Class TextAnalyzer


  • public class TextAnalyzer
    extends Object
    Analyze Text data to determine type information and other key metrics associated with a text stream. A key objective of the analysis is that it should be sufficiently fast to be in-line (i.e. as the data is input from some source it should be possible to stream the data through this class without undue performance degradation).

    Typical usage is:

     
     		TextAnalyzer analysis = new TextAnalyzer("Age");
    
     		analysis.train("12");
     		analysis.train("62");
     		analysis.train("21");
     		analysis.train("37");
     		...
    
     		TextAnalysisResult result = analysis.getResult();
     
     
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  TextAnalyzer.Feature
      Enumeration that defines all on/off features for parsers.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected static int REFLECTION_SAMPLES  
    • Constructor Summary

      Constructors 
      Constructor Description
      TextAnalyzer()
      Construct an anonymous Text Analyzer for a data stream.
      TextAnalyzer​(AnalyzerContext context)
      Construct a Text Analyzer using the supplied context.
      TextAnalyzer​(String name)
      Construct a Text Analyzer for the named data stream.
      TextAnalyzer​(String name, com.cobber.fta.dates.DateTimeParser.DateResolutionMode resolutionMode)
      Construct a Text Analyzer for the named data stream with the supplied DateResolutionMode.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void configure​(TextAnalyzer.Feature feature, boolean state)
      Method for changing state of an on/off feature for this TextAnalyzer.
      static TextAnalyzer deserialize​(String serialized)
      Create a new TextAnalyzer from a serialized representation - used in concert with serialize() and merge(TextAnalyzer, TextAnalyzer) to merge TextAnalyzers run on separate shards into a single TextAnalyzer and hence a single TextAnalysisResult.
      protected static int distanceLevenshtein​(String source, Set<String> universe)
      Calculate the Levenshtein distance of the source string from the 'closest' string from the provided universe.
      boolean equals​(Object obj)  
      boolean equals​(Object obj, double epsilon)  
      AnalysisConfig getConfig()
      Get the configuration associated with this TextAnalyzer.
      AnalyzerContext getContext()
      Get the context supplied to the TextAnalyzer.
      int getDetectWindow()
      Get the size of the Detect Window (i.e number of Samples used to collect before attempting to determine the type.
      protected Facts getFacts()  
      int getHistogramBins()
      Gets the number of bins to use for the underlying approximation used to hold the Histogram once maxCardinality is exceeded.
      int getMaxCardinality()
      Get the maximum cardinality that will be tracked.
      int getMaxInputLength()
      Gets the current maximum input length for sampling.
      int getMaxInvalids()
      Get the maximum number of invalid entries that will be tracked.
      int getMaxOutliers()
      Get the maximum number of outliers that will be tracked.
      Plugins getPlugins()  
      int getPluginThreshold()
      Get the current detection Threshold for Semantic Type plugins.
      double getQuantileRelativeAccuracy()
      Gets the relative-error guarantee for quantiles.
      int getReflectionSampleSize()
      Get the number of Samples required before we will 'reflect' on the analysis and potentially change determination.
      protected String getRegExp​(KnownTypes.ID id)  
      TextAnalysisResult getResult()
      Determine the result of the training complete to date.
      String getStreamName()
      Get the name of the Data Stream.
      int getThreshold()
      Get the current detection Threshold.
      String getTraceFilePath()
      Return the full path to the trace file, or null if no tracing configured.
      List<String> getTrainingSet()
      Access the training set - this will typically be the first AnalysisConfig.DETECT_WINDOW_DEFAULT records.
      boolean isEnabled​(TextAnalyzer.Feature feature)
      Method for checking whether given TextAnalyzer feature is enabled.
      protected boolean isNullEquivalent​(String input)  
      static TextAnalyzer merge​(TextAnalyzer first, TextAnalyzer second)
      Create a new TextAnalyzer which is the result of merging two separate TextAnalyzers.
      protected TextAnalysisResult reAnalyze​(Map<String,​Long> details)  
      void registerDefaultPlugins​(AnalysisConfig analysisConfig)
      Register the default set of plugins for Semantic Type detection.
      String serialize()
      Serialize a TextAnalyzer - commonly used in concert with deserialize(String) and merge(TextAnalyzer, TextAnalyzer) to merge TextAnalyzers run on separate shards into a single TextAnalyzer and hence a single TextAnalysisResult.
      protected void setConfig​(AnalysisConfig analysisConfig)
      Set the configuration associated with this TextAnalyzer.
      protected void setContext​(AnalyzerContext context)
      Set the context supplied to the TextAnalyzer.
      void setDebug​(int debug)
      Internal Only.
      int setDetectWindow​(int detectWindow)
      Set the size of the Detect Window (that is, number of samples) to collect before attempting to determine the type.
      void setDistinctCount​(long distinctCount)
      Set the Distinct Count - commonly used where there is an external source that has visibility into the entire data set and 'knows' the distinct count of the set as a whole.
      protected void setExternalFacts​(Facts.ExternalFacts externalFacts)  
      int setHistogramBins​(int histogramBins)
      Sets the number of bins to use for the underlying approximation used to hold the Histogram once maxCardinality is exceeded.
      void setKeyConfidence​(double keyConfidence)
      Set the Key Confidence - typically used where there is an external source that indicated definitively that this is a key.
      void setLocale​(Locale locale)
      Override the default Locale.
      int setMaxCardinality​(int newCardinality)
      Set the maximum cardinality that will be tracked.
      int setMaxInputLength​(int maxInputLength)
      Sets the maximum input length for sampling.
      int setMaxInvalids​(int newMaxInvalids)
      Set the maximum number of invalid entries that will be tracked.
      int setMaxOutliers​(int newMaxOutliers)
      Set the maximum number of outliers that will be tracked.
      void setPluginThreshold​(int threshold)
      The percentage when we declare success 0 - 100 for Semantic Type plugins.
      double setQuantileRelativeAccuracy​(double quantileRelativeAccuracy)
      Sets the relative-error guarantee for quantiles.
      void setThreshold​(int threshold)
      The percentage when we declare success 0 - 100.
      void setTotalBlankCount​(long totalBlankCount)
      Set the count of all blank elements in the entire data stream.
      void setTotalCount​(long totalCount)
      Set the total number of elements in the Data Stream.
      void setTotalMaxLength​(int totalMaxLength)
      Set the maximum length for Numeric, Boolean and String across the entire data stream.
      void setTotalMaxValue​(String totalMaxValue)
      Set the maximum value for Numeric, Boolean and String across the entire data stream.
      void setTotalMean​(Double totalMean)
      Set the mean for Numeric types (Long, Double) across the entire data stream.
      void setTotalMinLength​(int totalMinLength)
      Set the minimum length for Numeric, Boolean and String across the entire data stream.
      void setTotalMinValue​(String totalMinValue)
      Set the minimum value for Numeric, Boolean and String types across the entire data stream.
      void setTotalNullCount​(long totalNullCount)
      Set the count of all null elements in the entire data stream.
      void setTotalStandardDeviation​(Double totalStandardDeviation)
      Get the Standard Deviation for Numeric types (Long, Double) across the entire data stream (if known).
      void setTrace​(String traceOptions)
      Set tracing options.
      void setUniqueness​(double uniqueness)
      Set the Uniqueness - typically used where there is an external source that has visibility into the entire data set and 'knows' the uniqueness of the set as a whole.
      boolean train​(String rawInput)
      Train is the streaming entry point used to supply input to the Text Analyzer.
      void trainBulk​(Map<String,​Long> observed)
      TrainBulk is the core bulk entry point used to supply input to the Text Analyzer.
      • Methods inherited from class java.lang.Object

        clone, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • TextAnalyzer

        public TextAnalyzer​(AnalyzerContext context)
        Construct a Text Analyzer using the supplied context.
        Parameters:
        context - The context used to interpret the stream.
      • TextAnalyzer

        public TextAnalyzer​(String name)
        Construct a Text Analyzer for the named data stream.

        Note: The DateResolutionMode mode will be 'None'.

        Parameters:
        name - The name of the data stream (e.g. the column of the CSV file)
      • TextAnalyzer

        public TextAnalyzer()
        Construct an anonymous Text Analyzer for a data stream.

        Note: The DateResolutionMode mode will be 'None'.

      • TextAnalyzer

        public TextAnalyzer​(String name,
                            com.cobber.fta.dates.DateTimeParser.DateResolutionMode resolutionMode)
        Construct a Text Analyzer for the named data stream with the supplied DateResolutionMode.
        Parameters:
        name - The name of the data stream (e.g. the column of the CSV file)
        resolutionMode - Determines what to do when the Date field is ambiguous (i.e. we cannot determine which of the fields is the day or the month. If resolutionMode is DayFirst, then assume day is first, if resolutionMode is MonthFirst then assume month is first, if it is Auto then choose either DayFirst or MonthFirst based on the locale, if it is None then the pattern returned will have '?' in to represent any ambiguity present.
    • Method Detail

      • configure

        public void configure​(TextAnalyzer.Feature feature,
                              boolean state)
        Method for changing state of an on/off feature for this TextAnalyzer.
        Parameters:
        feature - The feature to be set.
        state - The new state of the feature.
      • isEnabled

        public boolean isEnabled​(TextAnalyzer.Feature feature)
        Method for checking whether given TextAnalyzer feature is enabled.
        Parameters:
        feature - The feature to be tested.
        Returns:
        Whether the identified feature is enabled.
      • getStreamName

        public String getStreamName()
        Get the name of the Data Stream.
        Returns:
        The name of the Data Stream.
      • getContext

        public AnalyzerContext getContext()
        Get the context supplied to the TextAnalyzer.
        Returns:
        The AnalyzerContext of the TextAnalyzer.
      • setContext

        protected void setContext​(AnalyzerContext context)
        Set the context supplied to the TextAnalyzer.
        Parameters:
        context - The Context for this analysis.
      • getConfig

        public AnalysisConfig getConfig()
        Get the configuration associated with this TextAnalyzer.
        Returns:
        The AnalysisConfig of the TextAnalyzer.
      • setConfig

        protected void setConfig​(AnalysisConfig analysisConfig)
        Set the configuration associated with this TextAnalyzer. Note: Internal only.
        Parameters:
        analysisConfig - The replacement AnalysisConfig
      • setDebug

        public void setDebug​(int debug)
        Internal Only. Enable internal debugging.
        Parameters:
        debug - The debug level.
      • setTrace

        public void setTrace​(String traceOptions)
        Set tracing options. General form of options is <attribute1>=<value1>,<attribute2>=<value2> ... Supported attributes are:
        • enabled=true/false,
        • stream=<name of stream> (defaults to all)
        • directory=<directory for trace file> (defaults to java.io.tmpdir)
        • samples=<# samples to trace> (defaults to 1000)
        Parameters:
        traceOptions - The trace options.
      • setThreshold

        public void setThreshold​(int threshold)
        The percentage when we declare success 0 - 100. Typically this should not be adjusted, if you want to run in Strict mode then set this to 100.
        Parameters:
        threshold - The new threshold for detection.
      • getThreshold

        public int getThreshold()
        Get the current detection Threshold.
        Returns:
        The current threshold.
      • setPluginThreshold

        public void setPluginThreshold​(int threshold)
        The percentage when we declare success 0 - 100 for Semantic Type plugins. Typically this should not be adjusted, if you want to run in Strict mode then set this to 100.
        Parameters:
        threshold - The new threshold used for detection.
      • getPluginThreshold

        public int getPluginThreshold()
        Get the current detection Threshold for Semantic Type plugins. If not set, this will return -1, this means that each plugin is using a default threshold and doing something sensible!
        Returns:
        The current threshold.
      • setLocale

        public void setLocale​(Locale locale)
        Override the default Locale.

        Note: There is no support for Locales that do not use the Gregorian Calendar.

        Parameters:
        locale - The new Locale used to determine separators in numbers, date processing, default plugins, etc.
      • setDetectWindow

        public int setDetectWindow​(int detectWindow)
        Set the size of the Detect Window (that is, number of samples) to collect before attempting to determine the type. Default is AnalysisConfig.DETECT_WINDOW_DEFAULT.

        Note: It is not possible to change the Sample Size once training has started.

        Parameters:
        detectWindow - The number of samples to collect
        Returns:
        The previous value of this parameter.
      • getDetectWindow

        public int getDetectWindow()
        Get the size of the Detect Window (i.e number of Samples used to collect before attempting to determine the type.
        Returns:
        The current size of the Detect Window.
      • getReflectionSampleSize

        public int getReflectionSampleSize()
        Get the number of Samples required before we will 'reflect' on the analysis and potentially change determination.
        Returns:
        The current size of the reflection window.
      • setMaxCardinality

        public int setMaxCardinality​(int newCardinality)
        Set the maximum cardinality that will be tracked. Default is AnalysisConfig.MAX_CARDINALITY_DEFAULT.

        Note:

        • The Cardinality must be larger than the Cardinality of the largest Finite Semantic type (if Semantic Type detection is enabled - see configure(Feature, boolean)).
        • It is not possible to change the cardinality once training has started.
        Parameters:
        newCardinality - The maximum Cardinality that will be tracked (0 implies no tracking)
        Returns:
        The previous value of this parameter.
      • getMaxCardinality

        public int getMaxCardinality()
        Get the maximum cardinality that will be tracked. See setMaxCardinality() method.
        Returns:
        The maximum cardinality.
      • setMaxOutliers

        public int setMaxOutliers​(int newMaxOutliers)
        Set the maximum number of outliers that will be tracked. Default is AnalysisConfig.MAX_OUTLIERS_DEFAULT.

        Note: It is not possible to change the outlier count once training has started.

        Parameters:
        newMaxOutliers - The maximum number of outliers that will be tracked (0 implies no tracking)
        Returns:
        The previous value of this parameter.
      • getMaxOutliers

        public int getMaxOutliers()
        Get the maximum number of outliers that will be tracked. See setMaxOutliers() method.
        Returns:
        The maximum number of outliers to track.
      • setMaxInvalids

        public int setMaxInvalids​(int newMaxInvalids)
        Set the maximum number of invalid entries that will be tracked. Default is AnalysisConfig.MAX_INVALID_DEFAULT.

        Note: It is not possible to change the invalid count once training has started.

        Parameters:
        newMaxInvalids - The maximum number of invalid entries that will be tracked (0 implies no tracking)
        Returns:
        The previous value of this parameter.
      • getMaxInvalids

        public int getMaxInvalids()
        Get the maximum number of invalid entries that will be tracked. See setMaxInvalids() method.
        Returns:
        The maximum number of invalid entries to track.
      • setKeyConfidence

        public void setKeyConfidence​(double keyConfidence)
        Set the Key Confidence - typically used where there is an external source that indicated definitively that this is a key.
        Parameters:
        keyConfidence - The new keyConfidence
      • setUniqueness

        public void setUniqueness​(double uniqueness)
        Set the Uniqueness - typically used where there is an external source that has visibility into the entire data set and 'knows' the uniqueness of the set as a whole.
        Parameters:
        uniqueness - The new Uniqueness
      • setDistinctCount

        public void setDistinctCount​(long distinctCount)
        Set the Distinct Count - commonly used where there is an external source that has visibility into the entire data set and 'knows' the distinct count of the set as a whole. If determined by FTA it will typically indicate that the distinct count is less than the maximum cardinality being tracked.
        Parameters:
        distinctCount - The new Distinct Count
      • setTotalCount

        public void setTotalCount​(long totalCount)
        Set the total number of elements in the Data Stream. Only used when there is an external source that has visibility into the entire data stream.
        Parameters:
        totalCount - The total number of elements, as opposed to the number sampled.
      • setTotalNullCount

        public void setTotalNullCount​(long totalNullCount)
        Set the count of all null elements in the entire data stream. Only used when there is an external source that has visibility into the entire data stream.
        Parameters:
        totalNullCount - The total number of null elements, as opposed to the number of nulls in the sample set.
      • setTotalBlankCount

        public void setTotalBlankCount​(long totalBlankCount)
        Set the count of all blank elements in the entire data stream. Only used when there is an external source that has visibility into the entire data stream.
        Parameters:
        totalBlankCount - The total number of blank elements, as opposed to the number of blanks in the sample set.
      • setTotalMean

        public void setTotalMean​(Double totalMean)
        Set the mean for Numeric types (Long, Double) across the entire data stream. Only used when there is an external source that has visibility into the entire data stream.
        Parameters:
        totalMean - The mean of all elements in the data stream, as opposed to the mean of the sampled set.
      • setTotalStandardDeviation

        public void setTotalStandardDeviation​(Double totalStandardDeviation)
        Get the Standard Deviation for Numeric types (Long, Double) across the entire data stream (if known). Only used when there is an external source that has visibility into the entire data stream.
        Parameters:
        totalStandardDeviation - The Standard Deviation of all elements in the data stream, as opposed to the Standard Deviation of the sampled set.
      • setTotalMinValue

        public void setTotalMinValue​(String totalMinValue)
        Set the minimum value for Numeric, Boolean and String types across the entire data stream. Only used when there is an external source that has visibility into the entire data stream.
        Parameters:
        totalMinValue - The minimum value of all elements in the data stream, as opposed to the minimum of the sampled set.
      • setTotalMaxValue

        public void setTotalMaxValue​(String totalMaxValue)
        Set the maximum value for Numeric, Boolean and String across the entire data stream. Only used when there is an external source that has visibility into the entire data stream.
        Parameters:
        totalMaxValue - The manimum value of all elements in the data stream, as opposed to the manimum of the sampled set.
      • setTotalMinLength

        public void setTotalMinLength​(int totalMinLength)
        Set the minimum length for Numeric, Boolean and String across the entire data stream. Only used when there is an external source that has visibility into the entire data stream. Note: For String and Boolean types this length includes any whitespace.
        Parameters:
        totalMinLength - The minimum length of all elements in the data stream, as opposed to the minimum length of the sampled set.
      • setTotalMaxLength

        public void setTotalMaxLength​(int totalMaxLength)
        Set the maximum length for Numeric, Boolean and String across the entire data stream. Only used when there is an external source that has visibility into the entire data stream. Note: For String and Boolean types this length includes any whitespace.
        Parameters:
        totalMaxLength - The manimum length of all elements in the data stream, as opposed to the manimum length of the sampled set.
      • setMaxInputLength

        public int setMaxInputLength​(int maxInputLength)
        Sets the maximum input length for sampling. Default is AnalysisConfig.MAX_INPUT_LENGTH_DEFAULT.
        Parameters:
        maxInputLength - The maximum length of samples, any samples longer than this will be truncated to this length.
        Returns:
        The previous value of this parameter.
      • getQuantileRelativeAccuracy

        public double getQuantileRelativeAccuracy()
        Gets the relative-error guarantee for quantiles.
        Returns:
        The relative-error guarantee for quantiles (relevant only if cardinality > maxCardinality).
      • setQuantileRelativeAccuracy

        public double setQuantileRelativeAccuracy​(double quantileRelativeAccuracy)
        Sets the relative-error guarantee for quantiles. Default is AnalysisConfig.QUANTILE_RELATIVE_ACCURACY_DEFAULT.
        Parameters:
        quantileRelativeAccuracy - The relative-error guarantee desired for quantile determination, note smaller values require more memory!
        Returns:
        The previous value of this parameter.
      • getHistogramBins

        public int getHistogramBins()
        Gets the number of bins to use for the underlying approximation used to hold the Histogram once maxCardinality is exceeded.
        Returns:
        The number of underlying bins used for the approximation (relevant only if cardinality > maxCardinality).
      • setHistogramBins

        public int setHistogramBins​(int histogramBins)
        Sets the number of bins to use for the underlying approximation used to hold the Histogram once maxCardinality is exceeded. Default is AnalysisConfig.HISTOGRAM_BINS_DEFAULT.
        Parameters:
        histogramBins - the number of bins to use for the underlying approximation, note larger values require more memory!
        Returns:
        The previous value of this parameter.
      • getTraceFilePath

        public String getTraceFilePath()
        Return the full path to the trace file, or null if no tracing configured. Note: This will only be valid (i.e. non-null) after the first invocation of train() or trainBulk().
        Returns:
        The Path to the trace file.
      • getMaxInputLength

        public int getMaxInputLength()
        Gets the current maximum input length for sampling.
        Returns:
        The current maximum length before an input sample is truncated.
      • getRegExp

        protected String getRegExp​(KnownTypes.ID id)
      • getPlugins

        public Plugins getPlugins()
      • registerDefaultPlugins

        public void registerDefaultPlugins​(AnalysisConfig analysisConfig)
        Register the default set of plugins for Semantic Type detection.
        Parameters:
        analysisConfig - The Analysis configuration used for this analysis. Note: The Locale (on the configuration) will impact both the set of plugins registered as well as the behavior of the individual plugins
      • trainBulk

        public void trainBulk​(Map<String,​Long> observed)
                       throws FTAPluginException,
                              FTAUnsupportedLocaleException
        TrainBulk is the core bulk entry point used to supply input to the Text Analyzer. This routine is commonly used to support training using the results aggregated from a database query.
        Parameters:
        observed - A Map containing the observed items and the corresponding count
        Throws:
        FTAPluginException - Thrown when a registered plugin has detected an issue
        FTAUnsupportedLocaleException - Thrown when a requested locale is not supported
      • train

        public boolean train​(String rawInput)
                      throws FTAPluginException,
                             FTAUnsupportedLocaleException
        Train is the streaming entry point used to supply input to the Text Analyzer.
        Parameters:
        rawInput - The raw input as a String
        Returns:
        A boolean indicating if the resultant type is currently known.
        Throws:
        FTAPluginException - Thrown when a registered plugin has detected an issue
        FTAUnsupportedLocaleException - Thrown when a requested locale is not supported
      • isNullEquivalent

        protected boolean isNullEquivalent​(String input)
      • distanceLevenshtein

        protected static int distanceLevenshtein​(String source,
                                                 Set<String> universe)
        Calculate the Levenshtein distance of the source string from the 'closest' string from the provided universe.
        Parameters:
        source - The source string to test.
        universe - The universe of strings to test for distance
        Returns:
        The Levenshtein distance from the best match.
      • reAnalyze

        protected TextAnalysisResult reAnalyze​(Map<String,​Long> details)
                                        throws FTAPluginException,
                                               FTAUnsupportedLocaleException
        Throws:
        FTAPluginException
        FTAUnsupportedLocaleException
      • getResult

        public TextAnalysisResult getResult()
                                     throws FTAPluginException,
                                            FTAUnsupportedLocaleException
        Determine the result of the training complete to date. Typically invoked after all training is complete, but may be invoked at any stage.
        Returns:
        A TextAnalysisResult with the analysis of any training completed.
        Throws:
        FTAPluginException - Thrown when a registered plugin has detected an issue
        FTAUnsupportedLocaleException - Thrown when a requested locale is not supported
      • getTrainingSet

        public List<String> getTrainingSet()
        Access the training set - this will typically be the first AnalysisConfig.DETECT_WINDOW_DEFAULT records.
        Returns:
        A List of the raw input strings.
      • serialize

        public String serialize()
                         throws FTAPluginException,
                                FTAUnsupportedLocaleException
        Serialize a TextAnalyzer - commonly used in concert with deserialize(String) and merge(TextAnalyzer, TextAnalyzer) to merge TextAnalyzers run on separate shards into a single TextAnalyzer and hence a single TextAnalysisResult.
        Returns:
        A Serialized version of this TextAnalyzer which can be hydrated via deserialize().
        Throws:
        FTAPluginException - Thrown when a registered plugin has detected an issue
        FTAUnsupportedLocaleException - Thrown when a requested locale is not supported
      • deserialize

        public static TextAnalyzer deserialize​(String serialized)
                                        throws FTAMergeException,
                                               FTAPluginException,
                                               FTAUnsupportedLocaleException
        Create a new TextAnalyzer from a serialized representation - used in concert with serialize() and merge(TextAnalyzer, TextAnalyzer) to merge TextAnalyzers run on separate shards into a single TextAnalyzer and hence a single TextAnalysisResult.
        Parameters:
        serialized - The serialized form of a TextAnalyzer.
        Returns:
        A new TextAnalyzer which can be merged with another TextAnalyzer to product a single result.
        Throws:
        FTAMergeException - When we fail to de-serialize the provided String.
        FTAUnsupportedLocaleException - Thrown when a requested locale is not supported
        FTAPluginException - Thrown when a registered plugin has detected an issue
      • merge

        public static TextAnalyzer merge​(TextAnalyzer first,
                                         TextAnalyzer second)
                                  throws FTAMergeException,
                                         FTAPluginException,
                                         FTAUnsupportedLocaleException
        Create a new TextAnalyzer which is the result of merging two separate TextAnalyzers. This is typically used to merge TextAnalyzers run on separate shards into a single TextAnalyzer and hence a single TextAnalysisResult. See also and @link #deserialize(String).
        Parameters:
        first - The first TextAnalyzer
        second - The second TextAnalyzer
        Returns:
        A new TextAnalyzer which is a merge of the two arguments.
        Throws:
        FTAMergeException - If the AnalysisConfig for both TextAnalyzers are not identical
        FTAUnsupportedLocaleException - Thrown when a requested locale is not supported
        FTAPluginException - Thrown when a registered plugin has detected an issue
      • getFacts

        protected Facts getFacts()
      • equals

        public boolean equals​(Object obj)
        Overrides:
        equals in class Object
      • equals

        public boolean equals​(Object obj,
                              double epsilon)