Class TextAnalysisResult


  • public class TextAnalysisResult
    extends Object
    TextAnalysisResult is the result of a TextAnalyzer analysis of a data stream.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      String asJSON​(boolean pretty, int verbose)
      A JSON representation of the Analysis.
      String asPlugin()
      A plugin definition to use to match this type.
      long getBlankCount()
      Get the count of all blank samples.
      Set<String> getBottomK()
      Get the bottomK values.
      int getCardinality()
      Get the cardinality for the current data stream.
      Map<String,​Long> getCardinalityDetails()
      Get the cardinality details for the current data stream.
      double getConfidence()
      Confidence in the type classification.
      String getDataRegExp()
      Get the Regular Expression that reflects the non-white space element in the data stream.
      String getDataSignature()
      A SHA-1 hash that reflects the data stream contents.
      com.cobber.fta.dates.DateTimeParser.DateResolutionMode getDateResolutionMode()
      Get the DateResolutionMode actually used to process Dates.
      char getDecimalSeparator()
      Get the Decimal Separator used to interpret Doubles.
      long getDistinctCount()
      Return the distinct number of valid values in this stream.
      double getKeyConfidence()
      Is this field a key?
      boolean getLeadingWhiteSpace()
      Does the set of elements contain any elements with leading White Space?
      long getLeadingZeroCount()
      Get the count of all samples with leading zeros (Type long only).
      long getMatchCount()
      Get the count of all (non-blank/non-null) samples that matched the determined type.
      int getMaxLength()
      Get the maximum length for Numeric, Boolean and String.
      String getMaxValue()
      Get the maximum value for Numeric, Boolean and String.
      Double getMean()
      Get the mean for Numeric types (Long, Double).
      int getMinLength()
      Get the minimum length for Numeric, Boolean and String.
      String getMinValue()
      Get the minimum value for Numeric, Boolean and String types.
      boolean getMultiline()
      Does the set of elements contain any multi-line elements?
      String getName()
      Name of the data stream being analyzed.
      long getNullCount()
      Get the count of all null samples.
      int getOutlierCount()
      Get the number of distinct outliers for the current data stream.
      Map<String,​Long> getOutlierDetails()
      Get the outlier details for the current data stream.
      String getRegExp()
      Get the Regular Expression that reflects the data stream.
      long getSampleCount()
      Get the count of all samples observed.
      int getShapeCount()
      Get the number of distinct shapes for the current data stream.
      Map<String,​Long> getShapeDetails()
      Get the shape details for the current data stream.
      Double getStandardDeviation()
      Get the Standard Deviation for Numeric types (Long, Double).
      String getStructureSignature()
      A SHA-1 hash that reflects the data stream structure.
      Set<String> getTopK()
      Get the topK values.
      long getTotalCount()
      Get the total number of elements in the Data Stream (if known).
      boolean getTrailingWhiteSpace()
      Does the set of elements contain any elements with trailing White Space?
      FTAType getType()
      Get 'Type' as determined by training to date.
      String getTypeQualifier()
      Get the optional Type Qualifier.
      double getUniqueness()
      How unique is this field, i.e.
      boolean isLogicalType()
      Is this a Logical Type?
      boolean statisticsEnabled()
      Was statistics collection enabled for this analysis.
      String toString()
      A String representation of the Analysis.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Method Detail

      • getName

        public String getName()
        Name of the data stream being analyzed.
        Returns:
        Name of data stream.
      • getConfidence

        public double getConfidence()
        Confidence in the type classification. Typically this will be the number of matches divided by the number of real samples. Where a real sample does not include either nulls or blanks.
        Returns:
        Confidence as a percentage.
      • getType

        public FTAType getType()
        Get 'Type' as determined by training to date.
        Returns:
        The Type of the data stream.
      • getTypeQualifier

        public String getTypeQualifier()
        Get the optional Type Qualifier. Predefined qualifiers are:
        • Type: BOOLEAN - "TRUE_FALSE", "YES_NO", "ONE_ZERO"
        • Type: STRING - "BLANK", "BLANKORNULL", "NULL"
        • Type: LONG - "GROUPING", "SIGNED", "SIGNED_TRAILING". Note: "GROUPING" and "SIGNED" are independent and can both be present.
        • Type: DOUBLE - "GROUPING", "SIGNED", "SIGNED_TRAILING", "NON_LOCALIZED". Note: "GROUPING" and "SIGNED" are independent and can both be present.
        • Type: DATE, TIME, DATETIME, ZONEDDATETIME, OFFSETDATETIME - The qualifier is the detailed date format string
        Note: Boolean TRUE_FALSE is not localized, i.e. it will only be detected if the field contains true/false respectively. Note: Additional Type Qualifiers may be returned if any Logical Type plugins are installed. For example: If the Month Abbreviation plugin installed, the Base Type will be STRING, and the Qualifier will be "MONTHABBR".
        Returns:
        The Type Qualifier for the Type.
      • isLogicalType

        public boolean isLogicalType()
        Is this a Logical Type?
        Returns:
        True if this is a Logical Type.
      • getMinValue

        public String getMinValue()
        Get the minimum value for Numeric, Boolean and String types.
        Returns:
        The minimum value as a String.
      • getMaxValue

        public String getMaxValue()
        Get the maximum value for Numeric, Boolean and String.
        Returns:
        The maximum value as a String.
      • getMinLength

        public int getMinLength()
        Get the minimum length for Numeric, Boolean and String. Note: For String and Boolean types this length includes any whitespace.
        Returns:
        The minimum length.
      • getMaxLength

        public int getMaxLength()
        Get the maximum length for Numeric, Boolean and String. Note: For String and Boolean types this length includes any whitespace.
        Returns:
        The maximum length.
      • getDecimalSeparator

        public char getDecimalSeparator()
        Get the Decimal Separator used to interpret Doubles. Note: This will either be the Decimal Separator as per the locale or possibly a period.
        Returns:
        The Decimal Separator.
      • getDateResolutionMode

        public com.cobber.fta.dates.DateTimeParser.DateResolutionMode getDateResolutionMode()
        Get the DateResolutionMode actually used to process Dates.
        Returns:
        The DateResolution mode used to process Dates.
      • getMean

        public Double getMean()
        Get the mean for Numeric types (Long, Double).
        Returns:
        The mean.
      • getStandardDeviation

        public Double getStandardDeviation()
        Get the Standard Deviation for Numeric types (Long, Double).
        Returns:
        The Standard Deviation.
      • getTopK

        public Set<String> getTopK()
        Get the topK values.
        Returns:
        The top K values (default: 10).
      • getBottomK

        public Set<String> getBottomK()
        Get the bottomK values.
        Returns:
        The bottom K values (default: 10).
      • getRegExp

        public String getRegExp()
        Get the Regular Expression that reflects the data stream. All valid inputs should match this Regular Expression, however in some instances, not all inputs that match this RE are necessarily valid. For example, 28/13/2017 will match the RE (\d{2}/\d{2}/\d{4}) however this is not a valid date with pattern dd/MM/yyyy (there is no 13th month).
        Returns:
        The Regular Expression.
      • getDataRegExp

        public String getDataRegExp()
        Get the Regular Expression that reflects the non-white space element in the data stream. For example, if a stream contains ' hello' and 'world ' this would return '(?i)(HELLO|WORLD)'.
        Returns:
        The Regular Expression reflecting the non-white space data.
      • getMatchCount

        public long getMatchCount()
        Get the count of all (non-blank/non-null) samples that matched the determined type. More formally the SampleCount is equal to the MatchCount + BlankCount + NullCount.
        Returns:
        Count of all matches.
      • getLeadingWhiteSpace

        public boolean getLeadingWhiteSpace()
        Does the set of elements contain any elements with leading White Space?
        Returns:
        True if any elements matched have leading White Space.
      • getTrailingWhiteSpace

        public boolean getTrailingWhiteSpace()
        Does the set of elements contain any elements with trailing White Space?
        Returns:
        True if any elements matched have trailing White Space.
      • getMultiline

        public boolean getMultiline()
        Does the set of elements contain any multi-line elements?
        Returns:
        True if any elements matched are multi-line.
      • getTotalCount

        public long getTotalCount()
        Get the total number of elements in the Data Stream (if known).
        Returns:
        total number of elements in the Data Stream (if known) - -1 if not.
      • getSampleCount

        public long getSampleCount()
        Get the count of all samples observed.
        Returns:
        Count of all samples observed.
      • getNullCount

        public long getNullCount()
        Get the count of all null samples.
        Returns:
        Count of all null samples.
      • getBlankCount

        public long getBlankCount()
        Get the count of all blank samples. Note: any number (including zero) of spaces are Blank.
        Returns:
        Count of all blank samples.
      • getLeadingZeroCount

        public long getLeadingZeroCount()
        Get the count of all samples with leading zeros (Type long only). Note: a single '0' does not constitute a sample with a leading zero.
        Returns:
        Count of all leading zero samples.
      • getCardinality

        public int getCardinality()
        Get the cardinality for the current data stream. See setMaxCardinality() method in TextAnalyzer. Note: The cardinality returned is the cardinality of the valid samples. For example, if a date is invalid it will not be included in the cardinality. Note: This is not a complete cardinality analysis unless the cardinality of the data stream is less than the maximum cardinality (Default: 12000). See also setMaxCardinality() method in TextAnalyzer.
        Returns:
        Count of all blank samples.
      • getCardinalityDetails

        public Map<String,​Long> getCardinalityDetails()
        Get the cardinality details for the current data stream. This is a Map of Strings and the count of occurrences.
        Returns:
        A Map of values and their occurrence frequency of the data stream to date.
      • getOutlierCount

        public int getOutlierCount()
        Get the number of distinct outliers for the current data stream. See setMaxOutliers() method in TextAnalyzer. Note: This is not a complete outlier analysis unless the outlier count of the data stream is less than the maximum outlier count (Default: 50). See also setMaxOutliers() method in TextAnalyzer.
        Returns:
        Count of the distinct outliers.
      • getOutlierDetails

        public Map<String,​Long> getOutlierDetails()
        Get the outlier details for the current data stream. This is a Map of Strings and the count of occurrences.
        Returns:
        A Map of values and their occurrence frequency of the data stream to date.
      • getShapeCount

        public int getShapeCount()
        Get the number of distinct shapes for the current data stream. Note: This is not a complete shape analysis unless the shape count of the data stream is less than the maximum shape count (Default: 400).
        Returns:
        Count of the distinct shapes.
      • getShapeDetails

        public Map<String,​Long> getShapeDetails()
        Get the shape details for the current data stream. This is a Map of Strings and the count of occurrences.
        Returns:
        A Map of shapes and their occurrence frequency of the data stream to date.
      • getKeyConfidence

        public double getKeyConfidence()
        Is this field a key?
        Returns:
        A Double (0.0 ... 1.0) representing our confidence that this field is a key.
      • getUniqueness

        public double getUniqueness()
        How unique is this field, i.e. the number of elements in the set with a cardinality of one / cardinality. Note: Only supported if the cardinality presented is less than Max Cardinality.
        Returns:
        A Double (0.0 ... 1.0) representing the uniqueness of this field.
      • getDistinctCount

        public long getDistinctCount()
        Return the distinct number of valid values in this stream. Note: Typically only supported if the cardinality presented is less than Max Cardinality. Can be set by an external source.
        Returns:
        A long with the number of distinct values in this stream or -1 if unknown.
      • statisticsEnabled

        public boolean statisticsEnabled()
        Was statistics collection enabled for this analysis.
        Returns:
        True if statistics were collected.
      • toString

        public String toString()
        A String representation of the Analysis.
        Overrides:
        toString in class Object
        Returns:
        A String representation of the analysis to date.
      • getStructureSignature

        public String getStructureSignature()
        A SHA-1 hash that reflects the data stream structure. Note: If a Semantic type is detected then the SHA-1 hash will reflect this.
        Returns:
        A String SHA-1 hash that reflects the structure of the data stream.
      • getDataSignature

        public String getDataSignature()
        A SHA-1 hash that reflects the data stream contents. Note: The order of the data stream is not considered.
        Returns:
        A String SHA-1 hash that reflects the data stream contents.
      • asPlugin

        public String asPlugin()
        A plugin definition to use to match this type.
        Returns:
        A JSON representation of the analysis.
      • asJSON

        public String asJSON​(boolean pretty,
                             int verbose)
        A JSON representation of the Analysis.
        Parameters:
        pretty - If set, add minimal whitespace formatting.
        verbose - If > 0 provides additional details on the core, Outlier, and Shapes sets. A value of 1 will output the first 100 elements, a value > 1 will output the full set.
        Returns:
        A JSON representation of the analysis.