Package com.cobber.fta
Class TextAnalysisResult
- Object
-
- TextAnalysisResult
-
public class TextAnalysisResult extends Object
TextAnalysisResult is the result of aTextAnalyzer
analysis of a data stream.
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description String
asJSON(boolean pretty, int verbose)
A JSON representation of the Analysis.String
asPlugin()
A plugin definition to use to match this type.long
getBlankCount()
Get the count of all blank samples.Set<String>
getBottomK()
Get the bottomK values.int
getCardinality()
Get the cardinality for the current data stream.Map<String,Long>
getCardinalityDetails()
Get the cardinality details for the current data stream.double
getConfidence()
Confidence in the type classification.String
getDataRegExp()
Get the Regular Expression that reflects the non-white space element in the data stream.String
getDataSignature()
A SHA-1 hash that reflects the data stream contents.com.cobber.fta.dates.DateTimeParser.DateResolutionMode
getDateResolutionMode()
Get the DateResolutionMode actually used to process Dates.char
getDecimalSeparator()
Get the Decimal Separator used to interpret Doubles.long
getDistinctCount()
Return the distinct number of valid values in this stream.double
getKeyConfidence()
Is this field a key?boolean
getLeadingWhiteSpace()
Does the set of elements contain any elements with leading White Space?long
getLeadingZeroCount()
Get the count of all samples with leading zeros (Type long only).long
getMatchCount()
Get the count of all (non-blank/non-null) samples that matched the determined type.int
getMaxLength()
Get the maximum length for Numeric, Boolean and String.String
getMaxValue()
Get the maximum value for Numeric, Boolean and String.Double
getMean()
Get the mean for Numeric types (Long, Double).int
getMinLength()
Get the minimum length for Numeric, Boolean and String.String
getMinValue()
Get the minimum value for Numeric, Boolean and String types.boolean
getMultiline()
Does the set of elements contain any multi-line elements?String
getName()
Name of the data stream being analyzed.long
getNullCount()
Get the count of all null samples.int
getOutlierCount()
Get the number of distinct outliers for the current data stream.Map<String,Long>
getOutlierDetails()
Get the outlier details for the current data stream.String
getRegExp()
Get the Regular Expression that reflects the data stream.long
getSampleCount()
Get the count of all samples observed.int
getShapeCount()
Get the number of distinct shapes for the current data stream.Map<String,Long>
getShapeDetails()
Get the shape details for the current data stream.Double
getStandardDeviation()
Get the Standard Deviation for Numeric types (Long, Double).String
getStructureSignature()
A SHA-1 hash that reflects the data stream structure.Set<String>
getTopK()
Get the topK values.long
getTotalCount()
Get the total number of elements in the Data Stream (if known).boolean
getTrailingWhiteSpace()
Does the set of elements contain any elements with trailing White Space?FTAType
getType()
Get 'Type' as determined by training to date.String
getTypeQualifier()
Get the optional Type Qualifier.double
getUniqueness()
How unique is this field, i.e.boolean
isLogicalType()
Is this a Logical Type?boolean
statisticsEnabled()
Was statistics collection enabled for this analysis.String
toString()
A String representation of the Analysis.
-
-
-
Method Detail
-
getName
public String getName()
Name of the data stream being analyzed.- Returns:
- Name of data stream.
-
getConfidence
public double getConfidence()
Confidence in the type classification. Typically this will be the number of matches divided by the number of real samples. Where a real sample does not include either nulls or blanks.- Returns:
- Confidence as a percentage.
-
getType
public FTAType getType()
Get 'Type' as determined by training to date.- Returns:
- The Type of the data stream.
-
getTypeQualifier
public String getTypeQualifier()
Get the optional Type Qualifier. Predefined qualifiers are:- Type: BOOLEAN - "TRUE_FALSE", "YES_NO", "ONE_ZERO"
- Type: STRING - "BLANK", "BLANKORNULL", "NULL"
- Type: LONG - "GROUPING", "SIGNED", "SIGNED_TRAILING". Note: "GROUPING" and "SIGNED" are independent and can both be present.
- Type: DOUBLE - "GROUPING", "SIGNED", "SIGNED_TRAILING", "NON_LOCALIZED". Note: "GROUPING" and "SIGNED" are independent and can both be present.
- Type: DATE, TIME, DATETIME, ZONEDDATETIME, OFFSETDATETIME - The qualifier is the detailed date format string
- Returns:
- The Type Qualifier for the Type.
-
isLogicalType
public boolean isLogicalType()
Is this a Logical Type?- Returns:
- True if this is a Logical Type.
-
getMinValue
public String getMinValue()
Get the minimum value for Numeric, Boolean and String types.- Returns:
- The minimum value as a String.
-
getMaxValue
public String getMaxValue()
Get the maximum value for Numeric, Boolean and String.- Returns:
- The maximum value as a String.
-
getMinLength
public int getMinLength()
Get the minimum length for Numeric, Boolean and String. Note: For String and Boolean types this length includes any whitespace.- Returns:
- The minimum length.
-
getMaxLength
public int getMaxLength()
Get the maximum length for Numeric, Boolean and String. Note: For String and Boolean types this length includes any whitespace.- Returns:
- The maximum length.
-
getDecimalSeparator
public char getDecimalSeparator()
Get the Decimal Separator used to interpret Doubles. Note: This will either be the Decimal Separator as per the locale or possibly a period.- Returns:
- The Decimal Separator.
-
getDateResolutionMode
public com.cobber.fta.dates.DateTimeParser.DateResolutionMode getDateResolutionMode()
Get the DateResolutionMode actually used to process Dates.- Returns:
- The DateResolution mode used to process Dates.
-
getMean
public Double getMean()
Get the mean for Numeric types (Long, Double).- Returns:
- The mean.
-
getStandardDeviation
public Double getStandardDeviation()
Get the Standard Deviation for Numeric types (Long, Double).- Returns:
- The Standard Deviation.
-
getTopK
public Set<String> getTopK()
Get the topK values.- Returns:
- The top K values (default: 10).
-
getBottomK
public Set<String> getBottomK()
Get the bottomK values.- Returns:
- The bottom K values (default: 10).
-
getRegExp
public String getRegExp()
Get the Regular Expression that reflects the data stream. All valid inputs should match this Regular Expression, however in some instances, not all inputs that match this RE are necessarily valid. For example, 28/13/2017 will match the RE (\d{2}/\d{2}/\d{4}) however this is not a valid date with pattern dd/MM/yyyy (there is no 13th month).- Returns:
- The Regular Expression.
-
getDataRegExp
public String getDataRegExp()
Get the Regular Expression that reflects the non-white space element in the data stream. For example, if a stream contains ' hello' and 'world ' this would return '(?i)(HELLO|WORLD)'.- Returns:
- The Regular Expression reflecting the non-white space data.
-
getMatchCount
public long getMatchCount()
Get the count of all (non-blank/non-null) samples that matched the determined type. More formally the SampleCount is equal to the MatchCount + BlankCount + NullCount.- Returns:
- Count of all matches.
-
getLeadingWhiteSpace
public boolean getLeadingWhiteSpace()
Does the set of elements contain any elements with leading White Space?- Returns:
- True if any elements matched have leading White Space.
-
getTrailingWhiteSpace
public boolean getTrailingWhiteSpace()
Does the set of elements contain any elements with trailing White Space?- Returns:
- True if any elements matched have trailing White Space.
-
getMultiline
public boolean getMultiline()
Does the set of elements contain any multi-line elements?- Returns:
- True if any elements matched are multi-line.
-
getTotalCount
public long getTotalCount()
Get the total number of elements in the Data Stream (if known).- Returns:
- total number of elements in the Data Stream (if known) - -1 if not.
-
getSampleCount
public long getSampleCount()
Get the count of all samples observed.- Returns:
- Count of all samples observed.
-
getNullCount
public long getNullCount()
Get the count of all null samples.- Returns:
- Count of all null samples.
-
getBlankCount
public long getBlankCount()
Get the count of all blank samples. Note: any number (including zero) of spaces are Blank.- Returns:
- Count of all blank samples.
-
getLeadingZeroCount
public long getLeadingZeroCount()
Get the count of all samples with leading zeros (Type long only). Note: a single '0' does not constitute a sample with a leading zero.- Returns:
- Count of all leading zero samples.
-
getCardinality
public int getCardinality()
Get the cardinality for the current data stream. SeesetMaxCardinality()
method in TextAnalyzer. Note: The cardinality returned is the cardinality of the valid samples. For example, if a date is invalid it will not be included in the cardinality. Note: This is not a complete cardinality analysis unless the cardinality of the data stream is less than the maximum cardinality (Default: 12000). See alsosetMaxCardinality()
method in TextAnalyzer.- Returns:
- Count of all blank samples.
-
getCardinalityDetails
public Map<String,Long> getCardinalityDetails()
Get the cardinality details for the current data stream. This is a Map of Strings and the count of occurrences.- Returns:
- A Map of values and their occurrence frequency of the data stream to date.
-
getOutlierCount
public int getOutlierCount()
Get the number of distinct outliers for the current data stream. SeesetMaxOutliers()
method in TextAnalyzer. Note: This is not a complete outlier analysis unless the outlier count of the data stream is less than the maximum outlier count (Default: 50). See alsosetMaxOutliers()
method in TextAnalyzer.- Returns:
- Count of the distinct outliers.
-
getOutlierDetails
public Map<String,Long> getOutlierDetails()
Get the outlier details for the current data stream. This is a Map of Strings and the count of occurrences.- Returns:
- A Map of values and their occurrence frequency of the data stream to date.
-
getShapeCount
public int getShapeCount()
Get the number of distinct shapes for the current data stream. Note: This is not a complete shape analysis unless the shape count of the data stream is less than the maximum shape count (Default: 400).- Returns:
- Count of the distinct shapes.
-
getShapeDetails
public Map<String,Long> getShapeDetails()
Get the shape details for the current data stream. This is a Map of Strings and the count of occurrences.- Returns:
- A Map of shapes and their occurrence frequency of the data stream to date.
-
getKeyConfidence
public double getKeyConfidence()
Is this field a key?- Returns:
- A Double (0.0 ... 1.0) representing our confidence that this field is a key.
-
getUniqueness
public double getUniqueness()
How unique is this field, i.e. the number of elements in the set with a cardinality of one / cardinality. Note: Only supported if the cardinality presented is less than Max Cardinality.- Returns:
- A Double (0.0 ... 1.0) representing the uniqueness of this field.
-
getDistinctCount
public long getDistinctCount()
Return the distinct number of valid values in this stream. Note: Typically only supported if the cardinality presented is less than Max Cardinality. Can be set by an external source.- Returns:
- A long with the number of distinct values in this stream or -1 if unknown.
-
statisticsEnabled
public boolean statisticsEnabled()
Was statistics collection enabled for this analysis.- Returns:
- True if statistics were collected.
-
toString
public String toString()
A String representation of the Analysis.- Overrides:
toString
in classObject
- Returns:
- A String representation of the analysis to date.
-
getStructureSignature
public String getStructureSignature()
A SHA-1 hash that reflects the data stream structure. Note: If a Semantic type is detected then the SHA-1 hash will reflect this.- Returns:
- A String SHA-1 hash that reflects the structure of the data stream.
-
getDataSignature
public String getDataSignature()
A SHA-1 hash that reflects the data stream contents. Note: The order of the data stream is not considered.- Returns:
- A String SHA-1 hash that reflects the data stream contents.
-
asPlugin
public String asPlugin()
A plugin definition to use to match this type.- Returns:
- A JSON representation of the analysis.
-
asJSON
public String asJSON(boolean pretty, int verbose)
A JSON representation of the Analysis.- Parameters:
pretty
- If set, add minimal whitespace formatting.verbose
- If > 0 provides additional details on the core, Outlier, and Shapes sets. A value of 1 will output the first 100 elements, a value > 1 will output the full set.- Returns:
- A JSON representation of the analysis.
-
-