Package com.cobber.fta
Class LogicalType
- Object
-
- LogicalType
-
- All Implemented Interfaces:
LTRandom
,Comparable<LogicalType>
- Direct Known Subclasses:
LogicalTypeCode
,LogicalTypeRegExp
public abstract class LogicalType extends Object implements Comparable<LogicalType>, LTRandom
All Logical Types are derived from this abstract class. This LTRandom interface provides aLTRandom.nextRandom()
which will create a new valid example of the Semantic Type.
-
-
Field Summary
Fields Modifier and Type Field Description protected PluginDefinition
defn
protected Locale
locale
protected PluginLocaleEntry
pluginLocaleEntry
protected int
priority
protected int
threshold
-
Constructor Summary
Constructors Constructor Description LogicalType(PluginDefinition plugin)
LogicalType constructor.
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description boolean
acceptsBaseType(FTAType type)
abstract PluginAnalysis
analyzeSet(AnalyzerContext context, long matchCount, long realSamples, String currentRegExp, Facts facts, Map<String,Long> cardinality, Map<String,Long> outliers, TokenStreams tokenStreams, AnalysisConfig analysisConfig)
Given the data to date as embodied by the arguments return an analysis.int
compareTo(LogicalType other)
FTAType
getBaseType()
The underlying type we are qualifying.double
getConfidence(long matchCount, long realSamples, String dataStreamName)
Confidence in the type classification.String
getDescription()
The user-friendly description of the Qualifier.int
getHeaderConfidence(String dataStreamName)
Determine the confidence that the name of the data stream is likely a valid header for this Semantic Type.PluginDefinition
getPluginDefinition()
Accessor for the Plugin Definition for this Logical Type.int
getPriority()
The relative priority of this plugin.abstract String
getQualifier()
The user-friendly name of the Qualifier.abstract String
getRegExp()
The Regular Expression that most closely matches (SeeisRegExpComplete()
) this Logical Type.String
getSignature()
A SHA-1 hash that reflects the data stream structure.int
getThreshold()
The percentage when we declare success 0 - 100.boolean
initialize(Locale locale)
Called to perform any initialization.abstract boolean
isClosed()
Does the set of members enumerated reflect the entire set.boolean
isLocaleSensitive()
Is this plugin sensitive to the input locale.boolean
isRegExpComplete()
Is the returned Regular Expression a true and complete representation of the Logical Type.abstract boolean
isValid(String input)
Is the supplied String an instance of this logical type?void
setThreshold(int threshold)
The percentage when we declare success 0 - 100.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.cobber.fta.LTRandom
nextRandom, seed
-
-
-
-
Field Detail
-
defn
protected PluginDefinition defn
-
locale
protected Locale locale
-
priority
protected int priority
-
threshold
protected int threshold
-
pluginLocaleEntry
protected PluginLocaleEntry pluginLocaleEntry
-
-
Constructor Detail
-
LogicalType
public LogicalType(PluginDefinition plugin)
LogicalType constructor.- Parameters:
plugin
- The definition of this plugin.
-
-
Method Detail
-
compareTo
public int compareTo(LogicalType other)
- Specified by:
compareTo
in interfaceComparable<LogicalType>
-
initialize
public boolean initialize(Locale locale) throws FTAPluginException
Called to perform any initialization.- Parameters:
locale
- The locale used for this analysis- Returns:
- True if initialization was successful.
- Throws:
FTAPluginException
- Thrown when the plugin is incorrectly configured.
-
getHeaderConfidence
public int getHeaderConfidence(String dataStreamName)
Determine the confidence that the name of the data stream is likely a valid header for this Semantic Type.- Parameters:
dataStreamName
- The name of this data stream- Returns:
- An integer between 0 and 100 reflecting the confidence that this stream name is a valid header.
-
getQualifier
public abstract String getQualifier()
The user-friendly name of the Qualifier. For example, EMAIL for an email address- Returns:
- The user-friendly name of the type-qualifier.
-
getDescription
public String getDescription()
The user-friendly description of the Qualifier. For example, 'Australian State' for the qualifier "STATE_PROVINCE.STATE_AU".- Returns:
- The user-friendly description of the type-qualifier.
-
getPriority
public int getPriority()
The relative priority of this plugin.- Returns:
- The relative priority of this plugin.
-
isLocaleSensitive
public boolean isLocaleSensitive()
Is this plugin sensitive to the input locale.- Returns:
- True if the plugin is sensitive to the input locale.
-
getRegExp
public abstract String getRegExp()
The Regular Expression that most closely matches (SeeisRegExpComplete()
) this Logical Type. Note: All valid matches will match this RE, but the inverse is not necessarily true.- Returns:
- The Java Regular Expression that most closely matches this Logical Type.
-
isRegExpComplete
public boolean isRegExpComplete()
Is the returned Regular Expression a true and complete representation of the Logical Type. For example, \\d{5} is not for US ZIP codes (e.g. 00000 is not a valid Zip), whereas (?i)(male|female) could be valid for a Gender.- Returns:
- The Java Regular Expression that most closely matches this Logical Type.
-
getThreshold
public int getThreshold()
The percentage when we declare success 0 - 100. We use this percentage in the determination of the Logical Type. When and how it is used varies based on the plugin.- Returns:
- The threshold percentage.
-
setThreshold
public void setThreshold(int threshold)
The percentage when we declare success 0 - 100. We use this percentage in the determination of the Logical Type. When and how it is used varies based on the plugin.- Parameters:
threshold
- the new threshold.
-
getConfidence
public double getConfidence(long matchCount, long realSamples, String dataStreamName)
Confidence in the type classification. Typically this will be the number of matches divided by the number of real samples.- Parameters:
matchCount
- Number of matches (as determined by isValid())realSamples
- Number of samples observed - does not include either nulls or blanksdataStreamName
- Name of the Data Stream- Returns:
- Confidence as a percentage.
-
getBaseType
public FTAType getBaseType()
The underlying type we are qualifying.- Returns:
- The underlying type - e.g. STRING, LONG, etc.
-
acceptsBaseType
public boolean acceptsBaseType(FTAType type)
-
getSignature
public String getSignature()
A SHA-1 hash that reflects the data stream structure.- Returns:
- A String SHA-1 hash that reflects the structure of the data stream.
-
isValid
public abstract boolean isValid(String input)
Is the supplied String an instance of this logical type?- Parameters:
input
- String to check (trimmed for Numeric base Types, un-trimmed for String base Type)- Returns:
- true iff the supplied String is an instance of this Logical type.
-
analyzeSet
public abstract PluginAnalysis analyzeSet(AnalyzerContext context, long matchCount, long realSamples, String currentRegExp, Facts facts, Map<String,Long> cardinality, Map<String,Long> outliers, TokenStreams tokenStreams, AnalysisConfig analysisConfig)
Given the data to date as embodied by the arguments return an analysis. If we think this is an instance of this logical type then valid will be true , if invalid then valid will be false and a new Pattern will be returned.- Parameters:
context
- The context used to interpret the Data Stream (for example, stream name, date resolution mode, etc)matchCount
- Number of samples that match so far (as determined by isValid()realSamples
- Number of real (i.e. non-blank and non-null) samples that we have processed so far.currentRegExp
- The current Regular Expression that we matched againstfacts
- Facts (min, max, sum) for the analysis to date (optional - i.e. maybe null)cardinality
- Cardinality set, up to the maximum maintainedoutliers
- Outlier set, up to the maximum maintainedtokenStreams
- Shapes observedanalysisConfig
- The Configuration of the current analysis- Returns:
- Null if we think this is an instance of this logical type (backout pattern otherwise)
-
isClosed
public abstract boolean isClosed()
Does the set of members enumerated reflect the entire set. For example any of the ISO sets are reference sets and hence complete, compared to FirstName and LastName where the set provided is of the common names. If isClosed() is true then isValid() false does not imply that the input is not valid just that it is not in the set of 'known' members.- Returns:
- A boolean indicating if the set is closed.
-
getPluginDefinition
public PluginDefinition getPluginDefinition()
Accessor for the Plugin Definition for this Logical Type.- Returns:
- The Plugin Definition.
-
-