Package com.cobber.fta
Class LogicalType
- Object
-
- LogicalType
-
- All Implemented Interfaces:
LTRandom
,Comparable<LogicalType>
- Direct Known Subclasses:
LogicalTypeCode
,LogicalTypeRegExp
public abstract class LogicalType extends Object implements Comparable<LogicalType>, LTRandom
All Semantic Types are derived from this abstract class. This LTRandom interface provides aLTRandom.nextRandom()
which will create a new valid example of the Semantic Type.
-
-
Field Summary
Fields Modifier and Type Field Description protected AnalysisConfig
analysisConfig
protected PluginDefinition
defn
protected Locale
locale
protected com.cobber.fta.dates.LocaleInfo
localeInfo
protected PluginLocaleEntry
pluginLocaleEntry
protected int
priority
protected int
threshold
-
Constructor Summary
Constructors Constructor Description LogicalType(PluginDefinition plugin)
LogicalType constructor.
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description boolean
acceptsBaseType(FTAType type)
abstract PluginAnalysis
analyzeSet(AnalyzerContext context, long matchCount, long realSamples, String currentRegExp, Facts facts, FiniteMap cardinality, FiniteMap outliers, TokenStreams tokenStreams, AnalysisConfig analysisConfig)
Given the data to date as embodied by the arguments return an analysis.int
compareTo(LogicalType other)
FTAType
getBaseType()
The underlying type we are qualifying.double
getConfidence(long matchCount, long realSamples, AnalyzerContext context)
Confidence in the type classification.String
getDescription()
The user-friendly description of the Semantic Type.int
getHeaderConfidence(String dataStreamName)
Determine the confidence that the name of the data stream is likely a valid header for this Semantic Type.PluginDefinition
getPluginDefinition()
Accessor for the Plugin Definition for this Semantic Type.int
getPriority()
The relative priority of this plugin.abstract String
getRegExp()
The Regular Expression that most closely matches (SeeisRegExpComplete()
) this Semantic Type.String
getSemanticType()
The name of the Semantic Type.String
getSignature()
A SHA-1 hash that reflects the data stream structure.int
getThreshold()
The percentage when we declare success 0 - 100.boolean
initialize(AnalysisConfig analysisConfig)
Called to perform any initialization.abstract boolean
isClosed()
Does the set of members enumerated reflect the entire set.boolean
isLocaleSensitive()
Is this plugin sensitive to the input locale.boolean
isRegExpComplete()
Is the returned Regular Expression a true and complete representation of the Semantic Type.boolean
isValid(String input)
Is the supplied String an instance of this Semantic type? Note: this invokesisValid(String, boolean, long)
with false so using validate mode not detect mode.abstract boolean
isValid(String input, boolean detectMode, long count)
Is the supplied String an instance of this Semantic type?void
setThreshold(int threshold)
The percentage when we declare success 0 - 100.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.cobber.fta.LTRandom
nextRandom, seed
-
-
-
-
Field Detail
-
defn
protected PluginDefinition defn
-
analysisConfig
protected AnalysisConfig analysisConfig
-
locale
protected Locale locale
-
localeInfo
protected com.cobber.fta.dates.LocaleInfo localeInfo
-
priority
protected int priority
-
threshold
protected int threshold
-
pluginLocaleEntry
protected PluginLocaleEntry pluginLocaleEntry
-
-
Constructor Detail
-
LogicalType
public LogicalType(PluginDefinition plugin)
LogicalType constructor.- Parameters:
plugin
- The definition of this plugin.
-
-
Method Detail
-
compareTo
public int compareTo(LogicalType other)
- Specified by:
compareTo
in interfaceComparable<LogicalType>
-
initialize
public boolean initialize(AnalysisConfig analysisConfig) throws FTAPluginException
Called to perform any initialization.- Parameters:
analysisConfig
- The Analysis configuration used for this analysis- Returns:
- True if initialization was successful.
- Throws:
FTAPluginException
- Thrown when the plugin is incorrectly configured.
-
getHeaderConfidence
public int getHeaderConfidence(String dataStreamName)
Determine the confidence that the name of the data stream is likely a valid header for this Semantic Type. Positive Numbers indicate it could be this Semantic Type, negative numbers indicate is it unlikely to be this Semantic Type, 0 indicates no opinion.- Parameters:
dataStreamName
- The name of this data stream- Returns:
- An integer between -100 and 100 reflecting the confidence that this stream name is a valid header.
-
getSemanticType
public String getSemanticType()
The name of the Semantic Type. For example, EMAIL for an email address.- Returns:
- The name of the Semantic Type.
-
getDescription
public String getDescription()
The user-friendly description of the Semantic Type. For example, 'Australian State' for the Semantic Type "STATE_PROVINCE.STATE_AU".- Returns:
- The user-friendly description of the Semantic Type.
-
getPriority
public int getPriority()
The relative priority of this plugin.- Returns:
- The relative priority of this plugin.
-
isLocaleSensitive
public boolean isLocaleSensitive()
Is this plugin sensitive to the input locale.- Returns:
- True if the plugin is sensitive to the input locale.
-
getRegExp
public abstract String getRegExp()
The Regular Expression that most closely matches (SeeisRegExpComplete()
) this Semantic Type. Note: All valid matches will match this RE, but the inverse is not necessarily true.- Returns:
- The Java Regular Expression that most closely matches this Semantic Type.
-
isRegExpComplete
public boolean isRegExpComplete()
Is the returned Regular Expression a true and complete representation of the Semantic Type. For example, \\d{5} is not for US ZIP codes (e.g. 00000 is not a valid Zip), whereas (?i)(male|female) could be valid for a Gender.- Returns:
- The Java Regular Expression that most closely matches this Semantic Type.
-
getThreshold
public int getThreshold()
The percentage when we declare success 0 - 100. We use this percentage in the determination of the Semantic Type. When and how it is used varies based on the plugin.- Returns:
- The threshold percentage.
-
setThreshold
public void setThreshold(int threshold)
The percentage when we declare success 0 - 100. We use this percentage in the determination of the Semantic Type. When and how it is used varies based on the plugin.- Parameters:
threshold
- the new threshold.
-
getConfidence
public double getConfidence(long matchCount, long realSamples, AnalyzerContext context)
Confidence in the type classification. Typically this will be the number of matches divided by the number of real samples.- Parameters:
matchCount
- Number of matches (as determined by isValid())realSamples
- Number of samples observed - does not include either nulls or blankscontext
- Context we are operating under (includes data stream name(s))- Returns:
- Confidence as a percentage.
-
getBaseType
public FTAType getBaseType()
The underlying type we are qualifying.- Returns:
- The underlying type - e.g. STRING, LONG, etc.
-
acceptsBaseType
public boolean acceptsBaseType(FTAType type)
-
getSignature
public String getSignature()
A SHA-1 hash that reflects the data stream structure.- Returns:
- A String SHA-1 hash that reflects the structure of the data stream.
-
isValid
public boolean isValid(String input)
Is the supplied String an instance of this Semantic type? Note: this invokesisValid(String, boolean, long)
with false so using validate mode not detect mode.- Parameters:
input
- String to check (trimmed for Numeric base Types, un-trimmed for String base Type)- Returns:
- true iff the supplied String is an instance of this Semantic type.
-
isValid
public abstract boolean isValid(String input, boolean detectMode, long count)
Is the supplied String an instance of this Semantic type?- Parameters:
input
- String to check (trimmed for Numeric base Types, un-trimmed for String base Type)detectMode
- If true then we are in the process of detection, otherwise it is a simple validity check.count
- The number of instance of this sample.- Returns:
- true iff the supplied String is an instance of this Semantic type.
-
analyzeSet
public abstract PluginAnalysis analyzeSet(AnalyzerContext context, long matchCount, long realSamples, String currentRegExp, Facts facts, FiniteMap cardinality, FiniteMap outliers, TokenStreams tokenStreams, AnalysisConfig analysisConfig)
Given the data to date as embodied by the arguments return an analysis. If we think this is an instance of this Semantic type then valid will be true , if invalid then valid will be false and a new Pattern will be returned.- Parameters:
context
- The context used to interpret the Data Stream (for example, stream name, date resolution mode, etc)matchCount
- Number of samples that match so far (as determined by isValid()realSamples
- Number of real (i.e. non-blank and non-null) samples that we have processed so far.currentRegExp
- The current Regular Expression that we matched againstfacts
- Facts (min, max, sum) for the analysis to date (optional - i.e. maybe null)cardinality
- Cardinality set, up to the maximum maintainedoutliers
- Outlier set, up to the maximum maintainedtokenStreams
- Shapes observedanalysisConfig
- The Configuration of the current analysis- Returns:
- Null if we think this is an instance of this Semantic type (backout pattern otherwise)
-
isClosed
public abstract boolean isClosed()
Does the set of members enumerated reflect the entire set. For example any of the ISO sets are reference sets and hence complete, compared to FirstName and LastName where the set provided is of the common names. If isClosed() is true then isValid() false does not imply that the input is not valid just that it is not in the set of 'known' members.- Returns:
- A boolean indicating if the set is closed.
-
getPluginDefinition
public PluginDefinition getPluginDefinition()
Accessor for the Plugin Definition for this Semantic Type.- Returns:
- The Plugin Definition.
-
-