Class LogicalType

    • Field Detail

      • locale

        protected Locale locale
      • priority

        protected int priority
      • threshold

        protected int threshold
    • Constructor Detail

      • LogicalType

        public LogicalType​(PluginDefinition plugin)
        LogicalType constructor.
        Parameters:
        plugin - The definition of this plugin.
    • Method Detail

      • compareTo

        public int compareTo​(LogicalType other)
        Specified by:
        compareTo in interface Comparable<LogicalType>
      • initialize

        public boolean initialize​(Locale locale)
                           throws FTAPluginException
        Called to perform any initialization.
        Parameters:
        locale - The locale used for this analysis
        Returns:
        True if initialization was successful.
        Throws:
        FTAPluginException - Thrown when the plugin is incorrectly configured.
      • getHeaderConfidence

        public int getHeaderConfidence​(String dataStreamName)
        Determine the confidence that the name of the data stream is likely a valid header for this Semantic Type.
        Parameters:
        dataStreamName - The name of this data stream
        Returns:
        An integer between 0 and 100 reflecting the confidence that this stream name is a valid header.
      • getQualifier

        public abstract String getQualifier()
        The user-friendly name of the Qualifier. For example, EMAIL for an email address
        Returns:
        The user-friendly name of the type-qualifier.
      • getDescription

        public String getDescription()
        The user-friendly description of the Qualifier. For example, 'Australian State' for the qualifier "STATE_PROVINCE.STATE_AU".
        Returns:
        The user-friendly description of the type-qualifier.
      • getPriority

        public int getPriority()
        The relative priority of this plugin.
        Returns:
        The relative priority of this plugin.
      • isLocaleSensitive

        public boolean isLocaleSensitive()
        Is this plugin sensitive to the input locale.
        Returns:
        True if the plugin is sensitive to the input locale.
      • getRegExp

        public abstract String getRegExp()
        The Regular Expression that most closely matches (See isRegExpComplete()) this Logical Type. Note: All valid matches will match this RE, but the inverse is not necessarily true.
        Returns:
        The Java Regular Expression that most closely matches this Logical Type.
      • isRegExpComplete

        public boolean isRegExpComplete()
        Is the returned Regular Expression a true and complete representation of the Logical Type. For example, \\d{5} is not for US ZIP codes (e.g. 00000 is not a valid Zip), whereas (?i)(male|female) could be valid for a Gender.
        Returns:
        The Java Regular Expression that most closely matches this Logical Type.
      • getThreshold

        public int getThreshold()
        The percentage when we declare success 0 - 100. We use this percentage in the determination of the Logical Type. When and how it is used varies based on the plugin.
        Returns:
        The threshold percentage.
      • setThreshold

        public void setThreshold​(int threshold)
        The percentage when we declare success 0 - 100. We use this percentage in the determination of the Logical Type. When and how it is used varies based on the plugin.
        Parameters:
        threshold - the new threshold.
      • getConfidence

        public double getConfidence​(long matchCount,
                                    long realSamples,
                                    String dataStreamName)
        Confidence in the type classification. Typically this will be the number of matches divided by the number of real samples.
        Parameters:
        matchCount - Number of matches (as determined by isValid())
        realSamples - Number of samples observed - does not include either nulls or blanks
        dataStreamName - Name of the Data Stream
        Returns:
        Confidence as a percentage.
      • getBaseType

        public FTAType getBaseType()
        The underlying type we are qualifying.
        Returns:
        The underlying type - e.g. STRING, LONG, etc.
      • acceptsBaseType

        public boolean acceptsBaseType​(FTAType type)
      • getSignature

        public String getSignature()
        A SHA-1 hash that reflects the data stream structure.
        Returns:
        A String SHA-1 hash that reflects the structure of the data stream.
      • isValid

        public abstract boolean isValid​(String input)
        Is the supplied String an instance of this logical type?
        Parameters:
        input - String to check (trimmed for Numeric base Types, un-trimmed for String base Type)
        Returns:
        true iff the supplied String is an instance of this Logical type.
      • analyzeSet

        public abstract PluginAnalysis analyzeSet​(AnalyzerContext context,
                                                  long matchCount,
                                                  long realSamples,
                                                  String currentRegExp,
                                                  Facts facts,
                                                  Map<String,​Long> cardinality,
                                                  Map<String,​Long> outliers,
                                                  TokenStreams tokenStreams,
                                                  AnalysisConfig analysisConfig)
        Given the data to date as embodied by the arguments return an analysis. If we think this is an instance of this logical type then valid will be true , if invalid then valid will be false and a new Pattern will be returned.
        Parameters:
        context - The context used to interpret the Data Stream (for example, stream name, date resolution mode, etc)
        matchCount - Number of samples that match so far (as determined by isValid()
        realSamples - Number of real (i.e. non-blank and non-null) samples that we have processed so far.
        currentRegExp - The current Regular Expression that we matched against
        facts - Facts (min, max, sum) for the analysis to date (optional - i.e. maybe null)
        cardinality - Cardinality set, up to the maximum maintained
        outliers - Outlier set, up to the maximum maintained
        tokenStreams - Shapes observed
        analysisConfig - The Configuration of the current analysis
        Returns:
        Null if we think this is an instance of this logical type (backout pattern otherwise)
      • isClosed

        public abstract boolean isClosed()
        Does the set of members enumerated reflect the entire set. For example any of the ISO sets are reference sets and hence complete, compared to FirstName and LastName where the set provided is of the common names. If isClosed() is true then isValid() false does not imply that the input is not valid just that it is not in the set of 'known' members.
        Returns:
        A boolean indicating if the set is closed.
      • getPluginDefinition

        public PluginDefinition getPluginDefinition()
        Accessor for the Plugin Definition for this Logical Type.
        Returns:
        The Plugin Definition.