Package org.opensextant.extraction
Extraction Fundamentals
Extraction fundamentals include TextEntity
, a span in free text, and TextMatch
a
TextEntity generated by an extractor, matcher, or rule. A span is defined as a character start offset
and end offset. A TextEntity provides basic reasoning for span logic and math: compare spans before, after
within, overlap, etc.
Beyond that, the extraction helpers here provide specific Solr tagger support, match filteration, match navigation, and match metrics.
-
Interface Summary Interface Description Extractor For now, this interface is closer to an AbstractExtractor where a clean interface might be output = Extractor.extract(input) This interface specifies more -
Class Summary Class Description ExtractionMetrics This is a holder for tracking various common measures: No.ExtractionResult MatcherUtils MatchFilter The Class MatchFilter.TextEntity A very simple struct to hold data useful for post-processing entities once found.TextMatch A variation on TextEntity that also records pattern metadata -
Exception Summary Exception Description ExtractionException An exception to be thrown when place name matching goes awry.NormalizationException