Package uk.gov.dstl.baleen.annotators.gazetteer
Contains annotators that annotate entities based on a Gazetteer
A gazetteer is essentially a list of terms we are interested in, and gazetteer annotators will look for each of these terms in turn and annotate them within the document. Annotators in this package largely all perform the same function, with the primary difference being the source of the gazetteer.
A gazetteer approach gives high precision and recall on the terms it matches, but does require someone to put together a comprehensive gazetteer in the first place as anything not in the gazetteer will not be extracted. This also makes it sensitive to spelling mistakes, variations on spelling, etc.
-
Class Summary Class Description Country Gazetteer annotator for countries, using the SharedCountryResource.File Generic file-backed RadixTree Gazetteer annotator, that will use a file based gazetteer to find and annotate entities.List Generic list-backed RadixTree Gazetteer annotator, that will use a list based gazetteer to find and annotate entities.Mongo Generic Mongo-backed RadixTree Gazetteer annotator, that will use a Mongo gazetteer to find and annotate entities.MongoRegex Generic Mongo-backed Regex-RadixTree Gazetteer annotator, that will use a Mongo gazetteer to find and annotate entities.MongoStemming Generic Mongo-backed Stemming RadixTree Gazetteer annotator, that will use a Mongo gazetteer to find and annotate entities.