Package org.opensextant.data

Xponents Data Model

The key constructs here are the GeoBase and Geocoding.  GeoBase provides a base class for anything that has an ID, name or label, and a coordinate.  Geocoding provides an interface for any heuristic that helps ground some data to a coordinate, while providing additional metadata about the geocoding itself.  For example, beyond an actual coordinate useful geocoding attributes include:
  • precision and confidence
  • country code and province code or name
  • method or source for geocoding, e.g., derivation or rote lookup
  • etc.

Country and Place objects are extensions of GeoBase. Country is used extensively in place name extraction, reverse geocoding, and general country name/code lookups.  See GeonamesUtility for more country metadata tools.

Language object helps tie language code and name. LangDetect and LangId (org.opensextant.extractors.langid) provide some tools for language detection. Language ID does not always line up with a known Language code literally, as LangID may yield language + locale. So there is a need to be able to parse and manage explicit and inferred language/locale codings.

Java SDK Locale classes appear to only cover those used for computer internationalization. ICU4J libraries, for example, do not have a simple clear API. So, I created language lookup tables around ISO-639 codes (sourced from Library of Congress) which are found in org.opensextant.util.TextUtility: getLanguage(), getLanguageCode(), getLanguageMap().