Package eu.clarin.weblicht.wlfxb.tc.api
Interface TextCorpus
-
- All Known Implementing Classes:
TextCorpusStored
,TextCorpusStreamed
,TextCorpusStreamedWithReplaceableLayers
public interface TextCorpus
Interface TextCorpus represents TCF TextCorpus annotations. Corresponds to TCF TextCorpus specification. These annotations represent linguistic annotations on written connected text. The annotations are divided into the annotation layers, were each layer represents specific linguistic aspect. For example, TextCorpus can containTokensLayer
,PosTagsLayer
,ConstituentParsingLayer
, etc. In TextCorpus, annotations from any layer usually annotate (directly or indirectly)Token
annotations fromTokensLayer
. An exception isTextLayer
which is independent from any other layer. See also: TCF Format description.- Author:
- Yana Panchenko
-
-
Method Summary
Modifier and Type Method Description LexicalSemanticsLayer
createAntonymyLayer()
Creates empty antonymy layer in this TextCorpus.ChunksLayer
createChunksLayer(String entitiesType)
Creates emptyChunksLayer
with the given tagset for named entity types in this TextCorpus.ConstituentParsingLayer
createConstituentParsingLayer(String tagset)
Creates emptyConstituentParsingLayer
with the given tagset in this TextCorpus.DependencyParsingLayer
createDependencyParsingLayer(boolean multipleGovernorsPossible, boolean emptyTokensPossible)
Creates emptyDependencyParsingLayer
in this TextCorpus.DependencyParsingLayer
createDependencyParsingLayer(String tagset, boolean multipleGovernorsPossible, boolean emptyTokensPossible)
Creates emptyDependencyParsingLayer
with the given tagset in this TextCorpus.DiscourseConnectivesLayer
createDiscourseConnectivesLayer()
Creates emptyDiscourseConnectivesLayer
in this TextCorpus.DiscourseConnectivesLayer
createDiscourseConnectivesLayer(String typeTagset)
Creates emptyDiscourseConnectivesLayer
in this TextCorpus.GeoLayer
createGeoLayer(String source, GeoLongLatFormat coordFormat)
Creates emptyGeoLayer
in this TextCorpus.GeoLayer
createGeoLayer(String source, GeoLongLatFormat coordFormat, GeoContinentFormat conitentFormat, GeoCountryFormat countryFormat, GeoCapitalFormat capitalFormat)
Creates emptyGeoLayer
in this TextCorpus.LexicalSemanticsLayer
createHyperonymyLayer()
Creates empty hyperonymy layer in this TextCorpus.LexicalSemanticsLayer
createHyponymyLayer()
Creates empty hyponymy layer in this TextCorpus.LemmasLayer
createLemmasLayer()
Creates emptyLemmasLayer
in this TextCorpus.MatchesLayer
createMatchesLayer(String queryLanguage, String queryString)
Creates empty MatchesLayer layers of this TextCorpus, ready to be filled in with the corpus match annotations.MorphologyLayer
createMorphologyLayer()
Creates emptyMorphologyLayer
in this TextCorpus.MorphologyLayer
createMorphologyLayer(boolean hasSegmentation)
Creates emptyMorphologyLayer
in this TextCorpus.MorphologyLayer
createMorphologyLayer(boolean hasSegmentation, boolean hasCharOffsets)
Creates emptyMorphologyLayer
in this TextCorpus.MorphologyLayer
createMorphologyLayer(String tagset)
Creates emptyMorphologyLayer
in this TextCorpus.MorphologyLayer
createMorphologyLayer(String tagset, boolean hasSegmentation)
Creates emptyMorphologyLayer
in this TextCorpus.MorphologyLayer
createMorphologyLayer(String tagset, boolean hasSegmentation, boolean hasCharOffsets)
Creates emptyMorphologyLayer
in this TextCorpus.NamedEntitiesLayer
createNamedEntitiesLayer(String entitiesType)
Creates emptyNamedEntitiesLayer
with the given tagset for named entity types in this TextCorpus.OrthographyLayer
createOrthographyLayer()
Creates emptyOrthographyLayer
in this TextCorpus.PhoneticsLayer
createPhotenicsLayer(String alphabet)
Creates emptyPhoneticsLayer
with the given alphabet for phonetic transcriptions in this TextCorpus.PosTagsLayer
createPosTagsLayer(String tagset)
Creates emptyPosTagsLayer
with the given tagset in this TextCorpus.ReferencesLayer
createReferencesLayer(String typetagset, String reltagset, String externalReferencesSource)
Creates empty references layers of this TextCorpus, ready to be filled in with the references data.RelationsLayer
createRelationsLayer(String type)
SentencesLayer
createSentencesLayer()
Creates emptySentencesLayer
in this TextCorpus.SentencesLayer
createSentencesLayer(boolean hasCharOffsets)
Creates emptySentencesLayer
in this TextCorpus.LexicalSemanticsLayer
createSynonymyLayer()
Creates empty synonymy layer in this TextCorpus.TextLayer
createTextLayer()
Creates emptyTextLayer
in this TextCorpus.TextSourceLayer
createTextSourceLayer()
Creates emptyTextSourceLayer
in this TextCorpus.TextStructureLayer
createTextStructureLayer()
Creates emptyTextStructureLayer
in this TextCorpus.TokensLayer
createTokensLayer()
Creates emptyTokensLayer
in this TextCorpus.TokensLayer
createTokensLayer(boolean hasCharOffsets)
Creates emptyTokensLayer
in this TextCorpus.TopologicalFieldsLayer
createTopologicalFieldsLayer(String tagset)
Creates emptyTopologicalFieldsLayer
with the given tagset in this TextCorpus.WordSensesLayer
createWordSensesLayer(String source)
Creates emptyWordSensesLayer
in this TextCorpus.WordSplittingLayer
createWordSplittingLayer(String type)
Creates emptyWordSplittingLayer
with the given type of the splitting in this TextCorpus.LexicalSemanticsLayer
getAntonymyLayer()
Gets antonymy layer of this TextCorpus.ChunksLayer
getChunksLayer()
Gets chunks layer of this TextCorpus.ConstituentParsingLayer
getConstituentParsingLayer()
Gets constituent parsing layer of this TextCorpus.DependencyParsingLayer
getDependencyParsingLayer()
Gets dependency parsing layer of this TextCorpus.DiscourseConnectivesLayer
getDiscourseConnectivesLayer()
Gets discourse connectives layer of this TextCorpus.GeoLayer
getGeoLayer()
Gets geo layer of this TextCorpus.LexicalSemanticsLayer
getHyperonymyLayer()
Gets hyperonymy layer of this TextCorpus.LexicalSemanticsLayer
getHyponymyLayer()
Gets hyponymy layer of this TextCorpus.String
getLanguage()
Gets the language of the text/tokens in this TextCorpus.List<TextCorpusLayer>
getLayers()
Gets all annotation layers of this TextCorpus.LemmasLayer
getLemmasLayer()
Gets lemmas layer of this TextCorpus.MatchesLayer
getMatchesLayer()
Gets matches layer of this TextCorpus.MorphologyLayer
getMorphologyLayer()
Gets morphology layer of this TextCorpus.NamedEntitiesLayer
getNamedEntitiesLayer()
Gets named entities layer of this TextCorpus.OrthographyLayer
getOrthographyLayer()
Gets orthography layer of this TextCorpus.PhoneticsLayer
getPhoneticsLayer()
Gets phonetics layer of this TextCorpus.PosTagsLayer
getPosTagsLayer()
Gets part-of-speech layer of this TextCorpus.ReferencesLayer
getReferencesLayer()
Gets references layer of this TextCorpus.RelationsLayer
getRelationsLayer()
SentencesLayer
getSentencesLayer()
Gets sentences layer of this TextCorpus.LexicalSemanticsLayer
getSynonymyLayer()
Gets synonymy layer of this TextCorpus.TextLayer
getTextLayer()
Gets text layer of this TextCorpus.TextSourceLayer
getTextSourceLayer()
Gets textSource layer of this TextSource.TextStructureLayer
getTextStructureLayer()
Gets text structure layer of this TextCorpus.TokensLayer
getTokensLayer()
Gets tokens layer of this TextCorpus.TopologicalFieldsLayer
getTopologicalFieldsLayer()
Gets topological fields layer of this TextCorpus.WordSensesLayer
getWordSensesLayer()
Gets word senses layer of this TextCorpus.WordSplittingLayer
getWordSplittingLayer()
Gets word splitting layer of this TextCorpus.
-
-
-
Method Detail
-
getLanguage
String getLanguage()
Gets the language of the text/tokens in this TextCorpus.- Returns:
- language of TextCorpus.
-
getLayers
List<TextCorpusLayer> getLayers()
Gets all annotation layers of this TextCorpus.- Returns:
- annotations layers.
-
getTextLayer
TextLayer getTextLayer()
Gets text layer of this TextCorpus.- Returns:
- annotation layer containing text.
-
createTextLayer
TextLayer createTextLayer()
Creates emptyTextLayer
in this TextCorpus.- Returns:
- annotation layer that has been created.
-
getTokensLayer
TokensLayer getTokensLayer()
Gets tokens layer of this TextCorpus.- Returns:
- annotation layer containing tokens.
-
createTokensLayer
TokensLayer createTokensLayer()
Creates emptyTokensLayer
in this TextCorpus.- Returns:
- annotation layer that has been created.
-
createTokensLayer
TokensLayer createTokensLayer(boolean hasCharOffsets)
Creates emptyTokensLayer
in this TextCorpus.- Parameters:
hasCharOffsets
- true if theToken
objects in this TokensLayer will contain character offset in text information, false otherwise.- Returns:
- annotation layer that has been created.
-
getLemmasLayer
LemmasLayer getLemmasLayer()
Gets lemmas layer of this TextCorpus.- Returns:
- layer containing lemma annotations on
Token
objects fromTokensLayer
.
-
createLemmasLayer
LemmasLayer createLemmasLayer()
Creates emptyLemmasLayer
in this TextCorpus.- Returns:
- annotation layer that has been created.
-
getPosTagsLayer
PosTagsLayer getPosTagsLayer()
Gets part-of-speech layer of this TextCorpus.- Returns:
- layer containing part-of-speech annotations on
Token
objects fromTokensLayer
.
-
createPosTagsLayer
PosTagsLayer createPosTagsLayer(String tagset)
Creates emptyPosTagsLayer
with the given tagset in this TextCorpus.- Parameters:
tagset
- of the part-of-speech annotations.- Returns:
- annotation layer that has been created.
-
getTopologicalFieldsLayer
TopologicalFieldsLayer getTopologicalFieldsLayer()
Gets topological fields layer of this TextCorpus.- Returns:
- layer containing topological field annotations on
Token
objects fromTokensLayer
.
-
createTopologicalFieldsLayer
TopologicalFieldsLayer createTopologicalFieldsLayer(String tagset)
Creates emptyTopologicalFieldsLayer
with the given tagset in this TextCorpus.- Parameters:
tagset
- of the topological fields.- Returns:
- annotation layer that has been created.
-
getSentencesLayer
SentencesLayer getSentencesLayer()
Gets sentences layer of this TextCorpus.- Returns:
- layer containing sentence boundary annotations on
Token
objects fromTokensLayer
.
-
createSentencesLayer
SentencesLayer createSentencesLayer()
Creates emptySentencesLayer
in this TextCorpus.- Returns:
- annotation layer that has been created.
-
createSentencesLayer
SentencesLayer createSentencesLayer(boolean hasCharOffsets)
Creates emptySentencesLayer
in this TextCorpus.- Parameters:
hasCharOffsets
- true if theSentence
objects in this SentencesLayer will contain character offset in text information, false otherwise.- Returns:
- annotation layer that has been created.
-
getConstituentParsingLayer
ConstituentParsingLayer getConstituentParsingLayer()
Gets constituent parsing layer of this TextCorpus.- Returns:
- layer containing constituent parsing annotations on
Token
objects fromTokensLayer
.
-
createConstituentParsingLayer
ConstituentParsingLayer createConstituentParsingLayer(String tagset)
Creates emptyConstituentParsingLayer
with the given tagset in this TextCorpus.- Parameters:
tagset
- of the parsing annotations.- Returns:
- annotation layer that has been created.
-
getDependencyParsingLayer
DependencyParsingLayer getDependencyParsingLayer()
Gets dependency parsing layer of this TextCorpus.- Returns:
- layer containing dependency parsing annotations on
Token
objects fromTokensLayer
.
-
createDependencyParsingLayer
DependencyParsingLayer createDependencyParsingLayer(boolean multipleGovernorsPossible, boolean emptyTokensPossible)
Creates emptyDependencyParsingLayer
in this TextCorpus.- Parameters:
multipleGovernorsPossible
- true if a dependent can be governed by more than 1 governor, false otherwise.emptyTokensPossible
- true if dependency annotations can contain empty tokens.- Returns:
- annotation layer that has been created.
-
createDependencyParsingLayer
DependencyParsingLayer createDependencyParsingLayer(String tagset, boolean multipleGovernorsPossible, boolean emptyTokensPossible)
Creates emptyDependencyParsingLayer
with the given tagset in this TextCorpus.- Parameters:
tagset
- of the functions between dependent and governor.multipleGovernorsPossible
- true if a dependent can be governed by more than 1 governor, false otherwise.emptyTokensPossible
- true if dependency annotations can contain empty tokens.- Returns:
- annotation layer that has been created.
-
getMorphologyLayer
MorphologyLayer getMorphologyLayer()
Gets morphology layer of this TextCorpus.- Returns:
- layer containing morphological analysis annotations on
Token
objects fromTokensLayer
.
-
createMorphologyLayer
MorphologyLayer createMorphologyLayer()
Creates emptyMorphologyLayer
in this TextCorpus.- Returns:
- annotation layer that has been created.
-
createMorphologyLayer
MorphologyLayer createMorphologyLayer(String tagset)
Creates emptyMorphologyLayer
in this TextCorpus.- Parameters:
tagset
- of the morphology annotations contain- Returns:
- annotation layer that has been created.
-
createMorphologyLayer
MorphologyLayer createMorphologyLayer(boolean hasSegmentation)
Creates emptyMorphologyLayer
in this TextCorpus.- Parameters:
hasSegmentation
- true if morphology annotations contain segmentation analysis.- Returns:
- annotation layer that has been created.
-
createMorphologyLayer
MorphologyLayer createMorphologyLayer(String tagset, boolean hasSegmentation)
Creates emptyMorphologyLayer
in this TextCorpus.- Parameters:
tagset
- of the morphology annotations containhasSegmentation
- true if morphology annotations contain segmentation analysis.- Returns:
- annotation layer that has been created.
-
createMorphologyLayer
MorphologyLayer createMorphologyLayer(boolean hasSegmentation, boolean hasCharOffsets)
Creates emptyMorphologyLayer
in this TextCorpus.- Parameters:
hasSegmentation
- true if morphology annotations contain segmentation analysis.hasCharOffsets
- true if theMorphologyAnalysis
objects in this layer will contain character offset for segmentation within the token information, false otherwise.- Returns:
- annotation layer that has been created.
-
createMorphologyLayer
MorphologyLayer createMorphologyLayer(String tagset, boolean hasSegmentation, boolean hasCharOffsets)
Creates emptyMorphologyLayer
in this TextCorpus.- Parameters:
tagset
- of the morphology annotations containhasSegmentation
- true if morphology annotations contain segmentation analysis.hasCharOffsets
- true if theMorphologyAnalysis
objects in this layer will contain character offset for segmentation within the token information, false otherwise.- Returns:
- annotation layer that has been created.
-
getNamedEntitiesLayer
NamedEntitiesLayer getNamedEntitiesLayer()
Gets named entities layer of this TextCorpus.- Returns:
- layer containing named entity annotations on
Token
objects fromTokensLayer
.
-
createNamedEntitiesLayer
NamedEntitiesLayer createNamedEntitiesLayer(String entitiesType)
Creates emptyNamedEntitiesLayer
with the given tagset for named entity types in this TextCorpus.- Parameters:
entitiesType
- tagset of the named entity annotations.- Returns:
- annotation layer that has been created.
-
createChunksLayer
ChunksLayer createChunksLayer(String entitiesType)
Creates emptyChunksLayer
with the given tagset for named entity types in this TextCorpus.- Parameters:
entitiesType
- tagset of the chunk annotations.- Returns:
- annotation layer that has been created.
-
getChunksLayer
ChunksLayer getChunksLayer()
Gets chunks layer of this TextCorpus.- Returns:
- layer containing chunk annotations on
Token
objects fromTokensLayer
.
-
getReferencesLayer
ReferencesLayer getReferencesLayer()
Gets references layer of this TextCorpus.- Returns:
- layer containing reference/coreference annotations on
Token
objects fromTokensLayer
.
-
createReferencesLayer
ReferencesLayer createReferencesLayer(String typetagset, String reltagset, String externalReferencesSource)
Creates empty references layers of this TextCorpus, ready to be filled in with the references data.- Parameters:
typetagset
- tagset for the mention type values of the references (should be null if no types are defined)reltagset
- tagset for relation values between the references (should be null if no relations are defined)externalReferencesSource
- name of external source (should be null if entities from the external source are not referenced)- Returns:
- annotation layer that has been created.
-
getRelationsLayer
RelationsLayer getRelationsLayer()
-
createRelationsLayer
RelationsLayer createRelationsLayer(String type)
-
getMatchesLayer
MatchesLayer getMatchesLayer()
Gets matches layer of this TextCorpus.- Returns:
- layer matches annotations on
Token
objects fromTokensLayer
.
-
createMatchesLayer
MatchesLayer createMatchesLayer(String queryLanguage, String queryString)
Creates empty MatchesLayer layers of this TextCorpus, ready to be filled in with the corpus match annotations.- Parameters:
queryLanguage
- language of the query used to extract corpus matches from a corpus.queryString
- the query used to extract corpus matches from a corpus.- Returns:
- annotation layer that has been created.
-
getWordSplittingLayer
WordSplittingLayer getWordSplittingLayer()
Gets word splitting layer of this TextCorpus.- Returns:
- layer split annotations (e.g. hyphenation) on
Token
objects fromTokensLayer
.
-
createWordSplittingLayer
WordSplittingLayer createWordSplittingLayer(String type)
Creates emptyWordSplittingLayer
with the given type of the splitting in this TextCorpus.- Parameters:
type
- of the splitting, e.g. hyphenation.- Returns:
- annotation layer that has been created.
-
getPhoneticsLayer
PhoneticsLayer getPhoneticsLayer()
Gets phonetics layer of this TextCorpus.- Returns:
- layer containing phonetic transcriptions of
Token
objects fromTokensLayer
.
-
createPhotenicsLayer
PhoneticsLayer createPhotenicsLayer(String alphabet)
Creates emptyPhoneticsLayer
with the given alphabet for phonetic transcriptions in this TextCorpus.- Parameters:
alphabet
- of the phonetic transcription annotations.- Returns:
- annotation layer that has been created.
-
getGeoLayer
GeoLayer getGeoLayer()
Gets geo layer of this TextCorpus.- Returns:
- layer containing geographical location annotations on
Token
objects fromTokensLayer
.
-
createGeoLayer
GeoLayer createGeoLayer(String source, GeoLongLatFormat coordFormat)
Creates emptyGeoLayer
in this TextCorpus.- Parameters:
source
- of the geographical coordinates.coordFormat
- format of the geographical coordinates.- Returns:
- annotation layer that has been created.
-
createGeoLayer
GeoLayer createGeoLayer(String source, GeoLongLatFormat coordFormat, GeoContinentFormat conitentFormat, GeoCountryFormat countryFormat, GeoCapitalFormat capitalFormat)
Creates emptyGeoLayer
in this TextCorpus.- Parameters:
source
- of the geographical coordinates.coordFormat
- format of the geographical coordinates.conitentFormat
- format of the continent (in case no continent is specified should be null).countryFormat
- format of the country (in case no country is specified should be null).capitalFormat
- format of the capital (in case no capital is specified should be null).- Returns:
- annotation layer that has been created.
-
getOrthographyLayer
OrthographyLayer getOrthographyLayer()
Gets orthography layer of this TextCorpus.- Returns:
- layer containing correct orthographic spellings of misspelled
Token
objects fromTokensLayer
.
-
createOrthographyLayer
OrthographyLayer createOrthographyLayer()
Creates emptyOrthographyLayer
in this TextCorpus.- Returns:
- annotation layer that has been created.
-
getTextStructureLayer
TextStructureLayer getTextStructureLayer()
Gets text structure layer of this TextCorpus.- Returns:
- layer containing original text structure (such as paragraphs,
lines, pages, etc.), anchored on
Token
objects fromTokensLayer
.
-
createTextStructureLayer
TextStructureLayer createTextStructureLayer()
Creates emptyTextStructureLayer
in this TextCorpus.- Returns:
- annotation layer that has been created.
-
getSynonymyLayer
LexicalSemanticsLayer getSynonymyLayer()
Gets synonymy layer of this TextCorpus.- Returns:
- layer containing synonyms of
Lemma
objects fromLemmasLayer
.
-
createSynonymyLayer
LexicalSemanticsLayer createSynonymyLayer()
Creates empty synonymy layer in this TextCorpus.- Returns:
- annotation layer that has been created.
-
getAntonymyLayer
LexicalSemanticsLayer getAntonymyLayer()
Gets antonymy layer of this TextCorpus.- Returns:
- layer containing antonyms of
Lemma
objects fromLemmasLayer
.
-
createAntonymyLayer
LexicalSemanticsLayer createAntonymyLayer()
Creates empty antonymy layer in this TextCorpus.- Returns:
- annotation layer that has been created.
-
getHyponymyLayer
LexicalSemanticsLayer getHyponymyLayer()
Gets hyponymy layer of this TextCorpus.- Returns:
- layer containing hyponyms of
Lemma
objects fromLemmasLayer
.
-
createHyponymyLayer
LexicalSemanticsLayer createHyponymyLayer()
Creates empty hyponymy layer in this TextCorpus.- Returns:
- annotation layer that has been created.
-
getHyperonymyLayer
LexicalSemanticsLayer getHyperonymyLayer()
Gets hyperonymy layer of this TextCorpus.- Returns:
- layer containing hyperonyms of
Lemma
objects fromLemmasLayer
.
-
createHyperonymyLayer
LexicalSemanticsLayer createHyperonymyLayer()
Creates empty hyperonymy layer in this TextCorpus.- Returns:
- annotation layer that has been created.
-
getDiscourseConnectivesLayer
DiscourseConnectivesLayer getDiscourseConnectivesLayer()
Gets discourse connectives layer of this TextCorpus.- Returns:
- layer containing discourse connectives annotations on
Token
objects fromTokensLayer
.
-
createDiscourseConnectivesLayer
DiscourseConnectivesLayer createDiscourseConnectivesLayer()
Creates emptyDiscourseConnectivesLayer
in this TextCorpus.- Returns:
- annotation layer that has been created.
-
createDiscourseConnectivesLayer
DiscourseConnectivesLayer createDiscourseConnectivesLayer(String typeTagset)
Creates emptyDiscourseConnectivesLayer
in this TextCorpus.- Parameters:
typeTagset
- tagset used to label semantic types of the connectives- Returns:
- annotation layer that has been created.
-
getWordSensesLayer
WordSensesLayer getWordSensesLayer()
Gets word senses layer of this TextCorpus.- Returns:
- layer containing word sense annotations on
Token
objects fromTokensLayer
.
-
createWordSensesLayer
WordSensesLayer createWordSensesLayer(String source)
Creates emptyWordSensesLayer
in this TextCorpus.- Parameters:
source
- from where the word senses are taken- Returns:
- annotation layer that has been created.
-
getTextSourceLayer
TextSourceLayer getTextSourceLayer()
Gets textSource layer of this TextSource.- Returns:
- annotation layer containing text.
-
createTextSourceLayer
TextSourceLayer createTextSourceLayer()
Creates emptyTextSourceLayer
in this TextCorpus.- Returns:
- annotation layer that has been created.
-
-