Shallow parsing; modifies the document in place
Discourse parsing; modifies the document in place
Lematization; modifies the document in place
Constructs a document of tokens from free text; includes sentence splitting and tokenization
Constructs a document of tokens from an array of untokenized sentences
Constructs a document of tokens from an array of tokenized sentences
Syntactic parsing; modifies the document in place
NER; modifies the document in place
Coreference resolution; modifies the document in place
Part of speech tagging This modifies the document in place, which is not too elegant.
Part of speech tagging This modifies the document in place, which is not too elegant. But there are two reasons for this: (a) Some annotators (e.g., Stanford's CoreNLP) require some state (i.e., their Annotation object) to be passed between operations; (b) This is more efficient during annotate() where all the possible operations are chained.
Runs preprocessText on each sentence
Hook to allow the preprocessing of input text This is useful for domain-specific corrections, such as the ones in BioNLPProcessor, where we remove Table and Fig references Note that this is allowed to change character offsets
Hook to allow the preprocessing of input text This is useful for domain-specific corrections, such as the ones in BioNLPProcessor, where we remove Table and Fig references Note that this is allowed to change character offsets
The original input text
The preprocessed text
Runs preprocessText on each token
User: mihais Date: 3/1/13