Shallow parsing; modifies the document in place
Shallow parsing; modifies the document in place
Discourse parsing; modifies the document in place
Discourse parsing; modifies the document in place
SRL; modifies the document in place
SRL; modifies the document in place
Lematization; modifies the document in place
Lematization; modifies the document in place
Constructs a document of tokens from free text; includes sentence splitting and tokenization
Constructs a document of tokens from free text; includes sentence splitting and tokenization
Constructs a document of tokens from an array of untokenized sentences
Constructs a document of tokens from an array of untokenized sentences
Constructs a document of tokens from an array of tokenized sentences
Constructs a document of tokens from an array of tokenized sentences
Syntactic parsing; modifies the document in place
Syntactic parsing; modifies the document in place
Hook to allow postprocessing of CoreNLP POS tagging *in place*, overwriting original POS tags This is useful for domain-specific corrections
Hook to allow postprocessing of CoreNLP POS tagging *in place*, overwriting original POS tags This is useful for domain-specific corrections
The CoreNLP annotation
Hook to allow postprocessing of CoreNLP tokenization This is useful for domain-specific corrections, such as the ones in BioNLPProcessor If you change the tokens, make sure to store them back in the sentence!
Hook to allow postprocessing of CoreNLP tokenization This is useful for domain-specific corrections, such as the ones in BioNLPProcessor If you change the tokens, make sure to store them back in the sentence!
Input CoreNLP sentence
The modified tokens
Runs preprocessText on each sentence
Runs preprocessText on each sentence
Hook to allow the preprocessing of input text This is useful for domain-specific corrections, such as the ones in BioNLPProcessor, where we remove Table and Fig references Note that this is allowed to change character offsets
Hook to allow the preprocessing of input text This is useful for domain-specific corrections, such as the ones in BioNLPProcessor, where we remove Table and Fig references Note that this is allowed to change character offsets
The original input text
The preprocessed text
Runs preprocessText on each token
Runs preprocessText on each token
NER; modifies the document in place
NER; modifies the document in place
Coreference resolution; modifies the document in place
Coreference resolution; modifies the document in place
Part of speech tagging This modifies the document in place, which is not too elegant.
Part of speech tagging This modifies the document in place, which is not too elegant. But there are two reasons for this: (a) Some annotators (e.g., Stanford's CoreNLP) require some state (i.e., their Annotation object) to be passed between operations; (b) This is more efficient during annotate() where all the possible operations are chained.
A Processor using only shallow analysis: tokenization, lemmatization, POS tagging, and NER. All implemented using Stanford's CoreNLP tools. User: mihais Date: 2/25/15