The standard label used in the parser (used to be String).
Something we can throw in an AnnotatedLabel
Something we can throw in an AnnotatedLabel
Class that turns BinarizedTrees into normal trees.
Class that turns BinarizedTrees into normal trees. Should replace unary chains in addition to removing intermediates.
root index is words.length
Based on Aria's comments:
Based on Aria's comments:
Basically, you're looking for the head label by searching in Dir for each parent -> rule expansion.
Dir is whether or not to look left to right or right to left Dis determines whether you are looking for the first match of any of the categories, or if you're looking for any match of the first category, then the second, etc. etc.
PennTreeReader due to Adam Pauls.
PennTreeReader due to Adam Pauls.
This reader returns empty categories as leaves of the tree below the -NONE-. These leaves span 0 words.
For example, (TOP (S-IMP (NP-SBJ (-NONE- *PRO*)) (VP (VB Look) (PP-DIR (IN at) (NP (DT that)))) (. /.)))
will return (TOP[0:4] (S-IMP[0:4] (NP-SBJ[0:0] (-NONE-[0:0] (*PRO*[0:0]))) (VP[0:4]...)
Represents a treebank with attendant spans, binarization, etc.
Represents a treebank with attendant spans, binarization, etc. Used in all the parser trainers.
Can annotate a tree with the head word.
Can annotate a tree with the head word. Usually you should just use HeadFinder.collinsHeadFinder
A SimpleTreebank can be easily specified by paths to the trees in Penn treebank format
A Treebank that uses a few number of training and test sentences.
Removes all traces from the word sequence, deleting all empty categories while it's at it.
Removes traces from the word sequence, and makes the tree have empty spans
A Treebank contains a train set, a test set, and a dev set, which are "Portions".
A Treebank contains a train set, a test set, and a dev set, which are "Portions". Portions are made up of sections, which have the trees.
Implements HeadFinding as in the Collins parser.
Implements HeadFinding as in the Collins parser. You can use HeadFinder.left[L] or right[L] to not use any head rules
Based on Aria's code.
Removes unaries chains A -> B -> ...
Removes unaries chains A -> B -> ... -> C, replacing them with A -> C and modifying the tree to know about the unaries
The standard label used in the parser (used to be String).
Useful for Klein-and-Manning style annotated labels and other explicit-annotation strategies