public class POSTagger extends Object
Modifier and Type | Field and Description |
---|---|
String[][] |
lexBuff |
protected Map<String,List<Rule>> |
rules |
String[] |
tagBuff |
String[] |
wordBuff |
Constructor and Description |
---|
POSTagger(URL lexiconURL,
URL rulesURL)
Construct a POS tagger using the platform's native encoding to read the
lexicon and rules files.
|
POSTagger(URL lexiconURL,
URL rulesURL,
String encoding)
Construct a POS tagger using the specified encoding to read the lexicon
and rules files.
|
Modifier and Type | Method and Description |
---|---|
protected String[] |
classifyWord(String wd)
Attempts to classify an unknown word.
|
Rule |
createNewRule(String ruleId)
Creates a new rule of the required type according to the provided ID.
|
static void |
main(String[] args)
Main method.
|
protected boolean |
oneStep(String word,
List<String[]> taggedSentence)
Adds a new word to the window of 7 words (on the last position) and tags
the word currently in the middle (i.e. on position 3).
|
void |
readRules(URL rulesURL)
Reads the rules from the rules input file
|
List<List<String[]>> |
runTagger(List<List<String>> sentences)
Runs the tagger over a set of sentences.
|
void |
showRules() |
public POSTagger(URL lexiconURL, URL rulesURL) throws InvalidRuleException, IOException
InvalidRuleException
IOException
public POSTagger(URL lexiconURL, URL rulesURL, String encoding) throws InvalidRuleException, IOException
InvalidRuleException
IOException
public Rule createNewRule(String ruleId) throws InvalidRuleException
ruleId
- the ID for the rule to be createdInvalidRuleException
public List<List<String[]>> runTagger(List<List<String>> sentences)
sentences
- a List
of List
s
of words to be tagged. Each list is a sentence represented as a list of
words.List
of List
s of
String
[]. A list of tagged sentences, each sentence
being itself a list having pairs of strings as elements with
the word on the first position and the tag on the second.protected boolean oneStep(String word, List<String[]> taggedSentence)
word
- the new wordtaggedSentence
- a List of pairs of strings representing the results
of tagging the current sentence so far.public void readRules(URL rulesURL) throws IOException, InvalidRuleException
IOException
InvalidRuleException
public void showRules()
protected String[] classifyWord(String wd)
wd
- the word to be classified