public class NGramTokenizer extends CharacterDelimitedTokenizer
-delimiters <value> The delimiters to use (default ' \r\n\t.,;:'"()?!').
-max <int> The max size of the Ngram (default = 3).
-min <int> The min size of the Ngram (default = 1).
Constructor and Description |
---|
NGramTokenizer() |
Modifier and Type | Method and Description |
---|---|
int |
getNGramMaxSize()
Gets the max N of the NGram.
|
int |
getNGramMinSize()
Gets the min N of the NGram.
|
String[] |
getOptions()
Gets the current option settings for the OptionHandler.
|
String |
getRevision()
Returns the revision string.
|
String |
globalInfo()
Returns a string describing the stemmer
|
boolean |
hasMoreElements()
returns true if there's more elements available
|
Enumeration<Option> |
listOptions()
Returns an enumeration of all the available options..
|
static void |
main(String[] args)
Runs the tokenizer with the given options and strings to tokenize.
|
String |
nextElement()
Returns N-grams and also (N-1)-grams and ....
|
String |
NGramMaxSizeTipText()
Returns the tip text for this property.
|
String |
NGramMinSizeTipText()
Returns the tip text for this property.
|
void |
setNGramMaxSize(int value)
Sets the max size of the Ngram.
|
void |
setNGramMinSize(int value)
Sets the min size of the Ngram.
|
void |
setOptions(String[] options)
Parses a given list of options.
|
void |
tokenize(String s)
Sets the string to tokenize.
|
delimitersTipText, getDelimiters, setDelimiters
runTokenizer, tokenize
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
makeCopy
public String globalInfo()
globalInfo
in class Tokenizer
public Enumeration<Option> listOptions()
listOptions
in interface OptionHandler
listOptions
in class CharacterDelimitedTokenizer
public String[] getOptions()
getOptions
in interface OptionHandler
getOptions
in class CharacterDelimitedTokenizer
public void setOptions(String[] options) throws Exception
-delimiters <value> The delimiters to use (default ' \r\n\t.,;:'"()?!').
-max <int> The max size of the Ngram (default = 3).
-min <int> The min size of the Ngram (default = 1).
setOptions
in interface OptionHandler
setOptions
in class CharacterDelimitedTokenizer
options
- the list of options as an array of stringsException
- if an option is not supportedpublic int getNGramMaxSize()
public void setNGramMaxSize(int value)
value
- the size of the NGram.public String NGramMaxSizeTipText()
public void setNGramMinSize(int value)
value
- the size of the NGram.public int getNGramMinSize()
public String NGramMinSizeTipText()
public boolean hasMoreElements()
hasMoreElements
in interface Enumeration<String>
hasMoreElements
in class Tokenizer
public String nextElement()
nextElement
in interface Enumeration<String>
nextElement
in class Tokenizer
public void tokenize(String s)
public String getRevision()
public static void main(String[] args)
args
- the commandline options and strings to tokenizeCopyright © 2019 University of Waikato, Hamilton, NZ. All rights reserved.