Package org.predict4all.nlp.words
Class WordDictionary
- java.lang.Object
-
- org.predict4all.nlp.words.WordDictionary
-
public class WordDictionary extends Object
Represent a word dictionary.
This dictionary identify each sequence of chars as an unique "word" and keep information for this word.
Each word are identified by a single int ID to save memory and space.
The dictionary itself is identified with an UUID to verify consistency when using user dictionary.
Note thatWord
added toWordDictionary
cannot be removed : their ID should be consistent and they could have been used in aAbstractNGramDictionary
: however, you can disable a word withWord.setForceInvalid(boolean, boolean)
-
-
Constructor Summary
Constructors Constructor Description WordDictionary(LanguageModel languageModel, String dictionaryId)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description Collection<Word>
getAllWords()
All the existing words in this dictionary.
Words can be special words asTagWord
,EquivalenceClassWord
, etc.
They can also beSimpleWord
from a trained model, andUserWord
if they are word "learned" when using the predictor.
Note that if you ony want the possible words for final user, you should useWord.isValidToBePredicted(PredictionParameter)
to filter out invalid words.SortedMap<String,Word>
getExactWordsWithPrefixExist(String prefix)
Map<BiIntegerKey,NextWord>
getValidWordForPredictionByPrefix(String wordPrefix, PredictionParameter predictionParameter, int wantedWordCount, Set<Integer> wordIdsToExclude)
Returns all the words that starts with a given prefix.
The returned list is not sorted.Word
getWord(int wordId)
To get a word entity from id.
Contrary to othergetWord(String)
method, this return null if there is no word for the given IDWord
getWord(String wordStr)
To get the word entity from text.
Note that this method will never return null : it can however returnTag.UNKNOWN
id if there is no word in the dictionary for the given text.int
getWordId(String wordStr)
To get a word ID.
Note that this method will never return null : it can however returnTag.UNKNOWN
id if there is no word in the dictionary for the given text.void
incrementUserWord(int wordId)
boolean
isTokenValidToCreateUserWord(Token token)
static WordDictionary
loadDictionary(LanguageModel languageModel, File dictionaryFile)
Create a word dictionary from a word dictionary data file previously created with the training algorithm.
This method should not be called on user dictionary file, useloadUserDictionary(File)
instead.void
loadUserDictionary(File userDictionaryFile)
To load user dictionary on an existing trained dictionary.
This will supplement this dictionary with custom word from user, or existing word with modified parameters.
This should be called on dictionary previously saved withsaveUserDictionary(File)
Word
putUserWord(String wordStr)
To manually add an user word to this dictionary.
This will create the associated word entity.
This doesn't check that a previous word was in the dictionary with the same word : you should check it before calling this method (usegetWord(String)
)int
putUserWord(Token token)
int
putWordTraining(String word)
void
saveUserDictionary(File userDictionaryFile)
To save this dictionary modified words.
This will saved into the given file : theUserWord
added to the dictionary, but also everyWord
that was modified (e.g. ifWord.setProbFactor(double, boolean)
,Word.setForceInvalid(boolean, boolean)
etc... was called).
This file can later be loaded withloadUserDictionary(File)
int
size()
The word count stored in this dictionary.
-
-
-
Constructor Detail
-
WordDictionary
public WordDictionary(LanguageModel languageModel, String dictionaryId)
-
-
Method Detail
-
getWordId
public int getWordId(String wordStr)
To get a word ID.
Note that this method will never return null : it can however returnTag.UNKNOWN
id if there is no word in the dictionary for the given text.- Parameters:
wordStr
- the word content- Returns:
- word ID if found in the dictionary or
Tag.UNKNOWN
id
-
getWord
public Word getWord(String wordStr)
To get the word entity from text.
Note that this method will never return null : it can however returnTag.UNKNOWN
id if there is no word in the dictionary for the given text.- Parameters:
wordStr
- the word content- Returns:
- word if found in the dictionary or
TagWord
withTag.UNKNOWN
-
getWord
public Word getWord(int wordId)
To get a word entity from id.
Contrary to othergetWord(String)
method, this return null if there is no word for the given ID- Parameters:
wordId
- the word ID- Returns:
- the word associated with the given ID
-
getAllWords
public Collection<Word> getAllWords()
All the existing words in this dictionary.
Words can be special words asTagWord
,EquivalenceClassWord
, etc.
They can also beSimpleWord
from a trained model, andUserWord
if they are word "learned" when using the predictor.
Note that if you ony want the possible words for final user, you should useWord.isValidToBePredicted(PredictionParameter)
to filter out invalid words.- Returns:
- all possible words collection.
The returned Collection is ready-only
-
size
public int size()
The word count stored in this dictionary.- Returns:
- the stored word count
-
putUserWord
public Word putUserWord(String wordStr)
To manually add an user word to this dictionary.
This will create the associated word entity.
This doesn't check that a previous word was in the dictionary with the same word : you should check it before calling this method (usegetWord(String)
)- Parameters:
wordStr
- the word to add- Returns:
- the created user word
-
getValidWordForPredictionByPrefix
public Map<BiIntegerKey,NextWord> getValidWordForPredictionByPrefix(String wordPrefix, PredictionParameter predictionParameter, int wantedWordCount, Set<Integer> wordIdsToExclude)
Returns all the words that starts with a given prefix.
The returned list is not sorted.- Parameters:
wordPrefix
- prefixpredictionParameter
- prediction parameters (used to validate word to predict)wantedWordCount
- the number of word wanted (used to stop search)wordIdsToExclude
- to exclude word from the result set (by id)- Returns:
- the list of all word starting with the given prefix
-
getExactWordsWithPrefixExist
public SortedMap<String,Word> getExactWordsWithPrefixExist(String prefix)
-
putUserWord
public int putUserWord(Token token)
-
incrementUserWord
public void incrementUserWord(int wordId)
-
isTokenValidToCreateUserWord
public boolean isTokenValidToCreateUserWord(Token token)
-
putWordTraining
public int putWordTraining(String word)
-
loadDictionary
public static WordDictionary loadDictionary(LanguageModel languageModel, File dictionaryFile) throws IOException
Create a word dictionary from a word dictionary data file previously created with the training algorithm.
This method should not be called on user dictionary file, useloadUserDictionary(File)
instead.- Parameters:
languageModel
- the language model contained in this dictionarydictionaryFile
- the dictionary data file- Returns:
- the loaded dictionary
- Throws:
IOException
- if the data file doesn't exist or if IO problem happens
-
loadUserDictionary
public void loadUserDictionary(File userDictionaryFile) throws IOException, WordDictionaryMatchingException
To load user dictionary on an existing trained dictionary.
This will supplement this dictionary with custom word from user, or existing word with modified parameters.
This should be called on dictionary previously saved withsaveUserDictionary(File)
- Parameters:
userDictionaryFile
- the user dictionary data file- Throws:
IOException
- if the data file doesn't exist or if IO problem happensWordDictionaryMatchingException
- if the loaded word dictionary doesn't match this dictionary : the user dictionary should always be loaded on the same trained dictionary used to save it
-
saveUserDictionary
public void saveUserDictionary(File userDictionaryFile) throws IOException
To save this dictionary modified words.
This will saved into the given file : theUserWord
added to the dictionary, but also everyWord
that was modified (e.g. ifWord.setProbFactor(double, boolean)
,Word.setForceInvalid(boolean, boolean)
etc... was called).
This file can later be loaded withloadUserDictionary(File)
- Parameters:
userDictionaryFile
- the user dictionary data file- Throws:
IOException
- if saving failed- See Also:
to see exactly how works
-
-