Package org.predict4all.nlp.words.model
Interface Word
-
- All Known Implementing Classes:
AbstractWord
,EquivalenceClassWord
,SimpleWord
,TagWord
,UserWord
public interface Word
Represent a word stored in aWordDictionary
- word are stored with a int ID to optimize memory usage. Word can be special (EquivalenceClassWord
or simple.SimpleWord
are firstly got from trained data, but thenUserWord
can be added from user vocabulary.Word type is not specialized as it contains every type methods (e.g.
getNGramTag()
even forSimpleWord
) but it is done to optimized runtime performance (avoid instanceof)
Word can be modified using
setProbFactor(double, boolean)
,setForceInvalid(boolean, boolean)
etc...
When a word is modified, the modifier should indicate if it is a user or system modification : this would change anything as every modified word are saved withWordDictionary.saveUserDictionary(File)
but it is a convenient way for library users to know if the word was modified programmatically or by the user (e.g. to filter out)
-
-
Field Summary
Fields Modifier and Type Field Description static byte
TYPE_EQUIVALENCE_CLASS
static byte
TYPE_NGRAM_TAG
static byte
TYPE_SIMPLE
static byte
TYPE_USER_WORD
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description Word
clone(int newId)
Create a clone of this word.
This allow duplication existing word, an new id should be provided.EquivalenceClass
getEquivalenceClass()
byte
getEquivalenceClassId()
int
getID()
long
getLastUseDate()
Tag
getNGramTag()
byte
getNGramTagId()
double
getProbFactor()
This factor can be used to modify final probabilities of the predictions.
It will be applied once probabilities are computed to influence result list.
It is mainly used in a multiplication with the original probability (and then the result list is normalized).
To only rely on probabilities, the value should be 1.0byte
getType()
int
getUsageCount()
String
getWord()
void
incrementUsageCount()
To increase the "usage" count of this wordboolean
isEquivalenceClass()
boolean
isForceInvalid()
To force that this word is invalid.
In fact, this method allow removal of a word from prediction result : words can't be removed from dictionary as they can be used in ngrams, but having forceInvalid true has the same effect than removing a word.boolean
isForceValid()
To force that this word become valid, mostly use onUserWord
to ignore validation.boolean
isModifiedBySystem()
Indicate that this word was modified by the system (e.g. calling a modification method withmodificationByUser
parameter to false)boolean
isModifiedByUser()
Indicate that this word was modified by the user (e.g. calling a modification method withmodificationByUser
parameter to true)boolean
isModifiedByUserOrSystem()
boolean
isNGramTag()
boolean
isUserWord()
boolean
isValidForSaving()
boolean
isValidToBePredicted(PredictionParameter predictionParameter)
To check if this word can be displayed as a prediction result.
This typically return true for original words, but can be sensible to computation for user words.
This can also return true/false regardingisForceInvalid()
orisForceValid()
Also, user word are valid for prediction regardingPredictionParameter.getMinUseCountToValidateNewWord()
void
setForceInvalid(boolean forceInvalid, boolean modificationByUser)
To force that this word is invalid.
In fact, this method allow removal of a word from prediction result : words can't be removed from dictionary as they can be used in ngrams, but having forceInvalid true has the same effect than removing a word.void
setForceValid(boolean forceValid, boolean modificationByUser)
To force that this word become valid, mostly use onUserWord
to ignore validation.void
setModifiedBySystem(boolean modifiedBySystem)
To manually set modification by system flagvoid
setModifiedByUser(boolean modifiedByUser)
To manually set modification by user flagvoid
setProbFactor(double factor, boolean modificationByUser)
This factor can be used to modify final probabilities of the predictions.
It will be applied once probabilities are computed to influence result list.
It is mainly used in a multiplication with the original probability (and then the result list is normalized).
To only rely on probabilities, the value should be 1.0
-
-
-
Field Detail
-
TYPE_EQUIVALENCE_CLASS
static final byte TYPE_EQUIVALENCE_CLASS
- See Also:
- Constant Field Values
-
TYPE_NGRAM_TAG
static final byte TYPE_NGRAM_TAG
- See Also:
- Constant Field Values
-
TYPE_SIMPLE
static final byte TYPE_SIMPLE
- See Also:
- Constant Field Values
-
TYPE_USER_WORD
static final byte TYPE_USER_WORD
- See Also:
- Constant Field Values
-
-
Method Detail
-
getID
int getID()
- Returns:
- this word ID : use int as it's the lowest memory foot print primitive to store enough word
-
getWord
String getWord()
- Returns:
- this word "word" ! Can sometimes be null if the word represent a concept more than a real word (e.g.
EquivalenceClassWord
orTagWord
)
-
getType
byte getType()
- Returns:
- the byte to save this word type (using TYPE_EQUIVALENCE_CLASS, TYPE_NGRAM_TAG, etc...)
Used inWordFileInputStream
andWordFileOutputStream
-
isNGramTag
boolean isNGramTag()
- Returns:
- true if this word is
TagWord
instance
-
isEquivalenceClass
boolean isEquivalenceClass()
- Returns:
- true if this word is
EquivalenceClassWord
instance
-
isUserWord
boolean isUserWord()
- Returns:
- true if this word is
UserWord
instance
-
getEquivalenceClass
EquivalenceClass getEquivalenceClass()
- Returns:
- the equivalence class represented by this word (only if
isEquivalenceClass()
)
-
getEquivalenceClassId
byte getEquivalenceClassId()
- Returns:
- the equivalence class id represented by this word (only if
isEquivalenceClass()
)
-
getNGramTagId
byte getNGramTagId()
- Returns:
- the ngram tag id represented by this word (only if
isNGramTag()
)
-
getNGramTag
Tag getNGramTag()
- Returns:
- the ngram tag represented by this word (only if
isNGramTag()
)
-
isValidForSaving
boolean isValidForSaving()
- Returns:
- true if this world should be saved (in both original and user dictionary)
-
isValidToBePredicted
boolean isValidToBePredicted(PredictionParameter predictionParameter)
To check if this word can be displayed as a prediction result.
This typically return true for original words, but can be sensible to computation for user words.
This can also return true/false regardingisForceInvalid()
orisForceValid()
Also, user word are valid for prediction regardingPredictionParameter.getMinUseCountToValidateNewWord()
- Parameters:
predictionParameter
- the prediction parameter, could be use to validate the word- Returns:
- true if the word can be displayed in prediction result
-
getProbFactor
double getProbFactor()
This factor can be used to modify final probabilities of the predictions.
It will be applied once probabilities are computed to influence result list.
It is mainly used in a multiplication with the original probability (and then the result list is normalized).
To only rely on probabilities, the value should be 1.0- Returns:
- the prob factor
-
setProbFactor
void setProbFactor(double factor, boolean modificationByUser)
This factor can be used to modify final probabilities of the predictions.
It will be applied once probabilities are computed to influence result list.
It is mainly used in a multiplication with the original probability (and then the result list is normalized).
To only rely on probabilities, the value should be 1.0- Parameters:
factor
- the prob factormodificationByUser
- true indicates that the modification was done by the user and not the system
-
isForceValid
boolean isForceValid()
To force that this word become valid, mostly use onUserWord
to ignore validation.- Returns:
- force valid enabled
-
setForceValid
void setForceValid(boolean forceValid, boolean modificationByUser)
To force that this word become valid, mostly use onUserWord
to ignore validation.- Parameters:
forceValid
- force valid enabledmodificationByUser
- true indicates that the modification was done by the user and not the system
-
isForceInvalid
boolean isForceInvalid()
To force that this word is invalid.
In fact, this method allow removal of a word from prediction result : words can't be removed from dictionary as they can be used in ngrams, but having forceInvalid true has the same effect than removing a word.- Returns:
- force invalid enabled
-
setForceInvalid
void setForceInvalid(boolean forceInvalid, boolean modificationByUser)
To force that this word is invalid.
In fact, this method allow removal of a word from prediction result : words can't be removed from dictionary as they can be used in ngrams, but having forceInvalid true has the same effect than removing a word.- Parameters:
forceInvalid
- force invalid enabledmodificationByUser
- true indicates that the modification was done by the user and not the system
-
setModifiedByUser
void setModifiedByUser(boolean modifiedByUser)
To manually set modification by user flag- Parameters:
modifiedByUser
- modification by user flag
-
setModifiedBySystem
void setModifiedBySystem(boolean modifiedBySystem)
To manually set modification by system flag- Parameters:
modifiedBySystem
- modification by system flag
-
isModifiedBySystem
boolean isModifiedBySystem()
Indicate that this word was modified by the system (e.g. calling a modification method withmodificationByUser
parameter to false)- Returns:
- modification by system flag
-
isModifiedByUser
boolean isModifiedByUser()
Indicate that this word was modified by the user (e.g. calling a modification method withmodificationByUser
parameter to true)- Returns:
- modification by user flag
-
isModifiedByUserOrSystem
boolean isModifiedByUserOrSystem()
- Returns:
- true if
isModifiedByUser()
orisModifiedBySystem()
-
getUsageCount
int getUsageCount()
- Returns:
- the number of times this word was seen "used" in user text.
This count is update byWordPredictor
when training the dynamic model.
-
incrementUsageCount
void incrementUsageCount()
To increase the "usage" count of this word
-
getLastUseDate
long getLastUseDate()
- Returns:
- the timestamp of the last usage (typically the last call to
incrementUsageCount()
)
-
clone
Word clone(int newId)
Create a clone of this word.
This allow duplication existing word, an new id should be provided.- Parameters:
newId
- the word new id- Returns:
- a clone of this word, with the new id
-
-