Package com.yahoo.prelude.querytransform
Class PhraseMatcher
- java.lang.Object
-
- com.yahoo.prelude.querytransform.PhraseMatcher
-
public class PhraseMatcher extends java.lang.Object
Detects query phrases using an automaton. This class is thread safe.- Author:
- bratseth
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
PhraseMatcher.Phrase
Points to a collection of word items (one or more) which is matches a complete listing in an automat
-
Constructor Summary
Constructors Constructor Description PhraseMatcher(com.yahoo.fsa.FSA phraseAutomatonFSA, boolean ignorePluralForm)
Creates a phrase matcherPhraseMatcher(java.lang.String phraseAutomatonFile)
Creates a phrase matcher.PhraseMatcher(java.lang.String phraseAutomatonFile, boolean ignorePluralForm)
Creates a phrase matcher
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static PhraseMatcher
getNullMatcher()
Returns a phrase matcher which (quickly) never matches anythingboolean
isEmpty()
java.util.List<PhraseMatcher.Phrase>
matchPhrases(Item queryItem)
Finds all phrases (word sequences of length 1 or higher) of the same index, not negative items of a notitem, which constitutes a complete entry in the automaton of this matchervoid
setIgnorePluralForm(boolean ignorePluralForm)
Sets whether we should ignore plural/singular form when matchingvoid
setMatchAll(boolean matchAll)
Sets whether to return the longest matching phrase when there are overlapping matches (default), or all matching phrasesvoid
setMatchPhraseItems(boolean matchPhraseItems)
Set whether to match words contained in phrase items as well.void
setMatchSingleItems(boolean matchSingleItems)
Sets whether single items should be matched and returned as phrase matches.
-
-
-
Constructor Detail
-
PhraseMatcher
public PhraseMatcher(java.lang.String phraseAutomatonFile)
Creates a phrase matcher. This will not ignore plural/singular form differences when matching- Parameters:
phraseAutomatonFile
- the file containing phrases to match- Throws:
java.lang.IllegalArgumentException
- if the file is not found
-
PhraseMatcher
public PhraseMatcher(java.lang.String phraseAutomatonFile, boolean ignorePluralForm)
Creates a phrase matcher- Parameters:
phraseAutomatonFile
- the file containing phrases to matchignorePluralForm
- whether we should ignore plural and singular forms as matches- Throws:
java.lang.IllegalArgumentException
- if the file is not found
-
PhraseMatcher
public PhraseMatcher(com.yahoo.fsa.FSA phraseAutomatonFSA, boolean ignorePluralForm)
Creates a phrase matcher- Parameters:
phraseAutomatonFSA
- the fsa containing phrases to matchignorePluralForm
- whether we should ignore plural and singular forms as matches- Throws:
java.lang.IllegalArgumentException
- if FSA is null
-
-
Method Detail
-
isEmpty
public boolean isEmpty()
-
setMatchPhraseItems
public void setMatchPhraseItems(boolean matchPhraseItems)
Set whether to match words contained in phrase items as well. Default is false - don't match words contained in phrase items
-
setMatchSingleItems
public void setMatchSingleItems(boolean matchSingleItems)
Sets whether single items should be matched and returned as phrase matches. Default is false.
-
setIgnorePluralForm
public void setIgnorePluralForm(boolean ignorePluralForm)
Sets whether we should ignore plural/singular form when matching
-
setMatchAll
public void setMatchAll(boolean matchAll)
Sets whether to return the longest matching phrase when there are overlapping matches (default), or all matching phrases
-
matchPhrases
public java.util.List<PhraseMatcher.Phrase> matchPhrases(Item queryItem)
Finds all phrases (word sequences of length 1 or higher) of the same index, not negative items of a notitem, which constitutes a complete entry in the automaton of this matcher- Parameters:
queryItem
- the root query item in which to match phrases- Returns:
- the matched phrases, or null if there was no matches
-
getNullMatcher
public static PhraseMatcher getNullMatcher()
Returns a phrase matcher which (quickly) never matches anything
-
-