Class HyphenationTree
java.lang.Object
org.apache.lucene.analysis.compound.hyphenation.TernaryTree
org.apache.lucene.analysis.compound.hyphenation.HyphenationTree
- All Implemented Interfaces:
Cloneable
,PatternConsumer
This tree structure stores the hyphenation patterns in an efficient way for
fast lookup. It provides the provides the method to hyphenate a word.
This class has been taken from the Apache FOP project (http://xmlgraphics.apache.org/fop/). They have been slightly modified.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.analysis.compound.hyphenation.TernaryTree
TernaryTree.Iterator
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
Add a character class to the tree.void
addException
(String word, ArrayList<Object> hyphenatedword) Add an exception to the tree.void
addPattern
(String pattern, String ivalue) Add a pattern to the tree.findPattern
(String pat) hyphenate
(char[] w, int offset, int len, int remainCharCount, int pushCharCount) Hyphenate word and return an array of hyphenation points.Hyphenate word and return a Hyphenation object.void
loadPatterns
(File f) Read hyphenation patterns from an XML file.void
loadPatterns
(InputSource source) Read hyphenation patterns from an XML file.void
printStats
(PrintStream out)
-
Constructor Details
-
HyphenationTree
public HyphenationTree()
-
-
Method Details
-
loadPatterns
Read hyphenation patterns from an XML file.- Parameters:
f
- the filename- Throws:
IOException
- In case the parsing fails
-
loadPatterns
Read hyphenation patterns from an XML file.- Parameters:
source
- the InputSource for the file- Throws:
IOException
- In case the parsing fails
-
findPattern
-
hyphenate
Hyphenate word and return a Hyphenation object.- Parameters:
word
- the word to be hyphenatedremainCharCount
- Minimum number of characters allowed before the hyphenation point.pushCharCount
- Minimum number of characters allowed after the hyphenation point.- Returns:
- a
Hyphenation
object representing the hyphenated word or null if word is not hyphenated.
-
hyphenate
Hyphenate word and return an array of hyphenation points.- Parameters:
w
- char array that contains the wordoffset
- Offset to first character in wordlen
- Length of wordremainCharCount
- Minimum number of characters allowed before the hyphenation point.pushCharCount
- Minimum number of characters allowed after the hyphenation point.- Returns:
- a
Hyphenation
object representing the hyphenated word or null if word is not hyphenated.
-
addClass
Add a character class to the tree. It is used byPatternParser
as callback to add character classes. Character classes define the valid word characters for hyphenation. If a word contains a character not defined in any of the classes, it is not hyphenated. It also defines a way to normalize the characters in order to compare them with the stored patterns. Usually pattern files use only lower case characters, in this case a class for letter 'a', for example, should be defined as "aA", the first character being the normalization char.- Specified by:
addClass
in interfacePatternConsumer
- Parameters:
chargroup
- character group
-
addException
Add an exception to the tree. It is used byPatternParser
class as callback to store the hyphenation exceptions.- Specified by:
addException
in interfacePatternConsumer
- Parameters:
word
- normalized wordhyphenatedword
- a vector of alternating strings andhyphen
objects.
-
addPattern
Add a pattern to the tree. Mainly, to be used byPatternParser
class as callback to add a pattern to the tree.- Specified by:
addPattern
in interfacePatternConsumer
- Parameters:
pattern
- the hyphenation patternivalue
- interletter weight values indicating the desirability and priority of hyphenating at a given point within the pattern. It should contain only digit characters. (i.e. '0' to '9').
-
printStats
- Overrides:
printStats
in classTernaryTree
-