Class HyphenationTree

  • All Implemented Interfaces:
    Cloneable, PatternConsumer

    public class HyphenationTree
    extends TernaryTree
    implements PatternConsumer
    This tree structure stores the hyphenation patterns in an efficient way for fast lookup. It provides the provides the method to hyphenate a word. This class has been taken from the Apache FOP project (http://xmlgraphics.apache.org/fop/). They have been slightly modified.
    • Constructor Detail

      • HyphenationTree

        public HyphenationTree()
    • Method Detail

      • loadPatterns

        public void loadPatterns​(File f)
                          throws IOException
        Read hyphenation patterns from an XML file.
        Parameters:
        f - the filename
        Throws:
        IOException - In case the parsing fails
      • loadPatterns

        public void loadPatterns​(InputSource source)
                          throws IOException
        Read hyphenation patterns from an XML file.
        Parameters:
        source - the InputSource for the file
        Throws:
        IOException - In case the parsing fails
      • hyphenate

        public Hyphenation hyphenate​(String word,
                                     int remainCharCount,
                                     int pushCharCount)
        Hyphenate word and return a Hyphenation object.
        Parameters:
        word - the word to be hyphenated
        remainCharCount - Minimum number of characters allowed before the hyphenation point.
        pushCharCount - Minimum number of characters allowed after the hyphenation point.
        Returns:
        a Hyphenation object representing the hyphenated word or null if word is not hyphenated.
      • hyphenate

        public Hyphenation hyphenate​(char[] w,
                                     int offset,
                                     int len,
                                     int remainCharCount,
                                     int pushCharCount)
        Hyphenate word and return an array of hyphenation points.
        Parameters:
        w - char array that contains the word
        offset - Offset to first character in word
        len - Length of word
        remainCharCount - Minimum number of characters allowed before the hyphenation point.
        pushCharCount - Minimum number of characters allowed after the hyphenation point.
        Returns:
        a Hyphenation object representing the hyphenated word or null if word is not hyphenated.
      • addClass

        public void addClass​(String chargroup)
        Add a character class to the tree. It is used by PatternParser as callback to add character classes. Character classes define the valid word characters for hyphenation. If a word contains a character not defined in any of the classes, it is not hyphenated. It also defines a way to normalize the characters in order to compare them with the stored patterns. Usually pattern files use only lower case characters, in this case a class for letter 'a', for example, should be defined as "aA", the first character being the normalization char.
        Specified by:
        addClass in interface PatternConsumer
        Parameters:
        chargroup - character group
      • addException

        public void addException​(String word,
                                 ArrayList<Object> hyphenatedword)
        Add an exception to the tree. It is used by PatternParser class as callback to store the hyphenation exceptions.
        Specified by:
        addException in interface PatternConsumer
        Parameters:
        word - normalized word
        hyphenatedword - a vector of alternating strings and hyphen objects.
      • addPattern

        public void addPattern​(String pattern,
                               String ivalue)
        Add a pattern to the tree. Mainly, to be used by PatternParser class as callback to add a pattern to the tree.
        Specified by:
        addPattern in interface PatternConsumer
        Parameters:
        pattern - the hyphenation pattern
        ivalue - interletter weight values indicating the desirability and priority of hyphenating at a given point within the pattern. It should contain only digit characters. (i.e. '0' to '9').