Class Util


  • public final class Util
    extends Object
    Static helper methods.
    • Method Detail

      • get

        public static <T> T get​(FST<T> fst,
                                IntsRef input)
                         throws IOException
        Looks up the output for this input, or null if the input is not accepted.
        Throws:
        IOException
      • get

        public static <T> T get​(FST<T> fst,
                                BytesRef input)
                         throws IOException
        Looks up the output for this input, or null if the input is not accepted
        Throws:
        IOException
      • getByOutput

        public static IntsRef getByOutput​(FST<Long> fst,
                                          long targetOutput)
                                   throws IOException
        Reverse lookup (lookup by output instead of by input), in the special case when your FSTs outputs are strictly ascending. This locates the input/output pair where the output is equal to the target, and will return null if that output does not exist.

        NOTE: this only works with FST<Long>, only works when the outputs are ascending in order with the inputs. For example, simple ordinals (0, 1, 2, ...), or file offets (when appending to a file) fit this.

        Throws:
        IOException
      • shortestPaths

        public static <T> Util.MinResult<T>[] shortestPaths​(FST<T> fst,
                                                            FST.Arc<T> fromNode,
                                                            T startOutput,
                                                            Comparator<T> comparator,
                                                            int topN,
                                                            boolean allowEmptyString)
                                                     throws IOException
        Starting from node, find the top N min cost completions to a final node.
        Throws:
        IOException
      • toDot

        public static <T> void toDot​(FST<T> fst,
                                     Writer out,
                                     boolean sameRank,
                                     boolean labelStates)
                              throws IOException
        Dumps an FST to a GraphViz's dot language description for visualization. Example of use:
         PrintWriter pw = new PrintWriter("out.dot");
         Util.toDot(fst, pw, true, true);
         pw.close();
         
        and then, from command line:
         dot -Tpng -o out.png out.dot
         

        Note: larger FSTs (a few thousand nodes) won't even render, don't bother. If the FST is > 2.1 GB in size then this method will throw strange exceptions.

        Parameters:
        sameRank - If true, the resulting dot file will try to order states in layers of breadth-first traversal. This may mess up arcs, but makes the output FST's structure a bit clearer.
        labelStates - If true states will have labels equal to their offsets in their binary format. Expands the graph considerably.
        Throws:
        IOException
        See Also:
        "http://www.graphviz.org/"
      • toUTF16

        public static IntsRef toUTF16​(CharSequence s,
                                      IntsRef scratch)
        Just maps each UTF16 unit (char) to the ints in an IntsRef.
      • toUTF32

        public static IntsRef toUTF32​(CharSequence s,
                                      IntsRef scratch)
        Decodes the Unicode codepoints from the provided CharSequence and places them in the provided scratch IntsRef, which must not be null, returning it.
      • toUTF32

        public static IntsRef toUTF32​(char[] s,
                                      int offset,
                                      int length,
                                      IntsRef scratch)
        Decodes the Unicode codepoints from the provided char[] and places them in the provided scratch IntsRef, which must not be null, returning it.
      • toIntsRef

        public static IntsRef toIntsRef​(BytesRef input,
                                        IntsRef scratch)
        Just takes unsigned byte values from the BytesRef and converts into an IntsRef.
      • toBytesRef

        public static BytesRef toBytesRef​(IntsRef input,
                                          BytesRef scratch)
        Just converts IntsRef to BytesRef; you must ensure the int values fit into a byte.
      • readCeilArc

        public static <T> FST.Arc<T> readCeilArc​(int label,
                                                 FST<T> fst,
                                                 FST.Arc<T> follow,
                                                 FST.Arc<T> arc,
                                                 FST.BytesReader in)
                                          throws IOException
        Reads the first arc greater or equal that the given label into the provided arc in place and returns it iff found, otherwise return null.
        Parameters:
        label - the label to ceil on
        fst - the fst to operate on
        follow - the arc to follow reading the label from
        arc - the arc to read into in place
        in - the fst's FST.BytesReader
        Throws:
        IOException