Class RegExpGenerator


  • public class RegExpGenerator
    extends Object
    Analyze a set of strings and return a suitable Regular Expression. Unlikely to be an optimal Regular Expression!!

    Typical usage is:

     
     		RegExpGenerator generator = new RegExpGenerator();
    
     		generator.train("janv.");
     		generator.train("oct");
     		generator.train("dec.");
     		...
    
     		String result = generator.getResult();
     
     
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      String getResult()
      Given the set of Strings trained (See @link #train(String)) return a Regular Expression which will accept any of the training set.
      Set<String> getValues()
      Get the set of Strings (in upper case) used to train the Generator.
      boolean isDigit()  
      boolean isOther()  
      static boolean isSpecial​(char ch)
      Is the supplied character reserved a special meaning in Regular Expressions? Note: We do not declare '-' as a special character, so should not be used in a Character Class
      static String merge​(String firstRE, String secondRE)  
      static String slosh​(char ch)  
      static String slosh​(String input)
      Return an escaped String (similar to Pattern.quote but not unconditional).
      static String toAutomatonRE​(String regExp, boolean onlyASCII)
      Map a set of "well-known" Regexp's to Unicode Character Classes that the Automaton package supports.
      void train​(String input)
      This method should be called for each string in the set.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • RegExpGenerator

        public RegExpGenerator()
      • RegExpGenerator

        public RegExpGenerator​(int maxSetSize,
                               Locale locale)
    • Method Detail

      • isSpecial

        public static boolean isSpecial​(char ch)
        Is the supplied character reserved a special meaning in Regular Expressions? Note: We do not declare '-' as a special character, so should not be used in a Character Class
        Parameters:
        ch - The character to test.
        Returns:
        True if the character is reserved.
      • slosh

        public static String slosh​(char ch)
      • merge

        public static String merge​(String firstRE,
                                   String secondRE)
      • slosh

        public static String slosh​(String input)
        Return an escaped String (similar to Pattern.quote but not unconditional).
        Parameters:
        input - The String to be protected.
        Returns:
        An escaped String.
      • isOther

        public boolean isOther()
      • isDigit

        public boolean isDigit()
      • train

        public void train​(String input)
        This method should be called for each string in the set.
        Parameters:
        input - The String to be used as part of the set.
      • getResult

        public String getResult()
        Given the set of Strings trained (See @link #train(String)) return a Regular Expression which will accept any of the training set.
        Returns:
        A regular expression matching the training set.
      • getValues

        public Set<String> getValues()
        Get the set of Strings (in upper case) used to train the Generator.
        Returns:
        The set of Strings (in upper case).
      • toAutomatonRE

        public static String toAutomatonRE​(String regExp,
                                           boolean onlyASCII)
        Map a set of "well-known" Regexp's to Unicode Character Classes that the Automaton package supports.
        Parameters:
        regExp - A String Java Regular Expression.
        onlyASCII - If true then generate simple ASCII only regexps, otherwise utilize Unicode Character Classes.
        Returns:
        The Automaton friendly RegExp.