object
Normalizer
Value Members
-
final
def
!=(arg0: AnyRef): Boolean
-
final
def
!=(arg0: Any): Boolean
-
final
def
##(): Int
-
final
def
==(arg0: AnyRef): Boolean
-
final
def
==(arg0: Any): Boolean
-
def
alphabet(ast: Node): Set[Char]
-
final
def
asInstanceOf[T0]: T0
-
def
clone(): AnyRef
-
final
def
eq(arg0: AnyRef): Boolean
-
def
equals(arg0: Any): Boolean
-
def
finalize(): Unit
-
final
def
getClass(): Class[_]
-
def
hashCode(): Int
-
final
def
isInstanceOf[T0]: Boolean
-
final
def
ne(arg0: AnyRef): Boolean
-
def
normalize(tree: Node, alphabet: Set[SglChar]): Node
-
final
def
notify(): Unit
-
final
def
notifyAll(): Unit
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
-
def
toString(): String
-
final
def
wait(): Unit
-
final
def
wait(arg0: Long, arg1: Int): Unit
-
final
def
wait(arg0: Long): Unit
Inherited from AnyRef
Inherited from Any
Regular expressions can have character classes and wildcards. In order to produce a NFA, they should be expanded to disjunctions. In the case of wildcards or negated characted classes, the complete alphabet must also be known to produce the expansion:
Example transformations with alphabet: abcdefgh
[abc] -> a|b|c [abc] -> d|e|f|g|h def[abc] -> def(d|e|f|g|h) . -> a|b|c|d|e|f|g|h abc. -> abc(a|b|c|d|e|f|g|h)
As the alphabet can be potentially huge (such as unicode is) something must be done to reduce the number of disjunctions:
[abc] -> a|b|c [abc] -> <other_char> def[abc] -> def(d|e|f|<other_char>) . -> <other_char> abc. -> abc(a|b|c|<other_char>)
Where <other_char> is a special metacharacter that matches any of the characters of the alphabet not present in the regex. Note that with this technique knowing the whole alphabet explicitly is not needed.
Care must be taken when the regex is meant to be used for an operation with another regex (such as intersection or difference). In this case, <other_char> must match only the characters present in neither regex. Example:
Regex space: [abc] and [cd] Characters present in any regex: abcd [abc] -> a|b|c [cd] -> a|b|<other_char>