Module io.github.mmm.scanner


module io.github.mmm.scanner
Provides scanners that help to parse character sequences efficient and easily.

Scanner API

For efficient parsers of complex grammars it is best practice to use a parser generator like javacc or antlr.
However in some situations it is more suitable to write a handwritten parser. The tradeoff is that this may result in ugly monolithic code that is hard to maintain.
The CharStreamScanner is an interface that covers typical tasks when paring strings or streams and therefore makes your life a lot easier. You can concentrate on the syntax you want to parse and do NOT need to repeat checks if the end is already reached all the time. For parsing enitre streams (e.g. from a Reader) there is the implementation CharReaderScanner while for simple Strings there is the implementation CharSequenceScanner. In any case the entire data and state (parsing position) is encapsulated so you can easily delegate a step to another method or class. Otherwise you would need to pass the current position to that method and return the new one from there. This is tricky if the method should already return something else.
As a motivation and anti-pattern, here is a little example of an entirely handwritten parser:
 String input = getInputString();
 int i = 0;
 boolean colonFound = false;
 while (i < input.length()) {
   char c = input.charAt(i++);
   if (c == ':') {
     colonFound = true;
     break;
   }
 }
 if (!colonFound) {
   throw new IllegalArgumentException("Expected character ':' not found!");
 }
 String key = input.substring(0, i - 1);
 String value = null;
 if (i < input.length()) {
   while ((i < input.length()) && (input.charAt(i) == ' ')) {
     i++;
   }
   int start = i;
   while (i < input.length()) {
     char c = input.charAt(i);
     if ((c < '0') || (c > '9')) {
       break;
     }
     i++;
   }
   value = input.substring(start, i);
 }
 
Here is the same thing when using CharSequenceScanner:
 String input = getInputString();
 CharStreamScanner scanner = new CharSequenceScanner(input);
 String key = scanner.readUntil(':', false);
 if (key == null) {
   throw new IllegalArgumentException("Expected character ':' not found!");
 }
 scanner.skipWhile(' ');
 String value = scanner.readWhile(CharFilter.LATIN_DIGIT);
 
This is just a simple example. The API offers all real-live scenarios you will need to parse your data. The implementations are highly efficient and internally directly operate on char[]. Streaming implementations use optimized lookahead buffers that can even be configured at construction time.