Class CharSequenceScanner

java.lang.Object
io.github.mmm.scanner.AbstractCharStreamScanner
io.github.mmm.scanner.CharSequenceScanner
All Implemented Interfaces:
TextFormatProcessor, TextPosition, CharStreamScanner, AutoCloseable

public class CharSequenceScanner extends AbstractCharStreamScanner
This class represents a String or better a sequence of characters (char[]) together with a position in that sequence.
It has various useful methods for scanning the sequence. This scanner is designed to be fast on long sequences and therefore internally converts Strings to a char array instead of frequently calling String.charAt(int).
ATTENTION:
This implementation is NOT thread-safe (intended by design).
Since:
1.0.0
  • Constructor Details

    • CharSequenceScanner

      public CharSequenceScanner(CharSequence charSequence)
      The constructor.
      Parameters:
      charSequence - is the string to scan.
    • CharSequenceScanner

      public CharSequenceScanner(CharSequence charSequence, TextFormatMessageHandler messageHandler)
      The constructor.
      Parameters:
      charSequence - is the string to scan.
      messageHandler - the TextFormatMessageHandler.
    • CharSequenceScanner

      public CharSequenceScanner(String string)
      The constructor.
      Parameters:
      string - is the string to parse.
    • CharSequenceScanner

      public CharSequenceScanner(String string, TextFormatMessageHandler messageHandler)
      The constructor.
      Parameters:
      string - is the string to parse.
      messageHandler - the TextFormatMessageHandler.
    • CharSequenceScanner

      public CharSequenceScanner(String string, TextFormatMessageHandler messageHandler, int line, int column)
      The constructor.
      Parameters:
      string - is the string to parse.
      messageHandler - the TextFormatMessageHandler.
      line - the initial line.
      column - the initial column.
    • CharSequenceScanner

      public CharSequenceScanner(char[] characters)
      The constructor.
      Parameters:
      characters - is an array containing the characters to scan.
    • CharSequenceScanner

      public CharSequenceScanner(char[] characters, TextFormatMessageHandler messageHandler)
      The constructor.
      Parameters:
      characters - is an array containing the characters to scan.
      messageHandler - the TextFormatMessageHandler.
    • CharSequenceScanner

      public CharSequenceScanner(char[] characters, TextFormatMessageHandler messageHandler, int line, int column)
      The constructor.
      Parameters:
      characters - is an array containing the characters to scan.
      messageHandler - the TextFormatMessageHandler.
      line - the initial line.
      column - the initial column.
    • CharSequenceScanner

      public CharSequenceScanner(char[] characters, int offset, int length)
      The constructor.
      Parameters:
      characters - is an array containing the characters to scan.
      offset - is the index of the first char to scan in characters (typically 0 to start at the beginning of the array).
      length - is the number of characters to scan from characters starting at offset (typically characters.length - offset).
    • CharSequenceScanner

      public CharSequenceScanner(char[] characters, int offset, int length, TextFormatMessageHandler messageHandler)
      The constructor.
      Parameters:
      characters - is an array containing the characters to scan.
      offset - is the index of the first char to scan in characters (typically 0 to start at the beginning of the array).
      length - is the number of characters to scan from characters starting at offset (typically characters.length - offset).
      messageHandler - the TextFormatMessageHandler.
    • CharSequenceScanner

      public CharSequenceScanner(char[] characters, int offset, int length, TextFormatMessageHandler messageHandler, int line, int column)
      The constructor.
      Parameters:
      characters - is an array containing the characters to scan.
      offset - is the index of the first char to scan in characters (typically 0 to start at the beginning of the array).
      length - is the number of characters to scan from characters starting at offset (typically characters.length - offset).
      messageHandler - the TextFormatMessageHandler.
      line - the initial line.
      column - the initial column.
  • Method Details

    • charAt

      public char charAt(int index)
      Parameters:
      index - is the index of the requested character.
      Returns:
      the character at the given index.
      See Also:
    • getPosition

      public int getPosition()
      Returns:
      the position in the sequence to scan or in other words the number of characters that have been read. Will initially be 0. Please note that this API is designed for scanning textual content (for parsers). Therefore we consider 2.1 terabyte as a suitable limit.
    • getLength

      public int getLength()
      Returns:
      the total length of the string to parse.
      See Also:
    • substring

      public String substring(int start, int end)
      Parameters:
      start - the start index, inclusive.
      end - the end index, exclusive.
      Returns:
      the specified substring.
      See Also:
    • getReplaced

      public String getReplaced(String substitute, int start, int end)
      This method gets the original string where the substring specified by start and end is replaced by substitute.
      Parameters:
      substitute - is the string used as replacement.
      start - is the inclusive start index of the substring to replace.
      end - is the exclusive end index of the substring to replace.
      Returns:
      the original string with the specified substring replaced by substitute.
    • appendSubstring

      public void appendSubstring(StringBuilder appendable, int start, int end)
      This method appends the substring specified by start and end to the given buffer.
      This avoids the overhead of creating a new string and copying the char array.
      Parameters:
      appendable - is the buffer where to append the substring to.
      start - the start index, inclusive.
      end - the end index, exclusive.
    • getCurrentIndex

      public int getCurrentIndex()
      This method gets the current position in the stream to scan. It will initially be 0. In other words this method returns the number of characters that have already been consumed.
      Returns:
      the current index position.
    • setCurrentIndex

      public void setCurrentIndex(int index)
      This method sets the current index.
      Parameters:
      index - is the next index position to set. The value has to be greater or equal to 0 and less or equal to getLength() .
    • hasNext

      public boolean hasNext()
      Description copied from interface: CharStreamScanner
      This method determines if there is at least one more character available.
      Specified by:
      hasNext in interface CharStreamScanner
      Overrides:
      hasNext in class AbstractCharStreamScanner
      Returns:
      true if there is at least one character available, false if the end of data has been reached.
    • next

      public char next()
      Description copied from interface: CharStreamScanner
      This method reads the current character from the stream and increments the index stepping to the next character. You should check if a character is available before calling this method. Otherwise if your stream may contain the NUL character ('\0') you can not distinguish if the end of the stream was reached or you actually read the NUL character.
      Specified by:
      next in interface CharStreamScanner
      Overrides:
      next in class AbstractCharStreamScanner
      Returns:
      the CharStreamScanner.next() character or CharStreamScanner.EOS if none is available.
    • peek

      public char peek()
      Description copied from interface: CharStreamScanner
      This method reads the current character without consuming characters and will therefore not change the state of this scanner.
      Specified by:
      peek in interface CharStreamScanner
      Overrides:
      peek in class AbstractCharStreamScanner
      Returns:
      the current character or CharStreamScanner.EOS if none is available.
    • peek

      public char peek(int lookaheadOffset)
      Description copied from interface: CharStreamScanner
      Like CharStreamScanner.peek() but with further lookahead.
      Attention:
      This method requires lookahead. For implementations that are backed by an underlying stream (or reader) the given lookaheadOffset shall not exceed the available lookahead size (buffer capacity given at construction time). Otherwise the method may fail.
      Parameters:
      lookaheadOffset - the lookahead offset. If 0 this method will behave like CharStreamScanner.peek(). In case of 1 it will return the character after the next one and so forth.
      Returns:
      the peeked character at the given lookaheadOffset or CharStreamScanner.EOS if no such character exists.
    • peekString

      public String peekString(int count)
      This method peeks the number of next characters given by count and returns them as string. If there are less characters available the returned string will be shorter than count and only contain the available characters. Unlike AbstractCharStreamScanner.read(int) this method does NOT consume the characters and will therefore NOT change the state of this scanner.
      Parameters:
      count - is the number of characters to peek. You may use Integer.MAX_VALUE to peek until the end of text (EOT) if the data-size is suitable.
      Returns:
      a string with the given number of characters or all available characters if less than count. Will be the empty string if no character is available at all.
    • peekWhile

      public String peekWhile(CharFilter filter, int maxLen)
      Parameters:
      filter - the CharFilter accepting only the characters to peek.
      maxLen - the maximum number of characters to peek (to get as lookahead without modifying this stream).
      Returns:
      a String with the peeked characters of the given maxLen or less if a character was hit that is not accepted by the given filter or the end-of-text has been reached before. The state of this stream remains unchanged.
      See Also:
    • readUntil

      public String readUntil(CharFilter filter, boolean acceptEot)
      Description copied from interface: CharStreamScanner
      This method reads all next characters until the first character accepted by the given filter or the end is reached.
      After the call of this method, the current index will point to the first accepted stop character or to the end if NO such character exists.
      Specified by:
      readUntil in interface CharStreamScanner
      Overrides:
      readUntil in class AbstractCharStreamScanner
      Parameters:
      filter - is used to decide where to stop.
      acceptEot - if true if end of data should be treated like the stop character and the rest of the text will be returned, false otherwise (to return null if the end of data was reached and the scanner has been consumed).
      Returns:
      the string with all read characters not accepted by the given CharFilter or null if there was no accepted character and acceptEnd is false.
    • expectRestWithLookahead

      protected boolean expectRestWithLookahead(char[] stopChars, boolean ignoreCase, Runnable appender, boolean skip)
      Specified by:
      expectRestWithLookahead in class AbstractCharStreamScanner
      Parameters:
      stopChars - the stop String as char[]. If ignoreCase is true in lower case.
      ignoreCase - - true to (also) compare chars in lower case, false otherwise.
      appender - an optional lambda to run before shifting buffers to append data.
      skip - - true to update buffers and offset such that on success this scanner points after the expected stop String, false otherwise (to not consume any character in any case).
      Returns:
      true if the stop String (stopChars) was found and consumed, false otherwise (and no data consumed).
      See Also:
    • expect

      public boolean expect(String expected, boolean ignoreCase, boolean lookahead, int off)
      Description copied from interface: CharStreamScanner
      This method determines if the given expected String is completely present at the current position. It will only consume characters and change the state if lookahead is false and the expected String was found (entirely).
      Attention:
      This method requires lookahead. For implementations that are backed by an underlying stream (or reader) the length of the expected String shall not exceed the available lookahead size (buffer capacity given at construction time). Otherwise the method may fail.
      Parameters:
      expected - the expected String to search for.
      ignoreCase - - if true the case of the characters is ignored when compared, false otherwise.
      lookahead - - if true the state of the scanner remains unchanged even if the expected String has been found, false otherwise (expected String is consumed on match).
      off - the number of characters that have already been peeked and after which the given String is expected. Will typically be 0. If lookahead is false and the expected String was found these characters will be skipped together with the expected String.
      Returns:
      true if the expected string was successfully found, false otherwise.
    • getTail

      protected String getTail()
      This method gets the tail of this scanner without changing the state.
      Returns:
      the tail of this scanner.
    • getTail

      protected String getTail(int maximum)
      This method gets the tail of this scanner limited (truncated) to the given maximum number of characters without changing the state.
      Parameters:
      maximum - is the maximum number of characters to return from the tail.
      Returns:
      the tail of this scanner.
    • require

      public void require(String expected, boolean ignoreCase)
      Description copied from interface: CharStreamScanner
      This method verifies that the expected string gets consumed from this scanner with respect to ignoreCase. Otherwise an exception is thrown indicating the problem.
      This method behaves functionally equivalent to the following code:
       if (!scanner.expectUnsafe(expected, ignoreCase)) {
         throw new IllegalStateException(...);
       }
       
      Specified by:
      require in interface CharStreamScanner
      Overrides:
      require in class AbstractCharStreamScanner
      Parameters:
      expected - is the expected string.
      ignoreCase - - if true the case of the characters is ignored during comparison.
    • readWhile

      public String readWhile(CharFilter filter, int min, int max)
      Description copied from interface: CharStreamScanner
      This method reads all next characters that are accepted by the given filter.
      After the call of this method, the current index will point to the next character that was NOT accepted by the given filter. If the next max characters or the characters left until the end of this scanner are accepted, only that amount of characters are skipped.
      Specified by:
      readWhile in interface CharStreamScanner
      Overrides:
      readWhile in class AbstractCharStreamScanner
      Parameters:
      filter - used to decide which characters should be accepted.
      min - the minimum number of characters expected.
      max - the maximum number of characters that should be read.
      Returns:
      a string with all characters accepted by the given filter limited to the length of max and the end of this scanner. Will be the empty string if no character was accepted.
      See Also:
    • getOriginalString

      public String getOriginalString()
      This method gets the original string to parse.
      Returns:
      the original string.
      See Also:
    • close

      public void close()