Package io.debezium.text
Class TokenStream.BasicTokenizer
- java.lang.Object
-
- io.debezium.text.TokenStream.BasicTokenizer
-
- All Implemented Interfaces:
TokenStream.Tokenizer
- Enclosing class:
- TokenStream
public static class TokenStream.BasicTokenizer extends Object implements TokenStream.Tokenizer
A basicTokenStream.Tokenizer
implementation that ignores whitespace but includes tokens for individual symbols, the period ('.'), single-quoted strings, double-quoted strings, whitespace-delimited words, and optionally comments.Note this Tokenizer may not be appropriate in many situations, but is provided merely as a convenience for those situations that happen to be able to use it.
-
-
Field Summary
Fields Modifier and Type Field Description static int
COMMENT
Thetoken type
for tokens that consist of all the characters between "/*" and "*/" or between "//" and the next line terminator (e.g., '\n', '\r' or "\r\n").static int
DECIMAL
Thetoken type
for tokens that consist of an individual '.' character.static int
DOUBLE_QUOTED_STRING
Thetoken type
for tokens that consist of all the characters within double-quotes.static int
SINGLE_QUOTED_STRING
Thetoken type
for tokens that consist of all the characters within single-quotes.static int
SYMBOL
Thetoken type
for tokens that consist of an individual "symbol" character.private boolean
useComments
static int
WORD
Thetoken type
for tokens that represent an unquoted string containing a character sequence made up of non-whitespace and non-symbol characters.
-
Constructor Summary
Constructors Modifier Constructor Description protected
BasicTokenizer(boolean useComments)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
tokenize(TokenStream.CharacterStream input, TokenStream.Tokens tokens)
Process the supplied characters and construct the appropriateTokenStream.Token
objects.
-
-
-
Field Detail
-
WORD
public static final int WORD
Thetoken type
for tokens that represent an unquoted string containing a character sequence made up of non-whitespace and non-symbol characters.- See Also:
- Constant Field Values
-
SYMBOL
public static final int SYMBOL
Thetoken type
for tokens that consist of an individual "symbol" character. The set of characters includes:-(){}*,;+%?$[]!<>|=:
- See Also:
- Constant Field Values
-
DECIMAL
public static final int DECIMAL
Thetoken type
for tokens that consist of an individual '.' character.- See Also:
- Constant Field Values
-
SINGLE_QUOTED_STRING
public static final int SINGLE_QUOTED_STRING
Thetoken type
for tokens that consist of all the characters within single-quotes. Single quote characters are included if they are preceded (escaped) by a '\' character.- See Also:
- Constant Field Values
-
DOUBLE_QUOTED_STRING
public static final int DOUBLE_QUOTED_STRING
Thetoken type
for tokens that consist of all the characters within double-quotes. Double quote characters are included if they are preceded (escaped) by a '\' character.- See Also:
- Constant Field Values
-
COMMENT
public static final int COMMENT
Thetoken type
for tokens that consist of all the characters between "/*" and "*/" or between "//" and the next line terminator (e.g., '\n', '\r' or "\r\n").- See Also:
- Constant Field Values
-
useComments
private final boolean useComments
-
-
Method Detail
-
tokenize
public void tokenize(TokenStream.CharacterStream input, TokenStream.Tokens tokens) throws ParsingException
Description copied from interface:TokenStream.Tokenizer
Process the supplied characters and construct the appropriateTokenStream.Token
objects.- Specified by:
tokenize
in interfaceTokenStream.Tokenizer
- Parameters:
input
- the character input stream; never nulltokens
- the factory forTokenStream.Token
objects, which records the order in which the tokens are created- Throws:
ParsingException
- if there is an error while processing the character stream (e.g., a quote is not closed, etc.)
-
-