Package io.debezium.text
Class TokenStream.BasicTokenizer
java.lang.Object
io.debezium.text.TokenStream.BasicTokenizer
- All Implemented Interfaces:
TokenStream.Tokenizer
- Enclosing class:
TokenStream
A basic
TokenStream.Tokenizer
implementation that ignores whitespace but includes tokens for individual symbols, the period
('.'), single-quoted strings, double-quoted strings, whitespace-delimited words, and optionally comments.
Note this Tokenizer may not be appropriate in many situations, but is provided merely as a convenience for those situations that happen to be able to use it.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final int
Thetoken type
for tokens that consist of all the characters between "/*" and "*/" or between "//" and the next line terminator (e.g., '\n', '\r' or "\r\n").static final int
Thetoken type
for tokens that consist of an individual '.' character.static final int
Thetoken type
for tokens that consist of all the characters within double-quotes.static final int
Thetoken type
for tokens that consist of all the characters within single-quotes.static final int
Thetoken type
for tokens that consist of an individual "symbol" character.private final boolean
static final int
Thetoken type
for tokens that represent an unquoted string containing a character sequence made up of non-whitespace and non-symbol characters. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
tokenize
(TokenStream.CharacterStream input, TokenStream.Tokens tokens) Process the supplied characters and construct the appropriateTokenStream.Token
objects.
-
Field Details
-
WORD
public static final int WORDThetoken type
for tokens that represent an unquoted string containing a character sequence made up of non-whitespace and non-symbol characters.- See Also:
-
SYMBOL
public static final int SYMBOLThetoken type
for tokens that consist of an individual "symbol" character. The set of characters includes:-(){}*,;+%?$[]!invalid input: '<'>|=:
- See Also:
-
DECIMAL
public static final int DECIMALThetoken type
for tokens that consist of an individual '.' character.- See Also:
-
SINGLE_QUOTED_STRING
public static final int SINGLE_QUOTED_STRINGThetoken type
for tokens that consist of all the characters within single-quotes. Single quote characters are included if they are preceded (escaped) by a '\' character.- See Also:
-
DOUBLE_QUOTED_STRING
public static final int DOUBLE_QUOTED_STRINGThetoken type
for tokens that consist of all the characters within double-quotes. Double quote characters are included if they are preceded (escaped) by a '\' character.- See Also:
-
COMMENT
public static final int COMMENTThetoken type
for tokens that consist of all the characters between "/*" and "*/" or between "//" and the next line terminator (e.g., '\n', '\r' or "\r\n").- See Also:
-
useComments
private final boolean useComments
-
-
Constructor Details
-
BasicTokenizer
protected BasicTokenizer(boolean useComments)
-
-
Method Details
-
tokenize
public void tokenize(TokenStream.CharacterStream input, TokenStream.Tokens tokens) throws ParsingException Description copied from interface:TokenStream.Tokenizer
Process the supplied characters and construct the appropriateTokenStream.Token
objects.- Specified by:
tokenize
in interfaceTokenStream.Tokenizer
- Parameters:
input
- the character input stream; never nulltokens
- the factory forTokenStream.Token
objects, which records the order in which the tokens are created- Throws:
ParsingException
- if there is an error while processing the character stream (e.g., a quote is not closed, etc.)
-