F
- the format supported by this parser.public abstract class CommonParserSettings<F extends Format> extends CommonSettings<F>
AbstractParser
)
By default, all parsers work with, at least, the following configuration options in addition to the ones provided by CommonSettings
:
RowProcessor
which handles the life cycle of the parsing process and processes each record extracted from the inputCommonSettings
) should be reordered.
When disabled, each parsed record will contain values for all columns, in the order they occur in the input. Fields which were not selected will not be parsed but and the record will contain empty values.
When enabled, each parsed record will contain values only for the selected columns. The values will be ordered according to the selection.
When enabled, a reading thread (in input.concurrent.ConcurrentCharInputReader
) will be started and load characters from the input, while the parser is processing its input buffer.
This yields better performance, especially when reading from big input (greater than 100 mb)
When disabled, the parsing process will briefly pause so the buffer can be replenished every time it is exhausted (in DefaultCharInputReader
it is not as bad or slow as it sounds, and can even be (slightly) more efficient if your input is small)
RowProcessor
,
CsvParserSettings
,
FixedWidthParserSettings
Modifier and Type | Field and Description |
---|---|
protected Boolean |
headerExtractionEnabled |
Constructor and Description |
---|
CommonParserSettings() |
Modifier and Type | Method and Description |
---|---|
protected void |
addConfiguration(Map<String,Object> out) |
void |
addInputAnalysisProcess(InputAnalysisProcess inputAnalysisProcess)
Provides a custom
InputAnalysisProcess to analyze the input buffer and potentially discover configuration options such as
column separators is CSV, data formats, etc. |
protected void |
clearInputSpecificSettings()
Clears settings that are likely to be specific to a given input.
|
protected CommonParserSettings |
clone()
Clones this configuration object.
|
protected CommonParserSettings |
clone(boolean clearInputSpecificSettings)
Clones this configuration object to reuse user-provided settings.
|
protected void |
configureFromAnnotations(Class<?> beanClass)
Configures the parser based on the annotations provided in a given class
|
List<InputAnalysisProcess> |
getInputAnalysisProcesses()
Returns the sequence of
InputAnalysisProcess to be used for analyzing the input buffer and potentially discover configuration options such as
column separators is CSV, data formats, etc. |
int |
getInputBufferSize()
Informs the number of characters held by the parser's buffer when processing the input (defaults to 1024*1024 characters).
|
long |
getNumberOfRecordsToRead()
The number of valid records to be parsed before the process is stopped.
|
long |
getNumberOfRowsToSkip()
Returns the number of rows to skip from the input before the parser can begin to execute.
|
<T extends Context> |
getProcessor()
Returns the callback implementation of the interface
Processor which handles the lifecycle of the parsing process and processes each record
extracted from the input |
boolean |
getReadInputOnSeparateThread()
Indicates whether or not a separate thread will be used to read characters from the input while parsing (defaults true if the number of available
processors at runtime is greater than 1)
|
RowProcessor |
getRowProcessor()
Deprecated.
Use the
getProcessor() method as it allows format-specific processors to be built to work with different implementations of Context .
Implementations based on RowProcessor allow only parsers who provide a ParsingContext to be used. |
boolean |
isAutoClosingEnabled()
Indicates whether automatic closing of the input (reader, stream, etc)
is enabled.
|
boolean |
isColumnReorderingEnabled()
Indicates whether fields selected using the field selection methods (defined by the parent class
CommonSettings ) should be reordered (defaults to
true). |
boolean |
isCommentCollectionEnabled()
Indicates that comments found in the input must be collected (disabled by default).
|
boolean |
isCommentProcessingEnabled()
Indicates whether code will check for comment line in the data file
is enabled.
|
boolean |
isHeaderExtractionEnabled()
Indicates whether or not the first valid record parsed from the input should be considered as the row containing the names of each column
|
boolean |
isLineSeparatorDetectionEnabled()
Indicates whether the parser should detect the line separator automatically.
|
protected CharAppender |
newCharAppender()
Returns an instance of CharAppender with the configured limit of maximum characters per column and the default value used to represent a null value (when
the String parsed from the input is empty)
|
protected CharInputReader |
newCharInputReader(int whitespaceRangeStart)
An implementation of
CharInputReader which loads the parser buffer in parallel or sequentially, as defined by the readInputOnSeparateThread
property |
void |
setAutoClosingEnabled(boolean autoClosingEnabled)
Configures whether the parser should always close the input (reader, stream, etc) automatically
when all records have been parsed or when an error occurs.
|
void |
setColumnReorderingEnabled(boolean columnReorderingEnabled)
Defines whether fields selected using the field selection methods (defined by the parent class
CommonSettings ) should be reordered (defaults to
true). |
void |
setCommentCollectionEnabled(boolean commentCollectionEnabled)
Enables collection of comments found in the input (disabled by default).
|
void |
setCommentProcessingEnabled(boolean commentProcessingEnabled)
Configures whether the parser will check for the comment line in the file
Defaults to
true |
void |
setHeaderExtractionEnabled(boolean headerExtractionEnabled)
Defines whether or not the first valid record parsed from the input should be considered as the row containing the names of each column
|
void |
setInputBufferSize(int inputBufferSize)
Defines the number of characters held by the parser's buffer when processing the input (defaults to 1024*1024 characters).
|
void |
setLineSeparatorDetectionEnabled(boolean lineSeparatorDetectionEnabled)
Defines whether the parser should detect the line separator automatically.
|
void |
setNumberOfRecordsToRead(long numberOfRecordsToRead)
Defines the number of valid records to be parsed before the process is stopped.
|
void |
setNumberOfRowsToSkip(long numberOfRowsToSkip)
Defines a number of rows to skip from the input before the parser can begin to execute.
|
void |
setProcessor(Processor<? extends Context> processor)
Defines the callback implementation of the interface
Processor which handles the lifecycle of the parsing process and processes each record
extracted from the input |
void |
setReadInputOnSeparateThread(boolean readInputOnSeparateThread)
Defines whether or not a separate thread will be used to read characters from the input while parsing (defaults true if the number of available
processors at runtime is greater than 1)
|
void |
setRowProcessor(RowProcessor processor)
Deprecated.
Use the
setProcessor(Processor) method as it allows format-specific processors to be built to work with different implementations of
Context .
Implementations based on RowProcessor allow only parsers who provide a ParsingContext to be used. |
createDefaultFormat, excludeFields, excludeFields, excludeIndexes, getErrorContentLength, getFormat, getHeaders, getIgnoreLeadingWhitespaces, getIgnoreTrailingWhitespaces, getMaxCharsPerColumn, getMaxColumns, getNullValue, getProcessorErrorHandler, getRowProcessorErrorHandler, getSkipBitsAsWhitespace, getSkipEmptyLines, getWhitespaceRangeStart, isAutoConfigurationEnabled, isProcessorErrorHandlerDefined, selectFields, selectFields, selectIndexes, setAutoConfigurationEnabled, setErrorContentLength, setFormat, setHeaders, setIgnoreLeadingWhitespaces, setIgnoreTrailingWhitespaces, setMaxCharsPerColumn, setMaxColumns, setNullValue, setProcessorErrorHandler, setRowProcessorErrorHandler, setSkipBitsAsWhitespace, setSkipEmptyLines, toString, trimValues
protected Boolean headerExtractionEnabled
public boolean getReadInputOnSeparateThread()
When enabled, a reading thread (in com.univocity.parsers.common.input.concurrent.ConcurrentCharInputReader
)
will be started and load characters from the input, while the parser is processing its input buffer.
This yields better performance, especially when reading from big input (greater than 100 mb)
When disabled, the parsing process will briefly pause so the buffer can be replenished every time it is exhausted
(in DefaultCharInputReader
it is not as bad or slow as it sounds, and can even be (slightly) more efficient if your input is small)
public void setReadInputOnSeparateThread(boolean readInputOnSeparateThread)
When enabled, a reading thread (in com.univocity.parsers.common.input.concurrent.ConcurrentCharInputReader
) will be
started and load characters from the input, while the
parser is processing its input buffer. This yields better performance, especially when reading from big input (greater than 100 mb)
When disabled, the parsing process will briefly pause so the buffer can be replenished every time it is exhausted (in DefaultCharInputReader
it is not as bad or slow as it sounds, and can even be (slightly) more efficient if your input is small)
readInputOnSeparateThread
- the flag indicating whether or not the input should be read on a separate threadpublic boolean isHeaderExtractionEnabled()
public void setHeaderExtractionEnabled(boolean headerExtractionEnabled)
headerExtractionEnabled
- a flag indicating whether the first valid record parsed from the input should be considered as the row containing the
names of each column@Deprecated public RowProcessor getRowProcessor()
getProcessor()
method as it allows format-specific processors to be built to work with different implementations of Context
.
Implementations based on RowProcessor
allow only parsers who provide a ParsingContext
to be used.RowProcessor
which handles the lifecycle of the parsing process and processes each record
extracted from the inputObjectRowProcessor
,
ObjectRowListProcessor
,
MasterDetailProcessor
,
MasterDetailListProcessor
,
BeanProcessor
,
BeanListProcessor
@Deprecated public void setRowProcessor(RowProcessor processor)
setProcessor(Processor)
method as it allows format-specific processors to be built to work with different implementations of
Context
.
Implementations based on RowProcessor
allow only parsers who provide a ParsingContext
to be used.RowProcessor
which handles the lifecycle of the parsing process and processes each record
extracted from the inputprocessor
- the RowProcessor instance which should used by the parser to handle each recordObjectRowProcessor
,
ObjectRowListProcessor
,
MasterDetailProcessor
,
MasterDetailListProcessor
,
BeanProcessor
,
BeanListProcessor
public <T extends Context> Processor<T> getProcessor()
Processor
which handles the lifecycle of the parsing process and processes each record
extracted from the inputT
- the context type supported by the parser implementation.Processor
used by the parser to handle each recordAbstractObjectProcessor
,
AbstractObjectListProcessor
,
AbstractMasterDetailProcessor
,
AbstractMasterDetailListProcessor
,
AbstractBeanProcessor
,
AbstractBeanListProcessor
public void setProcessor(Processor<? extends Context> processor)
Processor
which handles the lifecycle of the parsing process and processes each record
extracted from the inputprocessor
- the Processor
instance which should used by the parser to handle each recordAbstractObjectProcessor
,
AbstractObjectListProcessor
,
AbstractMasterDetailProcessor
,
AbstractMasterDetailListProcessor
,
AbstractBeanProcessor
,
AbstractBeanListProcessor
,
AbstractColumnProcessor
,
AbstractColumnProcessor
protected CharInputReader newCharInputReader(int whitespaceRangeStart)
CharInputReader
which loads the parser buffer in parallel or sequentially, as defined by the readInputOnSeparateThread
propertywhitespaceRangeStart
- starting range of characters considered to be whitespace.public long getNumberOfRecordsToRead()
public void setNumberOfRecordsToRead(long numberOfRecordsToRead)
numberOfRecordsToRead
- the number of records to read before stopping the parsing process.public boolean isColumnReorderingEnabled()
CommonSettings
) should be reordered (defaults to
true).
When disabled, each parsed record will contain values for all columns, in the order they occur in the input. Fields which were not selected will not be parsed but and the record will contain empty values.
When enabled, each parsed record will contain values only for the selected columns. The values will be ordered according to the selection.
public void setColumnReorderingEnabled(boolean columnReorderingEnabled)
CommonSettings
) should be reordered (defaults to
true).
When disabled, each parsed record will contain values for all columns, in the order they occur in the input. Fields which were not selected will not be parsed but the record will contain empty values.
When enabled, each parsed record will contain values only for the selected columns. The values will be ordered according to the selection.
columnReorderingEnabled
- the flag indicating whether or not selected fields should be reordered and returned by the parserpublic int getInputBufferSize()
public void setInputBufferSize(int inputBufferSize)
inputBufferSize
- the new input buffer size (in number of characters)protected CharAppender newCharAppender()
public final boolean isLineSeparatorDetectionEnabled()
true
if the first line of the input should be used to search for common line separator sequences (the matching sequence will be used as
the line separator for parsing). Otherwise false
.public final void setLineSeparatorDetectionEnabled(boolean lineSeparatorDetectionEnabled)
lineSeparatorDetectionEnabled
- a flag indicating whether the first line of the input should be used to search for common line separator sequences
(the matching sequence will be used as the line separator for parsing).public final long getNumberOfRowsToSkip()
public final void setNumberOfRowsToSkip(long numberOfRowsToSkip)
numberOfRowsToSkip
- number of rows to skip before parsingprotected void addConfiguration(Map<String,Object> out)
addConfiguration
in class CommonSettings<F extends Format>
public boolean isCommentCollectionEnabled()
AbstractParser.getContext().comments()
and AbstractParser.getContext().lastComment()
public void setCommentCollectionEnabled(boolean commentCollectionEnabled)
AbstractParser.getContext().comments()
and AbstractParser.getContext().lastComment()
commentCollectionEnabled
- flag indicating whether or not to enable collection of comments.protected void configureFromAnnotations(Class<?> beanClass)
beanClass
- the classes whose annotations will be processed to derive configurations for parsingprotected CommonParserSettings clone(boolean clearInputSpecificSettings)
CommonSettings
clearInputSpecificSettings
flag is set to true
clone
in class CommonSettings<F extends Format>
clearInputSpecificSettings
- flag indicating whether to clear settings that are likely to be associated with a given input.protected CommonParserSettings clone()
CommonSettings
CommonSettings.clone(boolean)
method to reset properties that are
specific to a given input, such as header names and selection of fields.clone
in class CommonSettings<F extends Format>
protected void clearInputSpecificSettings()
CommonSettings
clearInputSpecificSettings
in class CommonSettings<F extends Format>
public boolean isAutoClosingEnabled()
true
, the parser will always close the input automatically
when all records have been parsed or when an error occurs.
Defaults to true
public void setAutoClosingEnabled(boolean autoClosingEnabled)
true
autoClosingEnabled
- flag determining whether automatic input closing should be enabled.public boolean isCommentProcessingEnabled()
true
, the parser will always check for the comment line
default value for comment check is #
Defaults to true
public void setCommentProcessingEnabled(boolean commentProcessingEnabled)
true
commentProcessingEnabled
- flag determining whether comment line check should be performed
If disabled/false then parser wont treat any line as comment line including default(#)
this condition will supersede the comment character(#)public void addInputAnalysisProcess(InputAnalysisProcess inputAnalysisProcess)
InputAnalysisProcess
to analyze the input buffer and potentially discover configuration options such as
column separators is CSV, data formats, etc. The process will be execute only once.inputAnalysisProcess
- a custom process to analyze the contents of the first input buffer loaded when the parsing starts.public List<InputAnalysisProcess> getInputAnalysisProcesses()
InputAnalysisProcess
to be used for analyzing the input buffer and potentially discover configuration options such as
column separators is CSV, data formats, etc. Each process will be execute only once.Copyright © 2021 Univocity Software Pty Ltd. All rights reserved.