Package org.apache.tika.parser.csv
Class TextAndCSVParser
java.lang.Object
org.apache.tika.parser.AbstractEncodingDetectorParser
org.apache.tika.parser.csv.TextAndCSVParser
- All Implemented Interfaces:
Serializable
,org.apache.tika.parser.Parser
public class TextAndCSVParser
extends org.apache.tika.parser.AbstractEncodingDetectorParser
Unless the
TikaCoreProperties.CONTENT_TYPE_USER_OVERRIDE
is set,
this parser tries to assess whether the file is a text file, csv or tsv.
If the detector detects regularity in column numbers and/or encapsulated cells,
this parser will apply the CSVParser
;
otherwise, it will treat the contents as text.
If there is a csv parse exception during detection, the parser sets
the HttpHeaders.CONTENT_TYPE
to MediaType.TEXT_PLAIN
and treats the file as MediaType.TEXT_PLAIN
.
If there is a csv parse exception during the parse, the parser
writes what's left of the stream as if it were text and then throws
an exception. As of this writing, the content that was buffered by the underlying
CSVParser
is lost.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final org.apache.tika.metadata.Property
static final org.apache.tika.metadata.Property
If the file is detected as a csv/tsv, this is the number of columns in the first row.static final org.apache.tika.metadata.Property
If the file is detected as a csv/tsv, this is the number of rows if the file is successfully read (e.g. no encapsulation exceptions, etc). -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionSet<org.apache.tika.mime.MediaType>
getSupportedTypes
(org.apache.tika.parser.ParseContext context) void
parse
(InputStream stream, ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context) void
setNameToDelimiterMap
(Map<String, String> map) Methods inherited from class org.apache.tika.parser.AbstractEncodingDetectorParser
getEncodingDetector, getEncodingDetector, setEncodingDetector
-
Field Details
-
DELIMITER_PROPERTY
public static final org.apache.tika.metadata.Property DELIMITER_PROPERTY -
NUM_COLUMNS
public static final org.apache.tika.metadata.Property NUM_COLUMNSIf the file is detected as a csv/tsv, this is the number of columns in the first row. -
NUM_ROWS
public static final org.apache.tika.metadata.Property NUM_ROWSIf the file is detected as a csv/tsv, this is the number of rows if the file is successfully read (e.g. no encapsulation exceptions, etc).
-
-
Constructor Details
-
TextAndCSVParser
public TextAndCSVParser() -
TextAndCSVParser
public TextAndCSVParser(org.apache.tika.detect.EncodingDetector encodingDetector)
-
-
Method Details
-
getSupportedTypes
public Set<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext context) -
parse
public void parse(InputStream stream, ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context) throws IOException, SAXException, org.apache.tika.exception.TikaException - Throws:
IOException
SAXException
org.apache.tika.exception.TikaException
-
setNameToDelimiterMap
@Field public void setNameToDelimiterMap(Map<String, String> map) throws org.apache.tika.exception.TikaConfigException- Throws:
org.apache.tika.exception.TikaConfigException
-