Package org.apache.tika.parser.csv
Class TextAndCSVParser
java.lang.Object
org.apache.tika.parser.AbstractEncodingDetectorParser
org.apache.tika.parser.csv.TextAndCSVParser
- All Implemented Interfaces:
Serializable,org.apache.tika.parser.Parser
public class TextAndCSVParser
extends org.apache.tika.parser.AbstractEncodingDetectorParser
Unless the
TikaCoreProperties.CONTENT_TYPE_USER_OVERRIDE is set,
this parser tries to assess whether the file is a text file, csv or tsv.
If the detector detects regularity in column numbers and/or encapsulated cells,
this parser will apply the CSVParser;
otherwise, it will treat the contents as text.
If there is a csv parse exception during detection, the parser sets
the HttpHeaders.CONTENT_TYPE to MediaType.TEXT_PLAIN
and treats the file as MediaType.TEXT_PLAIN.
If there is a csv parse exception during the parse, the parser
writes what's left of the stream as if it were text and then throws
an exception. As of this writing, the content that was buffered by the underlying
CSVParser is lost.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final org.apache.tika.metadata.Propertystatic final org.apache.tika.metadata.PropertyIf the file is detected as a csv/tsv, this is the number of columns in the first row.static final org.apache.tika.metadata.PropertyIf the file is detected as a csv/tsv, this is the number of rows if the file is successfully read (e.g. no encapsulation exceptions, etc). -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionSet<org.apache.tika.mime.MediaType>getSupportedTypes(org.apache.tika.parser.ParseContext context) voidparse(InputStream stream, ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context) voidsetNameToDelimiterMap(Map<String, String> map) Methods inherited from class org.apache.tika.parser.AbstractEncodingDetectorParser
getEncodingDetector, getEncodingDetector, setEncodingDetector
-
Field Details
-
DELIMITER_PROPERTY
public static final org.apache.tika.metadata.Property DELIMITER_PROPERTY -
NUM_COLUMNS
public static final org.apache.tika.metadata.Property NUM_COLUMNSIf the file is detected as a csv/tsv, this is the number of columns in the first row. -
NUM_ROWS
public static final org.apache.tika.metadata.Property NUM_ROWSIf the file is detected as a csv/tsv, this is the number of rows if the file is successfully read (e.g. no encapsulation exceptions, etc).
-
-
Constructor Details
-
TextAndCSVParser
public TextAndCSVParser() -
TextAndCSVParser
public TextAndCSVParser(org.apache.tika.detect.EncodingDetector encodingDetector)
-
-
Method Details
-
getSupportedTypes
public Set<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext context) -
parse
public void parse(InputStream stream, ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context) throws IOException, SAXException, org.apache.tika.exception.TikaException - Throws:
IOExceptionSAXExceptionorg.apache.tika.exception.TikaException
-
setNameToDelimiterMap
@Field public void setNameToDelimiterMap(Map<String, String> map) throws org.apache.tika.exception.TikaConfigException- Throws:
org.apache.tika.exception.TikaConfigException
-