Class TextAndCSVParser

java.lang.Object
org.apache.tika.parser.AbstractEncodingDetectorParser
org.apache.tika.parser.csv.TextAndCSVParser
All Implemented Interfaces:
Serializable, org.apache.tika.parser.Parser

public class TextAndCSVParser extends org.apache.tika.parser.AbstractEncodingDetectorParser
Unless the TikaCoreProperties.CONTENT_TYPE_USER_OVERRIDE is set, this parser tries to assess whether the file is a text file, csv or tsv. If the detector detects regularity in column numbers and/or encapsulated cells, this parser will apply the CSVParser; otherwise, it will treat the contents as text.

If there is a csv parse exception during detection, the parser sets the HttpHeaders.CONTENT_TYPE to MediaType.TEXT_PLAIN and treats the file as MediaType.TEXT_PLAIN.

If there is a csv parse exception during the parse, the parser writes what's left of the stream as if it were text and then throws an exception. As of this writing, the content that was buffered by the underlying CSVParser is lost.

See Also:
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final org.apache.tika.metadata.Property
     
    static final org.apache.tika.metadata.Property
    If the file is detected as a csv/tsv, this is the number of columns in the first row.
    static final org.apache.tika.metadata.Property
    If the file is detected as a csv/tsv, this is the number of rows if the file is successfully read (e.g. no encapsulation exceptions, etc).
  • Constructor Summary

    Constructors
    Constructor
    Description
     
    TextAndCSVParser(org.apache.tika.detect.EncodingDetector encodingDetector)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    Set<org.apache.tika.mime.MediaType>
    getSupportedTypes(org.apache.tika.parser.ParseContext context)
     
    void
    parse(InputStream stream, ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context)
     
    void
     

    Methods inherited from class org.apache.tika.parser.AbstractEncodingDetectorParser

    getEncodingDetector, getEncodingDetector, setEncodingDetector

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • DELIMITER_PROPERTY

      public static final org.apache.tika.metadata.Property DELIMITER_PROPERTY
    • NUM_COLUMNS

      public static final org.apache.tika.metadata.Property NUM_COLUMNS
      If the file is detected as a csv/tsv, this is the number of columns in the first row.
    • NUM_ROWS

      public static final org.apache.tika.metadata.Property NUM_ROWS
      If the file is detected as a csv/tsv, this is the number of rows if the file is successfully read (e.g. no encapsulation exceptions, etc).
  • Constructor Details

    • TextAndCSVParser

      public TextAndCSVParser()
    • TextAndCSVParser

      public TextAndCSVParser(org.apache.tika.detect.EncodingDetector encodingDetector)
  • Method Details

    • getSupportedTypes

      public Set<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext context)
    • parse

      public void parse(InputStream stream, ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context) throws IOException, SAXException, org.apache.tika.exception.TikaException
      Throws:
      IOException
      SAXException
      org.apache.tika.exception.TikaException
    • setNameToDelimiterMap

      @Field public void setNameToDelimiterMap(Map<String,String> map) throws org.apache.tika.exception.TikaConfigException
      Throws:
      org.apache.tika.exception.TikaConfigException