Package org.apache.tika.parser.strings
Class Latin1StringsParser
java.lang.Object
org.apache.tika.parser.strings.Latin1StringsParser
- All Implemented Interfaces:
Serializable
,org.apache.tika.parser.Parser
Parser to extract printable Latin1 strings from arbitrary files with pure java
without running any external process. Useful for binary or unknown files, for
files without a specific parser and for corrupted ones causing a TikaException
as a fallback parser. To enable the parsing of unknown or files without a
specific parser with AutoDetectParser:
AutoDetectParser parser = new AutoDetectParser(); parser.setFallback(new Latin1StringsParser());
Currently the parser does a best effort to extract Latin1 strings, used by Western European languages, encoded with ISO-8859-1, UTF-8 or UTF-16 charsets mixed within the same file.The implementation is optimized for fast parsing with only one pass.
- See Also:
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionint
Returns the minimum size of a character sequence to be extracted.Set<org.apache.tika.mime.MediaType>
getSupportedTypes
(org.apache.tika.parser.ParseContext arg0) void
parse
(InputStream stream, ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context) void
setMinSize
(int minSize) Sets the minimum size of a character sequence to be extracted.
-
Constructor Details
-
Latin1StringsParser
public Latin1StringsParser()
-
-
Method Details
-
getMinSize
public int getMinSize()Returns the minimum size of a character sequence to be extracted.- Returns:
- the minimum size of a character sequence
-
setMinSize
public void setMinSize(int minSize) Sets the minimum size of a character sequence to be extracted.- Parameters:
minSize
- the minimum size of a character sequence
-
getSupportedTypes
public Set<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext arg0) - Specified by:
getSupportedTypes
in interfaceorg.apache.tika.parser.Parser
-
parse
public void parse(InputStream stream, ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context) throws IOException, SAXException - Specified by:
parse
in interfaceorg.apache.tika.parser.Parser
- Throws:
IOException
SAXException
- See Also:
-
Parser.parse(java.io.InputStream, org.xml.sax.ContentHandler, org.apache.tika.metadata.Metadata, org.apache.tika.parser.ParseContext)
-