Index

B C D E G H I L N O P S T U V 
All Classes and Interfaces|All Packages|Serialized Form

B

BIGENDIAN_16_BIT - Enum constant in enum org.apache.tika.parser.strings.StringsEncoding
 
BIGENDIAN_32_BIT - Enum constant in enum org.apache.tika.parser.strings.StringsEncoding
 
BOMDetector - Class in org.apache.tika.parser.txt
 
BOMDetector() - Constructor for class org.apache.tika.parser.txt.BOMDetector
 

C

CharsetDetector - Class in org.apache.tika.parser.txt
CharsetDetector provides a facility for detecting the charset or encoding of character data in an unknown format.
CharsetDetector() - Constructor for class org.apache.tika.parser.txt.CharsetDetector
Constructor
CharsetDetector(int) - Constructor for class org.apache.tika.parser.txt.CharsetDetector
 
CharsetMatch - Class in org.apache.tika.parser.txt
This class represents a charset that has been identified by a CharsetDetector as a possible encoding for a set of input data.
checkInitialization(InitializableProblemHandler) - Method in class org.apache.tika.parser.strings.StringsParser
 
compareTo(CSVResult) - Method in class org.apache.tika.parser.csv.CSVResult
Sorts in descending order of confidence
compareTo(CharsetMatch) - Method in class org.apache.tika.parser.txt.CharsetMatch
Compare to other CharsetMatch objects.
CSVParams - Class in org.apache.tika.parser.csv
 
CSVResult - Class in org.apache.tika.parser.csv
 
CSVResult(double, MediaType, Character) - Constructor for class org.apache.tika.parser.csv.CSVResult
 

D

DELIMITER_PROPERTY - Static variable in class org.apache.tika.parser.csv.TextAndCSVParser
 
detect() - Method in class org.apache.tika.parser.txt.CharsetDetector
Return the charset that best matches the supplied input data.
detect(InputStream, Metadata) - Method in class org.apache.tika.parser.txt.BOMDetector
 
detect(InputStream, Metadata) - Method in class org.apache.tika.parser.txt.Icu4jEncodingDetector
 
detect(InputStream, Metadata) - Method in class org.apache.tika.parser.txt.UniversalEncodingDetector
 
detectAll() - Method in class org.apache.tika.parser.txt.CharsetDetector
Return an array of all charsets that appear to be plausible matches with the input data.

E

enableInputFilter(boolean) - Method in class org.apache.tika.parser.txt.CharsetDetector
Enable filtering of input text.
equals(Object) - Method in class org.apache.tika.parser.csv.CSVResult
 
equals(Object) - Method in class org.apache.tika.parser.txt.CharsetMatch
compare this CharsetMatch to another based on confidence value

G

get() - Method in enum org.apache.tika.parser.strings.StringsEncoding
 
getAllDetectableCharsets() - Static method in class org.apache.tika.parser.txt.CharsetDetector
Get the names of all charsets supported by CharsetDetector class.
getCharset() - Method in class org.apache.tika.parser.csv.CSVParams
 
getConfidence() - Method in class org.apache.tika.parser.csv.CSVResult
 
getConfidence() - Method in class org.apache.tika.parser.txt.CharsetMatch
Get an indication of the confidence in the charset detected.
getDelimiter() - Method in class org.apache.tika.parser.csv.CSVParams
 
getDelimiter() - Method in class org.apache.tika.parser.csv.CSVResult
 
getDelimiterToNameMap() - Method in class org.apache.tika.parser.csv.TextAndCSVConfig
 
getDetectableCharsets() - Method in class org.apache.tika.parser.txt.CharsetDetector
Deprecated.
This API is ICU internal only.
getEncoding() - Method in class org.apache.tika.parser.strings.StringsConfig
Returns the character encoding of the strings that are to be found.
getIgnoreCharsets() - Method in class org.apache.tika.parser.txt.Icu4jEncodingDetector
 
getLanguage() - Method in class org.apache.tika.parser.txt.CharsetMatch
Get the ISO code for the language of the detected charset.
getMarkLimit() - Method in class org.apache.tika.parser.txt.Icu4jEncodingDetector
 
getMarkLimit() - Method in class org.apache.tika.parser.txt.UniversalEncodingDetector
 
getMarkLimt() - Method in class org.apache.tika.parser.txt.Icu4jEncodingDetector
 
getMediaType() - Method in class org.apache.tika.parser.csv.CSVParams
 
getMediaType() - Method in class org.apache.tika.parser.csv.CSVResult
 
getMinLength() - Method in class org.apache.tika.parser.strings.StringsConfig
Returns the minimum sequence length (characters) to print.
getMinLength() - Method in class org.apache.tika.parser.strings.StringsParser
 
getMinSize() - Method in class org.apache.tika.parser.strings.Latin1StringsParser
Returns the minimum size of a character sequence to be extracted.
getName() - Method in class org.apache.tika.parser.txt.CharsetMatch
Get the name of the detected charset.
getNameToDelimiterMap() - Method in class org.apache.tika.parser.csv.TextAndCSVConfig
 
getNormalizedName() - Method in class org.apache.tika.parser.txt.CharsetMatch
strips e.g.
getReader() - Method in class org.apache.tika.parser.txt.CharsetMatch
Create a java.io.Reader for reading the Unicode character data corresponding to the original byte data supplied to the Charset detect operation.
getReader(InputStream, String) - Method in class org.apache.tika.parser.txt.CharsetDetector
Autodetect the charset of an inputStream, and return a Java Reader to access the converted input data.
getString() - Method in class org.apache.tika.parser.txt.CharsetMatch
Create a Java String from Unicode character data corresponding to the original byte data supplied to the Charset detect operation.
getString(byte[], String) - Method in class org.apache.tika.parser.txt.CharsetDetector
Autodetect the charset of an inputStream, and return a String containing the converted input data.
getString(int) - Method in class org.apache.tika.parser.txt.CharsetMatch
Create a Java String from Unicode character data corresponding to the original byte data supplied to the Charset detect operation.
getStringsEncoding() - Method in class org.apache.tika.parser.strings.StringsParser
 
getStringsPath() - Method in class org.apache.tika.parser.strings.StringsParser
 
getStringsProg() - Static method in class org.apache.tika.parser.strings.StringsParser
 
getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.csv.TextAndCSVParser
 
getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.strings.Latin1StringsParser
 
getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.strings.StringsParser
 
getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.txt.TXTParser
 
getTimeoutSeconds() - Method in class org.apache.tika.parser.strings.StringsConfig
Returns the maximum time (in seconds) to wait for the "strings" command to terminate.
getTimeoutSeconds() - Method in class org.apache.tika.parser.strings.StringsParser
 

H

hashCode() - Method in class org.apache.tika.parser.csv.CSVResult
 
hashCode() - Method in class org.apache.tika.parser.txt.CharsetMatch
generates a hashCode based on the confidence value

I

Icu4jEncodingDetector - Class in org.apache.tika.parser.txt
 
Icu4jEncodingDetector() - Constructor for class org.apache.tika.parser.txt.Icu4jEncodingDetector
 
initialize(Map<String, Param>) - Method in class org.apache.tika.parser.strings.StringsParser
 
inputFilterEnabled() - Method in class org.apache.tika.parser.txt.CharsetDetector
Test whether or not input filtering is enabled.
isComplete() - Method in class org.apache.tika.parser.csv.CSVParams
 
isEmpty() - Method in class org.apache.tika.parser.csv.CSVParams
 
isStripMarkup() - Method in class org.apache.tika.parser.txt.Icu4jEncodingDetector
 

L

Latin1StringsParser - Class in org.apache.tika.parser.strings
Parser to extract printable Latin1 strings from arbitrary files with pure java without running any external process.
Latin1StringsParser() - Constructor for class org.apache.tika.parser.strings.Latin1StringsParser
 
LITTLEENDIAN_16_BIT - Enum constant in enum org.apache.tika.parser.strings.StringsEncoding
 
LITTLEENDIAN_32_BIT - Enum constant in enum org.apache.tika.parser.strings.StringsEncoding
 

N

NUM_COLUMNS - Static variable in class org.apache.tika.parser.csv.TextAndCSVParser
If the file is detected as a csv/tsv, this is the number of columns in the first row.
NUM_ROWS - Static variable in class org.apache.tika.parser.csv.TextAndCSVParser
If the file is detected as a csv/tsv, this is the number of rows if the file is successfully read (e.g. no encapsulation exceptions, etc).

O

org.apache.tika.parser.csv - package org.apache.tika.parser.csv
 
org.apache.tika.parser.strings - package org.apache.tika.parser.strings
 
org.apache.tika.parser.txt - package org.apache.tika.parser.txt
 

P

parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.csv.TextAndCSVParser
 
parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.strings.Latin1StringsParser
 
parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.strings.StringsParser
 
parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.txt.TXTParser
 

S

setCharset(Charset) - Method in class org.apache.tika.parser.csv.CSVParams
 
setDeclaredEncoding(String) - Method in class org.apache.tika.parser.txt.CharsetDetector
Set the declared encoding for charset detection.
setDelimiter(Character) - Method in class org.apache.tika.parser.csv.CSVParams
 
setDetectableCharset(String, boolean) - Method in class org.apache.tika.parser.txt.CharsetDetector
Deprecated.
This API is ICU internal only.
setEncoding(String) - Method in class org.apache.tika.parser.strings.StringsParser
 
setEncoding(StringsEncoding) - Method in class org.apache.tika.parser.strings.StringsConfig
Sets the character encoding of the strings that are to be found.
setIgnoreCharsets(List<String>) - Method in class org.apache.tika.parser.txt.Icu4jEncodingDetector
 
setMarkLimit(int) - Method in class org.apache.tika.parser.txt.Icu4jEncodingDetector
How far into the stream to read for charset detection.
setMarkLimit(int) - Method in class org.apache.tika.parser.txt.UniversalEncodingDetector
How far into the stream to read for charset detection.
setMediaType(MediaType) - Method in class org.apache.tika.parser.csv.CSVParams
 
setMinLength(int) - Method in class org.apache.tika.parser.strings.StringsConfig
Sets the minimum sequence length (characters) to print.
setMinLength(int) - Method in class org.apache.tika.parser.strings.StringsParser
 
setMinSize(int) - Method in class org.apache.tika.parser.strings.Latin1StringsParser
Sets the minimum size of a character sequence to be extracted.
setNameToDelimiterMap(Map<String, Character>) - Method in class org.apache.tika.parser.csv.TextAndCSVConfig
 
setNameToDelimiterMap(Map<String, String>) - Method in class org.apache.tika.parser.csv.TextAndCSVParser
 
setStringsPath(String) - Method in class org.apache.tika.parser.strings.StringsParser
Sets the "strings" installation folder.
setStripMarkup(boolean) - Method in class org.apache.tika.parser.txt.Icu4jEncodingDetector
Whether or not to attempt to strip html-ish markup from the stream before sending it to the underlying detector.
setText(byte[]) - Method in class org.apache.tika.parser.txt.CharsetDetector
Set the input text (byte) data whose charset is to be detected.
setText(InputStream) - Method in class org.apache.tika.parser.txt.CharsetDetector
Set the input text (byte) data whose charset is to be detected.
setTimeoutSeconds(int) - Method in class org.apache.tika.parser.strings.StringsConfig
Sets the maximum time (in seconds) to wait for the "strings" command to terminate.
setTimeoutSeconds(int) - Method in class org.apache.tika.parser.strings.StringsParser
 
SINGLE_7_BIT - Enum constant in enum org.apache.tika.parser.strings.StringsEncoding
 
SINGLE_8_BIT - Enum constant in enum org.apache.tika.parser.strings.StringsEncoding
 
StringsConfig - Class in org.apache.tika.parser.strings
Configuration for the "strings" (or strings-alternative) command.
StringsConfig() - Constructor for class org.apache.tika.parser.strings.StringsConfig
 
StringsEncoding - Enum in org.apache.tika.parser.strings
Character encoding of the strings that are to be found using the "strings" command.
StringsParser - Class in org.apache.tika.parser.strings
Parser that uses the "strings" (or strings-alternative) command to find the printable strings in a object, or other binary, file (application/octet-stream).
StringsParser() - Constructor for class org.apache.tika.parser.strings.StringsParser
 

T

TextAndCSVConfig - Class in org.apache.tika.parser.csv
 
TextAndCSVConfig() - Constructor for class org.apache.tika.parser.csv.TextAndCSVConfig
 
TextAndCSVParser - Class in org.apache.tika.parser.csv
Unless the TikaCoreProperties.CONTENT_TYPE_USER_OVERRIDE is set, this parser tries to assess whether the file is a text file, csv or tsv.
TextAndCSVParser() - Constructor for class org.apache.tika.parser.csv.TextAndCSVParser
 
TextAndCSVParser(EncodingDetector) - Constructor for class org.apache.tika.parser.csv.TextAndCSVParser
 
toString() - Method in class org.apache.tika.parser.csv.CSVResult
 
toString() - Method in enum org.apache.tika.parser.strings.StringsEncoding
 
toString() - Method in class org.apache.tika.parser.txt.CharsetMatch
 
TXTParser - Class in org.apache.tika.parser.txt
Plain text parser.
TXTParser() - Constructor for class org.apache.tika.parser.txt.TXTParser
 
TXTParser(EncodingDetector) - Constructor for class org.apache.tika.parser.txt.TXTParser
 

U

UniversalEncodingDetector - Class in org.apache.tika.parser.txt
 
UniversalEncodingDetector() - Constructor for class org.apache.tika.parser.txt.UniversalEncodingDetector
 

V

valueOf(String) - Static method in enum org.apache.tika.parser.strings.StringsEncoding
Returns the enum constant of this type with the specified name.
values() - Static method in enum org.apache.tika.parser.strings.StringsEncoding
Returns an array containing the constants of this enum type, in the order they are declared.
B C D E G H I L N O P S T U V 
All Classes and Interfaces|All Packages|Serialized Form