Index
All Classes and Interfaces|All Packages|Serialized Form
B
- BIGENDIAN_16_BIT - Enum constant in enum org.apache.tika.parser.strings.StringsEncoding
- BIGENDIAN_32_BIT - Enum constant in enum org.apache.tika.parser.strings.StringsEncoding
- BOMDetector - Class in org.apache.tika.parser.txt
- BOMDetector() - Constructor for class org.apache.tika.parser.txt.BOMDetector
C
- CharsetDetector - Class in org.apache.tika.parser.txt
-
CharsetDetector
provides a facility for detecting the charset or encoding of character data in an unknown format. - CharsetDetector() - Constructor for class org.apache.tika.parser.txt.CharsetDetector
-
Constructor
- CharsetDetector(int) - Constructor for class org.apache.tika.parser.txt.CharsetDetector
- CharsetMatch - Class in org.apache.tika.parser.txt
-
This class represents a charset that has been identified by a CharsetDetector as a possible encoding for a set of input data.
- checkInitialization(InitializableProblemHandler) - Method in class org.apache.tika.parser.strings.StringsParser
- compareTo(CSVResult) - Method in class org.apache.tika.parser.csv.CSVResult
-
Sorts in descending order of confidence
- compareTo(CharsetMatch) - Method in class org.apache.tika.parser.txt.CharsetMatch
-
Compare to other CharsetMatch objects.
- CSVParams - Class in org.apache.tika.parser.csv
- CSVResult - Class in org.apache.tika.parser.csv
- CSVResult(double, MediaType, Character) - Constructor for class org.apache.tika.parser.csv.CSVResult
D
- DELIMITER_PROPERTY - Static variable in class org.apache.tika.parser.csv.TextAndCSVParser
- detect() - Method in class org.apache.tika.parser.txt.CharsetDetector
-
Return the charset that best matches the supplied input data.
- detect(InputStream, Metadata) - Method in class org.apache.tika.parser.txt.BOMDetector
- detect(InputStream, Metadata) - Method in class org.apache.tika.parser.txt.Icu4jEncodingDetector
- detect(InputStream, Metadata) - Method in class org.apache.tika.parser.txt.UniversalEncodingDetector
- detectAll() - Method in class org.apache.tika.parser.txt.CharsetDetector
-
Return an array of all charsets that appear to be plausible matches with the input data.
E
- enableInputFilter(boolean) - Method in class org.apache.tika.parser.txt.CharsetDetector
-
Enable filtering of input text.
- equals(Object) - Method in class org.apache.tika.parser.csv.CSVResult
- equals(Object) - Method in class org.apache.tika.parser.txt.CharsetMatch
-
compare this CharsetMatch to another based on confidence value
G
- get() - Method in enum org.apache.tika.parser.strings.StringsEncoding
- getAllDetectableCharsets() - Static method in class org.apache.tika.parser.txt.CharsetDetector
-
Get the names of all charsets supported by
CharsetDetector
class. - getCharset() - Method in class org.apache.tika.parser.csv.CSVParams
- getConfidence() - Method in class org.apache.tika.parser.csv.CSVResult
- getConfidence() - Method in class org.apache.tika.parser.txt.CharsetMatch
-
Get an indication of the confidence in the charset detected.
- getDelimiter() - Method in class org.apache.tika.parser.csv.CSVParams
- getDelimiter() - Method in class org.apache.tika.parser.csv.CSVResult
- getDelimiterToNameMap() - Method in class org.apache.tika.parser.csv.TextAndCSVConfig
- getDetectableCharsets() - Method in class org.apache.tika.parser.txt.CharsetDetector
-
Deprecated.This API is ICU internal only.
- getEncoding() - Method in class org.apache.tika.parser.strings.StringsConfig
-
Returns the character encoding of the strings that are to be found.
- getIgnoreCharsets() - Method in class org.apache.tika.parser.txt.Icu4jEncodingDetector
- getLanguage() - Method in class org.apache.tika.parser.txt.CharsetMatch
-
Get the ISO code for the language of the detected charset.
- getMarkLimit() - Method in class org.apache.tika.parser.txt.Icu4jEncodingDetector
- getMarkLimit() - Method in class org.apache.tika.parser.txt.UniversalEncodingDetector
- getMarkLimt() - Method in class org.apache.tika.parser.txt.Icu4jEncodingDetector
- getMediaType() - Method in class org.apache.tika.parser.csv.CSVParams
- getMediaType() - Method in class org.apache.tika.parser.csv.CSVResult
- getMinLength() - Method in class org.apache.tika.parser.strings.StringsConfig
-
Returns the minimum sequence length (characters) to print.
- getMinLength() - Method in class org.apache.tika.parser.strings.StringsParser
- getMinSize() - Method in class org.apache.tika.parser.strings.Latin1StringsParser
-
Returns the minimum size of a character sequence to be extracted.
- getName() - Method in class org.apache.tika.parser.txt.CharsetMatch
-
Get the name of the detected charset.
- getNameToDelimiterMap() - Method in class org.apache.tika.parser.csv.TextAndCSVConfig
- getNormalizedName() - Method in class org.apache.tika.parser.txt.CharsetMatch
-
strips e.g.
- getReader() - Method in class org.apache.tika.parser.txt.CharsetMatch
-
Create a java.io.Reader for reading the Unicode character data corresponding to the original byte data supplied to the Charset detect operation.
- getReader(InputStream, String) - Method in class org.apache.tika.parser.txt.CharsetDetector
-
Autodetect the charset of an inputStream, and return a Java Reader to access the converted input data.
- getString() - Method in class org.apache.tika.parser.txt.CharsetMatch
-
Create a Java String from Unicode character data corresponding to the original byte data supplied to the Charset detect operation.
- getString(byte[], String) - Method in class org.apache.tika.parser.txt.CharsetDetector
-
Autodetect the charset of an inputStream, and return a String containing the converted input data.
- getString(int) - Method in class org.apache.tika.parser.txt.CharsetMatch
-
Create a Java String from Unicode character data corresponding to the original byte data supplied to the Charset detect operation.
- getStringsEncoding() - Method in class org.apache.tika.parser.strings.StringsParser
- getStringsPath() - Method in class org.apache.tika.parser.strings.StringsParser
- getStringsProg() - Static method in class org.apache.tika.parser.strings.StringsParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.csv.TextAndCSVParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.strings.Latin1StringsParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.strings.StringsParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.txt.TXTParser
- getTimeoutSeconds() - Method in class org.apache.tika.parser.strings.StringsConfig
-
Returns the maximum time (in seconds) to wait for the "strings" command to terminate.
- getTimeoutSeconds() - Method in class org.apache.tika.parser.strings.StringsParser
H
- hashCode() - Method in class org.apache.tika.parser.csv.CSVResult
- hashCode() - Method in class org.apache.tika.parser.txt.CharsetMatch
-
generates a hashCode based on the confidence value
I
- Icu4jEncodingDetector - Class in org.apache.tika.parser.txt
- Icu4jEncodingDetector() - Constructor for class org.apache.tika.parser.txt.Icu4jEncodingDetector
- initialize(Map<String, Param>) - Method in class org.apache.tika.parser.strings.StringsParser
- inputFilterEnabled() - Method in class org.apache.tika.parser.txt.CharsetDetector
-
Test whether or not input filtering is enabled.
- isComplete() - Method in class org.apache.tika.parser.csv.CSVParams
- isEmpty() - Method in class org.apache.tika.parser.csv.CSVParams
- isStripMarkup() - Method in class org.apache.tika.parser.txt.Icu4jEncodingDetector
L
- Latin1StringsParser - Class in org.apache.tika.parser.strings
-
Parser to extract printable Latin1 strings from arbitrary files with pure java without running any external process.
- Latin1StringsParser() - Constructor for class org.apache.tika.parser.strings.Latin1StringsParser
- LITTLEENDIAN_16_BIT - Enum constant in enum org.apache.tika.parser.strings.StringsEncoding
- LITTLEENDIAN_32_BIT - Enum constant in enum org.apache.tika.parser.strings.StringsEncoding
N
- NUM_COLUMNS - Static variable in class org.apache.tika.parser.csv.TextAndCSVParser
-
If the file is detected as a csv/tsv, this is the number of columns in the first row.
- NUM_ROWS - Static variable in class org.apache.tika.parser.csv.TextAndCSVParser
-
If the file is detected as a csv/tsv, this is the number of rows if the file is successfully read (e.g. no encapsulation exceptions, etc).
O
- org.apache.tika.parser.csv - package org.apache.tika.parser.csv
- org.apache.tika.parser.strings - package org.apache.tika.parser.strings
- org.apache.tika.parser.txt - package org.apache.tika.parser.txt
P
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.csv.TextAndCSVParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.strings.Latin1StringsParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.strings.StringsParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.txt.TXTParser
S
- setCharset(Charset) - Method in class org.apache.tika.parser.csv.CSVParams
- setDeclaredEncoding(String) - Method in class org.apache.tika.parser.txt.CharsetDetector
-
Set the declared encoding for charset detection.
- setDelimiter(Character) - Method in class org.apache.tika.parser.csv.CSVParams
- setDetectableCharset(String, boolean) - Method in class org.apache.tika.parser.txt.CharsetDetector
-
Deprecated.This API is ICU internal only.
- setEncoding(String) - Method in class org.apache.tika.parser.strings.StringsParser
- setEncoding(StringsEncoding) - Method in class org.apache.tika.parser.strings.StringsConfig
-
Sets the character encoding of the strings that are to be found.
- setIgnoreCharsets(List<String>) - Method in class org.apache.tika.parser.txt.Icu4jEncodingDetector
- setMarkLimit(int) - Method in class org.apache.tika.parser.txt.Icu4jEncodingDetector
-
How far into the stream to read for charset detection.
- setMarkLimit(int) - Method in class org.apache.tika.parser.txt.UniversalEncodingDetector
-
How far into the stream to read for charset detection.
- setMediaType(MediaType) - Method in class org.apache.tika.parser.csv.CSVParams
- setMinLength(int) - Method in class org.apache.tika.parser.strings.StringsConfig
-
Sets the minimum sequence length (characters) to print.
- setMinLength(int) - Method in class org.apache.tika.parser.strings.StringsParser
- setMinSize(int) - Method in class org.apache.tika.parser.strings.Latin1StringsParser
-
Sets the minimum size of a character sequence to be extracted.
- setNameToDelimiterMap(Map<String, Character>) - Method in class org.apache.tika.parser.csv.TextAndCSVConfig
- setNameToDelimiterMap(Map<String, String>) - Method in class org.apache.tika.parser.csv.TextAndCSVParser
- setStringsPath(String) - Method in class org.apache.tika.parser.strings.StringsParser
-
Sets the "strings" installation folder.
- setStripMarkup(boolean) - Method in class org.apache.tika.parser.txt.Icu4jEncodingDetector
-
Whether or not to attempt to strip html-ish markup from the stream before sending it to the underlying detector.
- setText(byte[]) - Method in class org.apache.tika.parser.txt.CharsetDetector
-
Set the input text (byte) data whose charset is to be detected.
- setText(InputStream) - Method in class org.apache.tika.parser.txt.CharsetDetector
-
Set the input text (byte) data whose charset is to be detected.
- setTimeoutSeconds(int) - Method in class org.apache.tika.parser.strings.StringsConfig
-
Sets the maximum time (in seconds) to wait for the "strings" command to terminate.
- setTimeoutSeconds(int) - Method in class org.apache.tika.parser.strings.StringsParser
- SINGLE_7_BIT - Enum constant in enum org.apache.tika.parser.strings.StringsEncoding
- SINGLE_8_BIT - Enum constant in enum org.apache.tika.parser.strings.StringsEncoding
- StringsConfig - Class in org.apache.tika.parser.strings
-
Configuration for the "strings" (or strings-alternative) command.
- StringsConfig() - Constructor for class org.apache.tika.parser.strings.StringsConfig
- StringsEncoding - Enum in org.apache.tika.parser.strings
-
Character encoding of the strings that are to be found using the "strings" command.
- StringsParser - Class in org.apache.tika.parser.strings
-
Parser that uses the "strings" (or strings-alternative) command to find the printable strings in a object, or other binary, file (application/octet-stream).
- StringsParser() - Constructor for class org.apache.tika.parser.strings.StringsParser
T
- TextAndCSVConfig - Class in org.apache.tika.parser.csv
- TextAndCSVConfig() - Constructor for class org.apache.tika.parser.csv.TextAndCSVConfig
- TextAndCSVParser - Class in org.apache.tika.parser.csv
-
Unless the
TikaCoreProperties.CONTENT_TYPE_USER_OVERRIDE
is set, this parser tries to assess whether the file is a text file, csv or tsv. - TextAndCSVParser() - Constructor for class org.apache.tika.parser.csv.TextAndCSVParser
- TextAndCSVParser(EncodingDetector) - Constructor for class org.apache.tika.parser.csv.TextAndCSVParser
- toString() - Method in class org.apache.tika.parser.csv.CSVResult
- toString() - Method in enum org.apache.tika.parser.strings.StringsEncoding
- toString() - Method in class org.apache.tika.parser.txt.CharsetMatch
- TXTParser - Class in org.apache.tika.parser.txt
-
Plain text parser.
- TXTParser() - Constructor for class org.apache.tika.parser.txt.TXTParser
- TXTParser(EncodingDetector) - Constructor for class org.apache.tika.parser.txt.TXTParser
U
- UniversalEncodingDetector - Class in org.apache.tika.parser.txt
- UniversalEncodingDetector() - Constructor for class org.apache.tika.parser.txt.UniversalEncodingDetector
V
- valueOf(String) - Static method in enum org.apache.tika.parser.strings.StringsEncoding
-
Returns the enum constant of this type with the specified name.
- values() - Static method in enum org.apache.tika.parser.strings.StringsEncoding
-
Returns an array containing the constants of this enum type, in the order they are declared.
All Classes and Interfaces|All Packages|Serialized Form