Index
All Classes and Interfaces|All Packages|Constant Field Values|Serialized Form
A
- addOtherTesseractConfig(String, String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Add a key-value pair to pass to Tesseract using its -c command line option.
- alpha - Variable in class org.apache.tika.parser.ocr.tess4j.ImageDeskew.HoughLine
C
- checkInitialization(InitializableProblemHandler) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- cloneAndUpdate(TesseractOCRConfig) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- count - Variable in class org.apache.tika.parser.ocr.tess4j.ImageDeskew.HoughLine
D
- d - Variable in class org.apache.tika.parser.ocr.tess4j.ImageDeskew.HoughLine
G
- getAlpha(int) - Method in class org.apache.tika.parser.ocr.tess4j.ImageDeskew
- getColorspace() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getColorspace() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- getDefaultConfig() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- getDensity() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getDensity() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- getDepth() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getDepth() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- getFilter() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getFilter() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- getImageMagickPath() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- getImageMagickProg() - Static method in class org.apache.tika.parser.ocr.TesseractOCRParser
- getLangs() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- getLangs(String, Set<String>, Set<String>) - Static method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
This takes a language string, parses it and then bins individual langs into valid or invalid based on regexes against the language codes
- getLanguage() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getLanguage() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- getMaxFileSizeToOcr() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getMaxFileSizeToOcr() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- getMinFileSizeToOcr() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getMinFileSizeToOcr() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- getOtherTesseractConfig() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getOtherTesseractSettings() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- getOutputType() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getOutputType() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- getPageSegMode() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getPageSegMode() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- getPageSeparator() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getResize() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getResize() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- getSkewAngle() - Method in class org.apache.tika.parser.ocr.tess4j.ImageDeskew
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- getTessdataPath() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- getTesseractPath() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- getTesseractProg() - Static method in class org.apache.tika.parser.ocr.TesseractOCRParser
- getTimeout() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- getTimeoutSeconds() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
H
- hasTesseract() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- hasWarned() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- HOCR - Enum constant in enum org.apache.tika.parser.ocr.TesseractOCRConfig.OUTPUT_TYPE
- HoughLine() - Constructor for class org.apache.tika.parser.ocr.tess4j.ImageDeskew.HoughLine
I
- IMAGE_MAGICK - Static variable in class org.apache.tika.parser.ocr.TesseractOCRParser
- IMAGE_ROTATION - Static variable in class org.apache.tika.parser.ocr.TesseractOCRParser
- ImageDeskew - Class in org.apache.tika.parser.ocr.tess4j
- ImageDeskew(BufferedImage) - Constructor for class org.apache.tika.parser.ocr.tess4j.ImageDeskew
- ImageDeskew.HoughLine - Class in org.apache.tika.parser.ocr.tess4j
- ImageUtil - Class in org.apache.tika.parser.ocr.tess4j
- ImageUtil() - Constructor for class org.apache.tika.parser.ocr.tess4j.ImageUtil
- index - Variable in class org.apache.tika.parser.ocr.tess4j.ImageDeskew.HoughLine
- initialize(Map<String, Param>) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- isApplyRotation() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- isApplyRotation() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- isBlack(BufferedImage, int, int) - Static method in class org.apache.tika.parser.ocr.tess4j.ImageUtil
- isBlack(BufferedImage, int, int, int) - Static method in class org.apache.tika.parser.ocr.tess4j.ImageUtil
- isEnableImagePreprocessing() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- isEnableImagePreprocessing() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- isPreloadLangs() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- isPreserveInterwordSpacing() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- isPreserveInterwordSpacing() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- isSkipOcr() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- isSkipOCR() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
O
- org.apache.tika.parser.ocr - package org.apache.tika.parser.ocr
- org.apache.tika.parser.ocr.tess4j - package org.apache.tika.parser.ocr.tess4j
P
- parse(Image, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- PSM0_ORIENTATION - Static variable in class org.apache.tika.parser.ocr.TesseractOCRParser
- PSM0_ORIENTATION_CONFIDENCE - Static variable in class org.apache.tika.parser.ocr.TesseractOCRParser
- PSM0_PAGE_NUMBER - Static variable in class org.apache.tika.parser.ocr.TesseractOCRParser
- PSM0_ROTATE - Static variable in class org.apache.tika.parser.ocr.TesseractOCRParser
- PSM0_SCRIPT - Static variable in class org.apache.tika.parser.ocr.TesseractOCRParser
- PSM0_SCRIPT_CONFIDENCE - Static variable in class org.apache.tika.parser.ocr.TesseractOCRParser
R
- rotate(BufferedImage, double, int, int) - Static method in class org.apache.tika.parser.ocr.tess4j.ImageUtil
S
- setApplyRotation(boolean) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Sets whether or not a rotation value should be calculated and passed to ImageMagick.
- setApplyRotation(boolean) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setColorspace(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- setColorspace(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setDensity(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- setDensity(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setDepth(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- setDepth(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setEnableImagePreprocessing(boolean) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Set the value to true if processing is to be enabled.
- setEnableImagePreprocessing(boolean) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setFilter(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- setFilter(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setImageMagickPath(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
-
Set the path to the ImageMagick executable directory, needed if it is not on system path.
- setLanguage(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Set tesseract language dictionary to be used.
- setLanguage(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setMaxFileSizeToOcr(long) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Set maximum file size to submit file to ocr.
- setMaxFileSizeToOcr(long) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setMinFileSizeToOcr(long) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Set minimum file size to submit file to ocr.
- setMinFileSizeToOcr(long) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setOtherTesseractSettings(List<String>) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setOutputType(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- setOutputType(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setOutputType(TesseractOCRConfig.OUTPUT_TYPE) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Set output type from ocr process.
- setPageSegMode(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Set tesseract page segmentation mode.
- setPageSegMode(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setPageSeparator(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
The page separator to use in plain text output.
- setPreloadLangs(boolean) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
-
If set to
trueand if tesseract is found, this will load the langs that result from --list-langs. - setPreserveInterwordSpacing(boolean) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Whether or not to maintain interword spacing.
- setPreserveInterwordSpacing(boolean) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setResize(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- setResize(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setSkipOcr(boolean) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
If you want to turn off OCR at run time for a specific file, set this to
true - setSkipOCR(boolean) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setTessdataPath(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
-
Set the path to the 'tessdata' folder, which contains language files and config files.
- setTesseractPath(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
-
Set the path to the Tesseract executable's directory, needed if it is not on system path.
- setTimeout(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
-
Set default timeout in seconds.
- setTimeoutSeconds(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Set maximum time (seconds) to wait for the ocring process to terminate.
- setTrustedPageSeparator(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Same as
TesseractOCRConfig.setPageSeparator(String)but does not perform any checks on the string.
T
- TESS_META - Static variable in class org.apache.tika.parser.ocr.TesseractOCRParser
- TesseractOCRConfig - Class in org.apache.tika.parser.ocr
-
Configuration for TesseractOCRParser.
- TesseractOCRConfig() - Constructor for class org.apache.tika.parser.ocr.TesseractOCRConfig
- TesseractOCRConfig.OUTPUT_TYPE - Enum in org.apache.tika.parser.ocr
- TesseractOCRParser - Class in org.apache.tika.parser.ocr
-
TesseractOCRParser powered by tesseract-ocr engine.
- TesseractOCRParser() - Constructor for class org.apache.tika.parser.ocr.TesseractOCRParser
- TXT - Enum constant in enum org.apache.tika.parser.ocr.TesseractOCRConfig.OUTPUT_TYPE
V
- valueOf(String) - Static method in enum org.apache.tika.parser.ocr.TesseractOCRConfig.OUTPUT_TYPE
-
Returns the enum constant of this type with the specified name.
- values() - Static method in enum org.apache.tika.parser.ocr.TesseractOCRConfig.OUTPUT_TYPE
-
Returns an array containing the constants of this enum type, in the order they are declared.
W
- warn() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
All Classes and Interfaces|All Packages|Constant Field Values|Serialized Form