Index

A C D G H I O P R S T V W 
All Classes and Interfaces|All Packages|Constant Field Values|Serialized Form

A

addOtherTesseractConfig(String, String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
Add a key-value pair to pass to Tesseract using its -c command line option.
alpha - Variable in class org.apache.tika.parser.ocr.tess4j.ImageDeskew.HoughLine
 

C

checkInitialization(InitializableProblemHandler) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
cloneAndUpdate(TesseractOCRConfig) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
 
count - Variable in class org.apache.tika.parser.ocr.tess4j.ImageDeskew.HoughLine
 

D

d - Variable in class org.apache.tika.parser.ocr.tess4j.ImageDeskew.HoughLine
 

G

getAlpha(int) - Method in class org.apache.tika.parser.ocr.tess4j.ImageDeskew
 
getColorspace() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
 
getColorspace() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
getDefaultConfig() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
getDensity() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
 
getDensity() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
getDepth() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
 
getDepth() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
getFilter() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
 
getFilter() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
getImageMagickPath() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
getImageMagickProg() - Static method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
getLangs() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
getLangs(String, Set<String>, Set<String>) - Static method in class org.apache.tika.parser.ocr.TesseractOCRConfig
This takes a language string, parses it and then bins individual langs into valid or invalid based on regexes against the language codes
getLanguage() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
 
getLanguage() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
getMaxFileSizeToOcr() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
 
getMaxFileSizeToOcr() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
getMinFileSizeToOcr() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
 
getMinFileSizeToOcr() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
getOtherTesseractConfig() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
 
getOtherTesseractSettings() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
getOutputType() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
 
getOutputType() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
getPageSegMode() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
 
getPageSegMode() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
getPageSeparator() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
 
getResize() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
 
getResize() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
getSkewAngle() - Method in class org.apache.tika.parser.ocr.tess4j.ImageDeskew
 
getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
getTessdataPath() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
getTesseractPath() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
getTesseractProg() - Static method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
getTimeout() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
getTimeoutSeconds() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
 

H

hasTesseract() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
hasWarned() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
HOCR - Enum constant in enum org.apache.tika.parser.ocr.TesseractOCRConfig.OUTPUT_TYPE
 
HoughLine() - Constructor for class org.apache.tika.parser.ocr.tess4j.ImageDeskew.HoughLine
 

I

IMAGE_MAGICK - Static variable in class org.apache.tika.parser.ocr.TesseractOCRParser
 
IMAGE_ROTATION - Static variable in class org.apache.tika.parser.ocr.TesseractOCRParser
 
ImageDeskew - Class in org.apache.tika.parser.ocr.tess4j
 
ImageDeskew(BufferedImage) - Constructor for class org.apache.tika.parser.ocr.tess4j.ImageDeskew
 
ImageDeskew.HoughLine - Class in org.apache.tika.parser.ocr.tess4j
 
ImageUtil - Class in org.apache.tika.parser.ocr.tess4j
 
ImageUtil() - Constructor for class org.apache.tika.parser.ocr.tess4j.ImageUtil
 
index - Variable in class org.apache.tika.parser.ocr.tess4j.ImageDeskew.HoughLine
 
initialize(Map<String, Param>) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
isApplyRotation() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
 
isApplyRotation() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
isBlack(BufferedImage, int, int) - Static method in class org.apache.tika.parser.ocr.tess4j.ImageUtil
 
isBlack(BufferedImage, int, int, int) - Static method in class org.apache.tika.parser.ocr.tess4j.ImageUtil
 
isEnableImagePreprocessing() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
 
isEnableImagePreprocessing() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
isPreloadLangs() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
isPreserveInterwordSpacing() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
 
isPreserveInterwordSpacing() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
isSkipOcr() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
 
isSkipOCR() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 

O

org.apache.tika.parser.ocr - package org.apache.tika.parser.ocr
 
org.apache.tika.parser.ocr.tess4j - package org.apache.tika.parser.ocr.tess4j
 

P

parse(Image, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
PSM0_ORIENTATION - Static variable in class org.apache.tika.parser.ocr.TesseractOCRParser
 
PSM0_ORIENTATION_CONFIDENCE - Static variable in class org.apache.tika.parser.ocr.TesseractOCRParser
 
PSM0_PAGE_NUMBER - Static variable in class org.apache.tika.parser.ocr.TesseractOCRParser
 
PSM0_ROTATE - Static variable in class org.apache.tika.parser.ocr.TesseractOCRParser
 
PSM0_SCRIPT - Static variable in class org.apache.tika.parser.ocr.TesseractOCRParser
 
PSM0_SCRIPT_CONFIDENCE - Static variable in class org.apache.tika.parser.ocr.TesseractOCRParser
 

R

rotate(BufferedImage, double, int, int) - Static method in class org.apache.tika.parser.ocr.tess4j.ImageUtil
 

S

setApplyRotation(boolean) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
Sets whether or not a rotation value should be calculated and passed to ImageMagick.
setApplyRotation(boolean) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
setColorspace(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
 
setColorspace(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
setDensity(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
 
setDensity(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
setDepth(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
 
setDepth(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
setEnableImagePreprocessing(boolean) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
Set the value to true if processing is to be enabled.
setEnableImagePreprocessing(boolean) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
setFilter(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
 
setFilter(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
setImageMagickPath(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
Set the path to the ImageMagick executable directory, needed if it is not on system path.
setLanguage(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
Set tesseract language dictionary to be used.
setLanguage(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
setMaxFileSizeToOcr(long) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
Set maximum file size to submit file to ocr.
setMaxFileSizeToOcr(long) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
setMinFileSizeToOcr(long) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
Set minimum file size to submit file to ocr.
setMinFileSizeToOcr(long) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
setOtherTesseractSettings(List<String>) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
setOutputType(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
 
setOutputType(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
setOutputType(TesseractOCRConfig.OUTPUT_TYPE) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
Set output type from ocr process.
setPageSegMode(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
Set tesseract page segmentation mode.
setPageSegMode(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
setPageSeparator(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
The page separator to use in plain text output.
setPreloadLangs(boolean) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
If set to true and if tesseract is found, this will load the langs that result from --list-langs.
setPreserveInterwordSpacing(boolean) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
Whether or not to maintain interword spacing.
setPreserveInterwordSpacing(boolean) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
setResize(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
 
setResize(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
setSkipOcr(boolean) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
If you want to turn off OCR at run time for a specific file, set this to true
setSkipOCR(boolean) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
setTessdataPath(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
Set the path to the 'tessdata' folder, which contains language files and config files.
setTesseractPath(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
Set the path to the Tesseract executable's directory, needed if it is not on system path.
setTimeout(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
Set default timeout in seconds.
setTimeoutSeconds(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
Set maximum time (seconds) to wait for the ocring process to terminate.
setTrustedPageSeparator(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
Same as TesseractOCRConfig.setPageSeparator(String) but does not perform any checks on the string.

T

TESS_META - Static variable in class org.apache.tika.parser.ocr.TesseractOCRParser
 
TesseractOCRConfig - Class in org.apache.tika.parser.ocr
Configuration for TesseractOCRParser.
TesseractOCRConfig() - Constructor for class org.apache.tika.parser.ocr.TesseractOCRConfig
 
TesseractOCRConfig.OUTPUT_TYPE - Enum in org.apache.tika.parser.ocr
 
TesseractOCRParser - Class in org.apache.tika.parser.ocr
TesseractOCRParser powered by tesseract-ocr engine.
TesseractOCRParser() - Constructor for class org.apache.tika.parser.ocr.TesseractOCRParser
 
TXT - Enum constant in enum org.apache.tika.parser.ocr.TesseractOCRConfig.OUTPUT_TYPE
 

V

valueOf(String) - Static method in enum org.apache.tika.parser.ocr.TesseractOCRConfig.OUTPUT_TYPE
Returns the enum constant of this type with the specified name.
values() - Static method in enum org.apache.tika.parser.ocr.TesseractOCRConfig.OUTPUT_TYPE
Returns an array containing the constants of this enum type, in the order they are declared.

W

warn() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
 
A C D G H I O P R S T V W 
All Classes and Interfaces|All Packages|Constant Field Values|Serialized Form