Package io.xpdf.api.pdftext.options
Enum PdfTextFormat
- All Implemented Interfaces:
Serializable
,Comparable<PdfTextFormat>
,java.lang.constant.Constable
Defines how text extracted from a PDF file should be formatted, or structured.
- Since:
- 1.0.0
-
Nested Class Summary
Nested classes/interfaces inherited from class java.lang.Enum
Enum.EnumDesc<E extends Enum<E>>
-
Enum Constant Summary
Enum Constants -
Method Summary
Modifier and TypeMethodDescriptionstatic PdfTextFormat
Returns the enum constant of this type with the specified name.static PdfTextFormat[]
values()
Returns an array containing the constants of this enum type, in the order they are declared.
-
Enum Constant Details
-
LAYOUT
From pdftotext documentation:Maintain (as best as possible) the original physical layout of the text. The default is to 'undo' physical layout (columns, hyphenation, etc.) and output the text in reading order. If the -fixed option is given, character spacing within each line will be determined by the specified character pitch.
- Since:
- 1.0.0
-
SIMPLE
From pdftotext documentation:Similar to -layout, but optimized for simple one-column pages. This mode will do a better job of maintaining horizontal spacing, but it will only work properly with a single column of text.
- Since:
- 1.0.0
-
SIMPLE_2
From pdftotext documentation:Similar to -simple, but handles slightly rotated text (e.g., OCR output) better. Only works for pages with a single column of text.
- Since:
- 1.0.0
-
TABLE
From pdftotext documentation:Table mode is similar to physical layout mode, but optimized for tabular data, with the goal of keeping rows and columns aligned (at the expense of inserting extra whitespace). If the -fixed option is given, character spacing within each line will be determined by the specified character pitch.
- Since:
- 1.0.0
-
LINE_PRINTER
From pdftotext documentation:Line printer mode uses a strict fixed-character-pitch and -height layout. That is, the page is broken into a grid, and characters are placed into that grid. If the grid spacing is too small for the actual characters, the result is extra white-space. If the grid spacing is too large, the result is missing whitespace. The grid spacing can be specified using the -fixed and -linespacing options. If one or both are not given on the command line, pdftotext will attempt to compute appropriate value(s).
- Since:
- 1.0.0
-
RAW
From pdftotext documentation:Keep the text in content stream order. Depending on how the PDF file was generated, this may or may not be useful.
- Since:
- 1.0.0
-
-
Method Details
-
values
Returns an array containing the constants of this enum type, in the order they are declared.- Returns:
- an array containing the constants of this enum type, in the order they are declared
-
valueOf
Returns the enum constant of this type with the specified name. The string must match exactly an identifier used to declare an enum constant in this type. (Extraneous whitespace characters are not permitted.)- Parameters:
name
- the name of the enum constant to be returned.- Returns:
- the enum constant with the specified name
- Throws:
IllegalArgumentException
- if this enum type has no constant with the specified nameNullPointerException
- if the argument is null
-