Enum PdfTextFormat

java.lang.Object
java.lang.Enum<PdfTextFormat>
io.xpdf.api.pdftext.options.PdfTextFormat
All Implemented Interfaces:
Serializable, Comparable<PdfTextFormat>, java.lang.constant.Constable

public enum PdfTextFormat extends Enum<PdfTextFormat>
Defines how text extracted from a PDF file should be formatted, or structured.
Since:
1.0.0
  • Enum Constant Details

    • LAYOUT

      public static final PdfTextFormat LAYOUT
      From pdftotext documentation:
        Maintain (as best as possible) the original physical layout of the text.
        The default is to 'undo' physical layout (columns, hyphenation, etc.) and
        output the text in reading order. If the -fixed option is given, character
        spacing within each line will be determined by the specified character pitch.
       
      Since:
      1.0.0
    • SIMPLE

      public static final PdfTextFormat SIMPLE
      From pdftotext documentation:
        Similar to -layout, but optimized for simple one-column pages. This mode will do
        a better job of maintaining horizontal spacing, but it will only work properly
        with a single column of text.
       
      Since:
      1.0.0
    • SIMPLE_2

      public static final PdfTextFormat SIMPLE_2
      From pdftotext documentation:
        Similar to -simple, but handles slightly rotated text (e.g., OCR output) better.
        Only works for pages with a single column of text.
       
      Since:
      1.0.0
    • TABLE

      public static final PdfTextFormat TABLE
      From pdftotext documentation:
        Table mode is similar to physical layout mode, but optimized for tabular data, with the
        goal of keeping rows and columns aligned (at the expense of inserting extra whitespace).
        If the -fixed option is given, character spacing within each line will be determined by
        the specified character pitch.
       
      Since:
      1.0.0
    • LINE_PRINTER

      public static final PdfTextFormat LINE_PRINTER
      From pdftotext documentation:
        Line printer mode uses a strict fixed-character-pitch and -height layout. That is,
        the page is broken into a grid, and characters are placed into that grid. If the
        grid spacing is too small for the actual characters, the result is extra white-space.
        If the grid spacing is too large, the result is missing whitespace. The grid spacing
        can be specified using the -fixed and -linespacing options. If one or both are not
        given on the command line, pdftotext will attempt to compute appropriate value(s).
       
      Since:
      1.0.0
    • RAW

      public static final PdfTextFormat RAW
      From pdftotext documentation:
        Keep the text in content stream order. Depending on how the PDF file was generated,
        this may or may not be useful.
       
      Since:
      1.0.0
  • Method Details

    • values

      public static PdfTextFormat[] values()
      Returns an array containing the constants of this enum type, in the order they are declared.
      Returns:
      an array containing the constants of this enum type, in the order they are declared
    • valueOf

      public static PdfTextFormat valueOf(String name)
      Returns the enum constant of this type with the specified name. The string must match exactly an identifier used to declare an enum constant in this type. (Extraneous whitespace characters are not permitted.)
      Parameters:
      name - the name of the enum constant to be returned.
      Returns:
      the enum constant with the specified name
      Throws:
      IllegalArgumentException - if this enum type has no constant with the specified name
      NullPointerException - if the argument is null