
Class Itext5TextExtractor

  • All Implemented Interfaces:
    io.mfj.textricator.extractor.TextExtractor , java.lang.AutoCloseable

    public final class Itext5TextExtractor
     implements TextExtractor

    Class to extract text from a PDF.

    Create an instance and call extract for each page.

    • Field Summary

      Modifier and Type Field Description
    • Enum Constant Summary

      Enum Constants 
      Enum Constant Description
    • Method Summary

      Modifier and Type Method Description
      Integer getPageCount() Get the number of pages.
      Unit close()
      List<Text> extract(Integer pageNumber) Extract text from the PDF, calling the callback for each text block.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • Itext5TextExtractor

        Itext5TextExtractor(InputStream input, Float boxPrecision, Set<String> boxIgnoreColors)
        Create an instance for the supplied PDF.