PDFParser (Apache PDFBox 1.7.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.pdfbox.pdfparser
Class PDFParser

java.lang.Object
  org.apache.pdfbox.pdfparser.BaseParser
      org.apache.pdfbox.pdfparser.PDFParser

Direct Known Subclasses:: NonSequentialPDFParser

public class PDFParser
extends BaseParser
extends BaseParser

This class will handle the parsing of the PDF document.

Version:: $Revision: 1.53 $
Author:: Ben Litchfield

Field Summary
`protected XrefTrailerResolver`	`xrefTrailerResolver` Collects all Xref/trailer objects and resolves them into single object using startxref reference.

Fields inherited from class org.apache.pdfbox.pdfparser.BaseParser
`DEF, document, ENDOBJ, ENDSTREAM, FORCE_PARSING, forceParsing, pdfSource`

Constructor Summary
`PDFParser(InputStream input)` Constructor.
`PDFParser(InputStream input, RandomAccess rafi)` Constructor to allow control over RandomAccessFile.
`PDFParser(InputStream input, RandomAccess rafi, boolean force)` Constructor to allow control over RandomAccessFile.

Method Summary
`COSDocument`	`getDocument()` This will get the document that was parsed.
`FDFDocument`	`getFDFDocument()` This will get the FDF document that was parsed.
`PDDocument`	`getPDDocument()` This will get the PD document that was parsed.
`protected boolean`	`isContinueOnError(Exception e)` Returns true if parsing should be continued.
`void`	`parse()` This will parse the stream and populate the COSDocument object.
`protected boolean`	`parseStartXref()` This will parse the startxref section from the stream.
`protected boolean`	`parseTrailer()` This will parse the trailer from the stream and add it to the state.
`void`	`parseXrefStream(COSStream stream, long objByteOffset)` Fills XRefTrailerResolver with data of given stream.
`protected boolean`	`parseXrefTable(long startByteOffset)` This will parse the xref table from the stream and add it to the state The XrefTable contents are ignored.
`void`	`setTempDirectory(File tmpDir)` This is the directory where pdfbox will create a temporary file for storing pdf document stream in.

Methods inherited from class org.apache.pdfbox.pdfparser.BaseParser
`isClosing, isClosing, isEndOfName, isEOL, isEOL, isWhitespace, isWhitespace, parseBoolean, parseCOSArray, parseCOSDictionary, parseCOSName, parseCOSStream, parseCOSString, parseDirObject, readExpectedString, readInt, readLine, readString, readString, setDocument, skipSpaces`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

xrefTrailerResolver

protected XrefTrailerResolver xrefTrailerResolver

Collects all Xref/trailer objects and resolves them into single object using startxref reference.

Constructor Detail

PDFParser

public PDFParser(InputStream input)
          throws IOException

Constructor.

Parameters:: input - The input stream that contains the PDF document.
Throws:: IOException - If there is an error initializing the stream.

PDFParser

public PDFParser(InputStream input,
                 RandomAccess rafi)
          throws IOException

Constructor to allow control over RandomAccessFile.

Parameters:: input - The input stream that contains the PDF document.; rafi - The RandomAccessFile to be used in internal COSDocument
Throws:: IOException - If there is an error initializing the stream.

PDFParser

public PDFParser(InputStream input,
                 RandomAccess rafi,
                 boolean force)
          throws IOException

Constructor to allow control over RandomAccessFile. Also enables parser to skip corrupt objects to try and force parsing

Parameters:: input - The input stream that contains the PDF document.; rafi - The RandomAccessFile to be used in internal COSDocument; force - When true, the parser will skip corrupt pdf objects and will continue parsing at the next object in the file
Throws:: IOException - If there is an error initializing the stream.

Method Detail

setTempDirectory

public void setTempDirectory(File tmpDir)

This is the directory where pdfbox will create a temporary file for storing pdf document stream in. By default this directory will be the value of the system property java.io.tmpdir.

Parameters:: tmpDir - The directory to create scratch files needed to store pdf document streams.

isContinueOnError

protected boolean isContinueOnError(Exception e)

Returns true if parsing should be continued. By default, forceParsing is returned. This can be overridden to add application specific handling (for example to stop parsing when the number of exceptions thrown exceed a certain number).

Parameters:: e - The exception if vailable. Can be null if there is no exception available
Returns:: true if parsing could be continued, otherwise false

parse

public void parse()
           throws IOException

This will parse the stream and populate the COSDocument object. This will close the stream when it is done parsing.

Throws:: IOException - If there is an error reading from the stream or corrupt data is found.

getDocument

public COSDocument getDocument()
                        throws IOException

This will get the document that was parsed. parse() must be called before this is called. When you are done with this document you must call close() on it to release resources.

Returns:: The document that was parsed.
Throws:: IOException - If there is an error getting the document.

getPDDocument

public PDDocument getPDDocument()
                         throws IOException

This will get the PD document that was parsed. When you are done with this document you must call close() on it to release resources.

Returns:: The document at the PD layer.
Throws:: IOException - If there is an error getting the document.

getFDFDocument

public FDFDocument getFDFDocument()
                           throws IOException

This will get the FDF document that was parsed. When you are done with this document you must call close() on it to release resources.

Returns:: The document at the PD layer.
Throws:: IOException - If there is an error getting the document.

parseStartXref

protected boolean parseStartXref()
                          throws IOException

This will parse the startxref section from the stream. The startxref value is ignored.

Returns:: false on parsing error
Throws:: IOException - If an IO error occurs.

parseXrefTable

protected boolean parseXrefTable(long startByteOffset)
                          throws IOException

This will parse the xref table from the stream and add it to the state The XrefTable contents are ignored.

Parameters:: startByteOffset - the offset to start at
Returns:: false on parsing error
Throws:: IOException - If an IO error occurs.

parseTrailer

protected boolean parseTrailer()
                        throws IOException

This will parse the trailer from the stream and add it to the state.

Returns:: false on parsing error
Throws:: IOException - If an IO error occurs.

parseXrefStream

public void parseXrefStream(COSStream stream,
                            long objByteOffset)
                     throws IOException

Fills XRefTrailerResolver with data of given stream. Stream must be of type XRef.

Parameters:: stream - the stream to be read; objByteOffset - the offset to start at
Throws:: IOException - if there is an error parsing the stream

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.pdfbox.pdfparser Class PDFParser

xrefTrailerResolver

PDFParser

PDFParser

PDFParser

setTempDirectory

isContinueOnError

parse

getDocument

getPDDocument

getFDFDocument

parseStartXref

parseXrefTable

parseTrailer

parseXrefStream

org.apache.pdfbox.pdfparser
Class PDFParser