|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.pdfbox.util.PDFStreamEngine
org.apache.pdfbox.util.PDFTextStripper
org.apache.pdfbox.util.PDFText2HTML
public class PDFText2HTML
Wrap stripped text in simple HTML, trying to form HTML paragraphs. Paragraphs broken by pages, columns, or figures are not mended.
Field Summary |
---|
Fields inherited from class org.apache.pdfbox.util.PDFTextStripper |
---|
charactersByArticle, document, lineSeparator, output, outputEncoding |
Constructor Summary | |
---|---|
PDFText2HTML(String encoding)
Constructor. |
Method Summary | |
---|---|
protected void |
endArticle()
Write out the article separator. |
void |
endDocument(PDDocument pdf)
This method is available for subclasses of this class. |
protected String |
getTitle()
This method will attempt to guess the title of the document using either the document properties or the first lines of text. |
protected void |
startArticle(boolean isltr)
Write out the article separator (div tag) with proper text direction information. |
protected void |
writeHeader()
Write the header to the output document. |
protected void |
writePage()
This will print the text of the processed page to "output". |
protected void |
writeString(String chars)
Write a string to the output stream and escape some HTML characters. |
Methods inherited from class org.apache.pdfbox.util.PDFStreamEngine |
---|
getColorSpaces, getCurrentPage, getFonts, getGraphicsStack, getGraphicsState, getGraphicsStates, getResources, getTextLineMatrix, getTextMatrix, getTotalCharCnt, getValidCharCnt, getXObjects, processEncodedText, processOperator, processOperator, processStream, processSubStream, registerOperatorProcessor, resetEngine, setColorSpaces, setFonts, setGraphicsStack, setGraphicsState, setGraphicsStates, setTextLineMatrix, setTextMatrix |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public PDFText2HTML(String encoding) throws IOException
encoding
- The encoding to be used
IOException
- If there is an error during initialization.Method Detail |
---|
protected void writeHeader() throws IOException
IOException
- If there is a problem writing out the header to the document.protected void writePage() throws IOException
writePage
in class PDFTextStripper
IOException
- If there is an error writing the text.public void endDocument(PDDocument pdf) throws IOException
endDocument
in class PDFTextStripper
pdf
- The PDF document that is being processed.
IOException
- If an IO error occurs.protected String getTitle()
protected void startArticle(boolean isltr) throws IOException
startArticle
in class PDFTextStripper
isltr
- true if direction of text is left to right
IOException
- If there is an error writing to the stream.protected void endArticle() throws IOException
endArticle
in class PDFTextStripper
IOException
- If there is an error writing to the stream.protected void writeString(String chars) throws IOException
writeString
in class PDFTextStripper
chars
- String to be written to the stream
IOException
- If there is an error writing to the stream.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |