public class GoogleHTMLOutputHandler extends OutputHandler
This class is an example OutputHandler
implementation that builds an XHTML document to
mimic the HTML view that Google offers for indexed PDF documents.
Source for this class is included in every PDFxStream bundle.
Constructor and Description |
---|
GoogleHTMLOutputHandler() |
Modifier and Type | Method and Description |
---|---|
void |
endPage(Page page)
Invoked when PDFxStream has finished processing a page
|
org.w3c.dom.Document |
getHTMLDocument()
Returns the XHTML document that is built up by this OutputHandler.
|
static void |
main(java.lang.String[] args)
Deprecated.
Command-line usage of this class may be moved or removed in future PDFxStream releases.
|
void |
startPage(Page page)
Invoked when a page is about to be processed.
|
void |
startPDF(java.lang.String pdfName,
java.io.File pdfFile)
Invoked when a new PDF is about to be processed.
|
void |
textUnit(TextUnit tu)
Invoked when a run of characters is to be outputted, as represented by the
given
TextUnit instance. |
endBlock, endLine, endPDF, linebreaks, spaces, startBlock, startLine
public static void main(java.lang.String[] args)
java GoogleHTMLOutputHandler [input_pdf_file] [output_html_path]
public org.w3c.dom.Document getHTMLDocument()
public void startPage(Page page)
OutputHandler
startPage
in class OutputHandler
page
- a reference to the Page
that is about to be processedpublic void endPage(Page page)
OutputHandler
endPage
in class OutputHandler
page
- a reference to the Page
that has been processedpublic void startPDF(java.lang.String pdfName, java.io.File pdfFile)
OutputHandler
startPDF
in class OutputHandler
pdfName
- the 'name' of the PDF document, as provided by
Document.getName()
}pdfFile
- the file reference PDFxStream is about to begin processing.
This reference may be null if the source Document
is not reading from a
File
or InputStream
.public void textUnit(TextUnit tu)
OutputHandler
TextUnit
instance.textUnit
in class OutputHandler