public class GoogleHTMLOutputHandler extends OutputHandler
This class is an example OutputHandler implementation that builds an XHTML document to
mimic the HTML view that Google offers for indexed PDF documents.
Source for this class is included in every PDFxStream bundle.
| Constructor and Description |
|---|
GoogleHTMLOutputHandler() |
| Modifier and Type | Method and Description |
|---|---|
void |
endPage(Page page)
Invoked when PDFxStream has finished processing a page
|
org.w3c.dom.Document |
getHTMLDocument()
Returns the XHTML document that is built up by this OutputHandler.
|
static void |
main(java.lang.String[] args)
Deprecated.
Command-line usage of this class may be moved or removed in future PDFxStream releases.
|
void |
startPage(Page page)
Invoked when a page is about to be processed.
|
void |
startPDF(java.lang.String pdfName,
java.io.File pdfFile)
Invoked when a new PDF is about to be processed.
|
void |
textUnit(TextUnit tu)
Invoked when a run of characters is to be outputted, as represented by the
given
TextUnit instance. |
endBlock, endLine, endPDF, linebreaks, spaces, startBlock, startLinepublic static void main(java.lang.String[] args)
java GoogleHTMLOutputHandler [input_pdf_file] [output_html_path]
public org.w3c.dom.Document getHTMLDocument()
public void startPage(Page page)
OutputHandlerstartPage in class OutputHandlerpage - a reference to the Page that is about to be processedpublic void endPage(Page page)
OutputHandlerendPage in class OutputHandlerpage - a reference to the Page that has been processedpublic void startPDF(java.lang.String pdfName,
java.io.File pdfFile)
OutputHandlerstartPDF in class OutputHandlerpdfName - the 'name' of the PDF document, as provided by
Document.getName() }pdfFile - the file reference PDFxStream is about to begin processing.
This reference may be null if the source Document is not reading from a
File or InputStream.public void textUnit(TextUnit tu)
OutputHandlerTextUnit instance.textUnit in class OutputHandler