public class GoogleHTMLOutputHandler extends OutputHandler
This example captures PDF text content, and builds an XHTML document to mimic the HTML view that Google offers for indexed PDF documents.
| Constructor and Description |
|---|
GoogleHTMLOutputHandler() |
| Modifier and Type | Method and Description |
|---|---|
void |
endPage(Page page)
Invoked when PDFTextStream has finished processing a page
|
org.w3c.dom.Document |
getHTMLDocument()
Returns the XHTML document that is built up by this OutputHandler.
|
static void |
main(java.lang.String[] args)
Main method for command-line execution.
|
void |
startPage(Page page)
Invoked when a page is about to be processed.
|
void |
startPDF(java.lang.String pdfName,
java.io.File pdfFile)
Invoked when a new PDF is about to be processed.
|
void |
textUnit(TextUnit tu)
Invoked when a run of characters is to be outputted, as represented by the
given
TextUnit instance. |
endBlock, endLine, endPDF, linebreaks, spaces, startBlock, startLinepublic GoogleHTMLOutputHandler()
throws javax.xml.parsers.ParserConfigurationException,
javax.xml.parsers.FactoryConfigurationError
javax.xml.parsers.ParserConfigurationExceptionjavax.xml.parsers.FactoryConfigurationErrorpublic static void main(java.lang.String[] args)
throws java.lang.Exception
java GoogleHTMLOutputHandler [input_pdf_file] [output_html_path]
java.lang.Exceptionpublic org.w3c.dom.Document getHTMLDocument()
public void startPage(Page page)
OutputHandlerstartPage in class OutputHandlerpage - - a reference to the Page that is about to be processedpublic void endPage(Page page)
OutputHandlerendPage in class OutputHandlerpage - - a reference to the Page that has been processedpublic void startPDF(java.lang.String pdfName,
java.io.File pdfFile)
OutputHandlerstartPDF in class OutputHandlerpdfName - - the 'name' of the PDF document, as provided by
PDFTextStream.getName() }pdfFile - - the file reference PDFTextStream is about to begin processing.
This reference may be null if the PDFTextStream instance was not created using one of the
java.io.File- or java.io.InputStream-based constructors.public void textUnit(TextUnit tu)
OutputHandlerTextUnit instance.textUnit in class OutputHandler