public class GoogleHTMLOutputHandler extends OutputHandler
This example captures PDF text content, and builds an XHTML document to mimic the HTML view that Google offers for indexed PDF documents.
| Constructor and Description | 
|---|
GoogleHTMLOutputHandler()  | 
| Modifier and Type | Method and Description | 
|---|---|
void | 
endPage(Page page)
Invoked when PDFTextStream has finished processing a page 
 | 
org.w3c.dom.Document | 
getHTMLDocument()
Returns the XHTML document that is built up by this OutputHandler. 
 | 
static void | 
main(java.lang.String[] args)
Main method for command-line execution. 
 | 
void | 
startPage(Page page)
Invoked when a page is about to be processed. 
 | 
void | 
startPDF(java.lang.String pdfName,
        java.io.File pdfFile)
Invoked when a new PDF is about to be processed. 
 | 
void | 
textUnit(TextUnit tu)
Invoked when a run of characters is to be outputted, as represented by the
 given  
TextUnit instance. | 
endBlock, endLine, endPDF, linebreaks, spaces, startBlock, startLinepublic GoogleHTMLOutputHandler()
                        throws javax.xml.parsers.ParserConfigurationException,
                               javax.xml.parsers.FactoryConfigurationError
javax.xml.parsers.ParserConfigurationExceptionjavax.xml.parsers.FactoryConfigurationErrorpublic static void main(java.lang.String[] args)
                 throws java.lang.Exception
java GoogleHTMLOutputHandler [input_pdf_file] [output_html_path]
java.lang.Exceptionpublic org.w3c.dom.Document getHTMLDocument()
public void startPage(Page page)
OutputHandlerstartPage in class OutputHandlerpage - - a reference to the Page that is about to be processedpublic void endPage(Page page)
OutputHandlerendPage in class OutputHandlerpage - - a reference to the Page that has been processedpublic void startPDF(java.lang.String pdfName,
            java.io.File pdfFile)
OutputHandlerstartPDF in class OutputHandlerpdfName - - the 'name' of the PDF document, as provided by
 PDFTextStream.getName() }pdfFile - - the file reference PDFTextStream is about to begin processing.
 This reference may be null if the PDFTextStream instance was not created using one of the 
 java.io.File- or java.io.InputStream-based constructors.public void textUnit(TextUnit tu)
OutputHandlerTextUnit instance.textUnit in class OutputHandler