public class GoogleHTMLOutputHandler extends OutputHandler
This example captures PDF text content, and builds an XHTML document to mimic the HTML view that Google offers for indexed PDF documents.
Constructor and Description |
---|
GoogleHTMLOutputHandler() |
Modifier and Type | Method and Description |
---|---|
void |
endPage(Page page)
Invoked when PDFTextStream has finished processing a page
|
org.w3c.dom.Document |
getHTMLDocument()
Returns the XHTML document that is built up by this OutputHandler.
|
static void |
main(java.lang.String[] args)
Main method for command-line execution.
|
void |
startPage(Page page)
Invoked when a page is about to be processed.
|
void |
startPDF(java.lang.String pdfName,
java.io.File pdfFile)
Invoked when a new PDF is about to be processed.
|
void |
textUnit(TextUnit tu)
Invoked when a run of characters is to be outputted, as represented by the
given
TextUnit instance. |
endBlock, endLine, endPDF, linebreaks, spaces, startBlock, startLine
public GoogleHTMLOutputHandler() throws javax.xml.parsers.ParserConfigurationException, javax.xml.parsers.FactoryConfigurationError
javax.xml.parsers.ParserConfigurationException
javax.xml.parsers.FactoryConfigurationError
public static void main(java.lang.String[] args) throws java.lang.Exception
java GoogleHTMLOutputHandler [input_pdf_file] [output_html_path]
java.lang.Exception
public org.w3c.dom.Document getHTMLDocument()
public void startPage(Page page)
OutputHandler
startPage
in class OutputHandler
page
- - a reference to the Page
that is about to be processedpublic void endPage(Page page)
OutputHandler
endPage
in class OutputHandler
page
- - a reference to the Page
that has been processedpublic void startPDF(java.lang.String pdfName, java.io.File pdfFile)
OutputHandler
startPDF
in class OutputHandler
pdfName
- - the 'name' of the PDF document, as provided by
PDFTextStream.getName()
}pdfFile
- - the file reference PDFTextStream is about to begin processing.
This reference may be null if the PDFTextStream instance was not created using one of the
java.io.File
- or java.io.InputStream
-based constructors.public void textUnit(TextUnit tu)
OutputHandler
TextUnit
instance.textUnit
in class OutputHandler