|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.snowtide.pdf.OutputHandler
pdfts.examples.GoogleHTMLOutputHandler
public class GoogleHTMLOutputHandler
This example captures PDF text content, and builds an XHTML document to mimic the HTML view that Google offers for indexed PDF documents.
| Constructor Summary | |
|---|---|
GoogleHTMLOutputHandler()
|
|
| Method Summary | |
|---|---|
void |
endPage(Page page)
Invoked when PDFTextStream has finished processing a page |
org.w3c.dom.Document |
getHTMLDocument()
Returns the XHTML document that is built up by this OutputHandler. |
static void |
main(java.lang.String[] args)
Main method for command-line execution. |
void |
startPage(Page page)
Invoked when a page is about to be processed. |
void |
startPDF(java.lang.String pdfName,
java.io.File pdfFile)
Invoked when a new PDF is about to be processed. |
void |
textUnit(TextUnit tu)
Invoked when a run of characters is to be outputted, as represented by the given TextUnit instance. |
| Methods inherited from class com.snowtide.pdf.OutputHandler |
|---|
endBlock, endLine, endPDF, linebreaks, spaces, startBlock, startLine |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public GoogleHTMLOutputHandler()
throws javax.xml.parsers.ParserConfigurationException,
javax.xml.parsers.FactoryConfigurationError
javax.xml.parsers.ParserConfigurationException
javax.xml.parsers.FactoryConfigurationError| Method Detail |
|---|
public static void main(java.lang.String[] args)
throws java.lang.Exception
java GoogleHTMLOutputHandler [input_pdf_file] [output_html_path]
java.lang.Exceptionpublic org.w3c.dom.Document getHTMLDocument()
public void startPage(Page page)
OutputHandler
startPage in class OutputHandlerpage - - a reference to the Page that is about to be processedpublic void endPage(Page page)
OutputHandler
endPage in class OutputHandlerpage - - a reference to the Page that has been processed
public void startPDF(java.lang.String pdfName,
java.io.File pdfFile)
OutputHandler
startPDF in class OutputHandlerpdfName - - the 'name' of the PDF document, as provided by
PDFTextStream.getName() }pdfFile - - the file reference PDFTextStream is about to begin processing.
This reference may be null if the PDFTextStream instance was not created using one of the
java.io.File- or java.io.InputStream-based constructors.public void textUnit(TextUnit tu)
OutputHandlerTextUnit instance.
textUnit in class OutputHandler
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||