|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcom.snowtide.pdf.OutputHandler
pdfts.examples.GoogleHTMLOutputHandler
public class GoogleHTMLOutputHandler
This example captures PDF text content, and builds an XHTML document to mimic the HTML view that Google offers for indexed PDF documents.
Constructor Summary | |
---|---|
GoogleHTMLOutputHandler()
|
Method Summary | |
---|---|
void |
endPage(Page page)
Invoked when PDFTextStream has finished processing a page |
org.w3c.dom.Document |
getHTMLDocument()
Returns the XHTML document that is built up by this OutputHandler. |
static void |
main(java.lang.String[] args)
Main method for command-line execution. |
void |
startPage(Page page)
Invoked when a page is about to be processed. |
void |
startPDF(java.lang.String pdfName,
java.io.File pdfFile)
Invoked when a new PDF is about to be processed. |
void |
textUnit(TextUnit tu)
Invoked when a run of characters is to be outputted, as represented by the given TextUnit instance. |
Methods inherited from class com.snowtide.pdf.OutputHandler |
---|
endBlock, endLine, endPDF, linebreaks, spaces, startBlock, startLine |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public GoogleHTMLOutputHandler() throws javax.xml.parsers.ParserConfigurationException, javax.xml.parsers.FactoryConfigurationError
javax.xml.parsers.ParserConfigurationException
javax.xml.parsers.FactoryConfigurationError
Method Detail |
---|
public static void main(java.lang.String[] args) throws java.lang.Exception
java GoogleHTMLOutputHandler [input_pdf_file] [output_html_path]
java.lang.Exception
public org.w3c.dom.Document getHTMLDocument()
public void startPage(Page page)
OutputHandler
startPage
in class OutputHandler
page
- - a reference to the Page
that is about to be processedpublic void endPage(Page page)
OutputHandler
endPage
in class OutputHandler
page
- - a reference to the Page
that has been processedpublic void startPDF(java.lang.String pdfName, java.io.File pdfFile)
OutputHandler
startPDF
in class OutputHandler
pdfName
- - the 'name' of the PDF document, as provided by
PDFTextStream.getName()
}pdfFile
- - the file reference PDFTextStream is about to begin processing.
This reference may be null if the PDFTextStream instance was not created using one of the
java.io.File
- or java.io.InputStream
-based constructors.public void textUnit(TextUnit tu)
OutputHandler
TextUnit
instance.
textUnit
in class OutputHandler
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |