pdfts.examples
Class GoogleHTMLOutputHandler

java.lang.Object
  extended by com.snowtide.pdf.OutputHandler
      extended by pdfts.examples.GoogleHTMLOutputHandler

public class GoogleHTMLOutputHandler
extends OutputHandler

This example captures PDF text content, and builds an XHTML document to mimic the HTML view that Google offers for indexed PDF documents.

Version:
©2004-2012 Snowtide Informatics Systems, Inc.

Constructor Summary
GoogleHTMLOutputHandler()
           
 
Method Summary
 void endPage(Page page)
          Invoked when PDFTextStream has finished processing a page
 org.w3c.dom.Document getHTMLDocument()
          Returns the XHTML document that is built up by this OutputHandler.
static void main(java.lang.String[] args)
          Main method for command-line execution.
 void startPage(Page page)
          Invoked when a page is about to be processed.
 void startPDF(java.lang.String pdfName, java.io.File pdfFile)
          Invoked when a new PDF is about to be processed.
 void textUnit(TextUnit tu)
          Invoked when a run of characters is to be outputted, as represented by the given TextUnit instance.
 
Methods inherited from class com.snowtide.pdf.OutputHandler
endBlock, endLine, endPDF, linebreaks, spaces, startBlock, startLine
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

GoogleHTMLOutputHandler

public GoogleHTMLOutputHandler()
                        throws javax.xml.parsers.ParserConfigurationException,
                               javax.xml.parsers.FactoryConfigurationError
Throws:
javax.xml.parsers.ParserConfigurationException
javax.xml.parsers.FactoryConfigurationError
Method Detail

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
Main method for command-line execution. Usage:

java GoogleHTMLOutputHandler [input_pdf_file] [output_html_path]

Throws:
java.lang.Exception

getHTMLDocument

public org.w3c.dom.Document getHTMLDocument()
Returns the XHTML document that is built up by this OutputHandler.


startPage

public void startPage(Page page)
Description copied from class: OutputHandler
Invoked when a page is about to be processed.

Overrides:
startPage in class OutputHandler
Parameters:
page - - a reference to the Page that is about to be processed

endPage

public void endPage(Page page)
Description copied from class: OutputHandler
Invoked when PDFTextStream has finished processing a page

Overrides:
endPage in class OutputHandler
Parameters:
page - - a reference to the Page that has been processed

startPDF

public void startPDF(java.lang.String pdfName,
                     java.io.File pdfFile)
Description copied from class: OutputHandler
Invoked when a new PDF is about to be processed.

Overrides:
startPDF in class OutputHandler
Parameters:
pdfName - - the 'name' of the PDF document, as provided by PDFTextStream.getName() }
pdfFile - - the file reference PDFTextStream is about to begin processing. This reference may be null if the PDFTextStream instance was not created using one of the java.io.File- or java.io.InputStream-based constructors.

textUnit

public void textUnit(TextUnit tu)
Description copied from class: OutputHandler
Invoked when a run of characters is to be outputted, as represented by the given TextUnit instance.

Overrides:
textUnit in class OutputHandler