GoogleHTMLOutputHandler (PDFTextStream API Reference)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

pdfts.examples
Class GoogleHTMLOutputHandler

java.lang.Object
  com.snowtide.pdf.OutputHandler
      pdfts.examples.GoogleHTMLOutputHandler

public class GoogleHTMLOutputHandler
extends OutputHandler
extends OutputHandler

This example captures PDF text content, and builds an XHTML document to mimic the HTML view that Google offers for indexed PDF documents.

Version:: ©2004-2012 Snowtide Informatics Systems, Inc.

Constructor Summary
`GoogleHTMLOutputHandler()`

Method Summary
`void`	`endPage(Page page)` Invoked when PDFTextStream has finished processing a page
`org.w3c.dom.Document`	`getHTMLDocument()` Returns the XHTML document that is built up by this OutputHandler.
`static void`	`main(java.lang.String[] args)` Main method for command-line execution.
`void`	`startPage(Page page)` Invoked when a page is about to be processed.
`void`	`startPDF(java.lang.String pdfName, java.io.File pdfFile)` Invoked when a new PDF is about to be processed.
`void`	`textUnit(TextUnit tu)` Invoked when a run of characters is to be outputted, as represented by the given `TextUnit` instance.

Methods inherited from class com.snowtide.pdf.OutputHandler
`endBlock, endLine, endPDF, linebreaks, spaces, startBlock, startLine`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

GoogleHTMLOutputHandler

public GoogleHTMLOutputHandler()
                        throws javax.xml.parsers.ParserConfigurationException,
                               javax.xml.parsers.FactoryConfigurationError

Throws:: javax.xml.parsers.ParserConfigurationException; javax.xml.parsers.FactoryConfigurationError

Method Detail

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception

Main method for command-line execution. Usage:

java GoogleHTMLOutputHandler [input_pdf_file] [output_html_path]

Throws:: java.lang.Exception

getHTMLDocument

public org.w3c.dom.Document getHTMLDocument()

Returns the XHTML document that is built up by this OutputHandler.

startPage

public void startPage(Page page)

Description copied from class: OutputHandler

Invoked when a page is about to be processed.

Overrides:: startPage in class OutputHandler

Parameters:: page - - a reference to the Page that is about to be processed

endPage

public void endPage(Page page)

Description copied from class: OutputHandler

Invoked when PDFTextStream has finished processing a page

Overrides:: endPage in class OutputHandler

Parameters:: page - - a reference to the Page that has been processed

startPDF

public void startPDF(java.lang.String pdfName,
                     java.io.File pdfFile)

Description copied from class: OutputHandler

Invoked when a new PDF is about to be processed.

Overrides:: startPDF in class OutputHandler

Parameters:: pdfName - - the 'name' of the PDF document, as provided by PDFTextStream.getName() }; pdfFile - - the file reference PDFTextStream is about to begin processing. This reference may be null if the PDFTextStream instance was not created using one of the java.io.File- or java.io.InputStream-based constructors.

textUnit

public void textUnit(TextUnit tu)

Description copied from class: OutputHandler

Invoked when a run of characters is to be outputted, as represented by the given TextUnit instance.

Overrides:: textUnit in class OutputHandler

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

pdfts.examples Class GoogleHTMLOutputHandler

GoogleHTMLOutputHandler

main

getHTMLDocument

startPage

endPage

startPDF

textUnit

pdfts.examples
Class GoogleHTMLOutputHandler