com.snowtide.pdf
Class RegionOutputTarget

java.lang.Object
  extended by com.snowtide.pdf.OutputHandler
      extended by com.snowtide.pdf.RegionOutputTarget

public class RegionOutputTarget
extends OutputHandler

This OutputHandler implemenation is used to selectively extract text from certain regions of each PDF page.

Here is the typical usage pattern:

  1. Create PDFTextStream instance.
  2. Create RegionOutputTarget instance, optionally specifying which type of OutputTarget to delegate text layout rending.
  3. Register each region of interest with the RegionOutputTarget, optionally specifying a name for each
  4. For each Page from the PDFTextStream instance (retrieved using PDFTextStream.getPage(int)):
  5. Pass the created RegionOutputTarget instance to each Page's Page.pipe(OutputHandler) function
  6. Retrieve the text extracted for each region from the RegionOutputTarget, using either getRegionText(int) or getRegionText(String)

Example:

 PDFTextStream stream = new PDFTextStream(pdfFile);
 RegionOutputTarget tgt = new RegionOutputTarget();
 tgt.addRegion(40, 600, 120, 16, "name");
 tgt.addRegion(40, 570, 120, 16, "address");
 
 Page p = stream.getPage(0);
 p.pipe(tgt);
 stream.close();
 
 String name = tgt.getRegionText("name");
 String address = tgt.getRegionText("address");
 

Important notes:

Since:
v2.0.2
Version:
©2004-2012 Snowtide Informatics Systems, Inc.

Constructor Summary
RegionOutputTarget()
          Creates a new RegionOutputTarget, using a VisualOutputTarget to lay out the text extracted for each region.
RegionOutputTarget(boolean useVisualTarget)
          Creates a new RegionOutputTarget.
 
Method Summary
 void addRegion(float x, float y, float width, float height)
           Registers a new unnamed region.
 void addRegion(float x, float y, float width, float height, java.lang.String name)
           Registers a new named region.
 void endPage(Page page)
          Invoked when PDFTextStream has finished processing a page
 int getRegionCnt()
          Returns the number of registered regions.
 java.util.Set getRegionNames()
          Returns a set containing each of the names used to register regions on this RegionOutputTarget via addRegion(float, float, float, float, String).
 java.lang.String getRegionText(int i)
          Returns the text extracted from the i-th region that was registered with this RegionOutputTarget.
 java.lang.String getRegionText(java.lang.String regionName)
          Returns the text extracted from the region that was registered with this RegionOutputTarget using the provided name.
 void startPage(Page page)
          Invoked when a page is about to be processed.
 void textUnit(TextUnit tu)
          Invoked when a run of characters is to be outputted, as represented by the given TextUnit instance.
 
Methods inherited from class com.snowtide.pdf.OutputHandler
endBlock, endLine, endPDF, linebreaks, spaces, startBlock, startLine, startPDF
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RegionOutputTarget

public RegionOutputTarget()
Creates a new RegionOutputTarget, using a VisualOutputTarget to lay out the text extracted for each region.


RegionOutputTarget

public RegionOutputTarget(boolean useVisualTarget)
Creates a new RegionOutputTarget.

Parameters:
useVisualTarget - - if true, then the layout of the text for each region will be determined by VisualOutputTarget; otherwise, the standard OutputTarget will be used.
Method Detail

addRegion

public void addRegion(float x,
                      float y,
                      float width,
                      float height)

Registers a new unnamed region. The coordinate pair x, y describes the origin and bottom-left corner of the rectangular region to be extracted; the width and height parameters represent the size of the rectangular region, extending up and to the right from the origin specified by x, y.

All values are denominated in 1/72" (called points). Please note that the origin of each page is its lower left corner. So, for example, a region that would encompass the top-left quarter of a 8.5" x 11" page would be registered with the parameters 0, 396, 306, 396.


addRegion

public void addRegion(float x,
                      float y,
                      float width,
                      float height,
                      java.lang.String name)

Registers a new named region. The coordinate pair x, y describes the origin and bottom-left corner of the rectangular region to be extracted; the width and height parameters represent the size of the rectangular region, extending up and to the right from the origin specified by x, y. Text extracted from this region will be available via the getRegionText(String) function.

All values are denominated in 1/72" (called points). Please note that the origin of each page is its lower left corner. So, for example, a region that would encompass the top-left quarter of a 8.5" x 11" page would be registered with the parameters 0, 396, 306, 396.


getRegionText

public java.lang.String getRegionText(int i)
Returns the text extracted from the i-th region that was registered with this RegionOutputTarget.


getRegionText

public java.lang.String getRegionText(java.lang.String regionName)
Returns the text extracted from the region that was registered with this RegionOutputTarget using the provided name.


getRegionNames

public java.util.Set getRegionNames()
Returns a set containing each of the names used to register regions on this RegionOutputTarget via addRegion(float, float, float, float, String).


getRegionCnt

public int getRegionCnt()
Returns the number of registered regions.


startPage

public void startPage(Page page)
Description copied from class: OutputHandler
Invoked when a page is about to be processed.

Overrides:
startPage in class OutputHandler
Parameters:
page - - a reference to the Page that is about to be processed

textUnit

public void textUnit(TextUnit tu)
Description copied from class: OutputHandler
Invoked when a run of characters is to be outputted, as represented by the given TextUnit instance.

Overrides:
textUnit in class OutputHandler

endPage

public void endPage(Page page)
Description copied from class: OutputHandler
Invoked when PDFTextStream has finished processing a page

Overrides:
endPage in class OutputHandler
Parameters:
page - - a reference to the Page that has been processed