|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.snowtide.pdf.OutputHandler
com.snowtide.pdf.RegionOutputTarget
public class RegionOutputTarget
This OutputHandler implemenation is used to selectively extract text from certain regions of each PDF page.
Here is the typical usage pattern:
Page from the PDFTextStream instance (retrieved using PDFTextStream.getPage(int)):Page.pipe(OutputHandler) functiongetRegionText(int) or getRegionText(String)Example:
PDFTextStream stream = new PDFTextStream(pdfFile);
RegionOutputTarget tgt = new RegionOutputTarget();
tgt.addRegion(40, 600, 120, 16, "name");
tgt.addRegion(40, 570, 120, 16, "address");
Page p = stream.getPage(0);
p.pipe(tgt);
stream.close();
String name = tgt.getRegionText("name");
String address = tgt.getRegionText("address");
Important notes:
addRegion(float, float, float, float) or
addRegion(float, float, float, float, String) functions are denominated in 1/72". Recall that the origin
of each page is its lower left corner. So, for example, a region that would encompass the top-left quarter of a
8.5" x 11" page would be registered with the parameters 0, 396, 306, 396.pipe(OutputHandler) function of anything other than a Page
will have undefined results. RegionOutputTarget depends on a PDF page being the "top-level" object in the PDF event stream.
| Constructor Summary | |
|---|---|
RegionOutputTarget()
Creates a new RegionOutputTarget, using a VisualOutputTarget to lay out the text extracted for each region. |
|
RegionOutputTarget(boolean useVisualTarget)
Creates a new RegionOutputTarget. |
|
| Method Summary | |
|---|---|
void |
addRegion(float x,
float y,
float width,
float height)
Registers a new unnamed region. |
void |
addRegion(float x,
float y,
float width,
float height,
java.lang.String name)
Registers a new named region. |
void |
endPage(Page page)
Invoked when PDFTextStream has finished processing a page |
int |
getRegionCnt()
Returns the number of registered regions. |
java.util.Set |
getRegionNames()
Returns a set containing each of the names used to register regions on this RegionOutputTarget via addRegion(float, float, float, float, String). |
java.lang.String |
getRegionText(int i)
Returns the text extracted from the i-th region that was registered with this RegionOutputTarget. |
java.lang.String |
getRegionText(java.lang.String regionName)
Returns the text extracted from the region that was registered with this RegionOutputTarget using the provided name. |
void |
startPage(Page page)
Invoked when a page is about to be processed. |
void |
textUnit(TextUnit tu)
Invoked when a run of characters is to be outputted, as represented by the given TextUnit instance. |
| Methods inherited from class com.snowtide.pdf.OutputHandler |
|---|
endBlock, endLine, endPDF, linebreaks, spaces, startBlock, startLine, startPDF |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public RegionOutputTarget()
VisualOutputTarget to lay out the text extracted for each region.
public RegionOutputTarget(boolean useVisualTarget)
useVisualTarget - - if true, then the layout of the text for each region will be determined by VisualOutputTarget;
otherwise, the standard OutputTarget will be used.| Method Detail |
|---|
public void addRegion(float x,
float y,
float width,
float height)
Registers a new unnamed region. The coordinate pair x, y describes the origin and bottom-left corner
of the rectangular region to be extracted; the width and height parameters represent
the size of the rectangular region, extending up and to the right from the origin specified by x, y.
All values are denominated in 1/72" (called points). Please note that the origin of each page is its lower left
corner. So, for example, a region that would encompass the top-left quarter of a 8.5" x 11" page would be registered
with the parameters 0, 396, 306, 396.
public void addRegion(float x,
float y,
float width,
float height,
java.lang.String name)
Registers a new named region. The coordinate pair x, y describes the origin and bottom-left corner
of the rectangular region to be extracted; the width and height parameters represent
the size of the rectangular region, extending up and to the right from the origin specified by x, y.
Text extracted from this region will be available via the getRegionText(String) function.
All values are denominated in 1/72" (called points). Please note that the origin of each page is its lower left
corner. So, for example, a region that would encompass the top-left quarter of a 8.5" x 11" page would be registered
with the parameters 0, 396, 306, 396.
public java.lang.String getRegionText(int i)
i-th region that was registered with this RegionOutputTarget.
public java.lang.String getRegionText(java.lang.String regionName)
public java.util.Set getRegionNames()
addRegion(float, float, float, float, String).
public int getRegionCnt()
public void startPage(Page page)
OutputHandler
startPage in class OutputHandlerpage - - a reference to the Page that is about to be processedpublic void textUnit(TextUnit tu)
OutputHandlerTextUnit instance.
textUnit in class OutputHandlerpublic void endPage(Page page)
OutputHandler
endPage in class OutputHandlerpage - - a reference to the Page that has been processed
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||