|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcom.snowtide.pdf.OutputHandler
com.snowtide.pdf.RegionOutputTarget
public class RegionOutputTarget
This OutputHandler
implemenation is used to selectively extract text from certain regions of each PDF page.
Here is the typical usage pattern:
Page
from the PDFTextStream instance (retrieved using PDFTextStream.getPage(int)
):Page.pipe(OutputHandler)
functiongetRegionText(int)
or getRegionText(String)
Example:
PDFTextStream stream = new PDFTextStream(pdfFile); RegionOutputTarget tgt = new RegionOutputTarget(); tgt.addRegion(40, 600, 120, 16, "name"); tgt.addRegion(40, 570, 120, 16, "address"); Page p = stream.getPage(0); p.pipe(tgt); stream.close(); String name = tgt.getRegionText("name"); String address = tgt.getRegionText("address");
Important notes:
addRegion(float, float, float, float)
or
addRegion(float, float, float, float, String)
functions are denominated in 1/72". Recall that the origin
of each page is its lower left corner. So, for example, a region that would encompass the top-left quarter of a
8.5" x 11" page would be registered with the parameters 0, 396, 306, 396
.pipe(OutputHandler)
function of anything other than a Page
will have undefined results. RegionOutputTarget depends on a PDF page being the "top-level" object in the PDF event stream.
Constructor Summary | |
---|---|
RegionOutputTarget()
Creates a new RegionOutputTarget, using a VisualOutputTarget to lay out the text extracted for each region. |
|
RegionOutputTarget(boolean useVisualTarget)
Creates a new RegionOutputTarget. |
Method Summary | |
---|---|
void |
addRegion(float x,
float y,
float width,
float height)
Registers a new unnamed region. |
void |
addRegion(float x,
float y,
float width,
float height,
java.lang.String name)
Registers a new named region. |
void |
endPage(Page page)
Invoked when PDFTextStream has finished processing a page |
int |
getRegionCnt()
Returns the number of registered regions. |
java.util.Set |
getRegionNames()
Returns a set containing each of the names used to register regions on this RegionOutputTarget via addRegion(float, float, float, float, String) . |
java.lang.String |
getRegionText(int i)
Returns the text extracted from the i-th region that was registered with this RegionOutputTarget. |
java.lang.String |
getRegionText(java.lang.String regionName)
Returns the text extracted from the region that was registered with this RegionOutputTarget using the provided name. |
void |
startPage(Page page)
Invoked when a page is about to be processed. |
void |
textUnit(TextUnit tu)
Invoked when a run of characters is to be outputted, as represented by the given TextUnit instance. |
Methods inherited from class com.snowtide.pdf.OutputHandler |
---|
endBlock, endLine, endPDF, linebreaks, spaces, startBlock, startLine, startPDF |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public RegionOutputTarget()
VisualOutputTarget
to lay out the text extracted for each region.
public RegionOutputTarget(boolean useVisualTarget)
useVisualTarget
- - if true, then the layout of the text for each region will be determined by VisualOutputTarget
;
otherwise, the standard OutputTarget
will be used.Method Detail |
---|
public void addRegion(float x, float y, float width, float height)
Registers a new unnamed region. The coordinate pair x, y
describes the origin and bottom-left corner
of the rectangular region to be extracted; the width
and height
parameters represent
the size of the rectangular region, extending up and to the right from the origin specified by x, y
.
All values are denominated in 1/72" (called points). Please note that the origin of each page is its lower left
corner. So, for example, a region that would encompass the top-left quarter of a 8.5" x 11" page would be registered
with the parameters 0, 396, 306, 396
.
public void addRegion(float x, float y, float width, float height, java.lang.String name)
Registers a new named region. The coordinate pair x, y
describes the origin and bottom-left corner
of the rectangular region to be extracted; the width
and height
parameters represent
the size of the rectangular region, extending up and to the right from the origin specified by x, y
.
Text extracted from this region will be available via the getRegionText(String)
function.
All values are denominated in 1/72" (called points). Please note that the origin of each page is its lower left
corner. So, for example, a region that would encompass the top-left quarter of a 8.5" x 11" page would be registered
with the parameters 0, 396, 306, 396
.
public java.lang.String getRegionText(int i)
i-th
region that was registered with this RegionOutputTarget.
public java.lang.String getRegionText(java.lang.String regionName)
public java.util.Set getRegionNames()
addRegion(float, float, float, float, String)
.
public int getRegionCnt()
public void startPage(Page page)
OutputHandler
startPage
in class OutputHandler
page
- - a reference to the Page
that is about to be processedpublic void textUnit(TextUnit tu)
OutputHandler
TextUnit
instance.
textUnit
in class OutputHandler
public void endPage(Page page)
OutputHandler
endPage
in class OutputHandler
page
- - a reference to the Page
that has been processed
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |