public interface Page
Document.getPage(int)
Modifier and Type | Field and Description |
---|---|
static int |
COLUMN_POSITION_HALVES
A constant parameter for use with
Page.addColumnPartition(int) . |
static int |
COLUMN_POSITION_QUARTERS
A constant parameter for use with
Page.addColumnPartition(int) . |
static int |
COLUMN_POSITION_THIRDS
A constant parameter for use with
Page.addColumnPartition(int) . |
Modifier and Type | Method and Description |
---|---|
void |
addColumnPartition(int xcoord)
Adds the given coordinate as an acceptable midline between columns, used when
this page is segmented.
|
Page |
crop(Region area)
Returns a
Page instance that contains only the content held by
this Page instance that intersects the given "query" area. |
Collection<TextUnit> |
getCharacters()
Returns a collection of
TextUnit s on this page. |
Configuration |
getConfig()
Returns the
Configuration instance provided to this page by its parent Document
instance. |
Region |
getCropBox()
The "crop box" defined by the PDF for this page, expressed in user space units as with
Page.getPageHeight() and Page.getPageWidth() . |
Document |
getDocument()
Returns the
Document from which this Page was sourced. |
Collection<Image> |
getImages()
Returns a Collection of
Image objects, one for each image on this page. |
int |
getPageHeight()
Returns the height of this page in PDF "default user space units" (as specified by the PDF spec).
|
int |
getPageNumber()
Returns this Page's page number.
|
int |
getPageWidth()
Returns the width of this page in PDF "default user space units" (as specified by the PDF spec).
|
int |
getRotationTheta()
Returns the number of degrees by which the page has been rotated clockwise.
|
BlockParent |
getTextContent()
Returns a BlockParent instance that contains all Block instances held by this Page, which
in turn hold all text content for this Page.
|
void |
pipe(OutputHandler tgt)
Extracts all text from this page, sending necessary events to the given
OutputHandler implementation. |
static final int COLUMN_POSITION_HALVES
Page.addColumnPartition(int)
.static final int COLUMN_POSITION_THIRDS
Page.addColumnPartition(int)
.static final int COLUMN_POSITION_QUARTERS
Page.addColumnPartition(int)
.void addColumnPartition(int xcoord)
Page.COLUMN_POSITION_HALVES
, Page.COLUMN_POSITION_THIRDS
,
or Page.COLUMN_POSITION_QUARTERS
are provided. Those constants "expand"
into multiple column partitions; e.g. specifying Page.COLUMN_POSITION_THIRDS
will result in two column partitions, one at getPageWidth() / 3
and another
at 2 * getPageWidth() / 3
.
Page.getTextContent()
or Page.pipe(OutputHandler)
are invoked.UnsupportedOperationException
- if this Page's implementation does not
support specifying column positions.int getPageNumber()
int getPageWidth()
/MediaBox
attribute of
a PDF page object.int getPageHeight()
/MediaBox
attribute of
a PDF page object.Region getCropBox()
Page.getPageHeight()
and Page.getPageWidth()
. This rectangle will default to the page
width and height if it is not otherwise specified.int getRotationTheta()
void pipe(OutputHandler tgt)
OutputHandler
implementation.
Equivalent to page.getTextContent().pipe(...)
.
OutputTarget
is the easiest way to
take advantage of this function.InsufficientLicenseException
- if a license has been loaded
,
but that license does not include PDF.Feature.Text
.BlockParent getTextContent()
InsufficientLicenseException
- if a license has been loaded
,
but that license does not include PDF.Feature.Text
.Collection<TextUnit> getCharacters()
TextUnit
s on this page.
Note that this collection is unordered unless a
license has been loaded
that includes PDF.Feature.Text
.
If attempting PDF text extraction, using
Page.pipe(OutputHandler)
with an appropriate OutputHandler
,
or accessing the document model
produced by
PDFTextStream
is strongly recommended.
Collection<Image> getImages()
Image
objects, one for each image on this page. Note that
the same image data might be displayed multiple times on a page; in such situations, multiple
Image
instances will still be included in the returned collection so as to represent each
displayed image's dimensions and position.InsufficientLicenseException
- if a license has been loaded
,
but that license does not include PDF.Feature.Images
.Configuration getConfig()
Configuration
instance provided to this page by its parent Document
instance.Page crop(Region area)
Page
instance that contains only the content held by
this Page
instance that intersects the given "query" area.
If all of the content held by this instance is intersected by the query area, then this
instance may be returned unchanged. If no content in this Page
intersects the query area, then an empty Page
instance will be
returned.UnsupportedOperationException
- if this Page
implementation does not support
crop
.Block.crop(Region)