Documents
and their Pages, as well as various PDFxStream
interfaces and implementations thereof that simply many PDF data extraction use cases.See: Description
| Interface | Description |
|---|---|
| Document |
Interface implemented by all objects representing a PDF document.
|
| DocumentLocation |
Represents a unique location within a
Document. |
| Font |
Represents a PDF font.
|
| Page |
Provides access to the text, images, and attributes of a page extracted
from a PDF document.
|
| Class | Description |
|---|---|
| Bookmark |
Instances of this class form a singly-rooted tree available in some PDF documents.
|
| Configuration |
Various configuration options for PDFxStream may be set using this class.
|
| Console |
This class provides a command-line interface to PDFxStream and its capabilities.
|
| EmbeddedFile |
Files in PDF documents may be associated either with
the document as a whole, or
with annotations that are located on a single page in a particular
location. |
| EncryptionInfo |
Instances of this class provide information about the parameters used to encrypt a PDF document.
|
| OutputHandler |
The base class for all PDF text event handlers.
|
| OutputTarget |
This is a base
OutputHandler implementation that directs all text extraction output to an
Appendable of your choice, e.g. a Writer,
StringBuilder, CharBuffer, and so on. |
| PDFDateParser |
This class provides methods for parsing PDF-format date/time strings
into
Dates. |
| PDFTextStream | Deprecated |
| RegionOutputTarget |
This
OutputHandler implemenation is used to selectively extract text from certain regions of each PDF page. |
| SelectionOutputTarget |
An
OutputTarget derivative that restricts the content added to the given
StringBuffer to that within the starting and ending selection points
specified in the constructor. |
| VisualOutputTarget |
This OutputHandler implementation aims to preserve as much of a PDF's text layout as possible so
that text extracts will retain the visual arrangement of text as present
in the original document.
|
| Enum | Description |
|---|---|
| EncryptedPDFException.ErrorType |
An enumeration of the set of possible
error types that can be indicated by a thrown
EncryptedPDFException. |
| PDFVersion |
An enumeration corresponding to the PDF specification version levels that can be returned by
Document.getPDFVersion(). |
| Exception | Description |
|---|---|
| EncryptedPDFException |
A subclass of IOException that is thrown by PDFxStream constructors if
one of the following conditions occurs:
a variety of encryption is encountered that PDFxStream does not support
an error occurs while decrypting PDF data
an incorrect password is provided to one of the PDFxStream constructors
|
| FaultyPDFException |
Exceptions of this type are thrown by PDFxStream when it encounters such a serious error when attempting
to process a PDF file that no extraction can take place.
|
Documents
and their Pages, as well as various PDFxStream
interfaces and implementations thereof that simply many PDF data extraction use cases.
See PDF to get started.