com.snowtide.pdf (PDFTextStream API Reference)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV PACKAGE NEXT PACKAGE

FRAMES NO FRAMES

Package com.snowtide.pdf

See:
Description

Interface Summary
Font	Represents a PDF font.
Page	Instances of this class provide access to the text and attributes of a page extracted from a PDF document.

Class Summary
Bookmark	Instances of this class form a singly-rooted tree available in some PDF documents.
EncryptionInfo	Instances of this class provide information about the parameters used to encrypt a PDF document.
OutputHandler	The base class for all PDF text event handlers.
OutputTarget	This is a base `OutputHandler` implementation that provides a common output interface for `Writer` and `Appendable` instances (such as `StringBuilder`s and `StringBuffer`s), allowing PDFTextStream to easily redirect output to either type of object.
PDFDateParser	This class provides methods for parsing PDF-format date/time strings into `java.util.Date objects`.
PDFTextStream	`PDFTextStream` gives your Java, .NET, and Python applications the ability to: Extract text and metadata from PDF documents (including metadata like XMP data, bookmarks, and annotations) Extract and update interactive AcroForm data Merge PDF documents Instances of this class can either access a PDF file directly, or process equivalent data delivered via a `java.io.InputStream` or `java.nio.ByteBuffer`.
PDFTextStreamConfig	Various configuration options for PDFTextStream may be set using this class.
PDFVersion	A typesafe enumeration class that provides singleton objects corresponding to each possible PDFVersion instance that might be returned by calls to `PDFTextStream.getPDFVersion()`.
RegionOutputTarget	This `OutputHandler` implemenation is used to selectively extract text from certain regions of each PDF page.
VisualOutputTarget	This OutputHandler implementation aims to preserve as much of a PDF's text layout as possible so that text extracts yielded by this OutputHandler will retain the visual arrangement of text as present in the original document.

Exception Summary
EncryptedPDFException	A subclass of IOException that is thrown by PDFTextStream constructors if one of the following conditions occurs: a variety of encryption is encountered that PDFTextStream does not support an error occurs while decrypting PDF data an incorrect password is provided to one of the PDFTextStream constructors
FaultyPDFException	Exceptions of this type are thrown by PDFTextStream when it encounters such a serious error when attempting to process a PDF file that no extraction can take place.

Package com.snowtide.pdf Description

PDFTextStream is a library that provides high performance, accurate text and metadata extraction, and is easy to integrate with your applications and web services on Java, .NET, and Python environments. This javadoc is the authoritative reference for PDFTextStream on all three platforms; its API is identical regardless of your development environment.

The com.snowtide.pdf package is where the main PDFTextStream class resides.

In addition, PDFTextStream comes with an integration module for use with the Jakarta Lucene indexing and search library.