Package com.snowtide.pdf

PDFTextStream is a library that provides high performance, accurate text and metadata extraction, and is easy to integrate with your applications and web services on Java, .NET, and Python environments.

See:
          Description

Interface Summary
Font Represents a PDF font.
Page Instances of this class provide access to the text and attributes of a page extracted from a PDF document.
 

Class Summary
Bookmark Instances of this class form a singly-rooted tree available in some PDF documents.
EncryptionInfo Instances of this class provide information about the parameters used to encrypt a PDF document.
OutputHandler The base class for all PDF text event handlers.
OutputTarget This is a base OutputHandler implementation that provides a common output interface for Writer and Appendable instances (such as StringBuilders and StringBuffers), allowing PDFTextStream to easily redirect output to either type of object.
PDFDateParser This class provides methods for parsing PDF-format date/time strings into java.util.Date objects.
PDFTextStream PDFTextStream gives your Java, .NET, and Python applications the ability to: Extract text and metadata from PDF documents (including metadata like XMP data, bookmarks, and annotations) Extract and update interactive AcroForm data Merge PDF documents Instances of this class can either access a PDF file directly, or process equivalent data delivered via a java.io.InputStream or java.nio.ByteBuffer.
PDFTextStreamConfig Various configuration options for PDFTextStream may be set using this class.
PDFVersion A typesafe enumeration class that provides singleton objects corresponding to each possible PDFVersion instance that might be returned by calls to PDFTextStream.getPDFVersion().
RegionOutputTarget This OutputHandler implemenation is used to selectively extract text from certain regions of each PDF page.
VisualOutputTarget This OutputHandler implementation aims to preserve as much of a PDF's text layout as possible so that text extracts yielded by this OutputHandler will retain the visual arrangement of text as present in the original document.
 

Exception Summary
EncryptedPDFException A subclass of IOException that is thrown by PDFTextStream constructors if one of the following conditions occurs: a variety of encryption is encountered that PDFTextStream does not support an error occurs while decrypting PDF data an incorrect password is provided to one of the PDFTextStream constructors
FaultyPDFException Exceptions of this type are thrown by PDFTextStream when it encounters such a serious error when attempting to process a PDF file that no extraction can take place.
 

Package com.snowtide.pdf Description

PDFTextStream is a library that provides high performance, accurate text and metadata extraction, and is easy to integrate with your applications and web services on Java, .NET, and Python environments. This javadoc is the authoritative reference for PDFTextStream on all three platforms; its API is identical regardless of your development environment.

The com.snowtide.pdf package is where the main PDFTextStream class resides.

In addition, PDFTextStream comes with an integration module for use with the Jakarta Lucene indexing and search library.