PDFTextStream is a library that provides high performance, accurate text and metadata extraction, and is easy to integrate with your applications and web services on Java, .NET, and Python environments.

See:
          Description

Packages
com.snowtide.pdf PDFTextStream is a library that provides high performance, accurate text and metadata extraction, and is easy to integrate with your applications and web services on Java, .NET, and Python environments.
com.snowtide.pdf.annot The com.snowtide.pdf.annot package contains interfaces and classes that PDFTextStream uses to represent various types of annotations present in PDF documents.
com.snowtide.pdf.forms The com.snowtide.pdf.forms package is home to a variety of classes that support PDFTextStream's form extraction functionality.
com.snowtide.pdf.layout  
com.snowtide.pdf.lucene The com.snowtide.pdf.lucene package provides a method of seamless integration between the Apache Lucene full-text indexing and search engine available for the Java environment.
com.snowtide.pdf.util  
com.snowtide.util.logging  
pdfts.examples  

 

PDFTextStream is a library that provides high performance, accurate text and metadata extraction, and is easy to integrate with your applications and web services on Java, .NET, and Python environments. This javadoc is the authoritative reference for PDFTextStream on all three platforms; its API is identical regardless of your development environment.

The com.snowtide.pdf package is where the main PDFTextStream class resides.

In addition, PDFTextStream comes with an integration module for use with the Jakarta Lucene indexing and search library.