Package com.snowtide

Class PDF


  • public class PDF
    extends Object

    With PDFxStream, your Java/JVM and .NET applications can extract data of all sorts from PDF documents:

    • metadata (including XMP data, bookmarks, and annotations)
    • text for use in search indexing, semantic analysis, and data integration contexts
    • images and image metadata for OCR pipelines and content and asset management contexts
    • tabular data for import into structured data processing
    • interactive form data

    PDFxStream also provides a number of auxiliary utilities, including merging PDF files, updating and saving interactive form data to a new PDF document, and more.

    Learn and Get Help

    You can learn how to leverage PDFxStream to the fullest by relying upon this API reference, as well as our comprehensive set of tutorials and technical support materials.

    Quick Start

    1. Open a Document to read from a PDF file on disk or in memory:
       Document pdf = com.snowtide.PDF.open("/path/to/file.pdf");
       // OR
       byte[] pdfFileData = ...;
       Document pdf = com.snowtide.PDF.open(java.nio.ByteBuffer.wrap(pdfFileData));
    2. Extract the data you need; most common tasks require 1-4 lines of code:
      • Extract text:
         java.io.StringWriter text = new java.io.StringWriter();
         pdf.pipe(new com.snowtide.pdf.OutputTarget(text));
         String pdfText = text.toString();
      • Extract images:
         for (Image image : pdf.getPage(0).getImages()) {
             byte[] imageData = image.data();
             Image.Format fmt = image.dataFormat(); // likely either JPEG, PNG, etc
             BufferedImage img = image.bitmap();
             // save extracted image data elsewhere,
             // or get a "live" BufferedImage/Bitmap for drawing to a graphics context
             // ...
         }
      • Extract form data:
        for (AcroFormField field : (AcroForm)pdf.getFormData()) {
             String fieldName = field.getFullName();
             String fieldValue = (String)field.getValue();
             // process the form data somehow, e.g. push into a database, etc
             // ...
         }

    Level of Support

    PDFxStream supports the core of the PDF file specification up to and including version 1.7 (corresponding to Acrobat 8 and higher), including 40/128-bit document encryption methods. PDFxStream also supports a variety of PDF variants: formats that deviate from the official PDF document specification significantly, yet still render as expected in Adobe Reader.

    Errors

    • Many PDFxStream functions and its constructors pass IOExceptions along as they are thrown due to underlying system I/O errors (permissions issues, etc).
    • FaultyPDFExceptions may be thrown in circumstances where a parsing or file structure problem is detected, and it is suspected that the PDF file in question is corrupt, invalid, or otherwise not readable.
    • Any errors encountered while decrypting PDF content will be signaled by a EncryptedPDFException.
    • Certain parts of the PDFxStream API require a purchased license file to be registered before use in production environments; accessing them otherwise will cause an InsufficientLicenseException to be thrown.
    Version:
    ©2004-2024 Snowtide