Class PDF
- java.lang.Object
-
- com.snowtide.PDF
-
public class PDF extends Object
With PDFxStream, your Java/JVM and .NET applications can extract data of all sorts from PDF documents:
- metadata (including XMP data, bookmarks, and annotations)
- text for use in search indexing, semantic analysis, and data integration contexts
- images and image metadata for OCR pipelines and content and asset management contexts
- tabular data for import into structured data processing
- interactive form data
PDFxStream also provides a number of auxiliary utilities, including merging PDF files, updating and saving interactive form data to a new PDF document, and more.
Learn and Get Help
You can learn how to leverage PDFxStream to the fullest by relying upon this API reference, as well as our comprehensive set of tutorials and technical support materials.
Quick Start
- Open a
Document
to read from a PDF file on disk or in memory:Document pdf = com.snowtide.PDF.open("/path/to/file.pdf"); // OR byte[] pdfFileData = ...; Document pdf = com.snowtide.PDF.open(java.nio.ByteBuffer.wrap(pdfFileData));
- Extract the data you need; most common tasks require 1-4 lines of code:
- Extract text:
java.io.StringWriter text = new java.io.StringWriter(); pdf.pipe(new com.snowtide.pdf.OutputTarget(text)); String pdfText = text.toString();
- Extract images:
for (Image image : pdf.getPage(0).getImages()) { byte[] imageData = image.data(); Image.Format fmt = image.dataFormat(); // likely either JPEG, PNG, etc BufferedImage img = image.bitmap(); // save extracted image data elsewhere, // or get a "live" BufferedImage/Bitmap for drawing to a graphics context // ... }
- Extract form data:
for (AcroFormField field : (AcroForm)pdf.getFormData()) { String fieldName = field.getFullName(); String fieldValue = (String)field.getValue(); // process the form data somehow, e.g. push into a database, etc // ... }
- Extract text:
Level of Support
PDFxStream supports the core of the PDF file specification up to and including version 1.7 (corresponding to Acrobat 8 and higher), including 40/128-bit document encryption methods. PDFxStream also supports a variety of PDF variants: formats that deviate from the official PDF document specification significantly, yet still render as expected in Adobe Reader.
Errors
- Many PDFxStream functions and its constructors pass
IOException
s along as they are thrown due to underlying system I/O errors (permissions issues, etc). FaultyPDFException
s may be thrown in circumstances where a parsing or file structure problem is detected, and it is suspected that the PDF file in question is corrupt, invalid, or otherwise not readable.- Any errors encountered while decrypting PDF content will be signaled by a
EncryptedPDFException
. - Certain parts of the PDFxStream API require a
purchased license file to be registered
before use in production environments; accessing them otherwise will cause anInsufficientLicenseException
to be thrown.
- Version:
- ©2004-2024 Snowtide
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
PDF.Feature
An enumeration of the discrete features available within PDFxStream.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static boolean
hasFeature(PDF.Feature f)
Returns true if the givenPDF.Feature
is enabled by the currently-loaded
license file.static boolean
isLicensed()
Returns true if PDFxStream has loaded and verified a non-evaluation license file that has not yet expired.static boolean
loadLicense(String licenseFilePath)
Loads and attempts to verify a PDFxStream license file at the given path.static boolean
loadLicense(URL licenseLocation)
Loads and attempts to verify a PDFxStream license file at the given URL.static Document
open(File pdfFile)
static Document
open(File pdfFile, byte[] userPasswd)
static Document
open(File pdfFile, byte[] userPasswd, Configuration config)
static Document
open(InputStream is, String pdfName)
Returns a new openDocument
reading from the PDF data provided by the givenInputStream
.static Document
open(InputStream is, String pdfName, byte[] userPasswd)
Returns a new openDocument
reading from the PDF data provided by the givenInputStream
.static Document
open(InputStream is, String pdfName, byte[] userPasswd, Configuration config)
Returns a new openDocument
reading from the PDF data provided by the givenInputStream
.static Document
open(String pdfFilePath)
Returns a new openDocument
reading from the PDF file found at the given filesystem path.static Document
open(String pdfFilePath, byte[] userPasswd)
Returns a new openDocument
reading from the PDF file found at the given filesystem path.static Document
open(String pdfFilePath, byte[] userPasswd, Configuration config)
Returns a new openDocument
reading from the PDF file found at the given filesystem path.static Document
open(ByteBuffer pdfData, String pdfName)
Returns a new openDocument
reading from the PDF data provided by the givenByteBuffer
.static Document
open(ByteBuffer pdfData, String pdfName, byte[] userPasswd)
Returns a new openDocument
reading from the PDF data provided by the givenByteBuffer
.static Document
open(ByteBuffer pdfData, String pdfName, byte[] userPasswd, Configuration config)
Returns a new openDocument
reading from the PDF data provided by the givenByteBuffer
.
-
-
-
Method Detail
-
loadLicense
public static boolean loadLicense(String licenseFilePath)
Loads and attempts to verify a PDFxStream license file at the given path.
PDFxStream may also be configured to load a license file from a specific path by setting the system property or environment variable
pdfxs_license_path
to that path.- Parameters:
licenseFilePath
- an absolute or relative file path- Returns:
- true if a license file was found at the given path, and was successfully verified
-
loadLicense
public static boolean loadLicense(URL licenseLocation)
Loads and attempts to verify a PDFxStream license file at the given URL.
- Parameters:
licenseLocation
- a URL object- Returns:
- true if a license file was found at the given path, and was successfully verified
-
isLicensed
public static boolean isLicensed()
Returns true if PDFxStream has loaded and verified a non-evaluation license file that has not yet expired.
-
hasFeature
public static boolean hasFeature(PDF.Feature f)
Returns true if the givenPDF.Feature
is enabled by the currently-loaded
license file. If no license file has been loaded, all features will be enabled, but usage will be limited for evaluation purposes.
-
open
public static Document open(File pdfFile) throws IOException
- Throws:
IOException
- if an error occurs while opening the newDocument
EncryptedPDFException
- if an error occurs decrypting PDF data, if necessary- See Also:
open(java.io.File, byte[], com.snowtide.pdf.Configuration)
-
open
public static Document open(File pdfFile, byte[] userPasswd, Configuration config) throws IOException
- Parameters:
userPasswd
- the password that should be used to decrypt the given pdf file; should benull
if the file is not encryptedconfig
- aConfiguration
object from which the newDocument
will obtain various settings.- Throws:
IOException
- if an error occurs while opening the newDocument
EncryptedPDFException
- if an error occurs decrypting PDF data, if necessary
-
open
public static Document open(File pdfFile, byte[] userPasswd) throws IOException
- Parameters:
pdfFile
- the PDF file to be readuserPasswd
- the password that should be used to decrypt the given pdf file; should benull
if the file is not encrypted- Throws:
IOException
- if an error occurs while opening the newDocument
EncryptedPDFException
- if an error occurs decrypting PDF data, if necessary- See Also:
open(java.io.File, byte[], com.snowtide.pdf.Configuration)
-
open
public static Document open(InputStream is, String pdfName, byte[] userPasswd, Configuration config) throws IOException
Returns a new openDocument
reading from the PDF data provided by the givenInputStream
.Please note that because reading PDF content requires random access to any and all parts of the PDF data, the
InputStream
will be read in its entirety and written to a temporary file for processing. All temporary files are closed and deleted when the returnedDocument
is closed or (in the worst case) garbage-collected.- Parameters:
pdfName
- the name of the PDF file (used mostly in logging / debugging)userPasswd
- the password that should be used to decrypt the given pdf file; should benull
if the file is not encryptedconfig
- aConfiguration
object from which the newDocument
will obtain various settings.- Throws:
IOException
- if an error occurs while opening the newDocument
EncryptedPDFException
- if an error occurs decrypting PDF data, if necessary
-
open
public static Document open(InputStream is, String pdfName, byte[] userPasswd) throws IOException
Returns a new openDocument
reading from the PDF data provided by the givenInputStream
.- Parameters:
pdfName
- the name of the PDF file (used mostly in logging / debugging)userPasswd
- the password that should be used to decrypt the given pdf file; should benull
if the file is not encrypted- Throws:
IOException
- if an error occurs while opening the newDocument
EncryptedPDFException
- if an error occurs decrypting PDF data, if necessary- See Also:
open(java.io.InputStream, String, byte[], com.snowtide.pdf.Configuration)
-
open
public static Document open(InputStream is, String pdfName) throws IOException
Returns a new openDocument
reading from the PDF data provided by the givenInputStream
.- Parameters:
pdfName
- the name of the PDF file (used mostly in logging / debugging)- Throws:
IOException
- if an error occurs while opening the newDocument
EncryptedPDFException
- if an error occurs decrypting PDF data, if necessary- See Also:
open(java.io.InputStream, String, byte[], com.snowtide.pdf.Configuration)
-
open
public static Document open(ByteBuffer pdfData, String pdfName, byte[] userPasswd, Configuration config) throws IOException
Returns a new openDocument
reading from the PDF data provided by the givenByteBuffer
. If you have a PDF file's data in abyte[]
, useByteBuffer.wrap(byte[])
to efficiently deliver it to this method.- Parameters:
pdfName
- the name of the PDF whose data is provided bypdfData
, used for logging and debugging purposesuserPasswd
- the password that should be used to decrypt the given pdf file; should benull
if the file is not encryptedconfig
- aConfiguration
object from which the newDocument
will obtain various settings.- Throws:
IOException
- if an error occurs while opening the newDocument
EncryptedPDFException
- if an error occurs decrypting PDF data, if necessary
-
open
public static Document open(ByteBuffer pdfData, String pdfName, byte[] userPasswd) throws IOException
Returns a new openDocument
reading from the PDF data provided by the givenByteBuffer
. If you have a PDF file's data in abyte[]
, useByteBuffer.wrap(byte[])
to efficiently deliver it to this method.- Parameters:
pdfName
- the name of the PDF whose data is provided bypdfData
, used for logging and debugging purposesuserPasswd
- the password that should be used to decrypt the given pdf file; should benull
if the file is not encrypted- Throws:
IOException
- if an error occurs while opening the newDocument
EncryptedPDFException
- if an error occurs decrypting PDF data, if necessary- See Also:
open(java.nio.ByteBuffer, String, byte[], com.snowtide.pdf.Configuration)
-
open
public static Document open(ByteBuffer pdfData, String pdfName) throws IOException
Returns a new openDocument
reading from the PDF data provided by the givenByteBuffer
. If you have a PDF file's data in abyte[]
, useByteBuffer.wrap(byte[])
to efficiently deliver it to this method.- Parameters:
pdfName
- the name of the PDF whose data is provided bypdfData
, used for logging and debugging purposes- Throws:
IOException
- if an error occurs while opening the newDocument
EncryptedPDFException
- if an error occurs decrypting PDF data, if necessary- See Also:
open(java.nio.ByteBuffer, String, byte[], com.snowtide.pdf.Configuration)
-
open
public static Document open(String pdfFilePath, byte[] userPasswd, Configuration config) throws IOException
Returns a new openDocument
reading from the PDF file found at the given filesystem path.- Parameters:
userPasswd
- the password that should be used to decrypt the given pdf file; should benull
if the file is not encryptedconfig
- aConfiguration
object from which the newDocument
will obtain various settings.- Throws:
IOException
- if an error occurs while opening the newDocument
EncryptedPDFException
- if an error occurs decrypting PDF data, if necessary
-
open
public static Document open(String pdfFilePath) throws IOException
Returns a new openDocument
reading from the PDF file found at the given filesystem path.- Throws:
IOException
- if an error occurs while opening the newDocument
EncryptedPDFException
- if an error occurs decrypting PDF data, if necessary- See Also:
open(String, byte[], com.snowtide.pdf.Configuration)
-
open
public static Document open(String pdfFilePath, byte[] userPasswd) throws IOException
Returns a new openDocument
reading from the PDF file found at the given filesystem path.- Parameters:
pdfFilePath
- the path to the PDF file to be readuserPasswd
- the password that should be used to decrypt the given pdf file; should benull
if the file is not encrypted- Throws:
IOException
- if an error occurs while opening the newDocument
EncryptedPDFException
- if an error occurs decrypting PDF data, if necessary- See Also:
open(String, byte[], com.snowtide.pdf.Configuration)
-
-