Interface Document
- 
- All Superinterfaces:
- AutoCloseable,- Closeable,- OutputSource
 - All Known Implementing Classes:
- PDFTextStream
 
 public interface Document extends OutputSource, Closeable Interface implemented by all objects representing a PDF document. Use one of theopenmethods on thePDFfactory class — e.g.PDF.open(java.io.File)— to obtain aDocumentproviding access to the contents of a particular PDF file.- Version:
- ©2004-2025 Snowtide
 
- 
- 
Field SummaryFields Modifier and Type Field Description static StringATTR_AUTHORDocument attribute key used to retrieve a String indicating who created a PDF document.static StringATTR_CREATION_DATEDocument attribute key used to retrieve a String indicating the date and time that a PDF document was created.static StringATTR_CREATORDocument attribute key used to retrieve a String indicating the name of the application that created the original document from which the PDF was generated.static StringATTR_KEYWORDSDocument attribute key used to retrieve a String containing keywords associated with a PDF document.static StringATTR_MOD_DATEDocument attribute key used to retrieve a String indicating the date and time that a PDF document was last modified.static StringATTR_PRODUCERDocument attribute key used to retrieve a String indicating the name of the application that generated a PDF document.static StringATTR_SUBJECTDocument attribute key used to retrieve a String indicating the subject of a PDF document.static StringATTR_TITLEDocument attribute key used to retrieve a String indicating the title of a PDF document.static StringATTR_TRAPPEDDocument attribute key used to retrieve an indicator as to whether a PDF document includes trapping information (trapping is a method for correcting printing errors in high-quality printing environments).static StringATTR_USES_GRAPH_FONTSSome PDF files use fonts that are image-based -- instead of their encodings mapping character codes to standard Unicode characters, they map character codes to images of characters.
 - 
Method SummaryAll Methods Instance Methods Abstract Methods Modifier and Type Method Description List<Annotation>getAllAnnotations()Returns a list containing all of theAnnotations contained in the current PDF document.intgetAllAnnotations(List tgt)Adds to the given List all of theAnnotations contained in the current PDF document.List<EmbeddedFile>getAllEmbeddedFiles()Returns a list of all ofthe embedded filesavailable in the source PDF.List<Annotation>getAnnotations(int page)Returns a List of all annotations found on the page indicated by the given page number; each object will be an instance of a class that implements theAnnotationinterface.ObjectgetAttribute(String attrName)Returns the value of the specified document-level metadata attribute.Set<String>getAttributeKeys()Returns aSetcontaining the keys of all available document metadata attributes.Map<String,Object>getAttributeMap()Returns aMapcontaining a copy of all keys and values of all available document metadata attributes.BookmarkgetBookmarks()If the current PDF document contains a bookmark tree, this function will return its root node.ConfigurationgetConfig()Returns theConfigurationinstance that thisDocumentis using to govern its operation.List<EmbeddedFile>getEmbeddedFiles()Returns a list ofthe embedded filesassociated with the source PDF document itself.EncryptionInfogetEncryptionInfo()Returns an EncryptionInfo object, which provides access to some of the parameters used for the current PDF document's encryption.FormgetFormData()Loads the form data contained in the current document, and returns aFormobject that represents that data.Collection<Image>getImages()StringgetName()Returns the name of the PDF that thisDocumentis reading; this will be either the name of the PDF file that is being read, or thepdfNameString that was provided if thisDocumentwas opened using one of thecom.snowtide.PDF.open()methods that accepts anInputStreamorByteBuffer, e.g.PagegetPage(int n)Reads and returns a single page.intgetPageCnt()Returns the number of pages in the PDF document.List<Page>getPages()FilegetPDFFile()Returns a reference to the file that thisDocumentis processing.longgetPdfFileSize()Returns the size of the PDF file being read, in bytes.PDFVersiongetPDFVersion()Returns thePDFVersioninstance that corresponds with the version of the PDF file specification to which current PDF file adheres.byte[]getXmlMetadata()Returns the XML metadata available from thisDocument, or null if no XML metadata is available.voidsetConfig(Configuration config)Sets theConfigurationinstance that thisDocumentwill use in various contexts to govern its operation.- 
Methods inherited from interface com.snowtide.pdf.OutputSourcepipe, pipe
 
- 
 
- 
- 
- 
Field Detail- 
ATTR_TITLEstatic final String ATTR_TITLE Document attribute key used to retrieve a String indicating the title of a PDF document.- See Also:
- getAttribute(String), Constant Field Values
 
 - 
ATTR_AUTHORstatic final String ATTR_AUTHOR Document attribute key used to retrieve a String indicating who created a PDF document.- See Also:
- getAttribute(String), Constant Field Values
 
 - 
ATTR_SUBJECTstatic final String ATTR_SUBJECT Document attribute key used to retrieve a String indicating the subject of a PDF document.- See Also:
- getAttribute(String), Constant Field Values
 
 - 
ATTR_KEYWORDSstatic final String ATTR_KEYWORDS Document attribute key used to retrieve a String containing keywords associated with a PDF document.- See Also:
- getAttribute(String), Constant Field Values
 
 - 
ATTR_CREATORstatic final String ATTR_CREATOR Document attribute key used to retrieve a String indicating the name of the application that created the original document from which the PDF was generated.- See Also:
- getAttribute(String), Constant Field Values
 
 - 
ATTR_PRODUCERstatic final String ATTR_PRODUCER Document attribute key used to retrieve a String indicating the name of the application that generated a PDF document.- See Also:
- getAttribute(String), Constant Field Values
 
 - 
ATTR_CREATION_DATEstatic final String ATTR_CREATION_DATE Document attribute key used to retrieve a String indicating the date and time that a PDF document was created. This String may be parsed into a java.util.Date object by passing it to theparseDateString(String)method.- See Also:
- getAttribute(String), Constant Field Values
 
 - 
ATTR_MOD_DATEstatic final String ATTR_MOD_DATE Document attribute key used to retrieve a String indicating the date and time that a PDF document was last modified. This String may be parsed into a java.util.Date object by passing it to theparseDateString(String)method.- See Also:
- getAttribute(String), Constant Field Values
 
 - 
ATTR_TRAPPEDstatic final String ATTR_TRAPPED Document attribute key used to retrieve an indicator as to whether a PDF document includes trapping information (trapping is a method for correcting printing errors in high-quality printing environments). This key maps to a String, the valid values of which are 'False' and 'Unknown'.- See Also:
- getAttribute(String), Constant Field Values
 
 - 
ATTR_USES_GRAPH_FONTSstatic final String ATTR_USES_GRAPH_FONTS Some PDF files use fonts that are image-based -- instead of their encodings mapping character codes to standard Unicode characters, they map character codes to images of characters. This makes it possible for these kinds of fonts (typically referred to as Type3 fonts) to, for example, map the character code 32 to the image of a letter 'g' instead of the standard space character. PDFxStream can derive the Unicode encoding of Type3 fonts in many cases, and will do so automatically if possible. Otherwise, content that uses a Type3 font for which no proper encoding can be derived will be skipped, and a document attribute with this key will be set and mapped to a Booleanobject with a value oftrue.- See Also:
- getAttribute(String), Constant Field Values
 
 
- 
 - 
Method Detail- 
setConfigvoid setConfig(Configuration config) Sets theConfigurationinstance that thisDocumentwill use in various contexts to govern its operation.Note that certain configuration options are utilized only when a Documentis being opened. In order for non-default settings for those such options to take effect, a customizedConfigurationobject must either be set as thedefault configuration, or must be provided to any of thecom.snowtide.PDF.open()static methods that accept aConfigurationobject, e.g.PDF.open(java.io.File, byte[], Configuration).
 - 
getConfigConfiguration getConfig() Returns theConfigurationinstance that thisDocumentis using to govern its operation.
 - 
getImagesCollection<Image> getImages() throws IOException - Throws:
- IOException- if an error occurs during the extraction process
 
 - 
getPdfFileSizelong getPdfFileSize() Returns the size of the PDF file being read, in bytes.- Since:
- v1.3
 
 - 
getPageCntint getPageCnt() Returns the number of pages in the PDF document.
 - 
getPagePage getPage(int n) throws IOException Reads and returns a single page. Page numbers are zero-indexed; they do not necessarily correspond with any reader-visible page number.- Parameters:
- n- the number of the page to retrieve.
- Throws:
- IOException- if an error occurs while preparing the Page for use
- Since:
- v1.3
 
 - 
getPagesList<Page> getPages() Returns a list ofpagesfrom thisDocument, which are loaded lazily when accessed via the returned list.- Since:
- v3.0
 
 - 
getNameString getName() Returns the name of the PDF that thisDocumentis reading; this will be either the name of the PDF file that is being read, or thepdfNameString that was provided if thisDocumentwas opened using one of thecom.snowtide.PDF.open()methods that accepts anInputStreamorByteBuffer, e.g.PDF.open(java.io.InputStream, String)Nearly all of the logging messages generated by PDFxStream include the relevant Document's name, making them easier to interpret in a multithreaded production environment.
 - 
getPDFFileFile getPDFFile() Returns a reference to the file that thisDocumentis processing. This reference may be null if theDocumentinstance is not reading from aFileorInputStream.
 - 
getFormDataForm getFormData() throws IOException Loads the form data contained in the current document, and returns aFormobject that represents that data. If the current PDF contains no forms, this function returns null. TheForminstance that is returned by this function is guaranteed to be anAcroForm.This function MUST NOT be called after this Documentisclosed.- Throws:
- IOException- if an error occurs loading the form data
 
 - 
getEmbeddedFilesList<EmbeddedFile> getEmbeddedFiles() throws IOException Returns a list ofthe embedded filesassociated with the source PDF document itself. UsegetAllEmbeddedFiles()to include all embedded files associated with annotations as well.- Throws:
- IOException- if reading the embedded file metadata fails
- Since:
- 3.0.0
- See Also:
- getAllEmbeddedFiles()
 
 - 
getAllEmbeddedFilesList<EmbeddedFile> getAllEmbeddedFiles() throws IOException Returns a list of all ofthe embedded filesavailable in the source PDF. This method includes all files associated with annotations as well; if you only want those embedded files that are associated with the source document itself (and not annotations), usegetEmbeddedFiles().- Throws:
- IOException- if reading the embedded file metadata fails
- Since:
- 3.0.0
- See Also:
- getEmbeddedFiles()
 
 - 
getBookmarksBookmark getBookmarks() throws IOException If the current PDF document contains a bookmark tree, this function will return its root node. If the document contains no bookmarks, this function will return null.An exception will be thrown if this function is called after this Documentinstance isclosed.- Throws:
- IOException- if an error occurs reading the bookmark tree
- Since:
- v1.3.5
- See Also:
- Bookmark
 
 - 
getAnnotationsList<Annotation> getAnnotations(int page) throws IOException Returns a List of all annotations found on the page indicated by the given page number; each object will be an instance of a class that implements theAnnotationinterface.This function will never return null; if a page contains no annotations, an empty list will be returned. The returned list is guaranteed to offer efficient random access to its elements. - Throws:
- IOException- if an error occurs retrieving the annotation data
- Since:
- v1.3.5
- See Also:
- Annotation
 
 - 
getAllAnnotationsList<Annotation> getAllAnnotations() throws IOException Returns a list containing all of theAnnotations contained in the current PDF document. The returned list is guaranteed to offer efficient random access to its elements.- Throws:
- IOException- if an error occurs retrieving the annotation data
- Since:
- v1.3.5
 
 - 
getAllAnnotationsint getAllAnnotations(List tgt) throws IOException Adds to the given List all of theAnnotations contained in the current PDF document.- Returns:
- the number of annotations added to the list
- Throws:
- IOException- if an error occurs retrieving the annotation data
- Since:
- v1.3.5
- See Also:
- Annotation
 
 - 
getPDFVersionPDFVersion getPDFVersion() throws IOException Returns the PDFVersioninstance that corresponds with the version of the PDF file specification to which current PDF file adheres. PDF specification version numbers correspond directly with particular versions of Adobe Acrobat:- v1.0 - Acrobat 1
- v1.1 - Acrobat 2
- v1.2 - Acrobat 3
- v1.3 - Acrobat 4
- v1.4 - Acrobat 5
- v1.5 - Acrobat 6
- v1.6 - Acrobat 7
- v1.7 - Acrobat 8+
 This method may not be called after the Documentisclosed.- Throws:
- IOException- if an error occurs in determining what the PDF file's version is
- Since:
- v1.3
 
 - 
getEncryptionInfoEncryptionInfo getEncryptionInfo() Returns an EncryptionInfo object, which provides access to some of the parameters used for the current PDF document's encryption.If the current PDF document is not encrypted, this method will return null. - Since:
- v1.3
 
 - 
getXmlMetadatabyte[] getXmlMetadata() throws IOExceptionReturns the XML metadata available from this Document, or null if no XML metadata is available.Note: This method must be called before the Documentis closed, and it should not be called while text is being actively read out of it. (Supporting such concurrency would require synchronization that would negatively impact performance.) Therefore, the best times to call this method are:- just after opening the Documentbut before reading text out of it
- after all text has been read out of the Document, but before it is closed
 PDFxStream does not control the content returned by this method -- it just provides access to the data that is already stored in a PDF document. The schema of the the returned XML data is defined by Adobe, and is called the Extensible Metadata Platform (XMP). More information about XMP can be found on Adobe's website - Throws:
- IOException- if this- Documenthas already been closed, or if an error occurs retrieving the XML metadata.
- Since:
- v1.2
 
- just after opening the 
 - 
getAttributeObject getAttribute(String attrName) throws IOException Returns the value of the specified document-level metadata attribute.All of the standard attribute names are defined in constants in this class, and are all prefixed with 'ATTR_'. A few notes should be kept in mind when accessing attribute values: - It is typical for only a subset of the possible attributes to be defined in a PDF document. Any attributes that are undefined will return a null value when their name is provided to this method.
- Many more attributes are used in the real world than are formally specified by the
          PDF specification.  It is entirely up to the PDF generator what attributes are to be
          outputted for a particular document, so some documents may contain attributes
          whose names are not canonicalized in the 'ATTR_' constants in this class.  You can use the
          getAttributeKeys()method to get a Set of the names of all available attributes.
- Most attribute values are Strings, but it is possible for attribute values to be Integers, Booleans, etc.
          The documentation associated with each attribute name constant in this class
          specifies what type may be expected when retrieving each particular attribute value.  Any attributes
          specified as dates are returned from this method as String instances; these can be passed through
          parseDateString(String)to get a Date object.
 Note: the attributes available through this method are retrieved from the "classic" document /Info entry. The document metadata in an XML format (which typically contains the same set of metadata attributes that are available through this method) may be obtained via the getXmlMetadata()method.- Parameters:
- attrName- the name of the attribute to be retrieved
- Returns:
- the value of the attribute with the given name defined in the PDF document being read, or null if no attribute is available with the given name. The type of this object depends upon which attribute is being retrieved, and is noted in the documentation of the attribute name constants held by this class.
- Throws:
- IOException- if an error occurs while retrieving the PDF document's metadata
- See Also:
- getXmlMetadata() for access to the XML-formatted document metadata
 
 - 
getAttributeKeysSet<String> getAttributeKeys() throws IOException Returns aSetcontaining the keys of all available document metadata attributes.- Throws:
- IOException- if an error occurs while retrieving the PDF document's metadata
 
 - 
getAttributeMapMap<String,Object> getAttributeMap() throws IOException Returns aMapcontaining a copy of all keys and values of all available document metadata attributes.- Throws:
- IOException- if an error occurs while retrieving the PDF document's metadata
 
 
- 
 
-