Class PDFTextStream
- java.lang.Object
-
- com.snowtide.pdf.PDFTextStream
-
- All Implemented Interfaces:
Document
,OutputSource
,Closeable
,AutoCloseable
public class PDFTextStream extends Object implements Document
Deprecated.This class is deprecated, and provided solely to ensure backwards compatibility for codebases written for the PDFTextStream v2.x API.Please use the
open
methods on thePDF
factory class for opening PDF documents, e.g.PDF.open(java.io.File)
.- Version:
- ©2004-2024 Snowtide
-
-
Field Summary
-
Fields inherited from interface com.snowtide.pdf.Document
ATTR_AUTHOR, ATTR_CREATION_DATE, ATTR_CREATOR, ATTR_KEYWORDS, ATTR_MOD_DATE, ATTR_PRODUCER, ATTR_SUBJECT, ATTR_TITLE, ATTR_TRAPPED, ATTR_USES_GRAPH_FONTS
-
-
Constructor Summary
Constructors Constructor Description PDFTextStream(File pdfFile)
Deprecated.PDFTextStream(File pdfFile, byte[] userPasswd)
Deprecated.PDFTextStream(File pdfFile, byte[] userPasswd, Configuration config)
Deprecated.PDFTextStream(InputStream is, String pdfName)
Deprecated.PDFTextStream(InputStream is, String pdfName, byte[] userPasswd)
Deprecated.PDFTextStream(InputStream is, String pdfName, byte[] userPasswd, Configuration config)
Deprecated.PDFTextStream(String pdfFilePath)
Deprecated.PDFTextStream(String pdfFilePath, byte[] userPasswd)
Deprecated.PDFTextStream(String pdfFilePath, byte[] userPasswd, Configuration config)
Deprecated.PDFTextStream(ByteBuffer pdfData, String pdfName)
Deprecated.PDFTextStream(ByteBuffer pdfData, String pdfName, byte[] userPasswd)
Deprecated.PDFTextStream(ByteBuffer pdfData, String pdfName, byte[] userPasswd, Configuration config)
Deprecated.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description void
close()
Deprecated.List<Annotation>
getAllAnnotations()
Deprecated.Returns a list containing all of theAnnotation
s contained in the current PDF document.int
getAllAnnotations(List tgt)
Deprecated.Adds to the given List all of theAnnotation
s contained in the current PDF document.List<EmbeddedFile>
getAllEmbeddedFiles()
Deprecated.Returns a list of all ofthe embedded files
available in the source PDF.List<Annotation>
getAnnotations(int page)
Deprecated.Returns a List of all annotations found on the page indicated by the given page number; each object will be an instance of a class that implements theAnnotation
interface.Object
getAttribute(String attrName)
Deprecated.Returns the value of the specified document-level metadata attribute.Set<String>
getAttributeKeys()
Deprecated.Returns aSet
containing the keys of all available document metadata attributes.Map<String,Object>
getAttributeMap()
Deprecated.Returns aMap
containing a copy of all keys and values of all available document metadata attributes.Bookmark
getBookmarks()
Deprecated.If the current PDF document contains a bookmark tree, this function will return its root node.Configuration
getConfig()
Deprecated.Returns theConfiguration
instance that thisDocument
is using to govern its operation.List<EmbeddedFile>
getEmbeddedFiles()
Deprecated.Returns a list ofthe embedded files
associated with the source PDF document itself.EncryptionInfo
getEncryptionInfo()
Deprecated.Returns an EncryptionInfo object, which provides access to some of the parameters used for the current PDF document's encryption.Form
getFormData()
Deprecated.Loads the form data contained in the current document, and returns aForm
object that represents that data.Collection<Image>
getImages()
Deprecated.String
getName()
Deprecated.Returns the name of the PDF that thisDocument
is reading; this will be either the name of the PDF file that is being read, or thepdfName
String that was provided if thisDocument
was opened using one of thecom.snowtide.PDF.open()
methods that accepts anInputStream
orByteBuffer
, e.g.Page
getPage(int n)
Deprecated.Reads and returns a single page.int
getPageCnt()
Deprecated.Returns the number of pages in the PDF document.List<Page>
getPages()
Deprecated.File
getPDFFile()
Deprecated.Returns a reference to the file that thisDocument
is processing.long
getPdfFileSize()
Deprecated.Returns the size of the PDF file being read, in bytes.PDFVersion
getPDFVersion()
Deprecated.Returns thePDFVersion
instance that corresponds with the version of the PDF file specification to which current PDF file adheres.byte[]
getXmlMetadata()
Deprecated.Returns the XML metadata available from thisDocument
, or null if no XML metadata is available.static boolean
isLicensed()
Deprecated.Retained to maintain PDFTextStream v2.x API compatibility.static boolean
loadLicense(String path)
Deprecated.Retained to maintain PDFTextStream v2.x API compatibility.static boolean
loadLicense(URL licenseLocation)
Deprecated.Retained to maintain PDFTextStream v2.x API compatibility.void
pipe(OutputHandler handler)
Deprecated.Equivalent toOutputSource.pipe(OutputHandler, Direction)
, with aDirection
ofConfiguration#getBaseDirection()
.void
pipe(OutputHandler handler, Direction bd)
Deprecated.Extracts all available text from thisOutputSource
, sending all PDF text events to the givenOutputHandler
, with the specifiedDirection
.void
setConfig(Configuration config)
Deprecated.Sets theConfiguration
instance that thisDocument
will use in various contexts to govern its operation.
-
-
-
Constructor Detail
-
PDFTextStream
public PDFTextStream(InputStream is, String pdfName) throws IOException
Deprecated.Equivalent to the corresponding "open
" static method inPDF
, provided to ensure backwards compatibility with codebases using the PDFTextStream v2.x API.- Throws:
IOException
-
PDFTextStream
public PDFTextStream(File pdfFile) throws IOException
Deprecated.Equivalent to the corresponding "open
" static method inPDF
, provided to ensure backwards compatibility with codebases using the PDFTextStream v2.x API.- Throws:
IOException
-
PDFTextStream
public PDFTextStream(String pdfFilePath) throws IOException
Deprecated.Equivalent to the corresponding "open
" static method inPDF
, provided to ensure backwards compatibility with codebases using the PDFTextStream v2.x API.- Throws:
IOException
-
PDFTextStream
public PDFTextStream(InputStream is, String pdfName, byte[] userPasswd, Configuration config) throws IOException
Deprecated.Equivalent to the corresponding "open
" static method inPDF
, provided to ensure backwards compatibility with codebases using the PDFTextStream v2.x API.- Throws:
IOException
-
PDFTextStream
public PDFTextStream(InputStream is, String pdfName, byte[] userPasswd) throws IOException
Deprecated.Equivalent to the corresponding "open
" static method inPDF
, provided to ensure backwards compatibility with codebases using the PDFTextStream v2.x API.- Throws:
IOException
-
PDFTextStream
public PDFTextStream(File pdfFile, byte[] userPasswd, Configuration config) throws IOException
Deprecated.Equivalent to the corresponding "open
" static method inPDF
, provided to ensure backwards compatibility with codebases using the PDFTextStream v2.x API.- Throws:
IOException
-
PDFTextStream
public PDFTextStream(String pdfFilePath, byte[] userPasswd, Configuration config) throws IOException
Deprecated.Equivalent to the corresponding "open
" static method inPDF
, provided to ensure backwards compatibility with codebases using the PDFTextStream v2.x API.- Throws:
IOException
-
PDFTextStream
public PDFTextStream(File pdfFile, byte[] userPasswd) throws IOException
Deprecated.Equivalent to the corresponding "open
" static method inPDF
, provided to ensure backwards compatibility with codebases using the PDFTextStream v2.x API.- Throws:
IOException
-
PDFTextStream
public PDFTextStream(String pdfFilePath, byte[] userPasswd) throws IOException
Deprecated.Equivalent to the corresponding "open
" static method inPDF
, provided to ensure backwards compatibility with codebases using the PDFTextStream v2.x API.- Throws:
IOException
-
PDFTextStream
public PDFTextStream(ByteBuffer pdfData, String pdfName, byte[] userPasswd, Configuration config) throws IOException
Deprecated.Equivalent to the corresponding "open
" static method inPDF
, provided to ensure backwards compatibility with codebases using the PDFTextStream v2.x API.- Throws:
IOException
-
PDFTextStream
public PDFTextStream(ByteBuffer pdfData, String pdfName, byte[] userPasswd) throws IOException
Deprecated.Equivalent to the corresponding "open
" static method inPDF
, provided to ensure backwards compatibility with codebases using the PDFTextStream v2.x API.- Throws:
IOException
-
PDFTextStream
public PDFTextStream(ByteBuffer pdfData, String pdfName) throws IOException
Deprecated.Equivalent to the corresponding "open
" static method inPDF
, provided to ensure backwards compatibility with codebases using the PDFTextStream v2.x API.- Throws:
IOException
-
-
Method Detail
-
loadLicense
public static boolean loadLicense(String path)
Deprecated.Retained to maintain PDFTextStream v2.x API compatibility. Use(String)
instead.
-
loadLicense
public static boolean loadLicense(URL licenseLocation)
Deprecated.Retained to maintain PDFTextStream v2.x API compatibility. UsePDF.loadLicense(java.net.URL)
instead.
-
isLicensed
public static boolean isLicensed()
Deprecated.Retained to maintain PDFTextStream v2.x API compatibility. Use()
instead.
-
setConfig
public void setConfig(Configuration config)
Deprecated.Description copied from interface:Document
Sets theConfiguration
instance that thisDocument
will use in various contexts to govern its operation.Note that certain configuration options are utilized only when a
Document
is being opened. In order for non-default settings for those such options to take effect, a customizedConfiguration
object must either be set as thedefault configuration
, or must be provided to any of thecom.snowtide.PDF.open()
static methods that accept aConfiguration
object, e.g.PDF.open(java.io.File, byte[], Configuration)
.
-
getConfig
public Configuration getConfig()
Deprecated.Description copied from interface:Document
Returns theConfiguration
instance that thisDocument
is using to govern its operation.
-
pipe
public void pipe(OutputHandler handler) throws IOException
Deprecated.Description copied from interface:OutputSource
Equivalent toOutputSource.pipe(OutputHandler, Direction)
, with aDirection
ofConfiguration#getBaseDirection()
.- Specified by:
pipe
in interfaceOutputSource
- Throws:
IOException
-
pipe
public void pipe(OutputHandler handler, Direction bd) throws IOException
Deprecated.Description copied from interface:OutputSource
Extracts all available text from this
OutputSource
, sending all PDF text events to the givenOutputHandler
, with the specifiedDirection
.If no special PDF text event handling is needed (i.e. you just want a straight text extract), then using an
OutputTarget
is recommended.- Specified by:
pipe
in interfaceOutputSource
- Parameters:
handler
- an OutputHandler instance.bd
- theDirection
that should be used to disambiguate the order in which extracted text is emitted- Throws:
IOException
- if an error occurs during the extraction process- See Also:
OutputHandler
,OutputTarget
-
getImages
public Collection<Image> getImages() throws IOException
Deprecated.Description copied from interface:Document
- Specified by:
getImages
in interfaceDocument
- Throws:
IOException
- if an error occurs during the extraction process
-
getPdfFileSize
public long getPdfFileSize()
Deprecated.Description copied from interface:Document
Returns the size of the PDF file being read, in bytes.- Specified by:
getPdfFileSize
in interfaceDocument
-
getPageCnt
public int getPageCnt()
Deprecated.Description copied from interface:Document
Returns the number of pages in the PDF document.- Specified by:
getPageCnt
in interfaceDocument
-
getPage
public Page getPage(int n) throws IOException
Deprecated.Description copied from interface:Document
Reads and returns a single page. Page numbers are zero-indexed; they do not necessarily correspond with any reader-visible page number.- Specified by:
getPage
in interfaceDocument
- Parameters:
n
- the number of the page to retrieve.- Throws:
IOException
- if an error occurs while preparing the Page for use
-
getName
public String getName()
Deprecated.Description copied from interface:Document
Returns the name of the PDF that thisDocument
is reading; this will be either the name of the PDF file that is being read, or thepdfName
String that was provided if thisDocument
was opened using one of thecom.snowtide.PDF.open()
methods that accepts anInputStream
orByteBuffer
, e.g.PDF.open(java.io.InputStream, String)
Nearly all of the logging messages generated by PDFxStream include the relevant
Document
's name, making them easier to interpret in a multithreaded production environment.
-
getPDFFile
public File getPDFFile()
Deprecated.Description copied from interface:Document
Returns a reference to the file that thisDocument
is processing. This reference may be null if theDocument
instance is not reading from aFile
orInputStream
.- Specified by:
getPDFFile
in interfaceDocument
-
close
public void close() throws IOException
Deprecated.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Throws:
IOException
-
getFormData
public Form getFormData() throws IOException
Deprecated.Description copied from interface:Document
Loads the form data contained in the current document, and returns aForm
object that represents that data. If the current PDF contains no forms, this function returns null. TheForm
instance that is returned by this function is guaranteed to be anAcroForm
.This function MUST NOT be called after this
Document
isclosed
.- Specified by:
getFormData
in interfaceDocument
- Throws:
IOException
- if an error occurs loading the form data
-
getEmbeddedFiles
public List<EmbeddedFile> getEmbeddedFiles() throws IOException
Deprecated.Description copied from interface:Document
Returns a list ofthe embedded files
associated with the source PDF document itself. UseDocument.getAllEmbeddedFiles()
to include all embedded files associated with annotations as well.- Specified by:
getEmbeddedFiles
in interfaceDocument
- Throws:
IOException
- if reading the embedded file metadata fails- See Also:
Document.getAllEmbeddedFiles()
-
getAllEmbeddedFiles
public List<EmbeddedFile> getAllEmbeddedFiles() throws IOException
Deprecated.Description copied from interface:Document
Returns a list of all ofthe embedded files
available in the source PDF. This method includes all files associated with annotations as well; if you only want those embedded files that are associated with the source document itself (and not annotations), useDocument.getEmbeddedFiles()
.- Specified by:
getAllEmbeddedFiles
in interfaceDocument
- Throws:
IOException
- if reading the embedded file metadata fails- See Also:
Document.getEmbeddedFiles()
-
getBookmarks
public Bookmark getBookmarks() throws IOException
Deprecated.Description copied from interface:Document
If the current PDF document contains a bookmark tree, this function will return its root node. If the document contains no bookmarks, this function will return null.An exception will be thrown if this function is called after this
Document
instance isclosed
.- Specified by:
getBookmarks
in interfaceDocument
- Throws:
IOException
- if an error occurs reading the bookmark tree- See Also:
Bookmark
-
getAnnotations
public List<Annotation> getAnnotations(int page) throws IOException
Deprecated.Description copied from interface:Document
Returns a List of all annotations found on the page indicated by the given page number; each object will be an instance of a class that implements theAnnotation
interface.This function will never return null; if a page contains no annotations, an empty list will be returned. The returned list is guaranteed to offer efficient random access to its elements.
- Specified by:
getAnnotations
in interfaceDocument
- Throws:
IOException
- if an error occurs retrieving the annotation data- See Also:
Annotation
-
getAllAnnotations
public List<Annotation> getAllAnnotations() throws IOException
Deprecated.Description copied from interface:Document
Returns a list containing all of theAnnotation
s contained in the current PDF document. The returned list is guaranteed to offer efficient random access to its elements.- Specified by:
getAllAnnotations
in interfaceDocument
- Throws:
IOException
- if an error occurs retrieving the annotation data
-
getAllAnnotations
public int getAllAnnotations(List tgt) throws IOException
Deprecated.Description copied from interface:Document
Adds to the given List all of theAnnotation
s contained in the current PDF document.- Specified by:
getAllAnnotations
in interfaceDocument
- Returns:
- the number of annotations added to the list
- Throws:
IOException
- if an error occurs retrieving the annotation data- See Also:
Annotation
-
getPDFVersion
public PDFVersion getPDFVersion() throws IOException
Deprecated.Description copied from interface:Document
Returns the
PDFVersion
instance that corresponds with the version of the PDF file specification to which current PDF file adheres. PDF specification version numbers correspond directly with particular versions of Adobe Acrobat:- v1.0 - Acrobat 1
- v1.1 - Acrobat 2
- v1.2 - Acrobat 3
- v1.3 - Acrobat 4
- v1.4 - Acrobat 5
- v1.5 - Acrobat 6
- v1.6 - Acrobat 7
- v1.7 - Acrobat 8+
This method may not be called after the
Document
isclosed
.- Specified by:
getPDFVersion
in interfaceDocument
- Throws:
IOException
- if an error occurs in determining what the PDF file's version is
-
getEncryptionInfo
public EncryptionInfo getEncryptionInfo()
Deprecated.Description copied from interface:Document
Returns an EncryptionInfo object, which provides access to some of the parameters used for the current PDF document's encryption.If the current PDF document is not encrypted, this method will return null.
- Specified by:
getEncryptionInfo
in interfaceDocument
-
getXmlMetadata
public byte[] getXmlMetadata() throws IOException
Deprecated.Description copied from interface:Document
Returns the XML metadata available from this
Document
, or null if no XML metadata is available.Note: This method must be called before the
Document
is closed, and it should not be called while text is being actively read out of it. (Supporting such concurrency would require synchronization that would negatively impact performance.) Therefore, the best times to call this method are:- just after opening the
Document
but before reading text out of it - after all text has been read out of the
Document
, but before it is closed
PDFxStream does not control the content returned by this method -- it just provides access to the data that is already stored in a PDF document. The schema of the the returned XML data is defined by Adobe, and is called the Extensible Metadata Platform (XMP). More information about XMP can be found on Adobe's website
- Specified by:
getXmlMetadata
in interfaceDocument
- Throws:
IOException
- if thisDocument
has already been closed, or if an error occurs retrieving the XML metadata.
- just after opening the
-
getAttribute
public Object getAttribute(String attrName) throws IOException
Deprecated.Description copied from interface:Document
Returns the value of the specified document-level metadata attribute.All of the standard attribute names are defined in constants in this class, and are all prefixed with 'ATTR_'. A few notes should be kept in mind when accessing attribute values:
- It is typical for only a subset of the possible attributes to be defined in a PDF document. Any attributes that are undefined will return a null value when their name is provided to this method.
- Many more attributes are used in the real world than are formally specified by the
PDF specification. It is entirely up to the PDF generator what attributes are to be
outputted for a particular document, so some documents may contain attributes
whose names are not canonicalized in the 'ATTR_' constants in this class. You can use the
getAttributeKeys()
method to get a Set of the names of all available attributes. - Most attribute values are Strings, but it is possible for attribute values to be Integers, Booleans, etc.
The documentation associated with each attribute name constant in this class
specifies what type may be expected when retrieving each particular attribute value. Any attributes
specified as dates are returned from this method as String instances; these can be passed through
parseDateString(String)
to get a Date object.
Note: the attributes available through this method are retrieved from the "classic" document /Info entry. The document metadata in an XML format (which typically contains the same set of metadata attributes that are available through this method) may be obtained via the
getXmlMetadata()
method.- Specified by:
getAttribute
in interfaceDocument
- Parameters:
attrName
- the name of the attribute to be retrieved- Returns:
- the value of the attribute with the given name defined in the PDF document being read, or null if no attribute is available with the given name. The type of this object depends upon which attribute is being retrieved, and is noted in the documentation of the attribute name constants held by this class.
- Throws:
IOException
- if an error occurs while retrieving the PDF document's metadata- See Also:
getXmlMetadata() for access to the XML-formatted document metadata
-
getAttributeKeys
public Set<String> getAttributeKeys() throws IOException
Deprecated.Description copied from interface:Document
Returns aSet
containing the keys of all available document metadata attributes.- Specified by:
getAttributeKeys
in interfaceDocument
- Throws:
IOException
- if an error occurs while retrieving the PDF document's metadata
-
getAttributeMap
public Map<String,Object> getAttributeMap() throws IOException
Deprecated.Description copied from interface:Document
Returns aMap
containing a copy of all keys and values of all available document metadata attributes.- Specified by:
getAttributeMap
in interfaceDocument
- Throws:
IOException
- if an error occurs while retrieving the PDF document's metadata
-
-