public class PDFDocumentFactory
extends java.lang.Object
This class allows the functionality of PDFTextStream to be dropped into a Lucene environment seamlessly (versions 1.2, 1.3, 1.4, 1.9, 2.0, 2.1, and 2.2 of Lucene are supported; a corresponding Lucene library jar must be any application's classpath that needs to use this class).
Typical usage would be to create a new DocumentFactoryConfig
object, configure it as desired, and pass that object into this class
along with a File object (pointing to a PDF file), InputStream (providing a stream of PDF data),
or a pre-existing PDFTextStream.
Methods also exist for building a Lucene Document instance without
using the DocumentFactoryConfig class, but this results in a direct dump of the content and document
properties of a PDF file according to the default settings of DocumentFactoryConfig.
This makes little sense in most environments, where the default names of PDF
document properties are unlikely to match the names of the corresponding Lucene Fields for those
document properties. See DocumentFactoryConfig
for details of
the default configuration of instances of that class.
Constructor and Description |
---|
PDFDocumentFactory() |
Modifier and Type | Method and Description |
---|---|
static org.apache.lucene.document.Document |
buildPDFDocument(java.nio.ByteBuffer pdfData,
java.lang.String pdfName)
Creates a new Lucene Document instance based on the PDF document data provided by the given
ByteBuffer and a default set of configuration parameters.
|
static org.apache.lucene.document.Document |
buildPDFDocument(java.nio.ByteBuffer pdfData,
java.lang.String pdfName,
DocumentFactoryConfig config)
Creates a new Lucene Document instance based on the PDF document data provided by the given
ByteBuffer and the custom set of configuration parameters specified in the given DocumentFactoryConfig instance.
|
static org.apache.lucene.document.Document |
buildPDFDocument(java.io.File pdfFile)
Creates a new Lucene Document instance based on the contents of the given PDF file reference
and a default set of configuration parameters.
|
static org.apache.lucene.document.Document |
buildPDFDocument(java.io.File pdfFile,
DocumentFactoryConfig config)
Creates a new Lucene Document instance based on the contents of the given PDF file reference
and the custom set of configuration parameters specified in the given DocumentFactoryConfig instance.
|
static org.apache.lucene.document.Document |
buildPDFDocument(java.io.InputStream pdfData,
java.lang.String pdfName)
Creates a new Lucene Document instance based on the PDF document data provided by the given
InputStream and a default set of configuration parameters.
|
static org.apache.lucene.document.Document |
buildPDFDocument(java.io.InputStream pdfData,
java.lang.String pdfName,
DocumentFactoryConfig config)
Creates a new Lucene Document instance based on the PDF document data provided by the given
InputStream and the custom set of configuration parameters specified in the given DocumentFactoryConfig instance.
|
static org.apache.lucene.document.Document |
buildPDFDocument(PDFTextStream stream,
DocumentFactoryConfig config)
Creates a new Lucene Document instance using the output of the already-created PDFTextStream instance provided
and the custom set of configuration parameters specified in the given DocumentFactoryConfig instance.
|
public static org.apache.lucene.document.Document buildPDFDocument(java.io.File pdfFile) throws java.io.IOException
java.io.IOException
public static org.apache.lucene.document.Document buildPDFDocument(java.io.InputStream pdfData, java.lang.String pdfName) throws java.io.IOException
pdfName
- - The name of the PDF document whose data is provided by the InputStream --
used only for identification of logged events in connection with parsing the PDF data.java.io.IOException
public static org.apache.lucene.document.Document buildPDFDocument(java.io.File pdfFile, DocumentFactoryConfig config) throws java.io.IOException
java.io.IOException
public static org.apache.lucene.document.Document buildPDFDocument(java.io.InputStream pdfData, java.lang.String pdfName, DocumentFactoryConfig config) throws java.io.IOException
pdfName
- - The name of the PDF document whose data is provided by the InputStream --
used only for identification of logged events in connection with parsing the PDF data.java.io.IOException
public static org.apache.lucene.document.Document buildPDFDocument(java.nio.ByteBuffer pdfData, java.lang.String pdfName) throws java.io.IOException
pdfName
- - The name of the PDF document whose data is provided by the ByteBuffer --
used only for identification of logged events in connection with parsing the PDF data.java.io.IOException
public static org.apache.lucene.document.Document buildPDFDocument(java.nio.ByteBuffer pdfData, java.lang.String pdfName, DocumentFactoryConfig config) throws java.io.IOException
pdfName
- - The name of the PDF document whose data is provided by the ByteBuffer --
used only for identification of logged events in connection with parsing the PDF data.java.io.IOException
public static org.apache.lucene.document.Document buildPDFDocument(PDFTextStream stream, DocumentFactoryConfig config) throws java.io.IOException
java.io.IOException