com.snowtide.pdf.lucene
Class PDFDocumentFactory

java.lang.Object
  extended by com.snowtide.pdf.lucene.PDFDocumentFactory

public class PDFDocumentFactory
extends java.lang.Object

This class allows the functionality of PDFTextStream to be dropped into a Lucene environment seamlessly (versions 1.2, 1.3, 1.4, 1.9, 2.0, 2.1, and 2.2 of Lucene are supported; a corresponding Lucene library jar must be any application's classpath that needs to use this class).

Typical usage would be to create a new DocumentFactoryConfig object, configure it as desired, and pass that object into this class along with a File object (pointing to a PDF file), InputStream (providing a stream of PDF data), or a pre-existing PDFTextStream.

Methods also exist for building a Lucene Document instance without using the DocumentFactoryConfig class, but this results in a direct dump of the content and document properties of a PDF file according to the default settings of DocumentFactoryConfig. This makes little sense in most environments, where the default names of PDF document properties are unlikely to match the names of the corresponding Lucene Fields for those document properties. See DocumentFactoryConfig for details of the default configuration of instances of that class.

Version:
©2004-2012 Snowtide Informatics Systems, Inc.

Constructor Summary
PDFDocumentFactory()
           
 
Method Summary
static org.apache.lucene.document.Document buildPDFDocument(java.nio.ByteBuffer pdfData, java.lang.String pdfName)
          Creates a new Lucene Document instance based on the PDF document data provided by the given ByteBuffer and a default set of configuration parameters.
static org.apache.lucene.document.Document buildPDFDocument(java.nio.ByteBuffer pdfData, java.lang.String pdfName, DocumentFactoryConfig config)
          Creates a new Lucene Document instance based on the PDF document data provided by the given ByteBuffer and the custom set of configuration parameters specified in the given DocumentFactoryConfig instance.
static org.apache.lucene.document.Document buildPDFDocument(java.io.File pdfFile)
          Creates a new Lucene Document instance based on the contents of the given PDF file reference and a default set of configuration parameters.
static org.apache.lucene.document.Document buildPDFDocument(java.io.File pdfFile, DocumentFactoryConfig config)
          Creates a new Lucene Document instance based on the contents of the given PDF file reference and the custom set of configuration parameters specified in the given DocumentFactoryConfig instance.
static org.apache.lucene.document.Document buildPDFDocument(java.io.InputStream pdfData, java.lang.String pdfName)
          Creates a new Lucene Document instance based on the PDF document data provided by the given InputStream and a default set of configuration parameters.
static org.apache.lucene.document.Document buildPDFDocument(java.io.InputStream pdfData, java.lang.String pdfName, DocumentFactoryConfig config)
          Creates a new Lucene Document instance based on the PDF document data provided by the given InputStream and the custom set of configuration parameters specified in the given DocumentFactoryConfig instance.
static org.apache.lucene.document.Document buildPDFDocument(PDFTextStream stream, DocumentFactoryConfig config)
          Creates a new Lucene Document instance using the output of the already-created PDFTextStream instance provided and the custom set of configuration parameters specified in the given DocumentFactoryConfig instance.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PDFDocumentFactory

public PDFDocumentFactory()
Method Detail

buildPDFDocument

public static org.apache.lucene.document.Document buildPDFDocument(java.io.File pdfFile)
                                                            throws java.io.IOException
Creates a new Lucene Document instance based on the contents of the given PDF file reference and a default set of configuration parameters.

Throws:
java.io.IOException

buildPDFDocument

public static org.apache.lucene.document.Document buildPDFDocument(java.io.InputStream pdfData,
                                                                   java.lang.String pdfName)
                                                            throws java.io.IOException
Creates a new Lucene Document instance based on the PDF document data provided by the given InputStream and a default set of configuration parameters.

Parameters:
pdfName - - The name of the PDF document whose data is provided by the InputStream -- used only for identification of logged events in connection with parsing the PDF data.
Throws:
java.io.IOException

buildPDFDocument

public static org.apache.lucene.document.Document buildPDFDocument(java.io.File pdfFile,
                                                                   DocumentFactoryConfig config)
                                                            throws java.io.IOException
Creates a new Lucene Document instance based on the contents of the given PDF file reference and the custom set of configuration parameters specified in the given DocumentFactoryConfig instance.

Throws:
java.io.IOException

buildPDFDocument

public static org.apache.lucene.document.Document buildPDFDocument(java.io.InputStream pdfData,
                                                                   java.lang.String pdfName,
                                                                   DocumentFactoryConfig config)
                                                            throws java.io.IOException
Creates a new Lucene Document instance based on the PDF document data provided by the given InputStream and the custom set of configuration parameters specified in the given DocumentFactoryConfig instance.

Parameters:
pdfName - - The name of the PDF document whose data is provided by the InputStream -- used only for identification of logged events in connection with parsing the PDF data.
Throws:
java.io.IOException

buildPDFDocument

public static org.apache.lucene.document.Document buildPDFDocument(java.nio.ByteBuffer pdfData,
                                                                   java.lang.String pdfName)
                                                            throws java.io.IOException
Creates a new Lucene Document instance based on the PDF document data provided by the given ByteBuffer and a default set of configuration parameters.

Parameters:
pdfName - - The name of the PDF document whose data is provided by the ByteBuffer -- used only for identification of logged events in connection with parsing the PDF data.
Throws:
java.io.IOException
Since:
v2.1

buildPDFDocument

public static org.apache.lucene.document.Document buildPDFDocument(java.nio.ByteBuffer pdfData,
                                                                   java.lang.String pdfName,
                                                                   DocumentFactoryConfig config)
                                                            throws java.io.IOException
Creates a new Lucene Document instance based on the PDF document data provided by the given ByteBuffer and the custom set of configuration parameters specified in the given DocumentFactoryConfig instance.

Parameters:
pdfName - - The name of the PDF document whose data is provided by the ByteBuffer -- used only for identification of logged events in connection with parsing the PDF data.
Throws:
java.io.IOException
Since:
v2.1

buildPDFDocument

public static org.apache.lucene.document.Document buildPDFDocument(PDFTextStream stream,
                                                                   DocumentFactoryConfig config)
                                                            throws java.io.IOException
Creates a new Lucene Document instance using the output of the already-created PDFTextStream instance provided and the custom set of configuration parameters specified in the given DocumentFactoryConfig instance.

Throws:
java.io.IOException