com.snowtide.pdf.lucene
Class DocumentFactoryConfig

java.lang.Object
  extended by com.snowtide.pdf.lucene.DocumentFactoryConfig

public class DocumentFactoryConfig
extends java.lang.Object

Instances of this class are used to control the creation of Lucene Documents from PDF content through the PDFDocumentFactory class. Typical usage of this class would be to create a static instance, configure it as desired, and use that static instance whenever a new Lucene Document instance is built through the PDFDocumentFactory class.

Version:
©2004-2012 Snowtide Informatics Systems, Inc.

Field Summary
static java.lang.String DEFAULT_MAIN_TEXT_FIELD_NAME
          The default name assigned to the Lucene Field containing the main body of text extracted from a PDF file.
 
Constructor Summary
DocumentFactoryConfig()
          Creates a new config object.
DocumentFactoryConfig(java.lang.String mainTextFieldName)
          Creates a new config object.
 
Method Summary
 boolean copyAllPDFAttrs()
          Returns true if this object is configured to ensure that all PDF document attributes will be added to generated Lucene Documents, even if no attribute name / field name mapping has been established in this config object.
 java.lang.String getFieldName(java.lang.String pdfAttrName)
          Returns the name that will be assigned to Fields containing the value of the PDF document attribute identified within the PDF document by the provided attribute name.
 java.lang.String getMainTextFieldName()
          Returns the name that will be assigned to the field that holds the main body text of each PDF document converted into a Lucene Document instance through the PDFDocumentFactory class using this config object.
 boolean indexMainText()
          Returns true if the main body text of Lucene Documents created through PDFDocumentFactory using this config object will be indexed.
 boolean indexPDFAttrs()
          Returns true if the metadata attributes of Lucene Documents created through PDFDocumentFactory using this config object will be indexed.
 java.util.Set pdfAttrNames()
          Returns a Set of the PDF document attribute names that are mapped to Field names in this config object.
 void setCopyAllPDFAttrs(boolean b)
          Setter corresponding to the copyAllPDFAttrs attribute.
 void setFieldName(java.lang.String pdfAttrName, java.lang.String fieldName)
          Sets the name that will be assigned to Fields corresponding to the provided PDF document attribute name.
 void setMainTextFieldName(java.lang.String mainTextFieldName)
          Sets the name that will be assigned to Lucene Fields containing the main text content of PDF's converted to Lucene Documents via the PDFDocumentFactory class.
 void setPDFAttrSettings(boolean store, boolean index, boolean token)
          Sets Field attributes that will be used when creating Field objects for the document attributes found in a PDF document.
 void setTextSettings(boolean store, boolean index, boolean token)
          Sets Field attributes that will be used when creating the Field object for the main text content of a PDF document.
 boolean storeMainText()
          Returns true if the main body text of Lucene Documents created through PDFDocumentFactory using this config object will be stored.
 boolean storePDFAttrs()
          Returns true if the metadata attributes of Lucene Documents created through PDFDocumentFactory using this config object will be stored.
 boolean tokenizeMainText()
          Returns true if the main body text of Lucene Documents created through PDFDocumentFactory using this config object will be tokenized.
 boolean tokenizePDFAttrs()
          Returns true if the metadata attributes of Lucene Documents created through PDFDocumentFactory using this config object will be tokenized.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_MAIN_TEXT_FIELD_NAME

public static final java.lang.String DEFAULT_MAIN_TEXT_FIELD_NAME
The default name assigned to the Lucene Field containing the main body of text extracted from a PDF file. Currently, set to be "text".

See Also:
Constant Field Values
Constructor Detail

DocumentFactoryConfig

public DocumentFactoryConfig(java.lang.String mainTextFieldName)
Creates a new config object. The resulting object retains the default configuration except for the name assigned to the Lucene Field that contains the main PDF text content.

Parameters:
mainTextFieldName - - the name that should be assigned to Fields containing the main PDF text content.

DocumentFactoryConfig

public DocumentFactoryConfig()
Creates a new config object. Fields containing the main text content of PDF's converted into Lucene Document instances will be assigned a default name. Other configuration defaults are as follows:

Method Detail

setMainTextFieldName

public void setMainTextFieldName(java.lang.String mainTextFieldName)
Sets the name that will be assigned to Lucene Fields containing the main text content of PDF's converted to Lucene Documents via the PDFDocumentFactory class.


getMainTextFieldName

public java.lang.String getMainTextFieldName()
Returns the name that will be assigned to the field that holds the main body text of each PDF document converted into a Lucene Document instance through the PDFDocumentFactory class using this config object.


getFieldName

public java.lang.String getFieldName(java.lang.String pdfAttrName)
Returns the name that will be assigned to Fields containing the value of the PDF document attribute identified within the PDF document by the provided attribute name.


pdfAttrNames

public java.util.Set pdfAttrNames()
Returns a Set of the PDF document attribute names that are mapped to Field names in this config object. All objects in the returned Set are strings.


setFieldName

public void setFieldName(java.lang.String pdfAttrName,
                         java.lang.String fieldName)
Sets the name that will be assigned to Fields corresponding to the provided PDF document attribute name.


copyAllPDFAttrs

public boolean copyAllPDFAttrs()
Returns true if this object is configured to ensure that all PDF document attributes will be added to generated Lucene Documents, even if no attribute name / field name mapping has been established in this config object.


setCopyAllPDFAttrs

public void setCopyAllPDFAttrs(boolean b)
Setter corresponding to the copyAllPDFAttrs attribute.

See Also:
copyAllPDFAttrs()

setTextSettings

public void setTextSettings(boolean store,
                            boolean index,
                            boolean token)
Sets Field attributes that will be used when creating the Field object for the main text content of a PDF document. These attributes correspond to the store, index, and token parameters of the Lucene Field constructor.


setPDFAttrSettings

public void setPDFAttrSettings(boolean store,
                               boolean index,
                               boolean token)
Sets Field attributes that will be used when creating Field objects for the document attributes found in a PDF document. These attributes correspond to the store, index, and token parameters of the Lucene Field constructor.


indexMainText

public boolean indexMainText()
Returns true if the main body text of Lucene Documents created through PDFDocumentFactory using this config object will be indexed.


storeMainText

public boolean storeMainText()
Returns true if the main body text of Lucene Documents created through PDFDocumentFactory using this config object will be stored.


tokenizeMainText

public boolean tokenizeMainText()
Returns true if the main body text of Lucene Documents created through PDFDocumentFactory using this config object will be tokenized.


indexPDFAttrs

public boolean indexPDFAttrs()
Returns true if the metadata attributes of Lucene Documents created through PDFDocumentFactory using this config object will be indexed.


storePDFAttrs

public boolean storePDFAttrs()
Returns true if the metadata attributes of Lucene Documents created through PDFDocumentFactory using this config object will be stored.


tokenizePDFAttrs

public boolean tokenizePDFAttrs()
Returns true if the metadata attributes of Lucene Documents created through PDFDocumentFactory using this config object will be tokenized.