public class DocumentFactoryConfig
extends java.lang.Object
PDFDocumentFactory
class.
Typical usage of this class would be to create a static instance, configure it as desired, and use
that static instance whenever a new Lucene Document instance is built through the PDFDocumentFactory
class.Modifier and Type | Field and Description |
---|---|
static java.lang.String |
DEFAULT_MAIN_TEXT_FIELD_NAME
The default name assigned to the Lucene Field containing the main body of text extracted from a PDF file.
|
Constructor and Description |
---|
DocumentFactoryConfig()
Creates a new config object.
|
DocumentFactoryConfig(java.lang.String mainTextFieldName)
Creates a new config object.
|
Modifier and Type | Method and Description |
---|---|
boolean |
copyAllPDFAttrs()
Returns true if this object is configured to ensure that all PDF document attributes will be added to
generated Lucene Documents, even if no attribute name / field name mapping has been established
in this config object.
|
java.lang.String |
getFieldName(java.lang.String pdfAttrName)
Returns the name that will be assigned to Fields containing the value of the PDF document attribute
identified within the PDF document by the provided attribute name.
|
java.lang.String |
getMainTextFieldName()
Returns the name that will be assigned to the field that holds the main body text of each PDF document converted
into a Lucene Document instance through the
PDFDocumentFactory class using this config object. |
boolean |
indexMainText()
Returns true if the main body text of Lucene Documents created through
PDFDocumentFactory using this config object will be indexed. |
boolean |
indexPDFAttrs()
Returns true if the metadata attributes of Lucene Documents created through
PDFDocumentFactory using this config object will be indexed. |
java.util.Set |
pdfAttrNames()
Returns a Set of the PDF document attribute names that are mapped to Field names in this config object.
|
void |
setCopyAllPDFAttrs(boolean b)
Setter corresponding to the
copyAllPDFAttrs attribute. |
void |
setFieldName(java.lang.String pdfAttrName,
java.lang.String fieldName)
Sets the name that will be assigned to Fields corresponding to the provided PDF document attribute
name.
|
void |
setMainTextFieldName(java.lang.String mainTextFieldName)
Sets the name that will be assigned to Lucene Fields containing the main text content of PDF's converted to
Lucene Documents via the PDFDocumentFactory class.
|
void |
setPDFAttrSettings(boolean store,
boolean index,
boolean token)
Sets Field attributes that will be used when creating Field objects for the document attributes found in
a PDF document.
|
void |
setTextSettings(boolean store,
boolean index,
boolean token)
Sets Field attributes that will be used when creating the Field object for the main text content of
a PDF document.
|
boolean |
storeMainText()
Returns true if the main body text of Lucene Documents created through
PDFDocumentFactory using this config object will be stored. |
boolean |
storePDFAttrs()
Returns true if the metadata attributes of Lucene Documents created through
PDFDocumentFactory using this config object will be stored. |
boolean |
tokenizeMainText()
Returns true if the main body text of Lucene Documents created through
PDFDocumentFactory using this config object will be tokenized. |
boolean |
tokenizePDFAttrs()
Returns true if the metadata attributes of Lucene Documents created through
PDFDocumentFactory using this config object will be tokenized. |
public static final java.lang.String DEFAULT_MAIN_TEXT_FIELD_NAME
public DocumentFactoryConfig(java.lang.String mainTextFieldName)
mainTextFieldName
- - the name that should be assigned to Fields containing
the main PDF text content.public DocumentFactoryConfig()
default name
. Other configuration defaults are as follows:
public void setMainTextFieldName(java.lang.String mainTextFieldName)
public java.lang.String getMainTextFieldName()
PDFDocumentFactory
class using this config object.public java.lang.String getFieldName(java.lang.String pdfAttrName)
public java.util.Set pdfAttrNames()
public void setFieldName(java.lang.String pdfAttrName, java.lang.String fieldName)
public boolean copyAllPDFAttrs()
public void setCopyAllPDFAttrs(boolean b)
copyAllPDFAttrs
attribute.copyAllPDFAttrs()
public void setTextSettings(boolean store, boolean index, boolean token)
store
,
index
, and token
parameters of the Lucene Field constructor.public void setPDFAttrSettings(boolean store, boolean index, boolean token)
store
,
index
, and token
parameters of the Lucene Field constructor.public boolean indexMainText()
PDFDocumentFactory
using this config object will be indexed.public boolean storeMainText()
PDFDocumentFactory
using this config object will be stored.public boolean tokenizeMainText()
PDFDocumentFactory
using this config object will be tokenized.public boolean indexPDFAttrs()
PDFDocumentFactory
using this config object will be indexed.public boolean storePDFAttrs()
PDFDocumentFactory
using this config object will be stored.public boolean tokenizePDFAttrs()
PDFDocumentFactory
using this config object will be tokenized.