|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcom.snowtide.pdf.lucene.DocumentFactoryConfig
public class DocumentFactoryConfig
Instances of this class are used to control the creation of Lucene Documents from PDF content
through the PDFDocumentFactory
class.
Typical usage of this class would be to create a static instance, configure it as desired, and use
that static instance whenever a new Lucene Document instance is built through the PDFDocumentFactory
class.
Field Summary | |
---|---|
static java.lang.String |
DEFAULT_MAIN_TEXT_FIELD_NAME
The default name assigned to the Lucene Field containing the main body of text extracted from a PDF file. |
Constructor Summary | |
---|---|
DocumentFactoryConfig()
Creates a new config object. |
|
DocumentFactoryConfig(java.lang.String mainTextFieldName)
Creates a new config object. |
Method Summary | |
---|---|
boolean |
copyAllPDFAttrs()
Returns true if this object is configured to ensure that all PDF document attributes will be added to generated Lucene Documents, even if no attribute name / field name mapping has been established in this config object. |
java.lang.String |
getFieldName(java.lang.String pdfAttrName)
Returns the name that will be assigned to Fields containing the value of the PDF document attribute identified within the PDF document by the provided attribute name. |
java.lang.String |
getMainTextFieldName()
Returns the name that will be assigned to the field that holds the main body text of each PDF document converted into a Lucene Document instance through the PDFDocumentFactory class using this config object. |
boolean |
indexMainText()
Returns true if the main body text of Lucene Documents created through PDFDocumentFactory using this config object will be indexed. |
boolean |
indexPDFAttrs()
Returns true if the metadata attributes of Lucene Documents created through PDFDocumentFactory using this config object will be indexed. |
java.util.Set |
pdfAttrNames()
Returns a Set of the PDF document attribute names that are mapped to Field names in this config object. |
void |
setCopyAllPDFAttrs(boolean b)
Setter corresponding to the copyAllPDFAttrs attribute. |
void |
setFieldName(java.lang.String pdfAttrName,
java.lang.String fieldName)
Sets the name that will be assigned to Fields corresponding to the provided PDF document attribute name. |
void |
setMainTextFieldName(java.lang.String mainTextFieldName)
Sets the name that will be assigned to Lucene Fields containing the main text content of PDF's converted to Lucene Documents via the PDFDocumentFactory class. |
void |
setPDFAttrSettings(boolean store,
boolean index,
boolean token)
Sets Field attributes that will be used when creating Field objects for the document attributes found in a PDF document. |
void |
setTextSettings(boolean store,
boolean index,
boolean token)
Sets Field attributes that will be used when creating the Field object for the main text content of a PDF document. |
boolean |
storeMainText()
Returns true if the main body text of Lucene Documents created through PDFDocumentFactory using this config object will be stored. |
boolean |
storePDFAttrs()
Returns true if the metadata attributes of Lucene Documents created through PDFDocumentFactory using this config object will be stored. |
boolean |
tokenizeMainText()
Returns true if the main body text of Lucene Documents created through PDFDocumentFactory using this config object will be tokenized. |
boolean |
tokenizePDFAttrs()
Returns true if the metadata attributes of Lucene Documents created through PDFDocumentFactory using this config object will be tokenized. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final java.lang.String DEFAULT_MAIN_TEXT_FIELD_NAME
Constructor Detail |
---|
public DocumentFactoryConfig(java.lang.String mainTextFieldName)
mainTextFieldName
- - the name that should be assigned to Fields containing
the main PDF text content.public DocumentFactoryConfig()
default name
. Other configuration defaults are as follows:
Method Detail |
---|
public void setMainTextFieldName(java.lang.String mainTextFieldName)
public java.lang.String getMainTextFieldName()
PDFDocumentFactory
class using this config object.
public java.lang.String getFieldName(java.lang.String pdfAttrName)
public java.util.Set pdfAttrNames()
public void setFieldName(java.lang.String pdfAttrName, java.lang.String fieldName)
public boolean copyAllPDFAttrs()
public void setCopyAllPDFAttrs(boolean b)
copyAllPDFAttrs
attribute.
copyAllPDFAttrs()
public void setTextSettings(boolean store, boolean index, boolean token)
store
,
index
, and token
parameters of the Lucene Field constructor.
public void setPDFAttrSettings(boolean store, boolean index, boolean token)
store
,
index
, and token
parameters of the Lucene Field constructor.
public boolean indexMainText()
PDFDocumentFactory
using this config object will be indexed.
public boolean storeMainText()
PDFDocumentFactory
using this config object will be stored.
public boolean tokenizeMainText()
PDFDocumentFactory
using this config object will be tokenized.
public boolean indexPDFAttrs()
PDFDocumentFactory
using this config object will be indexed.
public boolean storePDFAttrs()
PDFDocumentFactory
using this config object will be stored.
public boolean tokenizePDFAttrs()
PDFDocumentFactory
using this config object will be tokenized.
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |