public class PDFTextStreamConfig
extends java.lang.Object
Various configuration options for PDFTextStream may be set using this class. A custom configuration may be registered with PDFTextStream in any of three ways:
PDFTextStreamConfig
instance is modified (by retrieving and directly changing the current
default instance
or by creating a new PDFTextStreamConfig
instance,
modifying it as desired, and setting it
as the new default instance.PDFTextStreamConfig
instance to one of the PDFTextStream
constructors: PDFTextStream.PDFTextStream(java.io.File, byte[], PDFTextStreamConfig)
,
PDFTextStream.PDFTextStream(java.nio.ByteBuffer, String, byte[], PDFTextStreamConfig)
, or
PDFTextStream.PDFTextStream(java.io.InputStream, String, byte[], PDFTextStreamConfig)
.PDFTextStream
instance via the
PDFTextStream.setConfig(PDFTextStreamConfig)
function. Note that certain configuration properties
are utilized only during PDFTextStream
initialization (such as isMemoryMappingEnabled()
), so
using PDFTextStream.setConfig(PDFTextStreamConfig)
will not prevent PDFTextStream from using
the default settings for such properties at initialization-time.Constructor and Description |
---|
PDFTextStreamConfig() |
PDFTextStreamConfig(PDFTextStreamConfig other)
Creates a copy of the given PDFTextStreamConfig instance.
|
Modifier and Type | Method and Description |
---|---|
static PDFTextStreamConfig |
getDefaultConfig()
Returns the configuration that new
PDFTextStream instances use by default (which is settable
via setDefaultConfig(PDFTextStreamConfig) . |
java.lang.String |
getLinebreakString()
Returns the string that
OutputTarget (and its subclasses) output for each linebreak identified in
extracted PDF content. |
int |
getMinTableCellCount()
Returns the minimum number of adjacent cells that must be present in order
for PDFTextStream to recognize those cells collectively as a
Table . |
static boolean |
isCJKSupportEnabled()
Returns true if this configuration will cause PDFTextStream to extract and decode Chinese, Japanese,
and Korean content.
|
boolean |
isDeriveType3Fonts()
Returns true if this configuration will cause PDFTextStream to derive the Unicode encodings of Type3
PDF fonts.
|
boolean |
isImplicitLineDetectionEnabled() |
boolean |
isMemoryMappingEnabled()
Deprecated.
Memory-mapping of opened PDF files is disabled by default, and will be removed
as an option in future PDFTextStream releases.
|
boolean |
isStripXFAFormDataEnabled() |
boolean |
isTableDetectionEnabled()
Returns true only if
Table detection is enabled; defaults to true. |
static void |
setCJKSupportEnabled(boolean enableCJK)
Changes the setting that controls whether or not PDFTextStream extracts and decodes Chinese, Japanese,
and Korean content.
|
static void |
setDefaultConfig(PDFTextStreamConfig defaultConfig)
Sets the configuration that new
PDFTextStream instances use by default. |
void |
setDeriveType3Fonts(boolean deriveType3Fonts)
Changes the setting that controls whether or not PDFTextStream derives the Unicode encodings of Type3
PDF fonts.
|
void |
setImplicitLineDetectionEnabled(boolean detectImplicitLines) |
void |
setLinebreakString(java.lang.String linebreak)
Sets the string that
OutputTarget (and its subclasses) output for each linebreak identified in
extracted PDF content. |
void |
setMemoryMappingEnabled(boolean memoryMappingEnabled)
Deprecated.
Memory-mapping of opened PDF files is disabled by default, and will be removed
as an option in future PDFTextStream releases.
|
void |
setMinTableCellCount(int minTableCellCount)
Changes the setting that controls the minimum number of adjacent cells that must be present in order
for PDFTextStream to recognize those cells collectively as a
Table . |
void |
setStripXFAFormDataEnabled(boolean stripXFAFormData) |
void |
setTableDetectionEnabled(boolean detectTables)
Sets whether or not
Table detection is enabled. |
java.lang.String |
toString() |
public PDFTextStreamConfig(PDFTextStreamConfig other)
public PDFTextStreamConfig()
public static PDFTextStreamConfig getDefaultConfig()
PDFTextStream
instances use by default (which is settable
via setDefaultConfig(PDFTextStreamConfig)
.public static void setDefaultConfig(PDFTextStreamConfig defaultConfig)
PDFTextStream
instances use by default.public java.lang.String toString()
toString
in class java.lang.Object
public boolean isTableDetectionEnabled()
Table
detection is enabled; defaults to true.public void setTableDetectionEnabled(boolean detectTables)
Table
detection is enabled.public boolean isStripXFAFormDataEnabled()
public void setStripXFAFormDataEnabled(boolean stripXFAFormData)
public int getMinTableCellCount()
Table
. This setting defaults
to 4.public void setMinTableCellCount(int minTableCellCount)
Table
. This setting defaults
to 4.public boolean isMemoryMappingEnabled()
public void setMemoryMappingEnabled(boolean memoryMappingEnabled)
public boolean isImplicitLineDetectionEnabled()
public void setImplicitLineDetectionEnabled(boolean detectImplicitLines)
public static boolean isCJKSupportEnabled()
public static void setCJKSupportEnabled(boolean enableCJK)
public boolean isDeriveType3Fonts()
public void setDeriveType3Fonts(boolean deriveType3Fonts)
public java.lang.String getLinebreakString()
OutputTarget
(and its subclasses) output for each linebreak identified in
extracted PDF content. This value defaults to the current platform's line break string, as identified
by the line.separator
system property.public void setLinebreakString(java.lang.String linebreak)
OutputTarget
(and its subclasses) output for each linebreak identified in
extracted PDF content. This value defaults to the current platform's line break string, as identified
by the line.separator
system property.