public class OutputTarget extends OutputHandler
This is a base OutputHandler
implementation that provides a common output
interface for Writer
and Appendable
instances (such as
StringBuilder
s and StringBuffer
s), allowing
PDFTextStream to easily redirect output to either type of object. Not only does using
an OutputTarget simplify your code, but it also minimizes the internal buffering that
PDFTextStream might otherwise perform when being used as a java.io.Reader
implementation.
Note that since this class provides a baseline OutputHandler
implementation
that will direct all text content to the provided Writer or Appendable,
it is the ideal place to start when building a custom OutputHandler
implementation. See the XMLOutputTarget
class as an example
of how this can be done.
Finally, please note that OutputTarget
does not make any attempt to retain the
visual layout or formatting of the text extracted from PDF documents. This
OutputHandler
implementation is focused on:
If your application requires PDF text extracts that retain the visual appearance of the text as it
is laid out on each page, then VisualOutputTarget
would be more suitable.
Example usage:
StringBuilder sb = new StringBuilder(1024); OutputTarget tgt = new OutputTarget(sb); PDFTextStream stream = new PDFTextStream(); stream.pipe(tgt); stream.close(); // do something with the extracted text... processText(sb);
PDFTextStream.pipe(OutputHandler)
,
Page.pipe(OutputHandler)
,
Block.pipe(OutputHandler)
,
Line.pipe(OutputHandler)
Constructor and Description |
---|
OutputTarget(java.lang.Appendable sb)
Creates a new OutputTarget that directs output to the given
java.lang.Appendable instance. |
OutputTarget(java.io.Writer w)
Creates a new OutputTarget that directs output to the given
java.io.Writer instance. |
Modifier and Type | Method and Description |
---|---|
void |
endPage(Page page)
Invoked when PDFTextStream has finished processing a page
|
static OutputTarget |
forBuffer(java.lang.Appendable sb)
Deprecated.
|
static OutputTarget |
forWriter(java.io.Writer w)
Deprecated.
use
OutputTarget(Writer) instead |
PDFTextStreamConfig |
getConfig()
Returns the
PDFTextStreamConfig instance that this OutputTarget is currently using. |
java.lang.Object |
getObject()
Returns the output object that this instance wraps; will be an instance of either
java.io.Writer or java.lang.Appendable . |
void |
linebreaks(int linebreakCnt)
Default implementation that writes specified number of line breaks (using the
linebreak String provided by
the current configuration ) to the java.io.Writer or java.lang.Appendable
object that this OutputTarget wraps. |
void |
setConfig(PDFTextStreamConfig config)
Sets the
PDFTextStreamConfig instance this OutputTarget should use. |
void |
spaces(int spaceCnt)
Default implementation that writes specified number of spaces
to the
java.io.Writer or java.lang.Appendable
object that this OutputTarget wraps. |
void |
startPage(Page page)
Invoked when a page is about to be processed.
|
void |
textUnit(TextUnit tu)
Default implementation that writes the character run specified by the given
TextUnit instance to the java.io.Writer or java.lang.Appendable
object that this OutputTarget wraps. |
void |
write(char c)
Writes the provided character to the wrapped output object.
|
void |
write(char[] buf)
Writes the provided character data to the wrapped output object.
|
void |
write(char[] buf,
int start,
int len)
Writes the provided character data to the wrapped output object.
|
void |
write(java.lang.CharSequence sb)
Writes the provided CharSequence's character data to the wrapped output object.
|
void |
write(java.lang.String str)
Writes the provided String's character data to the wrapped output object.
|
endBlock, endLine, endPDF, startBlock, startLine, startPDF
public OutputTarget(java.io.Writer w)
java.io.Writer
instance.public OutputTarget(java.lang.Appendable sb)
java.lang.Appendable
instance.public static OutputTarget forWriter(java.io.Writer w)
OutputTarget(Writer)
insteadjava.io.Writer
instance.public static OutputTarget forBuffer(java.lang.Appendable sb)
OutputTarget(Appendable)
java.lang.Appendable
instance.public void write(java.lang.String str) throws java.io.IOException
java.io.IOException
- - if an error occurs writing the character data; only possible
in connection with an OutputTarget instance that wraps a java.io.Writer
instance.public void write(java.lang.CharSequence sb) throws java.io.IOException
java.io.IOException
- - if an error occurs writing the character data; only possible
in connection with an OutputTarget instance that wraps a java.io.Writer
instance.public void write(char[] buf, int start, int len) throws java.io.IOException
java.io.IOException
- - if an error occurs writing the character data; only possible
in connection with an OutputTarget instance that wraps a java.io.Writer
instance.public final void write(char[] buf) throws java.io.IOException
java.io.IOException
- - if an error occurs writing the character data; only possible
in connection with an OutputTarget instance that wraps a java.io.Writer
instance.public void write(char c) throws java.io.IOException
java.io.IOException
- - if an error occurs writing the character; only possible
in connection with an OutputTarget instance that wraps a java.io.Writer
instance.public java.lang.Object getObject()
java.io.Writer
or java.lang.Appendable
.public void textUnit(TextUnit tu)
Default implementation that writes the character run specified by the given
TextUnit
instance to the java.io.Writer
or java.lang.Appendable
object that this OutputTarget wraps.
This implementation is very straightforward; it is provided here for illustrative purposes only:
if (tu.getCharacterSequence() == null) { // no mapped sequence, append direct character code conversion int cc = tu.getCharCode(); if (cc >= 32) write((char)cc); } else { write(tu.getCharacterSequence()); }
textUnit
in class OutputHandler
public void spaces(int spaceCnt)
java.io.Writer
or java.lang.Appendable
object that this OutputTarget wraps.spaces
in class OutputHandler
spaceCnt
- - the number of spaces that PDFTextStream
recommends should be outputtedpublic void linebreaks(int linebreakCnt)
linebreak String provided by
the current configuration
) to the java.io.Writer
or java.lang.Appendable
object that this OutputTarget wraps.linebreaks
in class OutputHandler
linebreakCnt
- - the number of line breaks that PDFTextStream
recommends should be outputtedpublic void startPage(Page page)
OutputHandler
startPage
in class OutputHandler
page
- - a reference to the Page
that is about to be processedpublic void endPage(Page page)
OutputHandler
endPage
in class OutputHandler
page
- - a reference to the Page
that has been processedpublic PDFTextStreamConfig getConfig()
Returns the PDFTextStreamConfig
instance that this OutputTarget
is currently using.
Please note that unless an OutputTarget
instance is explicitly provided with a particular configuration
via setConfig(PDFTextStreamConfig)
, it will synchronize its configuration with the configuration of
a PDFTextStream
instance any time an OutputTarget
is provided to either
PDFTextStream.pipe(OutputHandler)
or Page.pipe(OutputHandler)
.
If an OutputTarget
is to be used to pipe content only from
Block contexts
, then it will use the default PDFTextStreamConfig
instance
until a different configuration is set via setConfig(PDFTextStreamConfig)
.
public void setConfig(PDFTextStreamConfig config)
PDFTextStreamConfig
instance this OutputTarget
should use. Once this
OutputTarget
instance's configuration is set using this function, it will cease to synchronize its configuration
with the configuration provided by PDFTextStream
and Page
objects from which it is used to pipe content.