public class OutputTarget extends OutputHandler
This is a base OutputHandler
implementation that directs all text extraction output to an
Appendable
of your choice, e.g. a Writer
,
StringBuilder
, CharBuffer
, and so on.
This is the ideal place to start when building a custom OutputHandler
implementation. See the XMLOutputTarget
class as an example
of how this can be done.
Please note that OutputTarget
makes no attempt to retain the
visual layout or formatting of the text extracted from PDF documents. It
is focused on:
If your application requires PDF text extracts that retain the visual appearance of the text as it
is laid out on each page, then VisualOutputTarget
would be more suitable.
Example usage:
java.io.StringWriter text = new java.io.StringWriter(1024); OutputTarget tgt = new OutputTarget(text); Document pdf = com.snowtide.PDF.open(); pdf.pipe(tgt); pdf.close(); // do something with the extracted text... processText(text.toString());
Document.pipe(OutputHandler)
,
Page.pipe(OutputHandler)
,
Block.pipe(OutputHandler)
,
Line.pipe(OutputHandler)
Modifier and Type | Field and Description |
---|---|
protected java.lang.Appendable |
sink |
Constructor and Description |
---|
OutputTarget(java.lang.Appendable sink)
Creates a new OutputTarget that directs output to the given
Appendable instance. |
Modifier and Type | Method and Description |
---|---|
void |
endPage(Page page)
Invoked when PDFxStream has finished processing a page
|
static OutputTarget |
forBuffer(java.lang.Appendable sb)
Deprecated.
|
static OutputTarget |
forWriter(java.io.Writer w)
Deprecated.
use
OutputTarget.OutputTarget(Appendable) instead |
Configuration |
getConfig()
Returns the
Configuration instance that this OutputTarget is currently using. |
java.lang.Appendable |
getObject()
Returns the
Appendable used to create this OutputTarget . |
void |
linebreaks(int linebreakCnt)
Default implementation that writes specified number of line breaks (using the
linebreak String provided by
the current configuration ) to the Writer or Appendable
object that this OutputTarget wraps. |
void |
setConfig(Configuration config)
Sets the
Configuration instance this OutputTarget should use. |
void |
spaces(int spaceCnt)
Default implementation that writes specified number of spaces
to the
Writer or Appendable
object that this OutputTarget wraps. |
void |
startPage(Page page)
Invoked when a page is about to be processed.
|
void |
textUnit(TextUnit tu)
Default implementation that writes the character run specified by the given
TextUnit instance to the Appendable held by this OutputTarget . |
void |
write(char c)
Writes the provided character to the
Appendable used to create this
OutputTarget . |
void |
write(char[] buf)
Writes the provided character data to the
Appendable used to create this
OutputTarget . |
void |
write(char[] buf,
int start,
int len)
Writes the provided character data to the
Appendable used to create this
OutputTarget . |
void |
write(java.lang.CharSequence sb)
Writes the provided
CharSequence 's character data to the Appendable used to create this
OutputTarget . |
endBlock, endLine, endPDF, startBlock, startLine, startPDF
public OutputTarget(java.lang.Appendable sink)
Appendable
instance.public static OutputTarget forWriter(java.io.Writer w)
OutputTarget.OutputTarget(Appendable)
insteadWriter
instance.public static OutputTarget forBuffer(java.lang.Appendable sb)
OutputTarget.OutputTarget(Appendable)
Appendable
instance.public void write(java.lang.CharSequence sb)
CharSequence
's character data to the Appendable
used to create this
OutputTarget
.java.io.IOException
- if an error occurs writing the character data; only possible
in connection with an OutputTarget instance that wraps a Writer
instance.public void write(char[] buf, int start, int len)
Appendable
used to create this
OutputTarget
.java.io.IOException
- if an error occurs writing the character data; only possible
in connection with an OutputTarget instance that wraps a Writer
instance.public final void write(char[] buf)
Appendable
used to create this
OutputTarget
.java.io.IOException
- if an error occurs writing the character data; only possible
in connection with an OutputTarget instance that wraps a Writer
instance.public void write(char c)
Appendable
used to create this
OutputTarget
.java.io.IOException
- if an error occurs writing the character; only possible
in connection with an OutputTarget instance that wraps a Writer
instance.public java.lang.Appendable getObject()
Appendable
used to create this OutputTarget
.public void textUnit(TextUnit tu)
Default implementation that writes the character run specified by the given
TextUnit
instance to the Appendable
held by this OutputTarget
.
This implementation is very straightforward; it is provided here for illustrative purposes only:
if (tu.getCharacterSequence() == null) { // no mapped sequence, append direct character code conversion int cc = tu.getCharCode(); if (cc >= 32) write((char)cc); } else { write(tu.getCharacterSequence()); }
textUnit
in class OutputHandler
public void spaces(int spaceCnt)
Writer
or Appendable
object that this OutputTarget wraps.spaces
in class OutputHandler
spaceCnt
- the number of spaces that PDFxStream
recommends should be outputtedpublic void linebreaks(int linebreakCnt)
linebreak String provided by
the current configuration
) to the Writer
or Appendable
object that this OutputTarget wraps.linebreaks
in class OutputHandler
linebreakCnt
- the number of line breaks that PDFxStream
recommends should be outputtedpublic void startPage(Page page)
OutputHandler
startPage
in class OutputHandler
page
- a reference to the Page
that is about to be processedpublic void endPage(Page page)
OutputHandler
endPage
in class OutputHandler
page
- a reference to the Page
that has been processedpublic Configuration getConfig()
Returns the Configuration
instance that this OutputTarget
is currently using.
Unless an OutputTarget
instance is explicitly provided with a particular configuration
via OutputTarget.setConfig(Configuration)
, it will synchronize its configuration with the configuration of
a Document
instance any time an OutputTarget
is provided to either
Document.pipe(OutputHandler)
or Page.pipe(OutputHandler)
.
If an OutputTarget
is to be used to pipe content only from
Block contexts
, then it will use the default PDFxStreamConfig
instance
until a different configuration is set via OutputTarget.setConfig(Configuration)
.
public void setConfig(Configuration config)
Configuration
instance this OutputTarget
should use. Once this
OutputTarget
instance's configuration is set using this function, it will cease to synchronize its configuration
with the configuration provided by Document
and Page
objects from which it is used to pipe content.