Class OutputTarget
- java.lang.Object
-
- com.snowtide.pdf.OutputHandler
-
- com.snowtide.pdf.OutputTarget
-
- Direct Known Subclasses:
SelectionOutputTarget
public class OutputTarget extends OutputHandler
This is a base
OutputHandler
implementation that directs all text extraction output to anAppendable
of your choice, e.g. aWriter
,StringBuilder
,CharBuffer
, and so on.This is the ideal place to start when building a custom
OutputHandler
implementation. See theXMLOutputTarget
class as an example of how this can be done.Please note that
OutputTarget
makes no attempt to retain the visual layout or formatting of the text extracted from PDF documents. It is focused on:- Maximum performance
- Ensuring that extracted PDF text is yielded with the proper segmentation and read-ordering, for the benefit of most users whose applications are sensitive to the semantic ordering of the PDF content. This includes search, text analytics, summarization tools, and similar applications.
If your application requires PDF text extracts that retain the visual appearance of the text as it is laid out on each page, then
VisualOutputTarget
would be more suitable.Example usage:
java.io.StringWriter text = new java.io.StringWriter(1024); OutputTarget tgt = new OutputTarget(text); Document pdf = com.snowtide.PDF.open(
); pdf.pipe(tgt); pdf.close(); // do something with the extracted text... processText(text.toString()); - Version:
- ©2004-2024 Snowtide
- See Also:
OutputSource.pipe(OutputHandler)
,OutputSource.pipe(OutputHandler)
,OutputSource.pipe(OutputHandler)
,OutputSource.pipe(OutputHandler)
-
-
Field Summary
Fields Modifier and Type Field Description protected Appendable
sink
-
Constructor Summary
Constructors Constructor Description OutputTarget(Appendable sink)
Creates a new OutputTarget that directs output to the givenAppendable
instance.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description static OutputTarget
forBuffer(Appendable sb)
Deprecated.static OutputTarget
forWriter(Writer w)
Deprecated.useOutputTarget(Appendable)
insteadConfiguration
getConfig()
Returns theConfiguration
instance that thisOutputTarget
is currently using.Appendable
getObject()
Returns theAppendable
used to create thisOutputTarget
.void
linebreaks(int linebreakCnt)
Default implementation that writes specified number of line breaks (using thelinebreak String provided by the current configuration
) to theWriter
orAppendable
object that this OutputTarget wraps.void
setConfig(Configuration config)
Sets theConfiguration
instance thisOutputTarget
should use.void
spaces(int spaceCnt)
Default implementation that writes specified number of spaces to theWriter
orAppendable
object that this OutputTarget wraps.void
startPage(Page page)
Invoked when a page is about to be processed.void
startSpan(Span s)
Invoked when aSpan
is about to be processedvoid
textUnit(TextUnit tu)
Default implementation that writes the character run specified by the givenTextUnit
instance to theAppendable
held by thisOutputTarget
.void
write(char c)
Writes the provided character to theAppendable
used to create thisOutputTarget
.void
write(char[] buf)
Writes the provided character data to theAppendable
used to create thisOutputTarget
.void
write(char[] buf, int start, int len)
Writes the provided character data to theAppendable
used to create thisOutputTarget
.void
write(CharSequence sb)
Writes the providedCharSequence
's character data to theAppendable
used to create thisOutputTarget
.-
Methods inherited from class com.snowtide.pdf.OutputHandler
endBlock, endLine, endPage, endPDF, endSpan, startBlock, startLine, startPDF
-
-
-
-
Field Detail
-
sink
protected final Appendable sink
-
-
Constructor Detail
-
OutputTarget
public OutputTarget(Appendable sink)
Creates a new OutputTarget that directs output to the givenAppendable
instance.
-
-
Method Detail
-
forWriter
public static OutputTarget forWriter(Writer w)
Deprecated.useOutputTarget(Appendable)
insteadCreates a new OutputTarget that wraps aWriter
instance.
-
forBuffer
public static OutputTarget forBuffer(Appendable sb)
Deprecated.Creates a new OutputTarget that wraps aAppendable
instance.
-
write
public void write(CharSequence sb) throws IOException
Writes the providedCharSequence
's character data to theAppendable
used to create thisOutputTarget
.- Throws:
IOException
- if an error occurs writing the character data; only possible in connection with an OutputTarget instance that wraps aWriter
instance.
-
write
public void write(char[] buf, int start, int len) throws IOException
Writes the provided character data to theAppendable
used to create thisOutputTarget
.- Throws:
IOException
- if an error occurs writing the character data; only possible in connection with an OutputTarget instance that wraps aWriter
instance.
-
write
public final void write(char[] buf) throws IOException
Writes the provided character data to theAppendable
used to create thisOutputTarget
.- Throws:
IOException
- if an error occurs writing the character data; only possible in connection with an OutputTarget instance that wraps aWriter
instance.
-
write
public void write(char c) throws IOException
Writes the provided character to theAppendable
used to create thisOutputTarget
.- Throws:
IOException
- if an error occurs writing the character; only possible in connection with an OutputTarget instance that wraps aWriter
instance.
-
getObject
public Appendable getObject()
Returns theAppendable
used to create thisOutputTarget
.
-
textUnit
public void textUnit(TextUnit tu)
Default implementation that writes the character run specified by the given
TextUnit
instance to theAppendable
held by thisOutputTarget
.This implementation is very straightforward; it is provided here for illustrative purposes only:
if (tu.getCharacterSequence() == null) { // no mapped sequence, append direct character code conversion int cc = tu.getCharCode(); if (cc >= 32) write((char)cc); } else { write(tu.getCharacterSequence()); }
- Overrides:
textUnit
in classOutputHandler
-
spaces
public void spaces(int spaceCnt)
Default implementation that writes specified number of spaces to theWriter
orAppendable
object that this OutputTarget wraps.- Overrides:
spaces
in classOutputHandler
- Parameters:
spaceCnt
- the number of spaces that PDFxStream recommends should be outputted
-
linebreaks
public void linebreaks(int linebreakCnt)
Default implementation that writes specified number of line breaks (using thelinebreak String provided by the current configuration
) to theWriter
orAppendable
object that this OutputTarget wraps.- Overrides:
linebreaks
in classOutputHandler
- Parameters:
linebreakCnt
- the number of line breaks that PDFxStream recommends should be outputted
-
startPage
public void startPage(Page page)
Description copied from class:OutputHandler
Invoked when a page is about to be processed.- Overrides:
startPage
in classOutputHandler
- Parameters:
page
- a reference to thePage
that is about to be processed
-
startSpan
public void startSpan(Span s)
Description copied from class:OutputHandler
Invoked when aSpan
is about to be processed- Overrides:
startSpan
in classOutputHandler
-
getConfig
public Configuration getConfig()
Returns the
Configuration
instance that thisOutputTarget
is currently using. Unless anOutputTarget
instance is explicitly provided with a particular configuration viasetConfig(Configuration)
, it will synchronize its configuration with the configuration of aDocument
instance any time anOutputTarget
is provided to eitherOutputSource.pipe(OutputHandler)
orOutputSource.pipe(OutputHandler)
.If an
OutputTarget
is to be used topipe content only from Block contexts
, then it will usethe default PDFxStreamConfig instance
until a different configuration is set viasetConfig(Configuration)
.
-
setConfig
public void setConfig(Configuration config)
Sets theConfiguration
instance thisOutputTarget
should use. Once thisOutputTarget
instance's configuration is set using this function, it will cease to synchronize its configuration with the configuration provided byDocument
andPage
objects from which it is used to pipe content.
-
-