Class OutputTarget
- java.lang.Object
-
- com.snowtide.pdf.OutputHandler
-
- com.snowtide.pdf.OutputTarget
-
- Direct Known Subclasses:
SelectionOutputTarget
public class OutputTarget extends OutputHandler
This is a base
OutputHandlerimplementation that directs all text extraction output to anAppendableof your choice, e.g. aWriter,StringBuilder,CharBuffer, and so on.This is the ideal place to start when building a custom
OutputHandlerimplementation. See theXMLOutputTargetclass as an example of how this can be done.Please note that
OutputTargetmakes no attempt to retain the visual layout or formatting of the text extracted from PDF documents. It is focused on:- Maximum performance
- Ensuring that extracted PDF text is yielded with the proper segmentation and read-ordering, for the benefit of most users whose applications are sensitive to the semantic ordering of the PDF content. This includes search, text analytics, summarization tools, and similar applications.
If your application requires PDF text extracts that retain the visual appearance of the text as it is laid out on each page, then
VisualOutputTargetwould be more suitable.Example usage:
java.io.StringWriter text = new java.io.StringWriter(1024); OutputTarget tgt = new OutputTarget(text); Document pdf = com.snowtide.PDF.open(
); pdf.pipe(tgt); pdf.close(); // do something with the extracted text... processText(text.toString()); - Version:
- ©2004-2025 Snowtide
- See Also:
OutputSource.pipe(OutputHandler),OutputSource.pipe(OutputHandler),OutputSource.pipe(OutputHandler),OutputSource.pipe(OutputHandler)
-
-
Field Summary
Fields Modifier and Type Field Description protected Appendablesink
-
Constructor Summary
Constructors Constructor Description OutputTarget(Appendable sink)Creates a new OutputTarget that directs output to the givenAppendableinstance.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description static OutputTargetforBuffer(Appendable sb)Deprecated.static OutputTargetforWriter(Writer w)Deprecated.useOutputTarget(Appendable)insteadConfigurationgetConfig()Returns theConfigurationinstance that thisOutputTargetis currently using.AppendablegetObject()Returns theAppendableused to create thisOutputTarget.voidlinebreaks(int linebreakCnt)Default implementation that writes specified number of line breaks (using thelinebreak String provided by the current configuration) to theWriterorAppendableobject that this OutputTarget wraps.voidsetConfig(Configuration config)Sets theConfigurationinstance thisOutputTargetshould use.voidspaces(int spaceCnt)Default implementation that writes specified number of spaces to theWriterorAppendableobject that this OutputTarget wraps.voidstartPage(Page page)Invoked when a page is about to be processed.voidstartSpan(Span s)Invoked when aSpanis about to be processedvoidtextUnit(TextUnit tu)Default implementation that writes the character run specified by the givenTextUnitinstance to theAppendableheld by thisOutputTarget.voidwrite(char c)Writes the provided character to theAppendableused to create thisOutputTarget.voidwrite(char[] buf)Writes the provided character data to theAppendableused to create thisOutputTarget.voidwrite(char[] buf, int start, int len)Writes the provided character data to theAppendableused to create thisOutputTarget.voidwrite(CharSequence sb)Writes the providedCharSequence's character data to theAppendableused to create thisOutputTarget.-
Methods inherited from class com.snowtide.pdf.OutputHandler
endBlock, endLine, endPage, endPDF, endSpan, startBlock, startLine, startPDF
-
-
-
-
Field Detail
-
sink
protected final Appendable sink
-
-
Constructor Detail
-
OutputTarget
public OutputTarget(Appendable sink)
Creates a new OutputTarget that directs output to the givenAppendableinstance.
-
-
Method Detail
-
forWriter
@Deprecated public static OutputTarget forWriter(Writer w)
Deprecated.useOutputTarget(Appendable)insteadCreates a new OutputTarget that wraps aWriterinstance.
-
forBuffer
@Deprecated public static OutputTarget forBuffer(Appendable sb)
Deprecated.Creates a new OutputTarget that wraps aAppendableinstance.
-
write
public void write(CharSequence sb) throws IOException
Writes the providedCharSequence's character data to theAppendableused to create thisOutputTarget.- Throws:
IOException- if an error occurs writing the character data; only possible in connection with an OutputTarget instance that wraps aWriterinstance.
-
write
public void write(char[] buf, int start, int len) throws IOExceptionWrites the provided character data to theAppendableused to create thisOutputTarget.- Throws:
IOException- if an error occurs writing the character data; only possible in connection with an OutputTarget instance that wraps aWriterinstance.
-
write
public final void write(char[] buf) throws IOExceptionWrites the provided character data to theAppendableused to create thisOutputTarget.- Throws:
IOException- if an error occurs writing the character data; only possible in connection with an OutputTarget instance that wraps aWriterinstance.
-
write
public void write(char c) throws IOExceptionWrites the provided character to theAppendableused to create thisOutputTarget.- Throws:
IOException- if an error occurs writing the character; only possible in connection with an OutputTarget instance that wraps aWriterinstance.
-
getObject
public Appendable getObject()
Returns theAppendableused to create thisOutputTarget.
-
textUnit
public void textUnit(TextUnit tu)
Default implementation that writes the character run specified by the given
TextUnitinstance to theAppendableheld by thisOutputTarget.- Overrides:
textUnitin classOutputHandler- See Also:
TextUnit.getCharacterSequence()
-
spaces
public void spaces(int spaceCnt)
Default implementation that writes specified number of spaces to theWriterorAppendableobject that this OutputTarget wraps.- Overrides:
spacesin classOutputHandler- Parameters:
spaceCnt- the number of spaces that PDFxStream recommends should be outputted
-
linebreaks
public void linebreaks(int linebreakCnt)
Default implementation that writes specified number of line breaks (using thelinebreak String provided by the current configuration) to theWriterorAppendableobject that this OutputTarget wraps.- Overrides:
linebreaksin classOutputHandler- Parameters:
linebreakCnt- the number of line breaks that PDFxStream recommends should be outputted
-
startPage
public void startPage(Page page)
Description copied from class:OutputHandlerInvoked when a page is about to be processed.- Overrides:
startPagein classOutputHandler- Parameters:
page- a reference to thePagethat is about to be processed
-
startSpan
public void startSpan(Span s)
Description copied from class:OutputHandlerInvoked when aSpanis about to be processed- Overrides:
startSpanin classOutputHandler
-
getConfig
public Configuration getConfig()
Returns the
Configurationinstance that thisOutputTargetis currently using. Unless anOutputTargetinstance is explicitly provided with a particular configuration viasetConfig(Configuration), it will synchronize its configuration with the configuration of aDocumentinstance any time anOutputTargetis provided to eitherOutputSource.pipe(OutputHandler)orOutputSource.pipe(OutputHandler).If an
OutputTargetis to be used topipe content only from Block contexts, then it will usethe default PDFxStreamConfig instanceuntil a different configuration is set viasetConfig(Configuration).
-
setConfig
public void setConfig(Configuration config)
Sets theConfigurationinstance thisOutputTargetshould use. Once thisOutputTargetinstance's configuration is set using this function, it will cease to synchronize its configuration with the configuration provided byDocumentandPageobjects from which it is used to pipe content.
-
-