Class OutputTarget

  • Direct Known Subclasses:
    SelectionOutputTarget

    public class OutputTarget
    extends OutputHandler

    This is a base OutputHandler implementation that directs all text extraction output to an Appendable of your choice, e.g. a Writer, StringBuilder, CharBuffer, and so on.

    This is the ideal place to start when building a custom OutputHandler implementation. See the XMLOutputTarget class as an example of how this can be done.

    Please note that OutputTarget makes no attempt to retain the visual layout or formatting of the text extracted from PDF documents. It is focused on:

    • Maximum performance
    • Ensuring that extracted PDF text is yielded with the proper segmentation and read-ordering, for the benefit of most users whose applications are sensitive to the semantic ordering of the PDF content. This includes search, text analytics, summarization tools, and similar applications.

    If your application requires PDF text extracts that retain the visual appearance of the text as it is laid out on each page, then VisualOutputTarget would be more suitable.

    Example usage:

     java.io.StringWriter text = new java.io.StringWriter(1024);
     OutputTarget tgt = new OutputTarget(text);
    
     Document pdf = com.snowtide.PDF.open();
     pdf.pipe(tgt);
     pdf.close();
    
      // do something with the extracted text...
     processText(text.toString());
     

    Version:
    ©2004-2024 Snowtide
    See Also:
    OutputSource.pipe(OutputHandler), OutputSource.pipe(OutputHandler), OutputSource.pipe(OutputHandler), OutputSource.pipe(OutputHandler)