public class VisualOutputTarget extends OutputHandler
Column 1 Column 2 Column 3 Row 1 $500 $1,000 14B Row 2 $1,000 $5,621 8A Row 3 $6,009 $121 N/A
whereas the default OutputTarget
is more likely to output such
tabular data with proper read-ordering, but with no concern for spacing or line breaks between the
table's cells and rows:
Column 1 Column 2 Column 3 Row 1 $500 $1,000 14B Row 2 $1,000 $5,621 8A Row 3 $6,009 $121 N/A
Please note the following regarding VisualOutputTarget
:
VisualOutputTarget
will yield very poor output when used to format rotated text;
in such a case, the results are essentially undefined. You may optionally suppress the inclusion of
rotated characters from VisualOutputTarget
's output using VisualOutputTarget.setIncludingRotatedChars(boolean)
.VisualOutputTarget
is likely to impose a slight performance penalty
compared to using the default OutputTarget
. This penalty
should be no more than 5%, and is necessary because of the processing needed to normalize the
extracted text to appear as it does on the page.VisualOutputTarget
performs best when working
with text rendered using a monospace font. Proportional fonts and (especially) justified text complicates
the process of normalizing the spacing of the text formatted by this class.Constructor and Description |
---|
VisualOutputTarget(Appendable sb) |
Modifier and Type | Method and Description |
---|---|
void |
endLine(Line line)
Invoked when PDFxStream has finished processing a Line.
|
void |
endPage(Page page)
Invoked when PDFxStream has finished processing a page
|
float |
getSpacingScale()
Returns the spacing scale currently in effect for this
VisualOutputTarget . |
boolean |
isIncludingRotatedChars()
Return true if this
VisualOutputTarget will include rotated TextUnit s in its output
(true by default). |
boolean |
isMarginTrimmed()
Returns true if this
VisualOutputTarget trims whitespace corresponding to the left margin of each page
piped to it. |
void |
linebreaks(int linebreakCnt)
Invoked when PDFxStream determines that a series of line breaks should
be outputted between the previous entity (page, block, line, etc) and the
next entity (page, block, line, etc).
|
void |
setIncludingRotatedChars(boolean includingRotatedChars)
Used to set whether or not this
VisualOutputTarget will include rotated TextUnit s in its output
(true by default). |
void |
setMarginTrimmed(boolean marginTrimmed)
Sets whether or not this
VisualOutputTarget trims the whitespace corresponding to the left margin of
each page it handles. |
void |
setSpacingScale(float scale)
Modifies the spacing scale that is used when outputting content laid out using this
VisualOutputTarget . |
void |
spaces(int spaceCnt)
Invoked when PDFxStream determines that a series of spaces should
be outputted between the previous entity (block, line, text unit, etc) and the
next entity (block, line, text unit, etc).
|
void |
startBlock(Block b)
Invoked when a Block is about to be processed.
|
void |
startLine(Line line)
Invoked when a Line is about to be processed.
|
void |
startPage(Page page)
Invoked when a page is about to be processed.
|
void |
textUnit(TextUnit tu)
Invoked when a run of characters is to be outputted, as represented by the
given
TextUnit instance. |
endBlock, endPDF, startPDF
public VisualOutputTarget(Appendable sb)
public void setSpacingScale(float scale)
VisualOutputTarget
.
The default is 1; using a value of 2 will (approximately) double the number of spaces that are outputted between
recognized words, while a value of .5 will (approximately) halve that number. This is useful in circumstances where:
public float getSpacingScale()
VisualOutputTarget
.public boolean isIncludingRotatedChars()
VisualOutputTarget
will include rotated TextUnit
s in its output
(true by default).public void setIncludingRotatedChars(boolean includingRotatedChars)
VisualOutputTarget
will include rotated TextUnit
s in its output
(true by default).public void setMarginTrimmed(boolean marginTrimmed)
VisualOutputTarget
trims the whitespace corresponding to the left margin of
each page it handles. Defaults to false
, meaning that text set off from the left edge of a page will
be preceded by a corresponding number of spaces in resulting text extracts. This means that text located at
the same horizontal position on different pages using the same font and font size will be found in the same
column position in extracted text, simplifying identification and organization of implicitly tabular data that
spans page boundaries.
If set to true
, then a minimum number of spaces will be added to the beginning of each line of text.
public boolean isMarginTrimmed()
VisualOutputTarget
trims whitespace corresponding to the left margin of each page
piped to it.public void endPage(Page page)
OutputHandler
endPage
in class OutputHandler
page
- a reference to the Page
that has been processedpublic void startBlock(Block b)
OutputHandler
startBlock
in class OutputHandler
b
- a reference to the Block
that is about to be processedpublic void textUnit(TextUnit tu)
OutputHandler
TextUnit
instance.textUnit
in class OutputHandler
public void spaces(int spaceCnt)
OutputHandler
spaces
in class OutputHandler
spaceCnt
- the number of spaces that PDFxStream
recommends should be outputtedpublic void linebreaks(int linebreakCnt)
OutputHandler
linebreaks
in class OutputHandler
linebreakCnt
- the number of line breaks that PDFxStream
recommends should be outputtedpublic void startLine(Line line)
OutputHandler
startLine
in class OutputHandler
line
- a reference to the Line
that is about to be processedpublic void endLine(Line line)
OutputHandler
endLine
in class OutputHandler
line
- a reference to the Line
that has been processedpublic void startPage(Page page)
OutputHandler
startPage
in class OutputHandler
page
- a reference to the Page
that is about to be processed