public class VisualOutputTarget extends OutputHandler
Column 1 Column 2 Column 3 Row 1 $500 $1,000 14B Row 2 $1,000 $5,621 8A Row 3 $6,009 $121 N/A
whereas the default OutputTarget
is more likely to output such
tabular data with proper read-ordering, but with no concern for spacing or line breaks between the
table's cells and rows:
Column 1 Column 2 Column 3 Row 1 $500 $1,000 14B Row 2 $1,000 $5,621 8A Row 3 $6,009 $121 N/A
Please note the following regarding VisualOutputTarget
:
VisualOutputTarget
will yield very poor output when used to format rotated text;
in such a case, the results are essentially undefined. You may optionally suppress the inclusion of
rotated characters from VisualOutputTarget
's output using VisualOutputTarget.setIncludingRotatedChars(boolean)
.VisualOutputTarget
is likely to impose a slight performance penalty
compared to using the default OutputTarget
. This penalty
should be no more than 5%, and is necessary because of the processing needed to normalize the
extracted text to appear as it does on the page.VisualOutputTarget
performs best when working
with text rendered using a monospace font. Proportional fonts and (especially) justified text complicates
the process of normalizing the spacing of the text formatted by this class.Constructor and Description |
---|
VisualOutputTarget(Appendable sb) |
Modifier and Type | Method and Description |
---|---|
void |
endLine(Line line)
Invoked when PDFxStream has finished processing a Line.
|
void |
endPage(Page page)
Invoked when PDFxStream has finished processing a page
|
float |
getSpacingScale()
Returns the spacing scale currently in effect for this
VisualOutputTarget . |
boolean |
isIncludingRotatedChars()
Return true if this
VisualOutputTarget will include rotated TextUnit s in its output
(true by default). |
void |
linebreaks(int linebreakCnt)
Invoked when PDFxStream determines that a series of line breaks should
be outputted between the previous entity (page, block, line, etc) and the
next entity (page, block, line, etc).
|
void |
setIncludingRotatedChars(boolean includingRotatedChars)
Used to set whether or not this
VisualOutputTarget will include rotated TextUnit s in its output
(true by default). |
void |
setSpacingScale(float scale)
Modifies the spacing scale that is used when outputting content laid out using this
VisualOutputTarget . |
void |
spaces(int spaceCnt)
Invoked when PDFxStream determines that a series of spaces should
be outputted between the previous entity (block, line, text unit, etc) and the
next entity (block, line, text unit, etc).
|
void |
startBlock(Block b)
Invoked when a Block is about to be processed.
|
void |
startLine(Line line)
Invoked when a Line is about to be processed.
|
void |
startPage(Page page)
Invoked when a page is about to be processed.
|
void |
textUnit(TextUnit tu)
Invoked when a run of characters is to be outputted, as represented by the
given
TextUnit instance. |
endBlock, endPDF, startPDF
public VisualOutputTarget(Appendable sb)
public void setSpacingScale(float scale)
VisualOutputTarget
.
The default is 1; using a value of 2 will (approximately) double the number of spaces that are outputted between
recognized words, while a value of .5 will (approximately) halve that number. This is useful in circumstances where:
public float getSpacingScale()
VisualOutputTarget
.public boolean isIncludingRotatedChars()
VisualOutputTarget
will include rotated TextUnit
s in its output
(true by default).public void setIncludingRotatedChars(boolean includingRotatedChars)
VisualOutputTarget
will include rotated TextUnit
s in its output
(true by default).public void endPage(Page page)
OutputHandler
endPage
in class OutputHandler
page
- a reference to the Page
that has been processedpublic void startBlock(Block b)
OutputHandler
startBlock
in class OutputHandler
b
- a reference to the Block
that is about to be processedpublic void textUnit(TextUnit tu)
OutputHandler
TextUnit
instance.textUnit
in class OutputHandler
public void spaces(int spaceCnt)
OutputHandler
spaces
in class OutputHandler
spaceCnt
- the number of spaces that PDFxStream
recommends should be outputtedpublic void linebreaks(int linebreakCnt)
OutputHandler
linebreaks
in class OutputHandler
linebreakCnt
- the number of line breaks that PDFxStream
recommends should be outputtedpublic void startLine(Line line)
OutputHandler
startLine
in class OutputHandler
line
- a reference to the Line
that is about to be processedpublic void endLine(Line line)
OutputHandler
endLine
in class OutputHandler
line
- a reference to the Line
that has been processedpublic void startPage(Page page)
OutputHandler
startPage
in class OutputHandler
page
- a reference to the Page
that is about to be processed