public class VisualOutputTarget extends OutputHandler
Column 1 Column 2 Column 3 Row 1 $500 $1,000 14B Row 2 $1,000 $5,621 8A Row 3 $6,009 $121 N/A
whereas the default OutputTarget
is more likely to output such
tabular data with proper read-ordering, but with no concern for spacing or line breaks between the
table's cells and rows:
Column 1 Column 2 Column 3 Row 1 $500 $1,000 14B Row 2 $1,000 $5,621 8A Row 3 $6,009 $121 N/A
Please note the following regarding VisualOutputTarget
:
VisualOutputTarget
will yield very poor output when used to format rotated text;
in such a case, the results are essentially undefined. You may optionally suppress the inclusion of
rotated characters from VisualOutputTarget
's output using setIncludingRotatedChars(boolean)
.VisualOutputTarget
is likely to impose a slight performance penalty
compared to using the default OutputTarget
. This penalty
should be no more than 5%, and is necessary because of the processing needed to normalize the
extracted text to appear as it does on the page.VisualOutputTarget
performs best when working
with text rendered using a monospace font. Proportional fonts and (especially) justified text complicates
the process of normalizing the spacing of the text formatted by this class. Improvements will likely be made
in future PDFTextStream releases to make this class more capable when handling justified text or text
rendered using proportional fonts.Constructor and Description |
---|
VisualOutputTarget(java.lang.Appendable sb) |
VisualOutputTarget(java.io.Writer w) |
Modifier and Type | Method and Description |
---|---|
void |
endLine(Line line)
Invoked when PDFTextStream has finished processing a Line.
|
void |
endPage(Page page)
Invoked when PDFTextStream has finished processing a page
|
float |
getSpacingScale()
Returns the spacing scale currently in effect for this
VisualOutputTarget . |
boolean |
isIncludingRotatedChars()
Return true if this
VisualOutputTarget will include rotated TextUnit s in its output
(true by default). |
void |
linebreaks(int linebreakCnt)
Invoked when PDFTextStream determines that a series of line breaks should
be outputted between the previous entity (page, block, line, etc) and the
next entity (page, block, line, etc).
|
void |
setIncludingRotatedChars(boolean includingRotatedChars)
Used to set whether or not this
VisualOutputTarget will include rotated TextUnit s in its output
(true by default). |
void |
setSpacingScale(float scale)
Modifies the spacing scale that is used when outputting content laid out using this
VisualOutputTarget . |
void |
spaces(int spaceCnt)
Invoked when PDFTextStream determines that a series of spaces should
be outputted between the previous entity (block, line, text unit, etc) and the
next entity (block, line, text unit, etc).
|
void |
startBlock(Block b)
Invoked when a Block is about to be processed.
|
void |
startLine(Line line)
Invoked when a Line is about to be processed.
|
void |
startPage(Page page)
Invoked when a page is about to be processed.
|
void |
textUnit(TextUnit tu)
Invoked when a run of characters is to be outputted, as represented by the
given
TextUnit instance. |
endBlock, endPDF, startPDF
public VisualOutputTarget(java.io.Writer w)
public VisualOutputTarget(java.lang.Appendable sb)
public void setSpacingScale(float scale)
VisualOutputTarget
.
The default is 1; using a value of 2 will (approximately) double the number of spaces that are outputted between
recognized words, while a value of .5 will (approximately) halve that number. This is useful in circumstances where:
public float getSpacingScale()
VisualOutputTarget
.setSpacingScale(float)
public boolean isIncludingRotatedChars()
VisualOutputTarget
will include rotated TextUnit
s in its output
(true by default).public void setIncludingRotatedChars(boolean includingRotatedChars)
VisualOutputTarget
will include rotated TextUnit
s in its output
(true by default).public void endPage(Page page)
OutputHandler
endPage
in class OutputHandler
page
- - a reference to the Page
that has been processedpublic void startBlock(Block b)
OutputHandler
startBlock
in class OutputHandler
b
- - a reference to the Block
that is about to be processedpublic void textUnit(TextUnit tu)
OutputHandler
TextUnit
instance.textUnit
in class OutputHandler
public void spaces(int spaceCnt)
OutputHandler
spaces
in class OutputHandler
spaceCnt
- - the number of spaces that PDFTextStream
recommends should be outputtedpublic void linebreaks(int linebreakCnt)
OutputHandler
linebreaks
in class OutputHandler
linebreakCnt
- - the number of line breaks that PDFTextStream
recommends should be outputtedpublic void startLine(Line line)
OutputHandler
startLine
in class OutputHandler
line
- - a reference to the Line
that is about to be processedpublic void endLine(Line line)
OutputHandler
endLine
in class OutputHandler
line
- - a reference to the Line
that has been processedpublic void startPage(Page page)
OutputHandler
startPage
in class OutputHandler
page
- - a reference to the Page
that is about to be processed