All Superinterfaces:

Bounded
```
public interface TextUnit
extends Bounded
```
A single character or discrete character grouping positioned within a Line.
Note that space characters are typically not encoded in PDF documents; rather, they are implicit in the spacing between the bounding boxes of adjacent TextUnits.

Since:

v1.4

Version:

©2004-2025 Snowtide

Nested Class Summary

Nested Classes
Modifier and Type Interface Description

static interface TextUnit.Predicate
Type to be satisfied when implementing a TextUnit predicate for filtering characters in a Page.

Method Summary

All Methods Instance Methods Abstract Methods
Modifier and Type	Method	Description
`char[]`	`getCharacterSequence()`	Returns the characters that should be rendered for this TextUnit.
`int`	`getCharCode()`	Returns the 'raw' character code used to encode this TextUnit in the source PDF document.
`Font`	`getFont()`	Returns the `Font` that was in force when this `TextUnit` was outputted.
`float`	`getFontSize()`	Returns the size of the `font` used to render this `TextUnit`.
`char[]`	`getMappedCharSequence()`	Returns the characters that the source PDF mapped to the `"raw" character code`, via the font and encoding information in force when the character code was read from the PDF document.
`float`	`getTheta()`	Returns the angle (in degrees) by which this `TextUnit`'s baseline is rotated.
`boolean`	`isStruckThrough()`	Returns true if this `TextUnit` is struck through (~~like this~~).
`boolean`	`isUnderlined()`	Returns true if this `TextUnit` is underlined (like this).

Methods inherited from interface com.snowtide.pdf.layout.Bounded
bounds

- Method Detail
  - getCharCode
```
int getCharCode()
```
    Returns the 'raw' character code used to encode this TextUnit in the source PDF document.
    In many cases, this character code is equivalent to the Unicode character id. Otherwise, the font and encoding information in force when the character code was read from the PDF document dictates that a particular character sequence be rendered instead of the Unicode character corresponding to the character code returned by this function.
    
    Nearly all use cases should use the getCharacterSequence() method in preference to this one.
    
    See Also:
    
    getCharacterSequence()
  - getMappedCharSequence
```
char[] getMappedCharSequence()
```
    Returns the characters that the source PDF mapped to the "raw" character code, via the font and encoding information in force when the character code was read from the PDF document.
    Note that this character sequence will not reflect normalization that PDFxStream applies in order to produce getCharacterSequence(), including ligature re-folding, Arabic "un-shaping", the un-mirroring of brackets in right-to-left and bidirectional text, etc. Unless you have specific cause to avoid the result of these normalization steps, you should prefer getCharacterSequence() to this method.
  - getCharacterSequence
```
char[] getCharacterSequence()
```
    Returns the characters that should be rendered for this TextUnit. This sequence is the result of applying:
    
    the font and encoding information in force when the character code was read from the PDF document
    
    ligature re-folding
    
    Arabic "un-shaping"
    
    reversal of right-to-left, multi-character sequences so that characters are in memory/logical order and not presentation order
    
    the un-mirroring of brackets in right-to-left and bidirectional text
    
    … and other normalization transformations that may be added from time to time
    
    This function will never return null, but may return an empty array if the TextUnit's "raw" character code is explicitly mapped to an empty character sequence.
  - getFont
```
Font getFont()
```
    Returns the Font that was in force when this TextUnit was outputted.
  - getFontSize
```
float getFontSize()
```
    Returns the size of the font used to render this TextUnit.
  - isUnderlined
```
boolean isUnderlined()
```
    Returns true if this TextUnit is underlined (like this). While this will report an appropriate value for text that is rotated by a "regular" angle (90º, -90º, 180º), it will always return false for text that is rotated by any other angle (i.e. 30º, -45º, 16º, etc).
  - isStruckThrough
```
boolean isStruckThrough()
```
    Returns true if this TextUnit is struck through (~~like this~~). This will report an appropriate value for for text that is not rotated, and will return always false otherwise.
  - getTheta
```
float getTheta()
```
    Returns the angle (in degrees) by which this TextUnit's baseline is rotated.

Interface TextUnit

Nested Class Summary

Method Summary

Methods inherited from interface com.snowtide.pdf.layout.Bounded

Method Detail

getCharCode

getMappedCharSequence

getCharacterSequence

getFont

getFontSize

isUnderlined

isStruckThrough

getTheta