Interface Page

    • Method Detail

      • addColumnPartition

        void addColumnPartition​(int xcoord)
                         throws UnsupportedOperationException
        Adds the given coordinate as an acceptable midline between columns, used when this page is segmented. By default, no specific coordinate restrictions are applied to column partitioning. Adding any column partition coordinate will restrict acceptable column spacing midlines to only those coordinates specified.

        The exceptions to this are when privileged constants of COLUMN_POSITION_HALVES, COLUMN_POSITION_THIRDS, or COLUMN_POSITION_QUARTERS are provided. Those constants "expand" into multiple column partitions; e.g. specifying COLUMN_POSITION_THIRDS will result in two column partitions, one at getPageWidth() / 3 and another at 2 * getPageWidth() / 3.

        In order to be effective, this method must be used before either getTextContent() or OutputSource.pipe(OutputHandler) are invoked.
        Throws:
        UnsupportedOperationException - if this Page's implementation does not support specifying column positions.
        Since:
        2.5.0
      • getDocument

        Document getDocument()
        Returns the Document from which this Page was sourced.
      • getPageNumber

        int getPageNumber()
        Returns this Page's page number.
      • getPageWidth

        int getPageWidth()
        Returns the width of this page in PDF "default user space units" (as specified by the PDF spec). Typically, each "user space unit" is equivalent to 1/72 of an inch, so dividing the value returned by this method by 72 will yield the page height in inches.

        The value returned from this method corresponds to the width value of the /MediaBox attribute of a PDF page object.
      • getPageHeight

        int getPageHeight()
        Returns the height of this page in PDF "default user space units" (as specified by the PDF spec). Typically, each "user space unit" is equivalent to 1/72 of an inch, so dividing the value returned by this method by 72 will yield the page height in inches.

        The value returned from this method corresponds to the height value of the /MediaBox attribute of a PDF page object.
      • getCropBox

        Region getCropBox()
        The "crop box" defined by the PDF for this page, expressed in user space units as with getPageHeight() and getPageWidth(). This rectangle will default to the page width and height if it is not otherwise specified.
      • getRotationTheta

        int getRotationTheta()
        Returns the number of degrees by which the page has been rotated clockwise. This value should be a factor of 90, and can be negative.

        The value returned from this method corresponds to the value of the /Rotate attribute of a PDF page object.
      • getImages

        Collection<Image> getImages()
        Returns a Collection of Image objects, one for each image on this page. Note that the same image data might be displayed multiple times on a page; in such situations, multiple Image instances will still be included in the returned collection so as to represent each displayed image's dimensions and position.
        Throws:
        InsufficientLicenseException - if a license has been loaded, but that license does not include PDF.Feature.Images.
      • crop

        Page crop​(Region area)
        Returns a Page instance that contains only the content held by this Page instance that intersects the given "query" area. If all of the content held by this instance is intersected by the query area, then this instance may be returned unchanged. If no content in this Page intersects the query area, then an empty Page instance will be returned.
        Throws:
        UnsupportedOperationException - if this Page implementation does not support crop.
        See Also:
        Block.crop(Region)