Class XMLFormExport


  • public class XMLFormExport
    extends Object

    This class extracts all interactive form data from a Document (or from a PDF file specified via command line), and builds up an XML DOM Document instance containing the extracted form data. When used from the command line, this class pipes the resulting DOM Document either to disk, or to standard out (System.out). (No formatting is applied to the XML document when it is written

    Note that the schema of the resulting XML document does not conform to any Adobe-specified XML schema for form data (i.e. XFDF, XFA, etc). However, it is useful for applications which do not require compatibility with those schemas, and this class remains an excellent example for how to utilize PDFxStream's interactive form API.

    The full source code for this class is included in every PDFxStream distribution.

    Below is a DTD representing the structure of the XML document this class produces. This DTD is also available in the source code included in the PDFxStream distribution.

    <?xml version="1.0" encoding="UTF-8"?>
    <!ELEMENT form (field+)>
    <!ATTLIST form
        sourcefile CDATA #REQUIRED
    >
    
    <!ELEMENT field (options?, button-type?, value-richtext?, value*)>
    <!ATTLIST field
        localname CDATA #REQUIRED
        fullname CDATA #REQUIRED
        type CDATA #REQUIRED
        mappingname CDATA #IMPLIED
    >
    
    <!-- button types: 'push', 'check', 'radio' -->
    <!ELEMENT button-type (#PCDATA)>
    
    <!-- used to represent options available from AcroChoiceField.getOptions() -->
    <!ELEMENT option (exp-value, disp-value)>
    <!ELEMENT options (option+)>
    <!ELEMENT disp-value (#PCDATA)>
    <!ELEMENT exp-value (#PCDATA)>
    
    <!ELEMENT value (#PCDATA)>
    
    <!-- Only used by AcroTextField -->
    <!ELEMENT value-richtext (#PCDATA)>
     
    Version:
    ©2004-2025 Snowtide
    • Method Detail

      • exportFormAsXML

        public static Document exportFormAsXML​(Document source)
                                        throws IOException
        Extracts all interactive form data from a PDF file using the given PDFxStream, and returns a DOM XML Document instance containing the form data. The structure of the returned XML document is given by the DTD in this class' main javadoc, as well as in the XMLFormExport.dtd file included with every PDFxStream distribution.
        Throws:
        IOException - if an error occurs while extracting the form data
      • main

        @Deprecated
        public static void main​(String[] args)
        Deprecated.
        Command-line usage of this class may be moved or removed in future PDFxStream releases.

        Exports the form data held in the PDF file referenced by the first path to a new XML document referenced by the second path. If the second path is omitted, then the output xml content is piped to System.out.

        Usage: java pdfts.examples.XMLFormExport pdf_file_path [output_xml_path]

        Example (classpath configuration not included here for simplicity's sake):
        java pdfts.examples.XMLFormExport /home/myname/path_to_pdf_file.pdf ../path_to_xml_export.xml