public class XMLFormExport
extends java.lang.Object
This class extracts all interactive form data from a PDFTextStream instance (or from a PDF file specified
via command line
), and builds up an XML DOM Document instance containing the
extracted form data. When used from the command line, this class pipes the resulting DOM Document either
to disk, or to standard out (System.out
). (No formatting is applied to the XML document
when it is written
Note that the schema of the resulting XML document does not conform to any Adobe-specified XML schema for form data (i.e. XFDF, XFA, etc). However, it is useful for applications which do not require compatibility with those schemas, and this class remains an excellent example for how to utilize PDFTextStream's interactive form API.
The full source code for this class is included in every PDFTextStream distribution.
Below is a DTD representing the structure of the XML document this class produces. This DTD is also available in the source code included in the PDFTextStream distribution.
<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT form (field+)>
<!ATTLIST form
sourcefile CDATA #REQUIRED
>
<!ELEMENT field (options?, button-type?, value-richtext?, value*)>
<!ATTLIST field
localname CDATA #REQUIRED
fullname CDATA #REQUIRED
type CDATA #REQUIRED
mappingname CDATA #IMPLIED
>
<!-- button types: 'push', 'check', 'radio' -->
<!ELEMENT button-type (#PCDATA)>
<!-- used to represent options available from AcroChoiceField.getOptions() -->
<!ELEMENT option (exp-value, disp-value)>
<!ELEMENT options (option+)>
<!ELEMENT disp-value (#PCDATA)>
<!ELEMENT exp-value (#PCDATA)>
<!ELEMENT value (#PCDATA)>
<!-- Only used by AcroTextField -->
<!ELEMENT value-richtext (#PCDATA)>
Modifier and Type | Method and Description |
---|---|
static org.w3c.dom.Document |
exportFormAsXML(PDFTextStream source)
Extracts all interactive form data from a PDF file using the given PDFTextStream, and returns a
DOM XML Document instance containing the form data.
|
static void |
main(java.lang.String[] args)
Exports the form data held in the PDF file referenced by the first path to a
new XML document referenced by the second path.
|
static void |
serializeXMLDocument(org.w3c.dom.Document doc,
java.io.Writer output)
Writes the given Document to the given Writer using a no-op XSL transformation.
|
public static org.w3c.dom.Document exportFormAsXML(PDFTextStream source) throws java.io.IOException
java.io.IOException
- - if an error occurs while extracting the form datapublic static void main(java.lang.String[] args)
Exports the form data held in the PDF file referenced by the first path to a new XML document referenced by the second path. If the second path is omitted, then the output xml content is piped to System.out.
Usage: java pdfts.examples.XMLFormExport pdf_file_path [output_xml_path]
Example (classpath configuration not included here for simplicity's sake):
java pdfts.examples.XMLFormExport /home/myname/path_to_pdf_file.pdf ../path_to_xml_export.xml
public static void serializeXMLDocument(org.w3c.dom.Document doc, java.io.Writer output) throws javax.xml.transform.TransformerException
javax.xml.transform.TransformerException