pdfts.examples
Class XMLFormExport

java.lang.Object
  extended by pdfts.examples.XMLFormExport

public class XMLFormExport
extends java.lang.Object

This class extracts all interactive form data from a PDFTextStream instance (or from a PDF file specified via command line), and builds up an XML DOM Document instance containing the extracted form data. When used from the command line, this class pipes the resulting DOM Document either to disk, or to standard out (System.out). (No formatting is applied to the XML document when it is written

Note that the schema of the resulting XML document does not conform to any Adobe-specified XML schema for form data (i.e. XFDF, XFA, etc). However, it is useful for applications which do not require compatibility with those schemas, and this class remains an excellent example for how to utilize PDFTextStream's interactive form API.

The full source code for this class is included in every PDFTextStream distribution.

Below is a DTD representing the structure of the XML document this class produces. This DTD is also available in the source code included in the PDFTextStream distribution.

<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT form (field+)>
<!ATTLIST form
    sourcefile CDATA #REQUIRED
>

<!ELEMENT field (options?, button-type?, value-richtext?, value*)>
<!ATTLIST field
    localname CDATA #REQUIRED
    fullname CDATA #REQUIRED
    type CDATA #REQUIRED
    mappingname CDATA #IMPLIED
>

<!-- button types: 'push', 'check', 'radio' -->
<!ELEMENT button-type (#PCDATA)>

<!-- used to represent options available from AcroChoiceField.getOptions() -->
<!ELEMENT option (exp-value, disp-value)>
<!ELEMENT options (option+)>
<!ELEMENT disp-value (#PCDATA)>
<!ELEMENT exp-value (#PCDATA)>

<!ELEMENT value (#PCDATA)>

<!-- Only used by AcroTextField -->
<!ELEMENT value-richtext (#PCDATA)>
 

Version:
©2004-2012 Snowtide Informatics Systems, Inc.

Method Summary
static org.w3c.dom.Document exportFormAsXML(PDFTextStream source)
          Extracts all interactive form data from a PDF file using the given PDFTextStream, and returns a DOM XML Document instance containing the form data.
static void main(java.lang.String[] args)
           Exports the form data held in the PDF file referenced by the first path to a new XML document referenced by the second path.
static void serializeXMLDocument(org.w3c.dom.Document doc, java.io.Writer output)
          Writes the given Document to the given Writer using a no-op XSL transformation.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

exportFormAsXML

public static org.w3c.dom.Document exportFormAsXML(PDFTextStream source)
                                            throws java.io.IOException
Extracts all interactive form data from a PDF file using the given PDFTextStream, and returns a DOM XML Document instance containing the form data. The structure of the returned XML document is given by the DTD in this class' main javadoc, as well as in the XMLFormExport.dtd file included with every PDFTextStream distribution.

Throws:
java.io.IOException - - if an error occurs while extracting the form data

main

public static void main(java.lang.String[] args)

Exports the form data held in the PDF file referenced by the first path to a new XML document referenced by the second path. If the second path is omitted, then the output xml content is piped to System.out.

Usage: java pdfts.examples.XMLFormExport pdf_file_path [output_xml_path]

Example (classpath configuration not included here for simplicity's sake):
java pdfts.examples.XMLFormExport /home/myname/path_to_pdf_file.pdf ../path_to_xml_export.xml


serializeXMLDocument

public static void serializeXMLDocument(org.w3c.dom.Document doc,
                                        java.io.Writer output)
                                 throws javax.xml.transform.TransformerException
Writes the given Document to the given Writer using a no-op XSL transformation.

Throws:
javax.xml.transform.TransformerException