XML Serialization Overview

If you plan to have your applications exchange data with other applications over the Internet, you will most probably use Extensible Markup Language (XML), either because of its flexibility or because of its widespread use in Internet applications. XML is a text-based markup language based on Standard Generalized Markup Language (SGML) and it's used mostly to represent structured data. XML is similar to HTML, but has stricter rules regarding the form and validity of documents.

This chapter contains the following sections:

XML Documents

To be usable, XML documents must be well formed. Well-formed documents have open and close tags for all their elements (in the correct sequence) and contain one root element. In addition, XML documents must have at least one XML declaration, an element that provides XML parsers with essential information needed to process a document.

Listing 1-1 shows an example of an XML document.

Listing 1-1  Service-request document

<?xml version="1.0" encoding="UTF-8"?>
<service_request>
    <company name="Kilocomp">
        <contact>Melinda Smith</contact>
        <address>
            <street>123 Market Street</street>
            <city>Townsville</city>
            <state>IN</state>
            <zip>65045</zip>
        </address>
        <phone_number>345-555-1234</phone_number>
    </company>
    <service priority="1">
        <description>Fix vending machine in lobby.</description>
    </service>
</service_request>

The XML declaration of the service-request document indicates that the document is written using the XML 1.0 standard and the UTF-8 character encoding. The service_request element is the root element; it encloses the data the document contains. service_request contains two elements: company and service. The company element contains one attribute, name, and three elements: contact, address, and phone_number. The service element has one attribute, priority, and one element, description. The address element contains four elements, street, city, state, and zip; it has no attributes. Figure 1-1 provides a graphical representation of the service-request document.

Figure 1-1  Graphical representation of the service-request document
Graphical representation of the service-request document

To be usable in a particular context, an XML document must be well formed and valid. A valid document is one that follows the structure specified by a schema file, which can be either a document type definition (DTD) file or file. The schema determines the layout of an XML document's elements, the attributes and subelements that each can have, and the constraints that the attribute data and element data must adhere to. XML Schema filenames usually have the .xsd extension, while DTD filenames usually have the .dtd extension. You can think of a schema as a Java class and an XML document as an instance of the schema. For more information on document schemas, see XML Schema at http://www.w3.org/XML/Schema.

Document type definition files can also be used to validate XML documents. However, because DTD files are not written in XML and are not as powerful as XML Schema files, XML Schema files are increasingly taking their place.

XML Namespaces

With the interoperability of XML documents comes the problem of differentiating between the element names you use in your documents and the names used in documents from other sources. Take a look at the document in Listing 1-2.

Listing 1-2  Service-response document

<?xml version="1.0" encoding="UTF-8"?>
<service_response>
    <service_request>
        <company name="Kilocomp">// 1
            <contact>Melinda Smith</contact>
            <address>
                <street>123 Market Street</street>
                <city>Townsville</city>
                <state>IN</state>
                <zip>65045</zip>
            </address>
            <phone_number>345-555-1234</phone_number>
        </company>
        <service priority="1">
            <description>Fix vending machine in lobby.</description>
        </service>
    </service_request>
    <appointment>
        <company>We Fix It</company>// 2
        <contact name="Nancy Garcia" phone="345-555-2334"
                 pager="345-555-1112" />
        <date>2002-05-02</date>
        <time>1500</time>
    </appointment>
</service_response>

Unless you add information about the element hierarchy of the document to your logic, it's difficult to differentiate between the company element of the service_request element (the line numbered 1) and the company element of the appointment element (2). This is where XML namespaces provide a great deal of assistance.

A namespace is like a Java package: It's a way of grouping related elements. Listing 1-3 shows a version of the service-response document that uses namespaces. Observe that the document has two distinct elements that enclose information about a company: client:company and provider:company. The prefixes tell you the category of each element.

To avoid having to put prefixes on all element names and to reduce the size of XML documents, you can define a default namespace for the document. By not including a prefix in the namespace definition, the line numbered 1 of Listing 1-4 defines a default namespace for the service_response element and the subelements of service_response that do not themselves define a namespace, such as appointment starting at the line numbered 2. You can find more information on XML namespaces in Namespaces in XML, located at http://www.w3.org/TR/REC-xml-names.

Listing 1-3  Service-response document using namespaces

<?xml version="1.0" encoding="UTF-8"?>
<provider:service_response xmlns:provider="http://provider.com/b_to_b">
    <client:service_request xmlns:client="http://client.com/svcs">
        <client:company name="Kilocomp">// 1
            <client:contact>Melinda Smith</client:contact>
            <client:address>
                <client:street>123 Market Street</client:street>
                <client:city>Townsville</client:city>
                <client:state>IN</client:state>
                <client:zip>65045</client:zip>
            </client:address>
            <client:phone_number>345-555-1234</client:phone_number>
        </client:company>
        <client:service priority="1">
            <client:description>Fix vending machine in lobby.</client:description>
        </client:service>
    </client:service_request>
    <provider:appointment>
        <provider:company>We Fix It</provider:company>// 2
        <provider:contact name="Nancy Garcia" phone="345-555-2334" pager="345-555-1112" />
        <provider:date>2002-05-02</provider:date>
        <provider:time>1500</provider:time>
    </provider:appointment>
</provider:service_response>

Listing 1-4  Service-response document using a default namespace for the provider entity

<?xml version="1.0" encoding="UTF-8"?>
<service_response xmlns="http://provider.com/b_to_b">// 1
    <client:service_request xmlns:client="http://client.com/svcs">
        <client:company name="Kilocomp">
            <client:contact>Melinda Smith</client:contact>
            <client:address>
                <client:street>123 Market Street</client:street>
                <client:city>Townsville</client:city>
                <client:state>IN</client:state>
                <client:zip>65045</client:zip>
            </client:address>
            <client:phone_number>345-555-1234</client:phone_number>
        </client:company>
        <client:service priority="1">
            <client:description>Fix vending machine in lobby.</client:description>
        </client:service>
    </client:service_request>
    <appointment>// 2
        <company>We Fix It</company>
        <contact name="Nancy Garcia" phone="345-555-2334"
                 pager="345-555-1112" />
        <date>2002-05-02</date>
        <time>1500</time>
    </appointment>
</service_response>

Benefits of XML Serialization

There are many benefits of using XML to encode data, including the ability to read and modify serialized or archived information easily. Java provides a great binary serialization API. WebObjects XML serialization leverages this well-known API to allow you to easily serialize your objects and data into XML documents. For more information on XML, visit http://www.w3.org/XML.

Serializing data into XML documents provides you with several benefits:

Transforming XML Documents

You may need to transform the XML documents generated by NSXMLOutputStream to a format that your customers or service providers are more familiar with. (NSXMLOutputStream is the WebObjects class that serializes objects and data into XML documents, while NSXMLInputStream is the class that deserializes XML documents into objects.) This can help expedite the creation of data-exchange systems. In other words, you can easily transfer information to and from your business partners. Keep in mind, however, that, unless the data transfer is one-way, you may have to create transformation scripts that convert your data to the format your partners need and data from your partners to the format that your applications require. In addition, you can deserialize data (using NSXMLInputStream) only from untransformed NSXMLOutputStream output.

XSL Transformations, or XSLT, is a specification that allows you to convert an XML document into another XML document or into any other type of document. An XSLT stylesheet or script contains instructions that tell a transformer how to process an input document (the product of XML serialization) to produce an output document. For more information on XSLT, see XSL Transformations (XSLT) Version 1.0, located at http://www.w3.org/TR/xslt.