Validation Tips and Techniques
Validation is a procedure that ensures an XML document conforms to the rules governing its logical structure as specified in a language schema such as DTD (Document Type Definition). An XML document might be well-formed—that is, it obeys the syntactical rules of XML—and at the same time be invalid. For example, an element might include a child element when it is supposed to have only textual content, or a required attribute of an element might be missing.
To perform validation it helps to construct a tree of an XML document’s schema that is parallel to a tree structure representing the document’s actual content (see “Constructing XML Tree Structures”). The schema tree presents a simple abstract view of how the document should be structured. Instead of nodes of objects representing the actual elements and text of the document, the schema tree contains nodes that express the rules by which the parts of the document can be combined. Validation tests the actual elements, attributes, and other parts of the document against the rules of the schema to see if the document conforms. If your application finds any violation of conformance, it can notify the user and perhaps require the user to fix the error. You can validate an XML document when it is first read and processed and later when users attempt to make any changes to it.
Because the programmatic interface of NSXMLParser is designed to report only XML constructs and DTD declarations, this article focuses on that language schema. However, if you use an XML-based language schema, such as RELAX NG, then NSXMLParser can process the schema just it would as any XML file, reporting what it finds to its delegate. You can use the data you thereby acquire for validation.
The sections on constructing rules focus primarily on element and attribute declarations because these are by far the most common and most important type of declaration. “Handling Other Declarations” briefly discusses what to do with other kinds of declarations, such as those for entities and notations.
Using NSXMLParser to Handle DTD Declarations
The NSXMLParser class reports to its delegate DTD declarations it encounters in a document (assuming the delegate implements the necessary methods). If the language schema you use is DTD, NSXMLParser helps you acquire the data you need either for validation or for other purposes, such as enforcing correctness when dynamically constructing objects (for example, a menu template).
The DTD Delegation Methods
The NSXMLParser class defines a half dozen delegation methods that the parser invokes when the parser encounters a DTD declaration in a internal or external source. These methods are of the form:
The third parameter and any subsequent parameters depend on the type of declaration. The following list briefly describes the NSXMLParser delegation methods related to DTD declarations.
<!ELEMENT dictionary (documentation?, suite+)>
<!ATTLIST dictionary title CDATA #IMPLIED >
<!ENTITY % OSType "CDATA">
<!ENTITY name SYSTEM "name.xml">
<!NOTATION img PUBLIC "urn:mime:image/jpeg">
<!ENTITY corplogo SYSTEM "logo.jpg" NDATA img>
- parser:foundUnparsedEntityDeclarationWithName:publicID:systemID: notationName:
Resolving External DTD Entities
An XML document, in the
DOCTYPE declaration that occurs near its beginning, often identifies an external DTD file whose declarations prescribe its logical structure. For example, the following
DOCTYPE declaration says that the DTD related to the root element “addresses” can be located by the system identifier “addresses.dtd”.
<!DOCTYPE addresses SYSTEM "addresses.dtd">
Often the system identifier assumes a standard file-system location for DTDs—for example,
/System/Library/DTDs. At the start of processing, the NSXMLParser delegate is given an opportunity to resolve this external entity and give the parser a list of DTD declarations to parse.
When you prepare the NSXMLParser instance, send it the
setShouldResolveExternalEntities:with an argument of
Implement the delegation method
parser:resolveExternalEntityName:systemID:to return the declarations in the external DTD file as an NSData object.
If the DTD declarations are internal to an XML document, then the delegate will receive the DTD-declaration messages automatically (assuming, of course, that it implements the related methods).
Constructing Rules for Elements
Just as elements are typically the most common kind of construct in an XML document, element declarations are the most common kind of declaration in a DTD. They express rules for the composition of elements from child elements, text, and other constituents.
An element declaration has three parts: the
!ELEMENT keyword, the element name, and a content model. The content model is everything after the name up to the terminating angle bracket. Consider the following examples:
<!ELEMENT cocoa EMPTY>
<!ELEMENT keyboard (layouts+, modifierMap+, keyMapSet+, actions*, terminators*)>
<!ELEMENT dict (key, %plistObject;)*>
<!ELEMENT string (#PCDATA)>
The content model can specify no content (
EMPTY), any content (
ANY, which is rare), textual content (
#PCDATA), and child elements. It may identify child elements by name or by an entity reference (such as
%plistObject; in the third example above). The model can also specify mixed content—that is, the element can contain text and child elements in any order. Through occurrence modifiers (
?) and other syntactical conventions, the content model can also specify the order of child elements, whether an element is required or optional, how many times an element may occur, and acceptable choices between elements. Occurrence modifiers can be applied to groups of elements (in parentheses) as well as individual elements.
The job required for validation is to examine the content model of an element declaration and derive rules for the composition of that element. As one approach, you might design classes for each type of rule as well as for the scope of a rule (individual element or group of elements). You could then associate instances of that rule class with an element through the name of the element. During validation the instances are queried with regard to a current or potential member of an element.
Table 1 lists the most important rules derivable from an element declaration’s content model.
Constructing Rules for Attributes
Elements frequently have attributes associated with them, and consequently attribute-list declarations are frequently encountered in DTDs. Attribute-list declarations specify the rules for attributes using a syntax that is different from element declarations. They specify, in order, the associated element, the name of the attribute, the type of the attribute, and a default value. For example, the declaration
<!ATTLIST modifierMap defaultIndex NMTOKEN #REQUIRED >
states that the
defaultIndex attribute, which is associated with the
modifierMap element, is of type
NMTOKEN (meaning that it must be a valid XML name); the
#REQUIRED keyword given as the default value means that a value for the attribute must be supplied.
When a NSXMLParser instance encounters an attribute-list declaration, it sends
parser:foundAttributeDeclarationWithName:forElement:type:defaultValue: to its delegate. Passed in as parameters are attribute name, the associated element, the attribute type, and its default value. The rules for attributes derive from combinations of the last two parameter (type and default value). Table 2 lists some the possible rules that you can construct from attribute-list declarations.
Handling Other Declarations
Other DTD declarations such as those for entities and notations are less common than element and attribute-list declarations. You can easily derive rule constructions for these other declarations after reviewing some DTD documentation. However, there are a couple of things to keep in mind:
You need to record entity declarations in case they are used as part of the content model for an element declaration.
Because notations can be made an attribute type, you should also keep track of them.