SAX (Event Based) XML Parsing

## SAX-like XML Event Processing

With the release of XML Tools 2.6, it is possible to implement event-based XML processing using XML Tools. This is done by passing an AppleScript script object to the new SAX handler parameter of the parse XML command. This script object is expected to provide a series of handlers that respond to XML parsing events.

This approach is useful when you want to populate a custom data structure directly from XML data instead of extracting the data from the nested collection of XML element classes normally generated by the parse XML command.

Here is a very simple example illustrating how this works:

~~~~
script EventProcessor
property elementNames : {}

on XMLStartElement(elementName, elementAttributes)
— called when an XML element begins
set end of elementNames to elementName
end XMLStartElement
end script

set theXML to “


data in second test element

data in root element

set xxx to parse XML theXML SAX handler EventProcessor
xxx’s elementNames
— Result:
— {“data”, “test”, “test”}
~~~~

In this example, a copy of the EventProcessor script object is passed to the parse XML command. As the parse XML command is parsing the XML data, it will call the EventProcess’s XMLStartElement handler whenever a new XML element begins. When parsing completes, the EventProcessor object is returned to AppleScript. In this particular case, the XMLStartElement handler records the name of each XML element tag.

Here is an example of an event handler object implementing all the handlers that the XML parse command can call. You need only include the handlers for events that you are interested in handling:

~~~~
script AllEventHandlers

on XMLStartElement(elementName, elementAttributes)
— called when a new XML element begins
display dialog “XMLStartElement: ” & elementName & “, Attributes: ” & (length of elementAttributes)
end XMLStartElement

on XMLEndElement(elementName)
— called when an XML element ends
display dialog “XMLEndElement: ” & elementName
end XMLEndElement

on XMLCharacterData(xmlData)
–called when there is XML data for an element
display dialog “XMLCharacterData: ” & xmlData
end XMLCharacterData

on XMLComment(comment)
— called when an XML comment is encoutered
— must call parse XML with comments
display dialog “XMLComment: ” & comment
end XMLComment

on XMLDefaultContent(xmlData)
— called for content outside the root element (i.e. XML declaration)
display dialog “XMLDefaultContent: ” & xmlData
end XMLDefaultContent

on XMLStartCData()
— called at the beginning of an XML CData section
display dialog “XMLStartCData”
end XMLStartCData

on XMLEndCData()
— called at the end of an XML CData section
display dialog “XMLEndCData”
end XMLEndCData

on XMLStartNamespace(prefix, uri)
— called when a namespace reference begins
display dialog “XMLStartNamespace: ” & prefix & “, URI: ” & uri
end XMLStartNamespace

on XMLEndNamespace(prefix)
— called when a namespace reference ends
display dialog “XMLEndNamespace: ” & prefix
end XMLEndNamespace

on XMLProcessingInstruction(target, piData)
— called when an XML processing instruction is encountered
— must call parse XML with including processing instructions
display dialog “XMLStartNamespace: ” & target & “, Data: ” & piData
end XMLProcessingInstruction

on XMLNotStandalone()
— called when XML is not standalone, and there is no DTD. Return true to allow processing to
— continue if this handler is missing, parse XML’s strict standalone parameter value is used
display dialog “XMLNotStandalone”
return true — allow processing to continue
end XMLNotStandalone

on XMLStartDocTypeDecl(docTypeName, systemID, publicID, hasInternalSubset)
— called at the beginning of a DOCTYPE declaration
display dialog “XMLStartDocTypeDecl: ” & docTypeName & “, systemID: ” & systemID & “, ¬
publicID: ” & publicID & “, hasInternalSubset: ” & hasInternalSubset
end XMLStartDocTypeDecl

on XMLEndDocTypeDecl()
— called at the end of a DOCTYPE declaration
display dialog “XMLEndDocTypeDecl”
end XMLEndDocTypeDecl

on XMLExternalEntityRef(context, base, systemID, publicID)
— called after an external entity (DTD) has been loaded
display dialog “XMLExternalEntityRef: ” & context & “, base: ” & base & “, ¬
systemID: ” & systemID & “, publicID: ” & publicID
end XMLExternalEntityRef

on XMLUnparsedEntityDecl(entityName, base, systemID, publidID, notationName)
display dialog “XMLUnparsedEntityDecl: ” & entityName & “, base: ” & base & “, ¬
systemID: ” & systemID & “, publicID: ” & publicID & “, notationName: ” & notationName
end XMLUnparsedEntityDecl

on XMLNotationDecl(notationName, base, systemID, publidID)
display dialog “XMLNotationDecl: ” & notationName & “, base: ” & base & “, ¬
systemID: ” & systemID & “, publicID: ” & publicID
end XMLNotationDecl

on XMLParseResult(errNumber, errMessage)
— if the parsing is aborted doe to an AppleScript error, errNumber and errMessage describe the error. Otherwise
— these parameters contain missing value.

— return the data you want parse XML to return. If this method is omitted, the entire script object is returned
return “some data”
end XMLParseResult
end script
~~~~

**NOTE**: Attributes are passed to the XMLStartElement as a record where keys are the attribute name and values are the corresponding attribute value.

**NOTE**: If there is an error in one of the XML event handlers, parse XML will abort the parse. When this happens parse XML will return the result of XMLParseResult() handler or the script object, if XMLParseResult() is not defined, in the partial result of the error. You can extract this information using this syntax:

~~~~
script SAXHandler
property elementNames : {}

on XMLStartElement(elementName, elementAttributes)
— called when an XML element begins
set end of elementNames to elementName
error “Error Message from SAXHandler” — signal an error to abort parsing the rest of the XML stream
end XMLStartElement

on XMLParseResult()
— return the data you want parse XML to return. If this method is omitted, the entire script object is returned
return elementNames
end XMLParseResult
end script

try
set xxx to parse XML “



data in second test element

data in root element
” SAX handler SAXHandler with including processing instructions and including comments
on error errMsg partial result pr
{errMsg, pr} — partial result is the data returned by XMLParseResult
end try
— Result:
— {
— “xmlstartelement SAX handler error: Error Message from SAXHandler”,
— {
— “data”
— }
— }
~~~~

**NOTE**: [Script Debugger’s](http://www.latenightsw.com) AppleScript debugger is unable to debug XML event handlers while they are being executed by the parse XM command.

### Parameters

#### SAX handler

(new in v2.6)
script object

When the SAX handler parameter is specified, parse XML switches to a SAX-like event-based mode of parsing where handlers in the script object specified are called in response to events as the XML data is parsed.

When this parameter is omitted, parse XML performs as it has done in the past and returns an XML document class containing a nested data structure representing the content of the parsed XML data.

#### strict standalone

boolean

Ignored if the event handler object implements the XMLNotStandalone handler.

#### expanding external entities

boolean

By default, external entity references (e.g. DTDs) are ignored since XML Tools is a non-validating XML parser. When expanding external entities is true, XML Tools uses the Mac OS URL Access facilities to access the externally referenced entity.

If the external entity exists on another machine, you must have an active internet connection.

Supported URL formats: file:///…, http://…, and ftp://…

**NOTE**: The XMLExternalEditityRef handler is called after the external entity has been loaded.

#### including comments

boolean

By default, comments in your XML data are ignored. The including comments parameter must be true in order for the event handler’s XMLComment handler to be called.

#### including processing instructions

boolean

By default, XML processing instructions are ignored. The including processing instructions parameter must be true in order for the event handler’s XMLProcessingInstruction handler to be called.

#### serializing

boolean

Ignored.

#### base path

string

Provides a base URL for all external entity IDs. For example, the following code uses a DTD loaded from http://www.latenightsw.com/dtds/mydtd.dtd:

~~~~
parse XML “




” base path “http://www.latenightsw.com/dtds/”

~~~~

#### preserving whitespace

boolean

By default, the parse XML command strips all leading and trailing whitespace characters and normalizes multiple whitespace characters within a string to a single space.

**NOTE**: The xml-space=”preserve” attribute is honored when preserving whitespace is false.
**NOTE**: The xml-space=”ignore” attribute is not honored when preserving whitespace is true.
**NOTE**: Whitespace characters in CDATA sections are never stripped.

When preserving whitespace is true, parse XML returns all XML data, including whitespace.

The parse XML command will strip whitespace according to these rules before calling the event handler’s XMLCharacterData handler.

#### allowing leading whitespace

boolean

The XML specification states that well formed XML documents have no leading whitespace before the declaration. However, for historical reasons, XML Tools allows XML documents to contain leading whitespace data. If allowing leading whitespace is false, XML Tools will report an error when whitespace appears at the beginning of an XML document.

**NOTE**: This only applies to documents that begin with a declaration. If your document does not have an XML declaration, this option is ignored.

#### seperate namespace URIs

boolean

Ignored.

Leave a Reply

The home of Script Debugger