It's time to put on your programmer's hat and get acquainted with Document Object Model (DOM), which provides easy access to XML documents via a tree-like set of objects.

Since there are DOM implementations in quite a few languages, I'll try to keep things as language-neutral as possible in the process of introducing you to the specification. That means, unfortunately, no sample code.

Some things you should know about DOM
DOM is in reality nothing more than an abstract specification for accessing the content of a given document using a tree-like set of objects. The document doesn't necessarily have to be an XML document; keep that in mind as you read along.

As with all things Web, the DOM specification is managed by the World Wide Web Consortium (W3C). Operating under a mandate to provide a uniform API for use with multiple platforms and languages, the W3C defines DOM as a set of abstract classes without an official implementation. So it's up to individual vendors to actually provide implementations of the specification's interfaces that are appropriate for a given platform and language.

DOM's interface definitions were created using Object Management Group's Interface Definition Language (IDL). It can often be helpful to examine these definitions even if you have no formal knowledge of IDL, which is fairly self-explanatory. I've linked to the appropriate IDL definition for each interface I mention in this article so that you can refer to it and the accompanying documentation if necessary.

DOM has three levels of functionality:

  • Level 1 provides only the most basic support for parsing an XML document.
  • Level 2 extends Level 1 by providing support for XML namespaces. This is the currently recommended level of functionality, and I'll be referring you to Level 2 versions of the DOM interfaces in this article.
  • Level 3, which as of the day I'm writing this is still in the "working document" phase (meaning it's subject to change), adds additional support for XPath queries and loading and saving documents.
Because the W3C's specification is only a minimum recommendation, vendors can, and often do, provide proprietary extensions. This is why, for example, many of the available DOM implementations will already have XPath support built-in. You should be wary of using these extensions, particularly ones that represent Level 3 functionality. The interfaces of those objects are still very much subject to change, and the final, official versions may be incompatible with code you've written for the working versions.

DOM's object model (Is that redundant?)

DOM expresses a document as a tree of Node objects. If you'll recall, a tree is defined as a set of interconnected objects, or nodes, with one node providing the root. Nodes are given names corresponding to their relative position to another node in the tree. For example, a node's parent node is the node one level up (closer to the root element) in the tree's hierarchy, while a child node is one level down; a sibling is to the immediate right or left of a node on the same level of the tree. Figure A gives a more graphical explanation of these terms, which you can refer to if you find any of this family business confusing.

Figure A
Nodes
A graphical illustration of node relationships

Node objects not only represent XML elements in a document, but they also represent everything else found in a document, from the topmost document element itself to individual content pieces like attributes, comments, and data. Each node has a specialised interface that corresponds to the XML content it represents, but these are all still nodes at heart. Object-oriented folks would say that all DOM objects inherit from node. The node interface is the primary method you'll use to navigate a document's tree and modify the structure of a document by adding new nodes.

Related links

Comments

1

c.ponnuchamy - 22/04/08

Leave a comment

You must read and type the 6 chars within 0..9 and A..F

* indicates mandatory fields.

Log in


Sign up | Forgot your password?

  • Staff Opera's new SDK: Better browsing on the Wii?

    Opera has thrown a little more love at device developers by announcing an updated version of its software development kit on Wednesday at CES. Read more »

    -- posted by Staff

  • Staff 2008: Time to call stumps

    It's another year down but some things never change. That was shown this week as Internet Explorer remained under fire from yet another zero-day exploit. In other news, we set a hard drive on fire and Apple cans its involvement with MacWorld. Read more »

    -- posted by Staff

  • Staff Unlocking Android

    In this week's roundup we take a look at Google's new technology -- Native Client, its Android phone, news from the world of web browsers and more. Read more »

    -- posted by Staff

What's on?