Pages

Friday, July 8, 2011

Understanding DOM with DOMC source code browsing

Understanding DOM with DOMC (a C implementation of DOM - minus a few features)

Use the Source Luke!!

Note: DOMC is freely available under the MIT License
  1. Download the DOMC source code (200 KB)
  2. Download the libmba helper code (200 KB)
  3. cd domc/src
  4. ctags *
  5. vim domc.h
  6. Navigate through the struct Document, Node, NodeList etc to understand the utter simplicity of the DOM C implementation
domc.h and node.c are THE most important files.

Core implementation

a) DOM_Node is THE struct to understand and it's very very simple
It's recursively defined - a tree is a node containing other nodes/nodelists.
1) Fixed part - common to all nodes from document to attribute
struct DOM_Node {
    DOM_String *nodeName;
    DOM_String *nodeValue;
    unsigned short nodeType;             //Used as selector on union of variable parts below
  
    //Pointers to other Nodes
    DOM_Node *parentNode;
    DOM_NodeList *childNodes;

    DOM_Node *firstChild;
    DOM_Node *lastChild;
    DOM_Node *previousSibling;
    DOM_Node *nextSibling;

    DOM_NamedNodeMap *attributes;
    DOM_Document *ownerDocument;
[...]
2) Variable part - DOMNode.nodeType above selects between document vs element vs attribute vs processing-instruction
union {
        struct {
            DOM_DocumentType *doctype;
            DOM_Element *documentElement;
                        DOM_DocumentView *document;
                        DOM_AbstractView *defaultView;
                        DOM_Node *commonParent;
                        DOM_String *version;
                        DOM_String *encoding;
                        int standalone;
        } Document;
        struct {
            DOM_String *tagName;
        } Element;
        struct {
            DOM_String *name;
            int specified;
            DOM_String *value;
                        DOM_Element *ownerElement;
        } Attr;
        struct { // small code stripped out for clarity } DocumentType;
        struct { // small code stripped out for clarity } CharacterData;
        struct { // small code stripped out for clarity } Notation;
        struct { // small code stripped out for clarity } Entity;
        struct { // small code stripped out for clarity } ProcessingInstruction;
    } u;
b) DOM_NodeList itself is a doubly-linked list with head/tail pointers
struct NodeEntry { struct Node *data; NodeEntry *before, *after; };
struct DOM_NodeList { NodeEntry *first, *last; DOM_NodeList *list; }
Don't need the documentation at all but here it is anyway :
DOMC - A C Implementation of DOM docs