RFC 2803 (rfc2803) - Page 2 of 11


Digest Values for DOM (DOMHASH)



Alternative Format: Original Text Document



RFC 2803            Digest Values for DOM (DOMHASH)           April 2000


1. Introduction

   The purpose of this document is to give a clear and unambiguous
   definition of digest (hash) values of the XML objects [XML].  Two
   subtrees are considered identical if their hash values are the same,
   and different if their hash values are different.

   There are at least two usage scenarios of DOMHASH. One is as a basis
   for digital signatures for XML. Digital signature algorithms normally
   require hashing a signed content before signing.  DOMHASH provides a
   concrete definition of the hash value calculation.

   The other is to use DOMHASH when synchronizing two DOM structures
   [DOM]. Suppose that a server program generates a DOM structure which
   is to be rendered by clients. If the server makes frequent small
   changes on a large DOM tree, it is desirable that only the modified
   parts are sent over to the client. A client can initiate a request by
   sending the root hash value of the structure in the cache memory. If
   it matches with the root hash value of the current server structure,
   nothing needs be sent. If not, then the server compares the client
   hash with the older versions in the server's cache. If it finds one
   that matches the client's version of the structure, then it locates
   differences with the current version by recursively comparing the
   hash values of each node. This way, the client can receive only an
   updated portion of a large structure without requesting the whole
   thing.

   One way of defining digest values is to take a surface string as the
   input for a digest algorithm. However, this approach has several
   drawbacks. The same internal DOM structure may be represented in may
   different ways as surface strings even if they strictly conform to
   the XML specification.  Treatment of white spaces, selection of
   character encodings, entity references (i.e., use of ampersands), and
   so on have impact on the generation of a surface string. If the
   implementations of surface string generation are different, the hash
   values would be different, resulting in unvalidatable digital
   signatures and unsuccessful detection of identical DOM structures.
   Therefore, it is desirable that digest of DOM is defined in the DOM
   terms -- that is, as an unambiguous algorithm operating on a DOM
   tree.  This is the approach we take in this specification.

   Introduction of namespace is another source of variation of surface
   string because different namespace prefixes can be used for
   representing the same namespace URI [URI]. In the following example,
   the namespace prefix "edi" is bound to the URI
   "http://ecommerce.org/schema" but this prefix can be arbitrary chosen
   without changing the logical contents as shown in the second example.




Maruyama, et al.             Informational