RFC 2803 (rfc2803) - Page 2 of 11
Digest Values for DOM (DOMHASH)
Alternative Format: Original Text Document
RFC 2803 Digest Values for DOM (DOMHASH) April 2000
1. Introduction
The purpose of this document is to give a clear and unambiguous
definition of digest (hash) values of the XML objects [XML]. Two
subtrees are considered identical if their hash values are the same,
and different if their hash values are different.
There are at least two usage scenarios of DOMHASH. One is as a basis
for digital signatures for XML. Digital signature algorithms normally
require hashing a signed content before signing. DOMHASH provides a
concrete definition of the hash value calculation.
The other is to use DOMHASH when synchronizing two DOM structures
[DOM]. Suppose that a server program generates a DOM structure which
is to be rendered by clients. If the server makes frequent small
changes on a large DOM tree, it is desirable that only the modified
parts are sent over to the client. A client can initiate a request by
sending the root hash value of the structure in the cache memory. If
it matches with the root hash value of the current server structure,
nothing needs be sent. If not, then the server compares the client
hash with the older versions in the server's cache. If it finds one
that matches the client's version of the structure, then it locates
differences with the current version by recursively comparing the
hash values of each node. This way, the client can receive only an
updated portion of a large structure without requesting the whole
thing.
One way of defining digest values is to take a surface string as the
input for a digest algorithm. However, this approach has several
drawbacks. The same internal DOM structure may be represented in may
different ways as surface strings even if they strictly conform to
the XML specification. Treatment of white spaces, selection of
character encodings, entity references (i.e., use of ampersands), and
so on have impact on the generation of a surface string. If the
implementations of surface string generation are different, the hash
values would be different, resulting in unvalidatable digital
signatures and unsuccessful detection of identical DOM structures.
Therefore, it is desirable that digest of DOM is defined in the DOM
terms -- that is, as an unambiguous algorithm operating on a DOM
tree. This is the approach we take in this specification.
Introduction of namespace is another source of variation of surface
string because different namespace prefixes can be used for
representing the same namespace URI [URI]. In the following example,
the namespace prefix "edi" is bound to the URI
"http://ecommerce.org/schema" but this prefix can be arbitrary chosen
without changing the logical contents as shown in the second example.
Maruyama, et al. Informational