RFC 2654 (rfc2654) - Page 2 of 24


A Tagged Index Object for use in the Common Indexing Protocol



Alternative Format: Original Text Document



RFC 2654           Tagged Index Object for use in CIP        August 1999


   5. Examples  . . . . . . . . . . . . . . . . . . . . . . . . . . .13
   5.1 The original database  . . . . . . . . . . . . . . . . . . . .13
   5.1.1 "complete" consistency based full update . . . . . . . . . .14
   5.1.2 "tag" consistency based full update  . . . . . . . . . . . .14
   5.1.3 "unique" consistency based full update . . . . . . . . . . .15
   5.2 First update . . . . . . . . . . . . . . . . . . . . . . . . .16
   5.2.1 "complete" consistency based incremental update  . . . . . .16
   5.2.2 "tag" consistency based incremental update   . . . . . . . .17
   5.2.3 "unique" consistency based incremental update  . . . . . . .17
   5.3 Second update  . . . . . . . . . . . . . . . . . . . . . . . .18
   5.3.1 "complete" consistency based incremental update  . . . . . .18
   5.3.2 "tag" consistency based incremental update . . . . . . . . .19
   5.3.3 "unique" consistency based incremental update  . . . . . . .20
   6. Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . .21
   6.1 Aggregation of Tagged Index Objects  . . . . . . . . . . . . .21
   7. Security Considerations . . . . . . . . . . . . . . . . . . . .21
   8. References  . . . . . . . . . . . . . . . . . . . . . . . . . .22
   9. Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . .23
   Full Copyright Statement . . . . . . . . . . . . . . . . . . . . .24

1. Introduction

   The Common Indexing Protocol (CIP) as defined in [1] proposes a
   mechanism for distributing searches across several instances of a
   single type of search engine to create a global directory.  CIP
   provides a scalable, flexible scheme to tie individual databases into
   distributed data warehouses that can scale gracefully with the growth
   of the Internet.  CIP provides a mechanism for meeting these goals
   that is independent of the access method that is used to access the
   data that underlies the indices.  Separate from CIP is the definition
   of the Index Object that is used to contain the information that is
   exchanged among Index Servers.  One such Index Object that has
   already been defined is the Centroid that is derived from the Whois++
   protocol [2].

   The Centroid does not meet all the requirements for the exchange of
   index information amongst information servers.  For example, it does
   not support the notion of incremental updates natively.  For
   information servers that contain millions of records in their
   database, constant exchange of complete dredges of the database is
   bandwidth intensive.  The Tagged Index Object is specifically
   designed to support the exchange of index update information.  This
   design comes at the cost of an increase in the size of the index
   object being exchanged.  The Centroid is also not tailored to always
   be able to give boolean answers to queries.  In the Centroid Model,
   "an index server will take a query in standard Whois++ format, search
   its collections of centroids and other forward information, determine
   which servers hold records which may fill that query, and then



Hedberg, et al.               Experimental