RFC 3284 (rfc3284) - Page 2 of 29
The VCDIFF Generic Differencing and Compression Data Format
Alternative Format: Original Text Document
RFC 3284 VCDIFF June 2002
Table of Contents
1. Executive Summary ........................................... 2
2. Conventions ................................................. 4
3. Delta Instructions .......................................... 5
4. Delta File Organization ..................................... 6
5. Delta Instruction Encoding .................................. 12
6. Decoding a Target Window .................................... 20
7. Application-Defined Code Tables ............................. 21
8. Performance ................................................. 22
9. Further Issues .............................................. 24
10. Summary ..................................................... 25
11. Acknowledgements ............................................ 25
12. Security Considerations ..................................... 25
13. Source Code Availability .................................... 25
14. Intellectual Property Rights ................................ 26
15. IANA Considerations ......................................... 26
16. References .................................................. 26
17. Authors' Addresses .......................................... 28
18. Full Copyright Statement .................................... 29
1. Executive Summary
Compression and differencing techniques can greatly improve storage
and transmission of files and file versions. Since files are often
transported across machines with distinct architectures and
performance characteristics, such data should be encoded in a form
that is portable and can be decoded with little or no knowledge of
the encoders. This document describes Vcdiff, a compact portable
encoding format designed for these purposes.
Data differencing is the process of computing a compact and
invertible encoding of a "target file" given a "source file". Data
compression is similar, but without the use of source data. The UNIX
utilities diff, compress, and gzip are well-known examples of data
differencing and compression tools. For data differencing, the
computed encoding is called a "delta file", and for data compression,
it is called a "compressed file". Delta and compressed files are
good for storage and transmission as they are often smaller than the
originals.
Data differencing and data compression are traditionally treated as
distinct types of data processing. However, as shown in the Vdelta
technique by Korn and Vo [1], compression can be thought of as a
special case of differencing in which the source data is empty. The
basic idea is to unify the string parsing scheme used in the Lempel-
Ziv'77 (LZ'77) style compressors [2] and the block-move technique of
Tichy [3]. Loosely speaking, this works as follows:
Korn, et. al. Standards Track