RFC 468 (rfc468) - Page 2 of 7


FTP data compression



Alternative Format: Original Text Document



RFC 468                   FTP Data Compression                March 1973


   The two main arguments for data compression are economics and
   convenience (usability).  Consider first economics, which is
   essentially a trade-off between CPU time and transmission costs.  Of
   course, as long as Network use is a free commodity, the economics of
   data compression are all bad.  That happy state won't last forever.
   What does data compression cost?

   Let us consider only simple linear compression schemes, such as the
   one proposed here.  By linear, I mean that the CPU time to examine a
   source record is proportional to number of bytes in the record.  A
   simple linear scheme could detect repeated single characters, for
   example.  One could imagine quadratic schemes, which detected
   repeated substrings; but except for possible special circumstance
   where the source stings have some structure known to the compression
   algorithm, the CPU economics don't favor quadratic compression.

   Assuming a reasonable figure for large-scale CPU costs in the
   generation of CCN's 360/91, we concluded that an upper bound on CPU
   costs for total compression and decompression would be 5 cents per
   megabit; this is based on very loose coding of a simple linear
   algorithm.  This may be compared with the projected Network
   transmission costs of over 30 cents per megabit (possibly a lot
   over).

   Thus, the CPU time to conserve bandwidth costs significantly less
   than the bandwidth saved.  Both CPU costs and bandwidth costs are
   trending downward, but it seems exceedingly unlikely that the ratio
   of CPU cost to bandwidth cost for linear compression will reverse in
   the next few years.  On the other hand, this calculation clearly
   discourages one from using quadratic compression.

WHY HASP

   CCN's batch remote job entry protocol NETRJS (see RFC #189, July 15,
   1971) was designed to include two data transfer modes, truncated and
   compressed.  The NETRJS truncated mode is essentially identical to
   current FTP block mode record structure (except for minor bit format
   differences).  The compressed mode of NETRJS uses an adaptation of
   the particular compression scheme which is incorporated in the
   "Multileaving protocol" of the binary synchronous rje support in
   IBM's HASP system.

   Although it isn't really necessary for the purpose of defining a
   compression scheme in FTP, I have included an appendix summarizing
   very briefly the nature of HASP and its rje package.  That appendix
   may be considered cultural enrichment for those in the Network
   Community who have been denied the privilege of being an IBM
   customer.  After all, I know a lot of HASP experts who never heard of



Braden