RFC 468 (rfc468) - Page 2 of 7
FTP data compression
Alternative Format: Original Text Document
RFC 468 FTP Data Compression March 1973
The two main arguments for data compression are economics and
convenience (usability). Consider first economics, which is
essentially a trade-off between CPU time and transmission costs. Of
course, as long as Network use is a free commodity, the economics of
data compression are all bad. That happy state won't last forever.
What does data compression cost?
Let us consider only simple linear compression schemes, such as the
one proposed here. By linear, I mean that the CPU time to examine a
source record is proportional to number of bytes in the record. A
simple linear scheme could detect repeated single characters, for
example. One could imagine quadratic schemes, which detected
repeated substrings; but except for possible special circumstance
where the source stings have some structure known to the compression
algorithm, the CPU economics don't favor quadratic compression.
Assuming a reasonable figure for large-scale CPU costs in the
generation of CCN's 360/91, we concluded that an upper bound on CPU
costs for total compression and decompression would be 5 cents per
megabit; this is based on very loose coding of a simple linear
algorithm. This may be compared with the projected Network
transmission costs of over 30 cents per megabit (possibly a lot
over).
Thus, the CPU time to conserve bandwidth costs significantly less
than the bandwidth saved. Both CPU costs and bandwidth costs are
trending downward, but it seems exceedingly unlikely that the ratio
of CPU cost to bandwidth cost for linear compression will reverse in
the next few years. On the other hand, this calculation clearly
discourages one from using quadratic compression.
WHY HASP
CCN's batch remote job entry protocol NETRJS (see RFC #189, July 15,
1971) was designed to include two data transfer modes, truncated and
compressed. The NETRJS truncated mode is essentially identical to
current FTP block mode record structure (except for minor bit format
differences). The compressed mode of NETRJS uses an adaptation of
the particular compression scheme which is incorporated in the
"Multileaving protocol" of the binary synchronous rje support in
IBM's HASP system.
Although it isn't really necessary for the purpose of defining a
compression scheme in FTP, I have included an appendix summarizing
very briefly the nature of HASP and its rje package. That appendix
may be considered cultural enrichment for those in the Network
Community who have been denied the privilege of being an IBM
customer. After all, I know a lot of HASP experts who never heard of
Braden