Back to main

Data Compression

Here are some of my models for data compression research compared against popular compressors gzip and 7-zip. Click on the names to download them as command-line tools. Note that none of these compressors are compatible with the others. All of these compressors are experimental. RH5 and its variants are designed for high speed and modest memory usage, and are meant to be practical solid multi-file archivers.

These benchmarks are with enwik8, a 100 MB text-file from the English-language Wikipedia. More information and a benchmark of some of my compressors (these and others) can be found on the Large Text Compression Benchmark.

Benchmarks (i7 2600)

Program Algorithm Compressed size (bytes) Compression time (seconds) Decompression time (seconds) Compression memory (MB) Decompression memory (MB)
Original 100,000,000
BTCM (max) BWT + CM 20,955,165 21.20 22.60 822 657
BTCM 8 BWT + CM 23,786,763 17.26 16.80 52 42
CM5 x64 CM 25,042,264 16.80 16.94 35 35
7-zip (normal) LZMA 25,899,684 72.00 1.40 186 18
RH5ba_x64 (max) LZMA 27,510,180 17.00 4.00 130 47
RH5_x64 (max) LZ + ctx 29,878,256 13.20 0.53 19 12
ctxn (32-bit) LZMA 30,211,251 9.00 5.00 67 67
RH4_x64 ROLZ 31,309,689 3.10 0.58 29 25
RH5_x64 LZ + ctx 31,798,141 2.10 0.61 19 12
RH5m_x64 LZ + ctx 33,638,243 3.40 0.69 2.9 1.8
gzip -9 LZ77 35,194,719 14.00 0.94 4 3
gzip LZ77 37,907,623 4.10 0.97 4 3