Jeffrey P. Chillingsworth, Compression Butler by Jonathan Kreuzer

Jeffrey P. Chillingsworth, the Compression Butler.

I remember when I was in high school I wrote an algorithm to compress plain data images (1 byte per pixel) for one of the adventure games I had made. The images were pretty simple, so it compressed them to about half the size. I tried it out on other data and it performed quite poorly compared to PKzip. I wondered, how was it possible to compress data so small? Recently, I decided to look into data compression briefly again to satisfy my lingering curiosity, and this is the result:

Download the .exe file, or the source at the bottom of the page.

Updates

Version .85 ( Aug. 19, 2006 ) I finally did some more work on Jeffrey. I was a bit embarassed to have the old code on my website, so I spent a day deciphering it and rewriting it. Then the next day I tried a few optimizations, and on my computer it can be up to 50% faster compressing for some files.

Version .81 ( Sep. 10, 2003 ) Source and .exe are updated to fix a bug that could cause data mangling.

Version .80 ( Early 2002 ) Jeffrey P. first put online

Algorithms

When I started on Jeffrey P., I used Limpel-Ziv-Welch compression. The program ran about twice as fast as it does now, but the compression sizes were disappointing (always noticeably larger than Winzip sizes, which I believe uses a Limpel-Ziv algorithm and Huffman encoding.) Plus, I had read about the LZW algorithm being copyrighted, so I a decided to avoid it, even though no one would care if a little freeware program used it.

The compression algorithm currently used is the Burrow-Wheeler Transformation (also called Block-Sorting) with Arithmetic Compression (and Move to Front and Run-Length Encoding between the two.) The BWT doesn't actually compress the data, but it sorts it in a way where it can be easily compressed by other algorithms. On certain data sets, especially text files, this can provide much better compression than Winzip.

Compression Size Comparisons:

Jeffrey compresses larger files the best usually, especially larger text files. 
On some other file types or small files, the compression ratios are similar or even very slightly worse than Winzip.

"bible.txt"
Original                   : 4,047,392 bytes
Jeffrey P. Chillingsworth  :   910,099 
Winzip                     : 1,176,617 

"koran.txt"
Original                   : 1,133,653 bytes
Jeffrey P. Chillingsworth  :   319,923 
Winzip                     :   423,303

"civ2\*.wav"
Original                   : 8,887,222 bytes
Jeffrey P. Chillingsworth  : 4,048,995 
Winzip                     : 5,085,351

There is a downside to using these techniques to achieve better compression though. It takes much longer to compress a file than Winzip. I'd guess maybe 4 times longer. Compression is faster for small files that require less sorting, but certain types of data may result in extremely slow compression. Decompression is faster than compression, because there is no sorting required for undoing the BWT, but the arithmetic compression is a bit slow to decompress.

Jeffrey uses block sizes of up to 900K (files larger than this are split into multiple blocks, and compressed a block at a time.) Since sorting takes the longest time, a Bucket Sort is performed first to divide the data into smaller sets in linear time, then a quicksort is performed on each of the smaller sets.

Using Jeffrey P. Chillingsworth:

Sample Command Lines: Let's say the file book1.txt is in the current directory. (And so is jeff.exe, or jeff.exe is in a path such as c:\windows, that is always checked.)

jeff book1.txt book1
will compress the file book.txt to the archive file book1.jef. If the archive name already exists, you will be asked whether you want to add to it or overwrite it.
The percentage displayed is the percentage of the size of the compressed file compared to the original file. So the lower the percentage, the greater the compression.

jeff l book1
Will list the files in book1.jef

jeff u book1
will uncompress the archive book1.jef. Since it stores filenames, book1.txt will uncompress

You can use wildcards for the name of the compressed file. To compress all the txt files in the current directory to the archive text, type jeff *.txt text

If you just run jeff.exe without any command line options, Jeffrey will ask you what you want him to do.

Download:

Windows console EXE: JeffExe85.zip (46 K)

Source Code:
JeffSrc85.zip (11 K)
jeff.cpp - The code for the command line and text interfaces, and also writing to files.
bwt.cpp - The Burrow Wheelers Transformation. (Encoding and Decoding)
rle.cpp - Run Length Encoding and Decoding
mtf.cpp - Move to Front Encoding and Decoding
ari.cpp - Arithmetic Encoding and Decoding

Be warned though, that the source code isn't especially clear, and there are few explanatory comments.
Update: v.85 code is now partially clear =)

NOTE: I make no guarantees about this program. I'm not responsible for any data that won't uncompress properly, or misuse of the program. (There were a couple data mangling bugs when I was making the program, but they're fixed and it's worked fine for me. I've even used the code in other programming projects.)

This program was based on code from Mark Nelson's Data Compression page
Everything was written or completely rewritten by me, except for the arithmetic compression code, for which I give credit completely to the original author(s).

Back to Main Page

Page by: jkreuzer