Let’s Program A Compression Algorithm Part 5: In Which Compressed Files Are Decompressed And The Project Completed

A program that can only compress data is like a packing company that insists on hot gluing your boxes shut: Everything may look nice and neat and tidy but you’re going to be in a real pickle when you want to actually start using your stuff again.

Fortunately writing a decompression function will be pretty easy; after all it’s mostly just the function we already wrote but backwards.

What we want to accomplish boils down to:

1) Read our compressed file and turn it into a list of bits

2) See if the file starts with a 0 or a 1

3) If it starts with a 0 look up a byte value in our short-list table. Add this to output.

4) If it starts with a 1 discard the 1 and turn the next 8 bits into a byte. Add this to output.

5) Discard the bits we just used and then repeat our steps until we see our termination sequence

Step one needs almost no work at all. Our previously written file-to-compressed-bitlist can already read a file bit by bit and then output a bitlist. The only problem is that it creates that bitlist by translating raw bytes into compressing bit sequences. Since we are now reading compressed data we just want the raw bits and bytes which we can get by replacing the call to compress-byte with a call to byte-to-8-bit-list.

I suppose we’ll also have to chop off the final bit of code where we append our termination sequence to the file too. Instead we’ll just return the list we make.

This leads to our new file-to-bitlist function:

(defun file-to-bitlist (filename)
    (let ((bitlist '())
            (in (open filename :element-type '(unsigned-byte 8))))
        (when in
            (loop for testbyte = (read-byte in nil)
                while testbyte do (setf bitlist (append bitlist (byte-to-8-bit-list testbyte))))
        (close in))
        bitlist))

Now that we have our file in an easy to read and manipulate bit format we can assemble our function for decompressing it.

Our decompression function needs to scan through the bit list and identify our special compressed codes. There are three types of codes we need to look for: Our 4 -bit short codes (that always start with 0), our 9-bit expanded codes (that always start with 1) and our termination code (which starts with a 1 and then has eight 0s).

We can take care of that easily enough with two nested if statements. Have some pseudo-code

if (bitlist starts with 0)
{
   do 4-bit shortcode logic
}
else{ //If bit list does not start with 0 it must start with 1
   if(bitlist starts with termination code)
   {
      finish up and close file
   }
   else //If it starts with 1 and is not the termination code it’s an expanded ASCII letter
   {
      do 9-bit expanded code logic.
   }
}

Simple enough. First we check if the next bit is a zero, which means we have a 4-bit shortcode we need to decompress. If the next bit isn’t zero it has to be a 1, which means it’s either a long code or our termination signal. We test for the termination signal first (otherwise we could never stop) and if we don’t find it we know we have a 9-bit expanded code that needs to be made normal.

To do this in Lisp we have to remember that every if statement has a sort of built in else.

(if (true/false)
   (do when true)
   (do when false))

So we can translate our pseudo-code into pseudo-lisp to get this:

(if (= 0 (first bitlist))
    (do short code logic here)
    (if (equals (subsequence bitlist 0 9) ‘(1 0 0 0 0 0 0 0 0))
        (finish things up here)
        (do expanded code logic here)))

Now we just need to wrap that up in a loop and fill in the logic bits and we’re home free.

Let’s look at the logic bits first. Our short code logic requires a few steps:

1) Look up the normal byte value of our 4-bit short code in our handy *list-to-byte-compression-hash*

2) Write that byte to output

3) Remove the 4-bits we just read from the bitlist so we don’t accidentally process them again

(progn (write-byte 
           (gethash (subseq bitlist 0 4) 
           *list-to-byte-compression-hash*) 
            out)
       (setf bitlist (subseq bitlist 4)))

The logic for our expanded code is very similar

1) Ignore the leading 1 and turn the next 8-bits into a byte

2) Write that byte to output

3) Remove the 9-bits we just read from the bitlist so we don’t accidentally process them again

(progn (write-byte (8-bit-list-to-byte (subseq bitlist 1 9)) out)
       (setf bitlist (subseq bitlist 9)))

For the termination sequence logic all we have to do is break out of our loop so we can close the file. To do that first we’re going to need to design our loop. Easiest approach is probably to have a loop that just runs as long as some variable, let’s name it “decompressing” is true. We can then just set that variable to false (nil in Lisp) when we see our termination sequence.

(loop while decompressing do stuff)
(setf decompressing nil)

Toss it all together along with some basic file input and output and we get this handy decompression function:

(defun white-rabbit-decompress-file (input-filename output-filename)
    (let ((bitlist (file-to-bitlist input-filename))
            (decompressing 1)
            (out (open output-filename :direction :output :element-type '(unsigned-byte 8))))
            (when out
                (loop while decompressing do                                             
                        (if (= 0 (first bitlist))
                            (progn (write-byte (gethash (subseq bitlist 0 4) *list-to-byte-compression-hash*) out)
                                    (setf bitlist (subseq bitlist 4)))
                            (if (equal (subseq bitlist 0 9) '(1 0 0 0 0 0 0 0 0))
                                    (setf decompressing nil)                                    
                                    (progn (write-byte (8-bit-list-to-byte (subseq bitlist 1 9)) out)
                                    (setf bitlist (subseq bitlist 9)))))))
            (close out)))

Like most Lips code this is extremely information dense and seems to have way too many nested parenthesis but since we built the individual parts separately it shouldn’t be too hard to follow along with what’s happening here.

Now we can load up our code and run things like:

[87]> (white-rabbit-decompress-file “compresseddrinkme” “decompresseddrinkme.txt”)

T

[88]> (white-rabbit-decompress-file “compressed1stparagraph” “decompressedfirstparagraph.txt”)

T

You should get decompressed files that match the original file you compressed. Nifty!

Whelp… that’s that. An ASCII based compression, decompression program in only a little more than a hundred lines of lisp. Of course, for a professional tool you’d want to add another few hundred lines of error checking and handling and UI and whatnot but the core here works.

So where do we go from here?

Well, first I’d like to spend at least one post talking about how “real” compression algorithms work. After that we can spend a week or two geeking out about how to make to improve our Lisp and make this application faster. I mean, it would be nice if we could handle at least handle the entirety of Alice in Wonderland within a few minutes rather the small eternity it would take with our prototype code as it exists.