Let’s Program A Compression Algorithm Part 7: Considerations On Speeding Up Slow Code

Welcome fellow Lisp enthusiasts*! Now that our proof of concept code works it’s time to talk about the elephant in the room: Our code is really slow and considering that modern Lisp is supposed to be FAST that means we’re probably doing something dumb.

But exactly how dumb? Well let’s take a scientific approach and run a test:

[7]> (time (white-rabbit-compress-file “chapter1.txt” “timedoutput”))

Real time: 251.23251 sec.

Run time: 250.732 sec.

Space: 10816791720 Bytes

GC: 9324, GC time: 159.104 sec.

T

Looks like compressing a small text file on my (admittedly old) Linux laptop took over four minutes and, more surprisingly, consumed something like 10 gigabytes of memory. That’s a freakishly huge amount of memory for working with a file that’s only 11.4kB long. And using up so much memory was a real strain on the garbage collector, which spent over two minutes just cleaning up all the data our program threw away.

How Are We Going To Fix This?

Code optimization is the reason why it’s important to not only understand WHAT your programming language can do but also HOW it does it.

Sure, most programmers can easily spot the warning signs when they personally write inefficient code: Too many nested loops, recursive functions on complex data and so on.

But what about when you load up someone else’s library and call the doSomethingCool function they wrote? Is it fast? Slow? Does it loop? Without some research you have no way of telling.

Moral of the story: Do your research!

Doing Some Research!

For example, let’s take a look at Lisp’s handy append function. Its job is simple: take two lists, glue them together and then return the combined list. Our prototype uses this function everywhere. To build bit lists. To build compressed output files. To build decompressed output files. It is no exaggeration to say most of what our program does is appending lists to other lists.

So…. How does append append lists anyways?

For that matter, what is a Lisp list?

A list in Lisp is a pretty simple thing. Each individual item in the list contains two pieces of information: The data stored there and directions on where to find the next item in the list. The final item in the list has a blank in the “next” spot. That’s how you know it’s the end of the list.

Now the easiest and fastest way to connect two lists together is to take the empty “next” from the end of the first list and point it at the start of the second list.

But append comes with a bonus guarantee that complicates things: It is guaranteed to NOT change either of its inputs. That means changing the last item in the first list is a no go.

Instead append creates a complete copy of the first list and then links that new list to the second list. (This doesn’t change the second list because Lisp lists only move forward and don’t care if anyone links to them, just who they link to).

Did we find the problem?

So append makes a copy of its first argument. That sounds like it could be a bit slow. Maybe we should double check how we us it. Let’s start with a look at our file-to-compressed-bitlist function. You might notice this little gem:

(append bit-list (compress-byte testbyte))

That’s the line of our code where we say “Take our current bit list and add the next set of compressed bits to the end”. This happens inside of a loop that runs once for every byte in the input file. So our eleven kilobyte test file is going to trigger this line some 11,000** times. And every single time is going to involve making a complete copy of our compressed bit list so far.

That will add up fast. How fast? Hmmm….

Let’s assume that between our 4 bit short codes and our 9 bit long codes the average compressed-byte comes out at 6 bits (that matches the 25% compression rate we were aiming for). Working with that average our loop probably looks sort of like this.

Step one: Append first 6 bits to empty list. No copying.

Step two: Copy existing 6 bits and link to next 6 bits. Total of 6 bits copied.

Step three: Copy existing 12 bits and link to next 6 bits. Total of 18 bits copied.

Step three: Copy existing 18 bits and link to next 6 bits. Total of 36 bits copied.

Step four: Copy existing 24 bits and link to next 6 bits. Total of 60 bits copied.

Step five: Copy existing 30 bits and link to next 6 bits. Total of 90 bits copied.

Step six: Copy existing 36 bits and link to next 6 bits. Total of 126 bits copied.

So we’re only six bytes into our 11,000+ byte file and we’ve already had to make copies of 126 list items. And while I’ve been calling them bits remember that they’re we’re actually using full 32 bit (4 byte) integers to keep track of our 0s and 1s. So that means we’ve had to copy 126 * 4 = 504 bytes just to compress six letters of input.

And it only gets worse. By the time we make it to the end of our elven kilobyte file we will have made copies equal to several thousand times the size of our original input! The bigger the input gets the worse that multiplier becomes and suddenly it’s not so mysterious why our code takes multiple minutes to run and consumes gigabytes of memory.

Functional Programming: What that be?

Before we start talking about how to “fix” append I want to take a minute and talk about why it’s not actually “broken”. Sure, using append the way we do is horribly inefficient but that’s not because append was poorly designed. In fact, append was very carefully designed to be very safe. Because it copies its inputs instead of changing them you can use it anywhere you want without having to worry about accidentally mutilating some bit of data you might need to use again later.

This is the core of what’s known as “functional programming”.

Basically programmers noticed that a lot of their worst and hardest to solve bugs were caused when different bits of code accidentally changed each other’s variables. Example: If three functions all depend on the same global variable and something else changes that variable suddenly all three functions might break down. Or if you pass the same variable as input to multiple functions and it gets changed halfway through your later functions might not work like you expected.

The traditional solution to this problem is to just work really really hard to remember which functions share which global variables and how every function changes their input. The problem here is that the bigger your program gets the harder it is to keep track of all that stuff.

This lead some people to come up with a clever idea: What if we avoid writing code with shared variables and just don’t let functions change their inputs? That should get rid of all those weird bugs and accidental data corruption issues.

When you write your code according to these rules it’s “functional”, named after math functions. After all, the quadratic equation won’t ever break down just because you used the Pythagorean Theorem wrong and and four is always four no matter how many equations you pass it through.

The obvious cost here is that functional code tends to “waste” a lot of time and memory copying inputs so they can safely work with the copies and leave the original data alone. So it’s up to you as the programmer to decide what each part of your program needs most: Functional code that’s easy to maintain and experiment with or efficient code that’s harder to work with but runs much faster.

In our case since we’re dealing with the interior of a massive loop it’s probably time to say goodbye to our safe functional prototype code and focus on speed.

Appending Faster Than Append

So we need a list building loop that doesn’t waste time or memory space. Turns out one of the more popular ways to do this in Lisp is to actually build your list backwards and then reverse it. This takes advantage of the fact that adding something to the front of Lisp list is both fast and simple.

Interesting trivia: We actually already did this in our current byte-to-8-bit-list function.

Now let’s rewrite our file-to-compressed-bitlist to use the same technique.

Basically in our new version when we compress a byte we won’t append the compressed bit pattern directly to our output. Instead we will load the compressed bits into a temporary variable and then push them one by one onto the front of our bit list.

After all the bytes are read we’ll then load our termination sequence into a variable and push that one bit at time too. This will give us a complete mirror image of the bit list we actually want, so we finish off with a call to nreverse which reverses the list. The “n” indicates this is a “unsafe” function that works very fast but will destroy the original data. Since we won’t ever need that backwards list that’s fine.

(defun file-to-compressed-bitlist (filename)
    (let ((bit-list '())
            (in (open filename :element-type '(unsigned-byte 8))))
        (when in
            (loop for testbyte = (read-byte in nil)
                while testbyte do (let ((compressed-bits (compress-byte testbyte)))
                                                    (loop for i in compressed-bits do (push i bit-list))))
        (close in))
        (let ((termination-symbol '(1 0 0 0 0 0 0 0 0)))
            (loop for i in termination-symbol do (push i bit-list)))
        (nreverse bit-lists)))

Interestingly enough this function is actually fairly functional since it avoids global variables. The only data it changes or destroys is private data so as far as the rest of the world is concerned nothing important has been changed.

Now that we’ve tuned up on of our major functions let’s give things another whirl and see what happens:

[41]> (time (white-rabbit-compress-file “chapter1.txt” “bettertimedoutput”))

Real time: 120.46666 sec.

Run time: 120.376 sec.

Space: 4636093800 Bytes

GC: 2925, GC time: 47.0 sec.

T

Not bad. Twice as fast, used up only half as much memory and not nearly as hard on the garbage collector. But even after that sort of major improvement it’s still pretty slow and memory hungry so our works not done quite yet. We’ll see what remaining inefficiencies we can trim out next time.

 

 

* Reminder: I’m not particularly good at Lisp. I just like the language.

** Yes, yes, I know. A kilobyte is actually 1024 bytes not an even 1000 but multiplying by powers of ten in your head is so much easier than powers of two and is close enough for general discussion.