Let’s Program A Compression Algorithm Part 6: In Which We Consider How Professional Compression Algorithms Function

Now that we’ve got a working little ASCII compressor it’s time to talk about how “real” compression algorithms work.

Now honestly I should probably just drop a couple wiki links to Huffman_Coding and DEFLATE but presumably you’re here on my blog because you specifically want to hear me talk about computer science so here we go!

Let’s start by refreshing our memories on how we approached the problem of compression.

We started by discovering that normal ASCII text assigns a full 8-bits to every possible letter. We then did some research and found that some ASCII letters are much more common than others. That let us invent a new standard that squished common letters into 4-bits while expanding less common letters into 9-bits. This overall resulted in files that were 20-30% smaller than their plan ASCII equivalents.

So how would a professional approach the same problem?

Believe it or not they would do more or less the same thing. Professional tools achieve their impressive 80%+ compression standards using the same basic approach we did: Find unusually common data patterns and then replace them with much smaller compressed place holders. The professional tools are just much much smarter about how they find and compress these patterns than we were.

How much smarter? Let’s find out by comparing and contrasting what we did against what better compression tools do.


Our OK Idea: We designed our compression patterns based around our knowledge of average letter frequencies in the English language. While this worked OK there was the problem that not every text file follows the average English pattern. A report about zigzagging zebras is going to have a ton of z’s and g’s while a C++ code file is going to have lots of brackets and semi-colons. Our average English compression strategy won’t work very well on files like that.

Their Better Idea: Professional tools don’t use a single universal encoding scheme but instead create a unique encoding for every file based on their individual symbol frequency. So if a particular text file has a ton of “z”s or parenthesis or brackets it will get its own custom code that compresses those letters. This lack of a universal standard does mean that each compressed file has to start with a dictionary explaining the specific compression pattern used for that file but the space saved by using an optimized compression pattern for each file more than makes up for the space taken up by the dictionary.


Our OK Idea: We not only used the same compression codes for every single file, we used the same compression codes throughout every file. This was easy to program but could be sub-optimal on longer files. If a book starts out talking about zigzagging zebras but later focuses on electric eels then one half of the book might compress better than the other.

Their Better Idea: Many compressional algorithms split files into multiple segments and then compress them individually before gluing them together into an output file. This way each segment can have its own optimized compression pattern. So if one half of a file had a lot of “z”s but no “l”s it could get a compression pattern focused on “z”s. If the second half of the file switched things around and had lots of “l”s but almost no “z”s the algorithm could then switch to a new compression pattern that shrunk down the “l”s and did nothing with “z”s. Of course this means each compressed file has to have multiple dictionaries explaining the specific compression pattern and length of each file segment but once again the space you save in better compression outweighs the space you lose from the extra dictionaries.


Our OK Idea: Our compression code focused only on ASCII text files. This made it easy to design and test our project but also severely limits it’s utility.

Their Better Idea: Admittedly some professional tools also only focus on one file type. For example, PNG only compresses image files. But many other professional compression tools work by looking for repeated bit patterns of any sort and length. This lets them compress any file made of bits which is, well, all of them. Sure, not every file compresses particularly well but at least there are no files you plain can’t run the algorithm on.


Our OK Idea: Our compression code worked by replacing common 8-bit letter patterns with smaller 4-bit patterns. This means that even in a best case scenario (a file containing only our eight compressible common letters) we could only shrink a file down to 50%.

Their Better Idea: Professional tools don’t restrict themselves to only looking for common 8-bit patterns but can instead find repeated patterns of any length. This can lead to massive compression by finding things like commonly used 256-bit patterns that can be replaced with two or three bit placeholders for 99% compression. Of course not every file is going to have conveniently repeated large bit sequences but being able to recognize these opportunities when they do happen opens up a lot of possibilities.


So there you have it. The main difference between professional compression algorithms and our toy is that professional programs spend a lot more time upfront analyzing their target files in order to figure out an ideal compression strategy for that specific file. This obviously is going to lead to better results than our naive “one size fits all” approach.

But figuring out the ideal compression strategy for a specific file or type of file is often easier said than done! To go much deeper into that we’d have to start seriously digging into information theory and various bits of complex math and that’s way beyond the scope of this blog. But if this little project piqued your interest I’d definitely encourage you to take some time and study up on it a bit yourself.



Hopefully this article has given you a few ideas on how our little toy compressor could be improved. And while you probably don’t want to try anything as drastic as implementing a full Huffman style algorithm I think there are a lot of relatively simple improvements you could experiment with.

For example, what if you got rid the hard coded set of “English’s eight most common symbols” and instead had your algorithm begin compression by doing a simple ASCII letter count to figure out the eight most common symbols for your specific target file? You could then have your algorithm assign compression short codes to those eight symbols which should probably lead to results at least a few percentage points smaller than our one size fits all prototypes

Of course, in order to decompress these files you’ll have to start your file with a list of which short codes map to which letters. You could do this with some sort of pair syntax (ex: {0000:z,0001:l,0010:p}) or maybe just have the first eight bytes in a compressed file always be the eight symbols from your compression map.

And with that we’re pretty much done talking about compression, but speaking of improvements reminds me that my Lisp code is still slow as cold tar. So if you have any interest in Lisp at all I’d like to encourage you to tune in again next week as we analyze and speed up our code and maybe even talk a little about that “functional programming”.

Let’s Program A Compression Algorithm Part 5: In Which Compressed Files Are Decompressed And The Project Completed

A program that can only compress data is like a packing company that insists on hot gluing your boxes shut: Everything may look nice and neat and tidy but you’re going to be in a real pickle when you want to actually start using your stuff again.

Fortunately writing a decompression function will be pretty easy; after all it’s mostly just the function we already wrote but backwards.

What we want to accomplish boils down to:

1) Read our compressed file and turn it into a list of bits

2) See if the file starts with a 0 or a 1

3) If it starts with a 0 look up a byte value in our short-list table. Add this to output.

4) If it starts with a 1 discard the 1 and turn the next 8 bits into a byte. Add this to output.

5) Discard the bits we just used and then repeat our steps until we see our termination sequence

Step one needs almost no work at all. Our previously written file-to-compressed-bitlist can already read a file bit by bit and then output a bitlist. The only problem is that it creates that bitlist by translating raw bytes into compressing bit sequences. Since we are now reading compressed data we just want the raw bits and bytes which we can get by replacing the call to compress-byte with a call to byte-to-8-bit-list.

I suppose we’ll also have to chop off the final bit of code where we append our termination sequence to the file too. Instead we’ll just return the list we make.

This leads to our new file-to-bitlist function:

(defun file-to-bitlist (filename)
    (let ((bitlist '())
            (in (open filename :element-type '(unsigned-byte 8))))
        (when in
            (loop for testbyte = (read-byte in nil)
                while testbyte do (setf bitlist (append bitlist (byte-to-8-bit-list testbyte))))
        (close in))

Now that we have our file in an easy to read and manipulate bit format we can assemble our function for decompressing it.

Our decompression function needs to scan through the bit list and identify our special compressed codes. There are three types of codes we need to look for: Our 4 -bit short codes (that always start with 0), our 9-bit expanded codes (that always start with 1) and our termination code (which starts with a 1 and then has eight 0s).

We can take care of that easily enough with two nested if statements. Have some pseudo-code

if (bitlist starts with 0)
   do 4-bit shortcode logic
else{ //If bit list does not start with 0 it must start with 1
   if(bitlist starts with termination code)
      finish up and close file
   else //If it starts with 1 and is not the termination code it’s an expanded ASCII letter
      do 9-bit expanded code logic.

Simple enough. First we check if the next bit is a zero, which means we have a 4-bit shortcode we need to decompress. If the next bit isn’t zero it has to be a 1, which means it’s either a long code or our termination signal. We test for the termination signal first (otherwise we could never stop) and if we don’t find it we know we have a 9-bit expanded code that needs to be made normal.

To do this in Lisp we have to remember that every if statement has a sort of built in else.

(if (true/false)
   (do when true)
   (do when false))

So we can translate our pseudo-code into pseudo-lisp to get this:

(if (= 0 (first bitlist))
    (do short code logic here)
    (if (equals (subsequence bitlist 0 9) ‘(1 0 0 0 0 0 0 0 0))
        (finish things up here)
        (do expanded code logic here)))

Now we just need to wrap that up in a loop and fill in the logic bits and we’re home free.

Let’s look at the logic bits first. Our short code logic requires a few steps:

1) Look up the normal byte value of our 4-bit short code in our handy *list-to-byte-compression-hash*

2) Write that byte to output

3) Remove the 4-bits we just read from the bitlist so we don’t accidentally process them again

(progn (write-byte 
           (gethash (subseq bitlist 0 4) 
       (setf bitlist (subseq bitlist 4)))

The logic for our expanded code is very similar

1) Ignore the leading 1 and turn the next 8-bits into a byte

2) Write that byte to output

3) Remove the 9-bits we just read from the bitlist so we don’t accidentally process them again

(progn (write-byte (8-bit-list-to-byte (subseq bitlist 1 9)) out)
       (setf bitlist (subseq bitlist 9)))

For the termination sequence logic all we have to do is break out of our loop so we can close the file. To do that first we’re going to need to design our loop. Easiest approach is probably to have a loop that just runs as long as some variable, let’s name it “decompressing” is true. We can then just set that variable to false (nil in Lisp) when we see our termination sequence.

(loop while decompressing do stuff)
(setf decompressing nil)

Toss it all together along with some basic file input and output and we get this handy decompression function:

(defun white-rabbit-decompress-file (input-filename output-filename)
    (let ((bitlist (file-to-bitlist input-filename))
            (decompressing 1)
            (out (open output-filename :direction :output :element-type '(unsigned-byte 8))))
            (when out
                (loop while decompressing do                                             
                        (if (= 0 (first bitlist))
                            (progn (write-byte (gethash (subseq bitlist 0 4) *list-to-byte-compression-hash*) out)
                                    (setf bitlist (subseq bitlist 4)))
                            (if (equal (subseq bitlist 0 9) '(1 0 0 0 0 0 0 0 0))
                                    (setf decompressing nil)                                    
                                    (progn (write-byte (8-bit-list-to-byte (subseq bitlist 1 9)) out)
                                    (setf bitlist (subseq bitlist 9)))))))
            (close out)))

Like most Lips code this is extremely information dense and seems to have way too many nested parenthesis but since we built the individual parts separately it shouldn’t be too hard to follow along with what’s happening here.

Now we can load up our code and run things like:

[87]> (white-rabbit-decompress-file “compresseddrinkme” “decompresseddrinkme.txt”)


[88]> (white-rabbit-decompress-file “compressed1stparagraph” “decompressedfirstparagraph.txt”)


You should get decompressed files that match the original file you compressed. Nifty!

Whelp… that’s that. An ASCII based compression, decompression program in only a little more than a hundred lines of lisp. Of course, for a professional tool you’d want to add another few hundred lines of error checking and handling and UI and whatnot but the core here works.

So where do we go from here?

Well, first I’d like to spend at least one post talking about how “real” compression algorithms work. After that we can spend a week or two geeking out about how to make to improve our Lisp and make this application faster. I mean, it would be nice if we could handle at least handle the entirety of Alice in Wonderland within a few minutes rather the small eternity it would take with our prototype code as it exists.

Let’s Program A Compression Algorithm Part 4: In Which Byte Compression Code Is Applied To An Entire File

Now that we have a working compression function all that’s left is to grab a file, read it byte by byte, compress the bytes and then output them into a new file. Easy peasy!

Let’s start with reading a file byte by byte. This is one particular topic that depends heavily on your language, so if you’re not using Lisp you’re on your own for this step. What you’re looking for is a way to read a file one byte at a time, which might be tricky since by default most languages try to read text files one line at a time. In general you need to use some specific function or argument to force things into byte mode.

For example, compare these two Lisp functions, both run on a file called “drinkme.txt” which contains nothing but the words “drink me”:

(defun normal-read-test (filename)
    (let ((in (open filename)))
        (when in
            (loop for line = (read-line in nil)
                while line do (print line))
        (close in))))

(defun byte-read-test (filename)
    (let ((in (open filename :element-type '(unsigned-byte 8))))
        (when in
            (loop for testbyte = (read-byte in nil)
                while testbyte do (print testbyte))
        (close in))))


[2]> (normal-read-test “drinkme.txt”)

“drink me”


[3]> (byte-read-test “drinkme.txt”)











Notice that in order to get actual bytes we not only had to use “read-byte” instead of “read-line”, we also had to open the file with the :element type ‘(unsinged-byte) modifier. Your language probably has similar options.

Hopefully, one way or another, you now know how to grab all the bytes in the file you want to compress. With that in place we can use the compress-byte function we wrote last time to change a file into a series of compressed bit lists.

(defun file-to-compressed-bit-lists (filename)
    (let ((bit-lists '())
            (in (open filename :element-type '(unsigned-byte 8))))
        (when in
            (loop for testbyte = (read-byte in nil)
                while testbyte do (setf bit-lists (append bit-lists (list (compress-byte testbyte)))))
        (close in))

[35]> (file-to-compressed-bit-lists “drinkme.txt”)

((1 0 1 1 0 0 1 0 0) (1 0 1 1 1 0 0 1 0) (0 1 0 1) (0 1 1 0)

(1 0 1 1 0 1 0 1 1) (0 0 0 0) (1 0 1 1 0 1 1 0 1) (0 0 0 1)

(1 0 0 0 0 1 0 1 0) )

Well whaddaya know, that’s the exact same compression we calculated by hand in part 2. Good sign our code probably did things right!

Lisp trivia: append adds the items of one list to another list. So to add a single list to a list of lists you actually have to wrap that list inside another list, making it a one item list whose item happens to be a list. Otherwise we’d get something like ((0 0 1 1) (1 0 1 0) 1 0 0 1) instead of ((0 0 1 1) (1 0 1 0) (1 0 0 1)).

Trivia aside, now all that’s left is to write our new compressed data back to file.

Unfortunately our compressed data is the wrong “shape” for normal file output code.

Modern computer software is more or less designed around the idea that the smallest unit of data any sane person would ever want to work with is an entire 8-bit byte. This is almost always the correct assumption for something like 99% of computer programs and is the reason programming language come shipped with libraries for reading and writing bytes.

But sometimes you need to work outside of byte land. We certainly do. Our short letters are 4-bits long and our expanded letters are 9-bits long. Neither of those will work with standard byte-based file IO.

Still makes more sense than the tax code

So our data is the wrong shape. Whatever shall we do?

The obvious solution is to make it the right shape and the easiest way to do this is to take our compressed sequence of 4-bit and 9-bit letters, glue them together and then split them apart at new 8-bit boundaries. As long as the end result has the same string of 0s and 1s it doesn’t really matter how they’re grouped.

So a compressed letter list like this

0001 0011 111110100 0000 101101101 0001

Becomes a bit list like this


Which we can then turn into

00010011 11111010 00000101 10110100 01

Now we have four easy to read and write bytes plus two dangling bits. Let’s fix that by adding in some extra 0s for padding.

00010011 11111010 00000101 10110100 01000000

Looking good. Well, mostly. Those padded zeros might be a problem when we try to decompress the file. The program is going to look at those and think there’s an extra 4-bit letter there at the end.

A quick fix for this would be to have every compressed file end with a specific bit sequence. Then when the decompresser sees that sequence it knows it can stop because everything after that point is just padding.

But what symbol should we use? It can’t be one of our 4-bit codes because we need those for making the compression work. How about a really uncommon ASCII character instead? For instance, 00000000 is “null” and shouldn’t ever appear in a normal text file. That means we can use the 9-bit 100000000 code to mean “end of the compressed text” without worrying about accidentally conflicting with a real 100000000 from the text.

So our new terminated compressed message looks like:

0001 0011 111110100 0000 101101101 0001 100000000

And when cut into bytes and padded out it looks like this (termination sequence underlined and padding comes after the |)

00010011 11111010 00000101 10110100 01100000 000|00000

Some of you probably noticed that between the termination sequence and the padding our supposedly compressed message is now just as long as it would have been had we just saved it as six normal byte characters. This is because our example was so short. Adding an extra byte to the end of a five byte message is obviously a huge change, but when we start working with actual multi-kilobyte text files the impact of padding and termination will be negligible.

Negligible is kind of a funny word when your write it down. I think it’s the two “g”s. gligib. egligibl. Weird, right?

Back to the subject at hand, let’s write some code for doing an actual compression.

Now my first idea was to take the bit lists from our function, strip them out of their lists and then dump them all into a new list. But as I mentioned in the trivia section I actually had to go to quite a bit of work to get the function to create a list of lists instead of one big list of bits. So I can actually just remove a few list calls from our existing function and create a new one that naturally returns one big list of bits. A final append to add our (1 0 0 0 0 0 0 0 0) termination symbol and it’s done!

(defun file-to-compressed-bitlist (filename)
    (let ((bit-lists '())
            (in (open filename :element-type '(unsigned-byte 8))))
        (when in
            (loop for testbyte = (read-byte in nil)
                while testbyte do (setf bit-lists (append bit-lists (compress-byte testbyte))))
        (close in))
        (append bit-lists '(1 0 0 0 0 0 0 0 0))))

[39]> (file-to-bitlist “drinkme.txt”)

(1 0 1 1 0 0 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 1 0 1 1 0 1 0 1 1 0 1 0 1 1 0 0 0 0

1 0 1 1 0 1 1 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0)

Now to pad it out and make sure it’s evenly divisible into 8-bit bytes.

We can figure out how many dangling bits there are like this:

[49]> (mod (length (file-to-bitlist “drinkme.txt”)) 8)


If there are 6 extra bits then we need 8 – 6 = 2 bits of padding to fill it out to an even byte length. We can generate this padding like this:

[58]> (loop repeat 2 collect 0)

(0 0)

So if we mix that all into one big function:

(defun compress-terminate-and-pad-file (filename)
    (let* ((bitlist (file-to-compressed-bitlist filename))
             (padding (loop repeat (- 8 (mod (length bitlist) 8)) collect 0)))
                (append bitlist padding)))

[70]> (compress-terminate-and-pad-file “drinkme.txt”)

(1 0 1 1 0 0 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 1 0 1 1 0 1 0 1 1 0 1 0 1 1 0 0 0 0

1 0 1 1 0 1 1 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0)

[71]> (length (compress-terminate-and-pad-file “drinkme.txt”))


For the finishing touch we pop open a byte output file, read through our fully prepared list 8-bits at a time and VOILA we have file compression.

(defun white-rabbit-compress-file (input-filename output-filename)
    (let ((bitlist (compress-terminate-and-pad-file input-filename))
            (out (open output-filename :direction :output :element-type '(unsigned-byte 8))))
            (when out
                (loop while (> (length bitlist) 0) do                     
                    (write-byte (8-bit-list-to-byte (subseq bitlist 0 8)) out)
                    (setf bitlist (subseq bitlist 8))))
            (close out)))

Nothing strange here. As long as there are bits left in the bitlist we take the first eight bits, convert them into a byte, write that byte to file and then drop those bits from the list (by setting the list equal to a subsequence starting at place 8 and continuing to the end of the list). Loop and repeat until the entire file is done.

Now let’s give it a whirl. Run a command like (white-rabbit-compress-file “drinkme.txt” “compresseddrinkme.txt”) and then look at the compressed file with some sort of hex editor. You will find, once again, the exact same sequence of ones and zeros we calculated by hand.

But turning a 9 byte file into a different 9 byte file isn’t much of a compression, so instead let’s grab the entire first paragraph from “Alice in Wonderland” and save it to a file called “1stparagraph.txt”.

Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, ‘and what is the use of a book,’ thought Alice ‘without pictures or conversations?’

Run that through our compression and then look at the file sizes of the original versus the compressed. For me the original text file was 303 bytes, but the compressed version was only 216.

Compression achieved.

One word of warning though: This is a horribly slow and inefficient Lisp program that will crawl to a halt on any file more than a few kilobytes in length. There are several easy ways to speed it up but I’m not going to talk about those until the end of this LP because 1) Speed doesn’t matter during proof of concept 2) Optimizing Lisp code isn’t the main focus of this project so let’s leave it until last for the handful of people who care.

So now that we have all happily agreed to not mention how horrible this particular implementation is we can move on to part two of the project: Decompressing the files we just compressed. After all, if you can’t decompress your files you might as well just delete them and achieve 100% compression.

Let’s Program A Compression Algorithm Part 3: In Which A General Compression Algorithm Becomes Working Lisp Code

Last time we invented and tested a super-simple compression algorithm that revolves around replacing the eight most common symbols in the English language with tiny 4-bit codes instead of their normal 8-bit representations (at the cost of replacing everything else with bigger 9-bit versions). We even did some examples by hand.

But this is a coding blog, so it’s time to write some actual code. When it comes to language choice our only real requirement is that the language be capable or working directly with bits and bytes and files and since pretty much every language ever has no problem doing that we’re open to choose whatever we want.

I’m personally going to choose Lisp and be keeping all my code in a file named “wrc.lisp” for “White Rabbit Compression”. You, of course, can follow along in whatever language you want. In fact, that would probably be the most educational approach to this series.

Anyways, like we talked about in the design phase, this project will mostly focus on processing lists of bits in batches of 8, 4 and 9. That means we’re going to need a convenient data structure for holding these lists.

Fortunately Lisp is built entirely around list processing and has some very powerful built in tools for creating and managing lists of numbers, so I will be representing our binary fragments as a plain old list of integers which will just so happen to always be either a 0 or a 1.

For those of you not in Lisp I bet your language has it’s own list structures. Could be a vector, a queue, a linked list, whatever. Anything that lets you add new stuff to the end and pull old stuff off the front will be fine.

Now the efficiency buffs in the audience might have noticed that we’re wasting space by using entire integers to keep track of mere bits. Shouldn’t we be using something else, like a bit vector?

Probably! But this is just the proof of concept first pass so doing things the easy way is more important than doing things the best way. If it turns out the program is too slow or memory hungry then we’ll revisit this decision.

With that in mind our first attempt at letter bit lists is going to look something like this:

; An 8-bit ASCII letter
(list 0 1 0 0 0 0 0 1)

;One of our 4 bit compressed letters
(list 0 0 1 1)

;One of our 9 bit labeled ASCII letters
(list 1 0 1 0 0 0 0 0 1)

Next up let’s write some helper functions that can turn real 8-bit bytes into our custom bit lists or turn our bit lists back into bytes.

(defun 8-bit-list-to-byte (bitlist)
    (+     (* 128 (first bitlist))
        (* 64 (second bitlist))
        (* 32 (third bitlist))
        (* 16 (fourth bitlist))
        (* 8 (fifth bitlist))
        (* 4 (sixth bitlist))
        (* 2 (seventh bitlist))
        (* 1 (eighth bitlist))))
(defun byte-to-8-bit-list (byte)
    (let ((bitlist nil))
        (if (>= byte 128)
                (push 1 bitlist)
                (setf byte (- byte 128)))
            (push 0 bitlist))
        (if (>= byte 64)
                (push 1 bitlist)
                (setf byte (- byte 64)))
            (push 0 bitlist))
        (if (>= byte 32)
                (push 1 bitlist)
                (setf byte (- byte 32)))
            (push 0 bitlist))
        (if (>= byte 16)
                (push 1 bitlist)
                (setf byte (- byte 16)))
            (push 0 bitlist))
        (if (>= byte 8)
                (push 1 bitlist)
                (setf byte (- byte 8)))
            (push 0 bitlist))
        (if (>= byte 4)
                (push 1 bitlist)
                (setf byte (- byte 4)))
            (push 0 bitlist))
        (if (>= byte 2)
                (push 1 bitlist)
                (setf byte (- byte 2)))
            (push 0 bitlist))
        (if (>= byte 1)
                (push 1 bitlist)
                (setf byte (- byte 1)))
            (push 0 bitlist))
        (nreverse bitlist)))


The logic here is pretty simple although the Lisp syntax can be a bit weird. I think 8-bit-list-to-byte is self-explanatory but byte-to-8-bit-list might need some explaining. The basic idea is that every bit in a byte has a specific value: 128, 64, 32, 16, 8, 4, 2 or 1. Because of how binary works any number greater than 128 must have the first bit set, any number smaller than 128 but larger than 64 must have the second bit set and so on.

So we can turn a byte into a list by first checking if the number is bigger than 128. If not we put a 0 into our list and move on. But if it is we put a 1 into our list and then subtract 128 before working with the remainder of the number. Then the next if statement checks if the numbe ris bigger than 64 and so on. The only Lispy trick here is that Lisp if statements by default can only contain two lines of logic: the first for when the if is true and the second when the if is false. Since we want to run two lines of logic when things are true we wrap them in a progn, which just lumps multiple lines of code into one unit.

Clear as mud? Let’s move on then.

With those generic helpers out of the way let’s move on to writing the core of our compressor: A function that can accept a byte and then transform it into either a 4-bit short code or a 9-bit extended code.

The first thing we’ll want is a single place in the code where we can keep the “official” list of which letters translate to which short codes. It’s important we have only one copy of this list because later on we might find out we need to change it and updating one list is a lot easier than hunting through our code for half a dozen different lists.

(defparameter *compression-map*
   '((160 (0 0 0 0)) 
     (101 (0 0 0 1))
     (116 (0 0 1 0))
     (97  (0 0 1 1))
     (111 (0 1 0 0))
     (105 (0 1 0 1))
     (110 (0 1 1 0))
     (115 (0 1 1 1))))

Here I’m using the ‘ shortcut to create a list of lists since (list (list 160 (list 0 0 0 0))…) would get really tiring to type really fast. Syntax aside the first item in each of the eight items in our list is the ASCII code for the letter we want to compress and the second item is the bit list we want it to compress to.

Now that we’ve gor our compression map in whatever format you prefer all we need is an easy way to get the data we need out of that map. During compression we want to be able to take a byte and find out it’s short code and during decompression we want to take a short-code and find out which byte it used to be.

Off the top of my head there are a few ways to do this. One would be to just loop through the entire list every time we want to do a lookup, stopping when we find an item that starts with the byte we want or ends with the list we want (or returning false if we don’t find anything). The whole list is only eight items long so this isn’t as wasteful as it might sound.

An easier solution (depending on your language) might be to use our master list to create a pair of hash tables or dictionaries. As you probably know these are one way lookup structures that link “keys” to “values”. Give the hash a key and it will very efficiently tell you whether or not it has stored a value for it and what that value is; perfect for our needs. The only trick is that since they are only one way and we want to both compress and decompress our data we’ll actually need two hashes that mirror each other. One would use the bytes as keys and reference the bit-lists. The other would use the bit-lists as keys and reference the bytes.

I think I’ll go with the hash approach since they are a more universal language feature than Lisp’s particular approach to list parsing.

Hash tables seem pretty weird until you understand how they work internally.

To translate my universal compression mapping into a pair of usable hash tables I’m going to first create the hash tables as global variables and then I’m going to use a simple Lisp loop to step through every pair in the compression map. During each step of the loop we will insert the data from one pair into each hash.

(defparameter *byte-to-list-compression-hash* (make-hash-table))
(defparameter *list-to-byte-compression-hash* (make-hash-table :test 'equal))

    for compression-pair in *compression-map*
    do    (setf 
            (gethash (car compression-pair) *byte-to-list-compression-hash*) 
            (cadr compression-pair))
            (gethash (cadr compression-pair) *list-to-byte-compression-hash*) 
            (car compression-pair)))


A little Lisp trivia here for anybody following along in my language of choice:

1) The :test keyword lets you tell a hash how to compare keys. The default value works great with bytes but not so great with lists so I use :test ‘equal to give the *list-to-byte-compression-hash* a more list friendly compartor.

2) To pull data out of or put data into a hash you use the gethash function. The fist argument is a key value, the second argument is the hash you want to use. I tend to get this backwards a lot :-(

3) “car” and “cdr” and combinitions like “cadar” are all old fashioned keywords for grabbing different parts of lists. I could have just as easily used chains of “first” and “second” but what’s the fun in that?

Lisp aside, we now have an official compression map and two easy to search hashes for doing compression lookups and decompression reverse lookups. Let’s put them to good use by actually writing a compression function!

The compression function should take an ASCII byte and check if it’s in the compression map. If it is it will return the proper 4-bit short code. If it isn’t in the map it should transform it to an 8-bit list, glue on a leading 1 and then return the new 9-bit list.

(defun compress-byte (byte-to-compress)
    (let ((short-code (gethash byte-to-compress *byte-to-list-compression-hash*)))
        (if short-code 
            (append '(1) (byte-to-8-bit-list byte-to-compress)))))

Nothing all that clever here. We take the byte-to-compress and lookup whatever value it has in the *byte-to-list-compression-hash*. We then use let to store that result in a local short-code variable. We then us a simple if statment to see whether short-code has an actual value (meaning we found a compression) or if it is empty (meaning that byte doesn’t compress). If it has a compressed value we just return it. Otherwise we take the original byte-to-compress, turn it into a bit list, glue a 1 to the front and return it.

Let’s see it all in action:

[1]> (load “wrc.lisp”)

;; Loading file wrc.lisp …

;; Loaded file wrc.lisp


[2]> (compress-byte 111)

(0 1 0 0)

[3]> (compress-byte 82)

(1 0 1 0 1 0 0 1 0)

Cool. It successfully found 111 (“o”) in our compression map and shrunk it from eight bytes down to four. It also noticed that 82 (“R”) was not in our map and so returned the full 8-bits with a ninth “1” marker glued to the front.

Next time we’ll start looking into how to use this function to compress and save an actual file.

Let’s Program A Compression Algorithm Part 2: In Which A General Idea Becomes A Specific Algorithm

Last time we talked about Alice in Wonderland and the deep cruelty of stores that play the same handful of movie trailers on endless loop.

More importantly we also talked about how computers use compression to shrink files down for easier storage and faster transfer. Without good compression algorithms the Internet would crawl to a halt and half the electronics you use on a daily basis wouldn’t exist. For instance, without compression a 2 hour HD movie would be over 300 GB; good luck fitting that all on one disc!

To better understand this vital computer science breakthrough we’re going to be writing our own ASCII text compression program.

Disclaimer: It’s not going to be a very good compression program. Like all of my let’s programs the goal here is education and not the creation of usable code. We’re going to be skipping out on all the security and error checking issues a professional compressor would include and our end goal is a modest 20% reduction in file size compared to the 75%+ reduction well known tools like Zip can pull off.

Now that your expectations have been properly lowered let’s talk about the structure of text files. After all, we can’t compress them if we don’t know what they’re supposed to look like or how they work.

The ASCII text standard is a set of 256 symbols that includes all 26 letters of the English language in both upper and lower case as well as all standard punctuation, some useful computer symbols (like newline) and a bunch of random symbols and proto-emoticons that seem like they were included mostly to fill up space.

The fact that there are 256 symbols is very important because 256 is exactly how high you can count using a single 8-bit computer byte. That means you can store one letter in one byte by simply using binary counting to mark down where the letter is in the ASCII chart.

For example, the capital letter “A” is in spot 65 of the ASCII chart. 65 in binary is 01000001. So to store the letter “A” in our computer we would grab a byte worth of space on our hard drive and fill it with the electronic pattern 01000001.

An ASCII text file is just a bunch of these 8-bit character bytes all in a row. If you want to save a 120 character long text you need a 120-byte long ASCII file. If you want to save a 5,000 character short story you need a roughly 5 kilobyte ASCII file.

Ok, cool. Now we know what an ASCII file’s guts look like. Time to start looking for patterns we can use for compression.

Patterns… patterns… here’s one!

The ASCII files we’ll be working with are all based on the English alphabet, and in English not all letters are used evenly. Things like “s”, “e”, and “a” get used all the time while poor little letters like “x” and “z” hardly ever see the light of day. And don’t forget “ ”! You might not think of the blank space as a letter but just imagine tryingtowritewithoutit.

So some letters are much more common than others but ASCII stores them all in identical 8-bit bytes anyways. What if we were to change that? What if we stored the most common letters in smaller spaces, like 4-bit nibbles*?

The biggest challenge here is figuring out a way to let the computer know when it should expect a nibble instead of a byte. In normal ASCII every letter is eight bits long which makes it easy for the computer to figure out where one letter ends and the next begins.

But since we’re going to have letters of different lengths we need some way to point out to the computer what to expect. A sort of virtual name-tag to say “I’m a 4-bit letter” or “I’m a normal 8-bit letter”.

Here’s a simple idea: Our short 4-bit letters will always start with a 0, on the other hand our 8-bit ASCII letters will always start with a 1.

This solution does have some drawbacks. If the first bit of our 4-bit letters is always 0 that means we only have three bits left for encoding the actual letter. Three bits is only enough to count up to eight so that means we will only be able to compress eight letters.

A bigger problem comes from the 8-bit ASCII letters. They need their full 8-bits to work properly so the only way to mark them with a leading 1 is by gluing it to the front and turning them into 9-bit letters. So while our common letters had their size cut in half our uncommon letters are actually getting bigger. Hopefully we’ll still come out ahead but it might be close.

It’s amazing how many problems are caused by people trying to apply averages to things that ought not be averaged

Anyways, it sounds like we’re going to have eight different shortcut codes to work with. What letters should we use them for? Well, according to Wikipedia the eight most common letters in the English language are, in order: E, T, A, O, I, N, S, H. So that’s probably a good bet if we want as much compression as possible.

But Wikipedia doesn’t count the blank space as a letter. However because it’s so common in text it’s definitely something we want to compress. Let’s add it to the front of the list and drop “H”. That means the letters we will be compressing are “ ”, E, T, A, O, I, N, S.

Or more accurately “ ”, e, t, a, o , i, n, s. Remember that in ASCII upper and lower case letters are coded differently so we have to choose which we want. Since lowercase letters are more common than uppercase it makes sense to focus on them.

Now that we have our eight compression targets all we have to do is assign them one of our short codes, all of which are just the number 0 followed by some binary. Let’s go with this:

“ ” = 0000

“e” = 0001

“t” = 0010

“a” = 0011

“o” = 0100

“i” = 0101

“n” = 0110

“s” = 0111

Also remember that any letter not on this list will actually be expanded by putting a “1” in front of it’s binary representation.

One Makes You Smaller

Whew! That was a lot of abstract thinking but believe it or not we now have a complete compression algorithm. And just to prove it we’re going to do a compression by hand.

But what should we practice on? Well, our theme is “Wonderland” and I seem to recall that Alice was able to shrink herself by fooling around with a bottle labeled “drink me”. In ASCII that looks like this:

d r i n k m e
01100100 01110010 01101001 01101110 01101011 00100000 01101101 01100101

Eight bits times eight letter means 64 bits total. But if we replace the space, ‘i’, ‘e’, and ‘n’ with our 4-bit shortcuts while adding a 1 flag in front of the remaining 8-bit (soon to be 9-bit) letters we get

d r i n k m e
101100100 101110010 0101 0110 101101011 0000 101101101 0001

Which is 9*4 + 4*4 bits long for a total of only 52 bits. So we saved ourselves 12 bits, which is almost 20% less space than the original. Not bad.

One Makes You Grow Taller

Of course, taking text and compressing it is pretty useless unless we also know how to take compressed text and expand it back into normal readable ASCII. So please take a look at the following bit sequence and see if you can figure out what it used to say:


I don’t want anybody accidentally looking ahead so let’s push the answer down a page or so with some another random comic.

Information theory jokes are funny, right?

So the first thing to do here is to take that big messy data stream and split it into individual letters. Remember that according to our rules the length of each letter is determined by whether it starts with a 0 or a 1. The 0s are 4-bit letters and the 1s are 9-bit letters.

So here we go. 0001001111111010000001011011010001 starts with 0 so the first letter must be four bits long: 0001

After removing those four bits we’re left with 001111111010000001011011010001, which also starts with a 0 meaning our next letter is also four bits long: 0011

Removing those four letters leaves us with 11111010000001011011010001. Since that starts with a 1 that means our next letter is 9 bits long: 111110100

By doing this again and again we finally get these six letters:

0001 0011 111110100 0000 101101101 0001

Now that we have our individual letters it’s time to turn them into, well, letters. But the kind of letters people can read.

For the four bit letters we just reference the list of short codes we made up. Scroll up in the likely event that you neglected to commit them to long term memory.

0001 0011 111110100 0000 101101101 0001
e a ? ? e

For the nine bit letters we need to remove the leading 1 and then look up the remaining eight bit code in the official ASCII chart. For instance 111110100 becomes 11110100 which is the code for “t”.

0001 0011 111110100 0000 101101101 0001
e a t m e

And there we go, the compressed binary has successfully been turned back into human readable data.

I Don’t Like Pretending To Be A Compression Algorithm

Doing these examples by hand was a great way to prove our proposed compression algorithm actually works but I don’t think any of us want to to try and compress an entire book or even an entire email by hand. It would be much better to teach the computer how to do this all for us. Which is exactly what we’re going to start working on next time.

After all, this is a Let’s Program, not a Let’s Spend A Small Eternity Doing Mathematical Busywork.

* Yes, nibble is the actual official term for half a byte. Programmers are weird like that.

Let’s Program A Compression Algorithm Part 1: How To Fit A Byte Into A Bit And Other Curious Tricks

So a while ago there was a new Alice in Wonderland movie coming out and my local geek store apparently decided it would be fun to replay the trailer again and again every few minutes. So it should be no surprise I’ve got the lyrics to “White Rabbit” stuck in my head.

One pill makes you larger
And one pill makes you small
And the ones that mother gives you
Don’t do anything at all
Go ask Alice
When she’s ten feet tall

Speaking of growing and shrinking, have you ever really thought about data compression? It’s that amazing thing that let’s you take an entire folder full of vacation photos and shrink it down into a single file that’s small enough to email to grandma. Data compression is also the star player behind streaming Youtube videos, fast loading image files, compact MP3 songs, reasonable computer backups, home DVD players and anywhere else you need to take a very big file and make it very small for either storage or transport.

But how exactly does that work? You would think that making a file smaller would result in lost data, kind of like how shrinking a picture in paint and then blowing it back up makes everything fuzzy. A 10MB file has 10MB of data, right? You’d think that the only way to shrink it is by throwing some of that data away.

After years of playing really old games on newer, bigger monitors I’ve grown fond of the fuzzy pixelated look

And yet somehow magic compression algorithms like zip manage to shrink things down and then return them to normal size without ever losing any of the original data.

To explain how that is possible we’re going to have to talk information theory, but that sounds boring so instead let’s talk fast food.

Your average fast food restaurant offers a dozen or so different kinds of sandwiches along with another dozen or so sides such as soda, onion rings and soft serve ice cream. Customers can then order whatever mix of items they want.

But not all combination of items are equally popular. Every day hundreds and hundreds of people ordered sandwiches and sodas for lunch but only one or two customers a week ask for a six pack of onion rings plus an ice cream cone.

This leads to a brilliant idea: What if we make the most popular combinations of food items easier to order by giving them simple numbers? A lot of people order hamburgers, a drink and some fries so let’s call that the “#1 Combo”. Other people order chicken strips and a soda so let’s call that the “#2 Combo”. We get a lot of visitors from the gym next door who just want a salad and some water so let’s call that the “#3 Combo”.

Fast food combos are basically a way to “compress” common orders into simple numbers that chef’s can “expand” by just looking at a chart. Customer wants a #3? Let’s check the menu and see which items go with that combo.

Customer service is thus much faster because now most people can just shout out a number instead of having to spend half a minute or more carefully listing out exactly what they want.

Information theory in a nutshell

Now information science has a lot of fancy words and equations for talking about this stuff but all you really need to know for now is that most types of data have patterns and those patterns can be used to create shortcuts. Restaurants use these patterns to replace complex orders with simple combo numbers and compression algorithms use these patterns to replace big files with simplified smaller files.

As an extreme example, imagine a file that’s just the word “Jabberwoky” repeated a million times. That’s roughly 8 MB of disk space, but do you really need all that? Not really. You could replace it with a file that says “JabberwokyX1000000” and as long as your code knew to interpret that as a million item list everything would work the exact same.

And that’s the secret to compression: A 10MB file contains 10MB worth of 0s and 1s but not necessarily 10MB worth of unique information. By finding a more efficient way to express that same information you can shrink your files, sometimes quite dramatically.

For a more realistic example: imagine an image file made up of a few million pixels. In a raw image file each pixel is a 32 bit number telling the computer which of several million colors it should draw on the screen. But what if your image doesn’t have millions of different colors? What if it’s a plain black and white drawing? You don’t need a full 32 bits just to mark whether a pixel is black or white so you could save a ton of space by inventing a new compressed image format where each pixel is just a single bit, 0 for black and 1 for white.

Or maybe you have a cartoon video file with a lot of identical still shots. Instead of storing a complete image for all those identical frames maybe you could create a compressed video format with the ability to say “This frame should look just like the last one”.

And maybe you have a text file that’s just a little too big for an email attachment. You could just email it in parts but I bet with a little clever thought you could come up with a way to compress it.

That’s going to be the topic of this let’s program.

Well, that and Alice in Wonderland.

Let’s Program A Prisoner’s Dilemma: Index

The Prisoner’s Dilemma is a famous Game Theory thought experiment that tries to answer questions about why sometimes people choose to cooperate with each other and sometimes they choose to trick and betray each other.

While it can be analyzed using nothing but pure math we’re going to take an AI approach to this classic dilemma by building a bunch of simple simulations and then putting them in a big virtual room where they can either work together or prey on each other. Watching their behavior and seeing who comes out on top should hopefully give us some insight into how real humans approach similar situations.

Let’s Program A Prisoner’s Dilemma Part 1: Game Theory Is Less Fun Than It Sounds

Let’s Program A Prisoner’s Dilemma Part 2: Crime And Punishment

Let’s Program A Prisoner’s Dilemma Part 3: Fight In Cell Block D

Let’s Program A Prisoners Dilemma Part 4: Heaven or Hell, Let’s Rock!

Let’s Program A Prisoners Dilemma 5: What Is A Decision Making Process?

Let’s Program A Prisoner’s Dilemma Part 6: Eye For An Eye

Let’s Program A Prisoner’s Dilemma Part 7: Survival of the Fittest

Let’s Program A Prisoner’s Dilemma Part 8: And The Moral Of The Story Is…

Let’s Program A Prisoner’s Dilemma Part 8: And The Moral Of The Story Is…

Last time we finished up our Prisoner’s Dilemma experiments by tossing all of our prisoners into a big arena and then having them compete in a multi-generational survival-of-the-fittest pit fight in order to see which strategies have the best long term survivability.

Now it’s time to see the final outcome, but nobody wants a blog post filled with nothing but thousands of lines of pure numbers. Instead I’ve dumped my test results into a graphing program program and built this nice little animation for you.


So what did we see happen here?

Things started out about like you’d expect. The first few generations saw the devils taking advantage of the naïve saints and quickly becoming the majority of our population. They then steamrolled their way through the madmen who just aren’t smart enough to purposely avoid being taken advantage of by the devils.

We’re now left with a population that’s 75% devil with a mere handful of judges. But judges ARE smart enough to notice when people keep cheating them so we see a sudden reversal. The judges stop feeding free points to the devils while simultaneously working together to keep their score high. A few more generations is all it takes for the devils to be completely wiped out by the superior teamwork of the judges.

Wasn’t This Supposed To Have Real World Applications?

Waaaaay back in part one I mentioned that the main goal of game theory was to use simple math games to analyze real world decisions making.

So what real world lessons can we learn from all our experiences with the prisoner’s dilemma?

First, that a group full of trustworthy people is a good thing for everybody.

Second, that a cheater can easily exploit a group full of trustworthy people for massive gain.

Third, that having too many cheaters isn’t just bad for the group, it eventually leaves the cheaters worse off too.

Fourth, that a reliable way to prevent cheaters from messing up a group is by identifying and then punishing them until they either reform or leave the group.

These four observations together explain the rise and fall of a lot of real world organizations. You start off with a group of trustworthy people who work together to accomplish something. Eventually a cheater infiltrates the group and realizes that he can exploit the good will of the group for personal gain. Other cheaters eventually notice the scam and decide they also want to join in on exploiting the organizations.

At this point one of two things can happen: Either the group notices the cheaters and manages to remove them before irreparable harm can be done or the group fails to stop the cheaters and collapses under their corrupting weight (possibly after limping along for years or decades).

But why do so many groups fail to stop the cheaters?

Sometimes it’s because the group is just too nice for their own good. They feel bad about having to punish or expel anyone, so instead they just put up with the cheaters until it’s too late.

But more often than not the problem comes from the fact that identifying cheaters is really hard.

This is, in fact, one of the big limitations of the Prisoner’s Dilemma: It assumes you can tell when people are cooperating with or betraying you. But relatively few real world situations fit that pattern. Usually it’s really really hard to tell between a good natured cooperator and a really sly betrayer. A competent cheater can get away with dozens of mini-betrayals before anybody figures out what’s really going on.

Consider a politician who helps pass a popular law, but only after adding in a few sneaky sections that will help slowly funnel money and power to his buddies.

Consider a CEO who generates record profits in the short term by knowingly sabotaging the long term health of the company that hired him.

Consider a forum member who claims to be interested in giving people constructive criticism but in reality is a troll who just likes frustrating people by picking apart everything they say.

Knowing that the tit-for-tat Judge strategy works doesn’t actually do us much good unless we happen to have a talent for judging the behavior of others. And that is sadly a feat simple game theory can’t help us with.

In Summary

Don’t be a cheater. In the short run it might seem like a good idea but in the long run the odds are pretty high you’ll get stuck in a downward spiral with other cheaters and wind up worse off than if you had played it straight all along.

Don’t be naive. If there are no consequences to taking advantage of you cheaters will eventually drain you dry.

Use your judgment. Helping helpful people and avoiding cheaters seems to be the most reliable path to both individual and group success.

Acknowledge that figuring out who is helpful and who is a cheater is much harder than our simple simulations made it seem. Real life is messy.

Homework Assignment

Crack open a history book and find your favorite failed civilization, company, club or whatever. Do some research into the specifics of it’s rise and fall. Did it start out as a bunch of people cooperating towards a common goal? Can you spot a point at which cheaters started infiltrating? Was there a final tipping point where too many people were scamming the group and it could no longer hold up?

Now look up a group that seems to have survived the test of time. What did it do differently? Did it have some sort of rule or tradition that helped prevent cheaters from taking advantage of the group?

Remember, real life is messy so it’s entirely possible that whatever real life examples you find won’t quite fit into the standard pattern of an iterated prisoner dilemma. In which case you now have a great excuse to dig deeper into game theory and see if any of the more advanced scenarios do fit your historical example.

Let’s Program A Prisoner’s Dilemma Part 7: Survival of the Fittest

So we’ve built a bunch of different prisoner models and thrown them together in all sorts of different combinations to see what happens. And we could keep doing that.

But isn’t running all these experiments by hand kind of boring? I mean, who really has the time to sit around all day adjusting prisoner ratios and recording results?

Fortunately, as programmers we know that the answer to boredom is automation, so let’s build us a system that can test different prisoner populations for us.

What we’re going to be doing specifically is a simple survival of the fittest algorithm. We’ll still give our program a starting mix of prisoners, but after their first game is done the worst 10% of the prisoners will be removed while the best 10% will be duplicated. This new group will then play the game again. We’ll repeat this pattern a few dozen or hundred times producing all sorts of interesting numbers on what sorts of groups seem to work best.

It Runs In The Family

So what should our new multi-generational code look like?

For starters the main game playing algorithm should stay the same. We’ll still have a group of prisoners that need to be randomly paired up for a certain number of rounds and we’ll still want to keep track of everyone’s score.

The only difference is that at the end of the round instead of just reporting those scores and exiting we’re going to want to generate a new group of prisoners and then run the entire experiment again. We’ll do this for however many generations we think the experiment should run.

So there’s our first hint on what this should look like. Instead of just asking for a group of prisoners and how many rounds they should play we are going to ask for a group of prisoners, how many rounds they should play and how many times (or generations) we should repeat the experiment.

def playMultiGenerationalPrisonersDilemma(prisoners, generations, roundsPerGeneration)

Now inside of that we’re going to need a loop that runs once for every generation we asked for. And inside of that loop we’re going to want to copy our existing prisoner dilemma code. Standard Disclaimer: Copying code is sloppy and should be avoided on serious projects. If you were building a professional prisoner dilemma product you would want to figure out some way that playPrisonersDilemma and playMultiGenerationalPrisonersDilemma could both reference the same piece of code instead of copying each other.


generations.times{ |generation|
   #Copy paste our code from the normal playPrisonersDilemma function here

That’s enough to make our code play the prisoners dilemma again and again as many times as we want, but it will use the same group every time. What we want instead is to create a new group based off of the last group but with the bottom 10% of prisoners removed and the top 10% of prisoners doubled. We can do this by adding a little bit of code after the copy pasted dilemma logic, right after the part where it prints out the score.

#Generate a new group of prisoners based off of the last group.
#Prisoners that scored well get duplicated. Prisoners that scored poorly get dropped

newPrisoners = Array.new
newPrisonerCount = 0
newSaintCount = 0
newDevilCount = 0
newMadmenCount = 0
newJudgeCount = 0

prisoners.sort{ |x, y| y.score <=> x.score}.each{ |prisoner|
   #Top ten percent get an extra prisoner in the next generation
   if newPrisonerCount < (prisoners.length / 10).floor
      case prisoner.strategy
      when "Saint"
         newSaintCount = newSaintCount + 1
      when "Devil"
         newDevilCount = newDevilCount + 1
      when "MadMan"
         newMadmenCount = newMadmenCount + 1
      when "Judge"
         newJudgeCount = newJudgeCount + 1

   #Bottom ten percent don't get added to the next generation at all
   if newPrisonerCount >= prisoners.length - (prisoners.length / 10).floor

   #If it's not the bottom ten percent add one to the next generation
   case prisoner.strategy
   when "Saint"
      newSaintCount = newSaintCount + 1
   when "Devil"
      newDevilCount = newDevilCount + 1
   when "MadMan"
      newMadmenCount = newMadmenCount + 1
   when "Judge"
      newJudgeCount = newJudgeCount + 1

   newPrisonerCount = newPrisonerCount + 1

#Create new generation and load it into the prisoners variable for the next game
prisoners = createPrisoners(newSaintCount, newDevilCount, newMadmenCount, newJudgeCount)

Nothing fancy here. We just sort the prisoners and then iterate through them while keeping track of how many prisoners of each type we’ve seen. For the first ten percent of prisoners we count double and we skip the last ten percent entirely.

We then use our good old createPrisoners function to build a new generation of prisoners. Starting with fresh prisoners like this is actually better than trying to modify our old prisoner group because it ensures no lingering data gets left behind to mess things up. For instance, it would be a bad thing if Judge style prisoners kept their memory between matches (especially since IDs might wind up changing between games).

With the new prisoners created the generation loop then repeats and our new group of prisoners play the game again. All that’s left to do is add a nice little message to our score output letting us know which generation we’re on and we have a complete little program.

def playMultiGenerationalPrisonersDilemma(prisoners, generations, roundsPerGeneration)
        if !prisoners.length.even?
            throw "Prisoners Dilemma requires an even number of participants"
        generations.times{ |generation|
            # Make sure each prisoner starts out with a clean slate
            prisoners.each{ |prisoner| prisoner.score = 0}
                pairs = prisoners.shuffle.each_slice(2)
                pairs.each{ |pair|
                    firstPlayerCooperates = pair[0].cooperate?(pair[1].id)
                    secondPlayerCooperates = pair[1].cooperate?(pair[0].id)
                        pair[0].score -= 1
                        pair[1].score -= 2
                        pair[1].score -= 1
                        pair[0].score -= 2
                    pair[0].learnResults(pair[1].id, secondPlayerCooperates)
                    pair[1].learnResults(pair[0].id, firstPlayerCooperates)
            #Show the stats for the group as a whole as well as for each individual prisoner
            groupScore = 0
            prisoners.each{ |prisoner| groupScore += prisoner.score }
            puts "In generation #{generation+1} the group's overall score was #{groupScore} in #{roundsPerGeneration} rounds with #{prisoners.length} prisoners"
            prisoners.sort{ |x, y| y.score <=> x.score}.each{ |prisoner| prisoner.report }
            #Generate a new group of prisoners based off of the last group.
            #Prisoners that scored well get duplicated. Prisoners that scored poorly get dropped
            newPrisoners = Array.new
            newPrisonerCount = 0
            newSaintCount = 0
            newDevilCount = 0
            newMadmenCount = 0
            newJudgeCount = 0
            prisoners.sort{ |x, y| y.score <=> x.score}.each{ |prisoner|
                #Top ten percent get an extra prisoner in the next generation
                if newPrisonerCount < (prisoners.length / 10).floor
                    case prisoner.strategy
                    when "Saint"
                        newSaintCount = newSaintCount + 1
                    when "Devil"
                        newDevilCount = newDevilCount + 1
                    when "MadMan"
                        newMadmenCount = newMadmenCount + 1
                    when "Judge"
                        newJudgeCount = newJudgeCount + 1
                #Bottom ten percent don't get added to the next generation at all
                if newPrisonerCount >= prisoners.length - (prisoners.length / 10).floor
                #If it's not the bottom ten percent add one to the next generation
                case prisoner.strategy
                when "Saint"
                    newSaintCount = newSaintCount + 1
                when "Devil"
                    newDevilCount = newDevilCount + 1
                when "MadMan"
                    newMadmenCount = newMadmenCount + 1
                when "Judge"
                    newJudgeCount = newJudgeCount + 1
                newPrisonerCount = newPrisonerCount + 1
            #Create new generation and load it into the prisoners variable for the next game
            prisoners = createPrisoners(newSaintCount, newDevilCount, newMadmenCount, newJudgeCount)

Me, Papa and Granpappy

Now let’s give our code a whirl. For testing purposes I’m going to create a small 10 prisoner group and have it run for a mere three generations.

prisoners = createPrisoners(2, 3, 3, 2)
#playPrisonersDilemma(prisoners, 1000)
playMultiGenerationalPrisonersDilemma(prisoners, 3, 1000)

In generation 1 the group’s overall score was -15444 in 1000 rounds with 10 prisoners

ID: 5 Score: -1176 Strategy: Devil

ID: 3 Score: -1190 Strategy: Devil

ID: 4 Score: -1192 Strategy: Devil

ID: 9 Score: -1495 Strategy: Judge

ID: 10 Score: -1505 Strategy: Judge

ID: 6 Score: -1589 Strategy: MadMan

ID: 8 Score: -1616 Strategy: MadMan

ID: 7 Score: -1639 Strategy: MadMan

ID: 1 Score: -1994 Strategy: Saint

ID: 2 Score: -2048 Strategy: Saint

In generation 2 the group’s overall score was -16716 in 1000 rounds with 10 prisoners

ID: 4 Score: -1440 Strategy: Devil

ID: 3 Score: -1444 Strategy: Devil

ID: 5 Score: -1448 Strategy: Devil

ID: 2 Score: -1498 Strategy: Devil

ID: 10 Score: -1602 Strategy: Judge

ID: 9 Score: -1629 Strategy: Judge

ID: 7 Score: -1793 Strategy: MadMan

ID: 6 Score: -1815 Strategy: MadMan

ID: 8 Score: -1839 Strategy: MadMan

ID: 1 Score: -2208 Strategy: Saint

In generation 3 the group’s overall score was -17991 in 1000 rounds with 10 prisoners

ID: 2 Score: -1642 Strategy: Devil

ID: 5 Score: -1656 Strategy: Devil

ID: 3 Score: -1680 Strategy: Devil

ID: 1 Score: -1688 Strategy: Devil

ID: 4 Score: -1690 Strategy: Devil

ID: 10 Score: -1738 Strategy: Judge

ID: 9 Score: -1749 Strategy: Judge

ID: 6 Score: -2039 Strategy: MadMan

ID: 7 Score: -2045 Strategy: MadMan

ID: 8 Score: -2064 Strategy: MadMan

Looking at the first generation you’ll notice that the devils picked on the saints (like usual) which put the devils in the lead and sent the saints to last place. That means that the top 10% of our group was a single devil and the bottom 10% was a single saint. Because of this the next generation has four devils (up from the starting three) and only one saint (down from the starting two).

Another generation then passes with results very much like the first, causing us to lose another saint and gain another devil. Which is honestly a little ominous…

Next Time: Turning Boring Data Overflow Into Interesting Visuals

So that was just ten prisoners playing three generations of fairly short games. For a real experiment we’re going to want hundreds of prisoners playing significantly longer games over at least a dozen generations. Things like:

prisoners = createPrisoners(250, 250, 250, 250)
playMultiGenerationalPrisonersDilemma(prisoners, 25, 10000)

Only problem is that a big experiment like that is going to rapidly fill up your terminal with junk. Sending the results to a text file helps but even that produces a lot more numbers than the human brain can conveniently work with.

But you know what the human brain can conveniently work with? Animated bar graphs!

Let’s Program A Prisoner’s Dilemma Part 6: Eye For An Eye

It’s finally time to write a prisoner with some actual brain power. Let’s start by looking at our current prisoners and their unique issues:

Saints always cooperate, making them an easy target for defectors.

Devils always defect, which leads to bad scores for everyone when multiple devils get together.

Madmen act randomly, which means whether or not they make good choices is up to chance.

What we really want is a prisoner that cooperates when cooperating is a good idea and defects in self defense when it suspects its about to be betrayed. A prisoner that can judge whether or not its partner deserves to be trusted.

Trust No One?

But how will our new “judge” type prisoner decide who to trust? The only thing it will know about its partner is their ID. There’s no way to force the other prisoner to reveal their strategy*, or at least there shouldn’t be. After all, the prisoner’s dilemma is meant to model real life and most real humans aren’t mind readers.

So how will our judge decide how to make decisions if he can’t read minds or see the future? The same way we humans do. We trust people who act trustworthy and we are suspicious of people who betray us.

This leads to a strategy called “tit-for-tat”, where you treat your partner the same way they treat you. If they betray you then the next time you meet them you betray them. If they cooperate then the next time you see them you pay them back by cooperating.

To pull this off we’re going to have to give our judges a memory.

Remember that reportResults function that we programed into the original prisoner class? We’ve been ignoring that so far because none of our prisoners needed it. Saints, devils and madmen always act the same way so they don’t really care how their partners act.

But our judge can use the info from reportResults to build a list of people he can and can’t trust. All we have to do is initialize a blank hash when the judge is created. Then every time reportResults is called we update that hash. We then use this hash in cooperate? by taking the provided partner ID, looking up in our hash whether or not they like to cooperate and then matching their behavior.

But wait! What do we do the first time we meet a new player? We can’t copy their behavior if we’ve never seen how they behave!

The answer is to remember the Golden Rule and “Treat others like you want to be treated”. That means the first time a judge meets a new player he will cooperate. Not only is this good manners, it helps make sure that when judges meet each other they will get locked into a loop of infinite cooperation instead of a loop of infinite betrayal.

Designing The Judicial Branch

The two things that make a judge a judge are:

1) It keeps track of how other prisoners treat it

2) If it meets a prisoner more than once it copies their last behavior

So let’s start with feature 1 by giving our prisoner a memory. To do that we’re going to have to create a hash during prisoner initialization and then use the learnResults function to fill it with useful data.

def initialize(id)
   @strategy = "Judge"
   @opponentBehavior = Hash.new()

def learnResults(opponentID, opponentCooperated)
   @opponentBehavior[opponentID] = opponentCooperated

Now that our opponentBehavior hash is filling up all that’s left is to use it as part of our decision making process. This is a pretty simple two step process. When our judge is asked whether or not it wants to cooperate with a given prisoner it starts by checking whether or not it has an opponentBehvaior entry for that ID. If it does it just mimics back whatever behavior it recorded for that ID, but if it’s empty it instead defaults to cooperation.

def cooperate?(opponentID)
   if( @opponentBehavior.key?(opponentID))
      return @opponentBehavior[opponentID]
      return true

Put it all together and what do you get?

class Judge < Prisoner

   def initialize(id)
      @strategy = "Judge"
      @opponentBehavior = Hash.new()

   def cooperate?(opponentID)
      if( @opponentBehavior.key?(opponentID))
         return @opponentBehavior[opponentID]
         return true

   def learnResults(opponentID, opponentCooperated)
      @opponentBehavior[opponentID] = opponentCooperated

Featuring Four Different Character Classes!

Now let’s give our code a test by setting up a group with four judges, four saints, four devils and four madmen.

The Group’s Overall Score was -23546 in 1000 rounds with 16 prisoners

ID: 8 Score: -1126 Strategy: Devil

ID: 7 Score: -1170 Strategy: Devil

ID: 6 Score: -1198 Strategy: Devil

ID: 5 Score: -1254 Strategy: Devil

ID: 14 Score: -1391 Strategy: Judge

ID: 16 Score: -1394 Strategy: Judge

ID: 15 Score: -1398 Strategy: Judge

ID: 13 Score: -1425 Strategy: Judge

ID: 12 Score: -1479 Strategy: MadMan

ID: 9 Score: -1483 Strategy: MadMan

ID: 11 Score: -1498 Strategy: MadMan

ID: 10 Score: -1510 Strategy: MadMan

ID: 3 Score: -1766 Strategy: Saint

ID: 2 Score: -1790 Strategy: Saint

ID: 4 Score: -1802 Strategy: Saint

ID: 1 Score: -1862 Strategy: Saint

In our mixed group you can see that the judges outperform both the madmen and the saints and come pretty close to matching the devils.

But that’s not good enough. We didn’t want to just match the devils, we wanted to beat them! It’s absolutely infuriating that our genuinely intelligent judges are somehow being outscored by a bunch of dumb always-defectors.

Grudge Match

OK, time to get to the bottom of this. Why can’t our judges beat the devils? Let’s get rid of all the saints and madmen and isolate our problem.

The Group’s Overall Score was -17859 in 1000 rounds with 10 prisoners

ID: 10 Score: -1558 Strategy: Judge

ID: 7 Score: -1581 Strategy: Judge

ID: 6 Score: -1581 Strategy: Judge

ID: 9 Score: -1592 Strategy: Judge

ID: 8 Score: -1597 Strategy: Judge

ID: 2 Score: -1990 Strategy: Devil

ID: 3 Score: -1990 Strategy: Devil

ID: 4 Score: -1990 Strategy: Devil

ID: 5 Score: -1990 Strategy: Devil

ID: 1 Score: -1990 Strategy: Devil

Well would you look at that. When all you have are devils and judges the judges scream ahead into first place no problem.

It’s not that our judges weren’t smart enough to out-think the devils in the mixed game, it’s that the devils had a bunch of saints sitting around they could exploit for easy points. As soon as we took away the ever-forgiving all cooperators the devils sank like a rock while the judges pulled off a brilliant group victory by cooperating with each other and paying back the deceitful devils with defection after defection.

Impossible To Trust

Let’s finish up with a few more what-if scenarios, like “What if you stuck a bunch of judges with a bunch of saints?”

The Group’s Overall Score was -10000 in 1000 rounds with 10 prisoners

ID: 1 Score: -1000 Strategy: Saint

ID: 2 Score: -1000 Strategy: Saint

ID: 3 Score: -1000 Strategy: Saint

ID: 4 Score: -1000 Strategy: Saint

ID: 5 Score: -1000 Strategy: Saint

ID: 6 Score: -1000 Strategy: Judge

ID: 7 Score: -1000 Strategy: Judge

ID: 8 Score: -1000 Strategy: Judge

ID: 9 Score: -1000 Strategy: Judge

ID: 10 Score: -1000 Strategy: Judge

A ten way tie with a perfect group score, just like when we had a group of pure saints. As long as you’re nice to the judges they’ll be perfectly nice to you.

What if we get rid of the saints and just have judges?

The Group’s Overall Score was -10000 in 1000 rounds with 10 prisoners

ID: 1 Score: -1000 Strategy: Judge

ID: 2 Score: -1000 Strategy: Judge

ID: 3 Score: -1000 Strategy: Judge

ID: 4 Score: -1000 Strategy: Judge

ID: 5 Score: -1000 Strategy: Judge

ID: 6 Score: -1000 Strategy: Judge

ID: 7 Score: -1000 Strategy: Judge

ID: 8 Score: -1000 Strategy: Judge

ID: 9 Score: -1000 Strategy: Judge

ID: 10 Score: -1000 Strategy: Judge

Yet another perfect score! Since judges start off by cooperating they inevitably end up trusting each other and create just as nice a world as a group of pure saints would.

That’s enough happy thoughts for now. Why not try something more sinister? Like a group full of devils with only one judge?

The Group’s Overall Score was -19991 in 1000 rounds with 10 prisoners

ID: 1 Score: -1998 Strategy: Devil

ID: 2 Score: -1998 Strategy: Devil

ID: 3 Score: -1998 Strategy: Devil

ID: 4 Score: -1998 Strategy: Devil

ID: 5 Score: -1998 Strategy: Devil

ID: 9 Score: -1998 Strategy: Devil

ID: 7 Score: -1998 Strategy: Devil

ID: 8 Score: -1998 Strategy: Devil

ID: 6 Score: -1998 Strategy: Devil

ID: 10 Score: -2009 Strategy: Judge

As we’ve seen time and time again there is literally no way to win in a world full of devils. They will always betray you and so you can never get ahead.

But unlike the saint, which was absolutely destroyed by a gang of devils, the judge manages to at least keep his loses pretty low. As you can see from the numbers the judge cooperated exactly once with every single devil and then never trusted them again, thereby preventing them from leeching more points out of him.

* There is actually a variation of the prisoner dilemma where each prisoner is given a complete copy of their partner’s code as an input. We’re not doing that.