Let’s Program A 3D Browser Game Part 6: Forward To Tomorrow!

Last time we improved our engine to let us make nice clean 90 degree turns, a definite requirement of any proper dungeon crawler. The next logical step is to let us also move forward in whatever direction we happen to be facing so that we can actually explore our 3D maze instead of just admiring it from one spot.

The logic for moving forward is very similar for the logic of making a 90 degree turn.

  1. Wait for the user to press the up key
  2. Enter a “moving” state that blocks new player input
  3. For every frame spent moving move the player a small distance in the direction they are facing
  4. Once the user has moved an entire dungeon square distance reset their state and wait for the next key press

The only tricky bit here is keeping track of which direction the player needs to move in. With turning we could always just add (or subtract) 90 degrees from our current position and trust things would work out fine. But with movement sometimes we will need to add to the player’s x position and sometimes to their z position.

Obviously we need some way to keep track of which axis the player should move along when they press the move key. While we could technically calculate this information based off of the player’s current degree of rotation I think it makes for easier and cleaner code to just explicitly keep track of whether the player is facing north, east, south or west. And of course that means we’re going to need new constants and a new player variable.

const NORTH = 100;
const EAST = 101;
const WEST = 102;
const SOUTH = 103;

var direction = NORTH;

And then to keep track of our actual movement we’re going to need a few more:

const MOVING_FORWARD = 4;

var walkDistance = 0;
var startX = 0;
var startZ = 0;

Now let’s put those to work, first by updating our turning code to also properly update the direction the player is facing. Because the player is locked into a full turn as soon as they press the key we might as well do things the easy way and update their direction as soon as they press left or right; no need to wait until the end of the turn. So find the if statements inside of the waiting state and add to them like so:

if(playerInput.left){
   state = TURNING_LEFT;
   switch(direction){
      case NORTH:
         direction = WEST;
         break;
      case EAST:
         direction = NORTH;
         break;
      case SOUTH:
         direction = EAST;
         break;
      case WEST:
         direction = SOUTH;
         break;
   }
}
else if(playerInput.right){
   state = TURNING_RIGHT;
   switch(direction){
      case NORTH:
         direction = EAST;
         break;
      case EAST:
         direction = SOUTH;
         break;
      case SOUTH:
         direction = WEST;
         break;
      case WEST:
         direction = NORTH;
         break;
   }
 }

Now that we know what direction the player is facing we can safely move them forward. To start let’s add some detection code for the up key inside of our waiting state block. This else if should go immediately after the playerInput.right else if we just updated:

else if(playerInput.up){
   walkingDistance = 0;
   startX = camera.position.x;
   startZ = camera.position.z;
   state = MOVING_FORWARD;
}

Pretty simple. If the player presses the up button we put them in the MOVING_FORWARD state and initialize the data we need to make the move succeed. In particular we need to mark down where the player is currently standing in startX and startZ. We also reset the walkingDistance variable to 0 since we’re going to need to use that data to figure out when we should stop moving.

Of course, being in the MOVING_FORWARD state doesn’t do us any good unless there is code for it. So make some space at the end of your render function, right before the recursive call to render, and add this:

if(state == MOVING_FORWARD)
{
   walkingDistance += 1 /30;

   if(walkingDistance >= 1){
      walkingDistance = 1;
      state = WAITING;
   }

   switch(direction){
      case NORTH:
         camera.position.z = startZ - walkingDistance;
         break;
      case EAST:
         camera.position.x = startX + walkingDistance;
         break;
      case SOUTH:
         camera.position.z = startZ + walkingDistance;
         break;
      case WEST:
         camera.position.x = startX - walkingDistance;
         break;
   }
}

This logic should look familiar since it’s just our 90 degree turn code straightened into a line.

Basically for every frame we spend MOVING_FORWARD we extend our walking distance by a fraction of a square length (which for us is 1). We then look at our current direction to figure out which way we need to move and update the camera to a new position based off our starting location and the distance we have moved so far. As walkingDistance increases the camera gets moved farther and farther from where it started.

When walking distance exceeds 1 whole unit we force it back to a perfect 1 and reset our state to waiting for input. By not including a return statement we also get to have one final round of camera updates based off of the perfect 1 which should properly place the camera at the desired end point, ready for any future turns or moves. This is important because missing our target by even a tiny fraction can add up over the course of several hundred moves and leave the camera in the completely wrong place.

If everything went well you can now load your code and walk around our tiny three square maze. Of course, the lack of any sort of collision detection also means you can ghost through the walls and walk right out of the maze but solving that is a problem for another day.

Still, we have a little bit of time left over so let’s make a quick improvement to the code.

At the moment our turn and move code both move the player a set amount per frame. This is an OK approach, especially since the requestAnimationFrame function we’re using does its best to maintain a constant 60 or so frames per second.

But that frame rate is not guaranteed! If your game is really complicated or the user has too many tabs open that frame rate can start to drop and a quick turn that was supposed to take a single second can drag on and on and on.

Alternatively changes in computer technology might lead to a much higher frame rate and suddenly that carefully planned one second turn flashes by in half that time.

Even pro games have issues like this. For example, Dark Souls 2 originally ran at 30 frames per second and calculated weapon durability based on that fact. When it got ported to a new generation of faster machines they boosted the frame rate to 60 and suddenly all the weapons were breaking half twice as fast. Woops.

The solution to these problems is to, whenever possible, base game calculations not off of frames but off of actual time passed. That way you can program game events to take exactly one second regardless of whether that second lasts for thirty frames or three frames or three hundred frames.

First off we’re going to need a variable to keep track of what time it is. Let’s toss it up by the rest of our contsants and global variables:

var last_update = Date.now();

It’s default value is whatever time it is when the program gets initialized.

Now near the top of our render function we can add:

var now = Date.now();

var deltaTime = now - last_update;
last_update = now;

These three lines get the current time and compare it to our last stored time in order to figure out how long our last frame lasted. We then reset our update time so we can start counting our next frame.

Now if we want to turn once per second we can:

turningArc += Math.PI/2 * deltaTime/1000;

And if we want to walk forward one square in one second:

walkingDistance += 1 * deltaTime/1000;

We are now frame-rate independent!

Let’s Program A 3D Browser Game Part 5: Before You Can Learn To Walk You Must Learn To Dungeon Crawl

If we want to actually control our dungeon experience of just spinning around in automated circles the first thing we’re going to need is a way to get user input. Fortunately we already know how to do that in javascript.

For those of you who don’t feel like reading the linked article we basically create a global javascript object that stores what buttons are currently being pushed. We then create a pair of functions, one designed to keep track of keys being pushed and one to keep track of keys being released. Finally we link those two functions to the webpages natural keyDown and keyUp events.

In other words, put this somewhere convenient, like at the top of your script before the runMaze function:

var playerInput = new Object();

function doKeyDown(event){
   var keynum;

   if(window.event){ //Browser is IE
      keynum = event.keyCode;
   }
   else{
      keynum = event.which;
   }

   if(keynum == 37){
      playerInput.left = 1;
   }
   else if(keynum == 38){
      playerInput.up = 1;
   }
   else if(keynum == 39){
      playerInput.right = 1;
   }
   else if(keynum == 40){
      playerInput.down = 1;
   }
}

function doKeyUp(event){
   var keynum;
   
   if(window.event){ //Browser is IE
      keynum = event.keyCode;
   }
   else{
      keynum = event.which;
   }

   if(keynum == 37){
      playerInput.left = 0;
   }
   else if(keynum == 38){
      playerInput.up = 0;
   }
   else if(keynum == 39){
      playerInput.right = 0;
   }
   else if(keynum == 40){
      playerInput.down = 0;
   }
}

Then be sure to update your body definition like so:

<body onload="runMaze();" onkeydown="doKeyDown(event);" onkeyup="doKeyUp(event);">

And voila! We can now track the arrow keys. Let’s see if we can do anything cool with that.

For instance, what if we were to wrap our automatic camera rotation code inside of an if statement?

var render = function () {
   requestAnimationFrame( render );

   if(playerInput.left){
      camera.rotation.y += 0.01;
   }
   else if(playerInput.right){
      camera.rotation.y -= 0.01;
   }

   renderer.render(scene, camera);
};

And now we can spin the camera IN BOTH DIRECTIONS by holding down the left and right arrows.

Wrong Genre?

Of course, if you’ve played a dungeon crawler before you might notice that something here doesn’t feel quite right. Specifically, we shouldn’t be able to turn just a few degrees at a time. It’s much more traditional to lock the player to the four cardinal directions and force them to slowly rotate through a full 90 degrees on each keypress.

Ironically limiting the player’s control like this is actually going to be harder than just letting them do whatever they want. No more directly reacting to button presses on each frame. Instead we need a multi-step workflow like this:

1) Put the player into a TURNING state that blocks new input

2) While TURNING rotate the player a few degrees per frame

3) When the player has turned a full 90 degrees remove the TURNING state and wait for more input

That means we’re going to need some constants (which Javascript finally sort-of supports) to list out our possible states and a variable for keeping track of which state the player is currently in.

const WAITING = 1;
const TURNING_RIGHT = 2;
const TURNING_LEFT = 3;

var state = WAITING;

We’re also going to want to keep track of which way the camera is currently pointing as well as how far the player has already turned.

var currentDirection = 0;
var turningArc = 0;

With all that we can now setup our loop to start managing state for us by putting this in our render function.

if(state == WAITING){
   if(playerInput.left){
      state = TURNING_LEFT;
   }
   else if(playerInput.right){
      state = TURNING_RIGHT;
   }
}

And now we can use that state to slowly (over the course of 60 frames) move the player through a 90 degree (half PI radian) arc by getting rid of our old turning code and replacing it with something like this:

if(state == TURNING_LEFT){
   turningArc += Math.PI/2 / 60;
   if(turningArc >= Math.PI/2){
      turningArc = Math.PI/2;
      currentDirection = currentDirection + turningArc;
      turningArc = 0;
      state = WAITING;
   }
   
   camera.rotation.y = currentDirection + turningArc;
}

if(state == TURNING_RIGHT){
   turningArc += Math.PI/2 / 60;
   if(turningArc >= Math.PI/2){
      turningArc = Math.PI/2;
      currentDirection = currentDirection - turningArc;
      turningArc = 0;
      state = WAITING;
   }

   camera.rotation.y = currentDirection - turningArc;
}

Now when we are TURNING_LEFT (or RIGHT) we start each frame by adding one 60th of a turn to our turningArc. We then adjust the camera by setting its y rotation to the combination of our turningArc to our currentDirection. So if the player is facing north (0 radians) and has turned 30 degrees the camera will point North-North East.

We also check whether or not we’ve turned a full 90 degrees. When we have that means the turn is finished and several things happen:

1) We set our turningArc to exactly 90 degrees to avoid overshooting out goal. We don’t want the player accidentally turning 91 degrees or eventually they’ll be pointing in the completely wrong direction.

2) We update our currentDirection to reflect the completed turn.

3) We reset the turningArc so it’s ready for our next turn

4) We change the state to waiting so the game will accept input again

Really the only strange thing here is the fact that we have to ADD turning angles when turning left and SUBTRACT them when turning right, which seems backwards. This is just because of the orientation of the default y axis of our engine. If you really cared I’m sure you can swap it around by turning the game’s main camera completely upside down before moving the player or building any walls but I’m just going to live with it.

Time To Move On

We have a nice basic framework for turning around now, but it’s hard to call this a dungeon crawler while the player is stuck in just one place. The next obvious step is to add some actual movement through some code pretty similar to what we just wrote. Sadly there’s no quite enough room to include that all right here so I guess I’ll see you next time!

Let’s Program A 3D Browser Game Part 4: A Different Kind Of Software Architect

We’ve seen how to use three.js to manually create and position walls within our labyrinth. But building a full size maze by hand would be be boring, easy to mess up and hard to maintain.

Let’s not do that.

Instead let’s write a bunch of helper code to do the heavy lifting for us.

For starters I’d like a function to help put walls in the right place, a function that would let us forget about doing math and instead just say something like “I want a north wall at maze location <3, 5>”.

Not a particularly hard thing to write either. First we calculate the center point of the maze location we want to put the wall at. Since all of our walls are a convenient 1 unit wide that’s as simple as directly assigning our maze coordinates to our wall coordinates. The only trick here is that by default in 3D space the y axis points straight up so what we call the “y coordinate” in 2D space is actually the  “z coordinate” in 3D space. (Well… unless you want to rotate the camera around a bunch but let’s not bother with that).

Once we have the x and z coordinates of the center of our maze location we next have to check whether the wall is supposed to be to the north, east, south or west. We then rotate the wall and push it half a unit away from the center. Be sure to double check this step; it’s pretty easy to accidentally mix things up so that east walls wind up getting pushed to the west or get rotated backwards and basically disappear (since walls are one-sided).

function placeWall(x,y,direction){
   var wall = new THREE.Mesh( wallGeometry, wallMaterial );
   wall.position.z = y;
   wall.position.x = x;

   if(direction == 'n'){
      wall.position.z -= 0.5;
   }
   else if(direction == 'e'){
      wall.position.x += 0.5;
      wall.rotation.y = -Math.PI/2;
   }
   else if(direction == 's'){
      wall.position.z += 0.5;
      wall.rotation.y = Math.PI;
   }
   else if(direction == 'w'){
      wall.position.x -= 0.5;
      wall.rotation.y = Math.PI/2;
   }
   else{
      return false;
   }

   scene.add(wall);
}

Now we can replace all of our wall positioning code with:

placeWall(0,0,'n');
placeWall(0,0,'e');
placeWall(0,0,'w');
placeWall(0,0,'s');

And everything looks the same as before.

Which is boring. Replace that with this:

placeWall(0,0,'n');
placeWall(0,0,'w');
placeWall(0,0,'s');
placeWall(1,0,'n');
placeWall(1,0,'e');
placeWall(1,0,'s');

And now we get a hallway that’s twice as wide as our original square room.

If only remodeling in real life was so easy…

I Thought You Had The Map…

Our new helper function makes it almost impossible to mess up when placing a wall. But there’s still a couple problems:

1) It’s kind of hard to remember where we have and haven’t placed walls

2) For an interactive maze it’s not enough to just place the walls, we need to keep track of where they are for later navigation.

So what we really need is a way to describe our entire maze all at once and all in one place. We can then use that maze data to build the 3D graphics and then also use it later on for keeping track of where the player is in the maze and how they can and can’t move.

So what is the most convenient way to define our high level maze data?

Well… a maze is fundamentally just a grid of connected cells that may or may not have walls between them. So we can think of a maze as a two dimensional array of cell objects and we can define a cell object as a set of four wall values that are either “true” (there is a wall) or “false” (there is no wall).

Now since a cell has 4 walls it basically contains 4 bits of information. This means we could define each cell as a single 4-bit super-short integer and then use bitwise logic to calculate…

JUST KIDDING!

Modern computers have billions of bytes of memory. Since our program is unlikely to ever have more than a few hundred cells there is absolutely no need to obsess about jamming as much information into as few bits as possible. Instead we can write our code in a way that’s easy for us humans to understand and work with.

Something like this:

function MazeCell(northWall, eastWall, southWall, westWall){
   this.northWall = northWall;
   this.eastWall = eastWall;
   this.southWall = southWall;
   this.westWall = westWall;
}

Then we can define new cells in our labyrinth as easily as:

var cell = new MazeCell(true, true, false, true);

Of course, we don’t want our cells just hanging out in random variables. We want them to be part of a nice big grid array, something more like this:

var mazeGrid = [Array(2), Array(2)];
mazeGrid[0][0] = new MazeCell(true, false, false, true);
mazeGrid[0][1] = new MazeCell(true, true, true, false);
mazeGrid[1][0] = new MazeCell(false, true, true, true);
mazeGrid[1][1] = new MazeCell(false,false,false,false);

Which works out to be a simple ‘L” shaped room. (That’s why the last room has no walls, it’s completely blocked off from the rest of the maze and doesn’t really matter because it can’t be seen.)

Bringing It All Together

Now we have some maze data AND a function that can accurately drop walls down wherever we want. Mix them together and we can build an entire maze!

All we need is a nested foreach loop. The first loop will grab the rows from the maze. Then for each row we’ll have a second loop that grabs each individual cell and builds whatever walls it needs.

mazeGrid.forEach(function(mazeRow, rowCount){
   mazeRow.forEach(function(mazeCell, colCount){
      if(mazeCell.northWall)
         placeWall(colCount, rowCount, 'n');
      if(mazeCell.eastWall)
         placeWall(colCount, rowCount, 'e');
      if(mazeCell.southWall)
         placeWall(colCount, rowCount, 's');
      if(mazeCell.westWall)
         placeWall(colCount, rowCount, 'w');
   });
});

And now we can use our automatically rotating camera to see the shape of our little “L” shaped room.

OK… but where are we going to find a carpet that shape?

Of course there’s a limit to how much use we can get out of an automatically rotating camera. If the maze gets even a little bit bigger we wont be able to see the whole thing from our starting point at <0, 0>. If only we could control the camera, maybe walk around the maze like an actual dungeon crawler.

Hey! I’ve got a great idea for the next update!

Let’s Program A 3D Browser Game Part 3: Feeling Boxed In

Last time we started our adventure into building a 3D labyrinth by putting up a single wall, and while that was pretty neat the truth is it barely qualifies as a road block, much less a maze.

So let’s pop up another three walls and at least make ourselves a room.

Since all of our walls are the exact same shape and color (for now, at least) we can save on both code and memory by reusing the same geometry and material again and again. (Note: pro games do this too. If there are 100 identical orcs on the screen odds are good they’re all being drawn based off a single in-memory model).

While all of our walls will be sharing the same 3D data they will still have their own unique position data, which should make sense since we want all all the walls to be in difference places.

For example, let’s put a wall to the left of our camera (which lives at position <0,0,0>). To put a wall to the left of of the camera’s position we need to leave the z and y coordinates alone but set the x position to -0.5. We will then also need to twist the wall 90 degrees so that it sits at a right angle with our other wall. Well, I say 90 degrees but three.js uses radians so make that half a Pi instead. Translated into code:

var wall2 = new THREE.Mesh( wallGeometry, wallMaterial)
 wall2.position.x = -0.5;
 wall2.rotation.y = Math.PI/2;
 scene.add( wall2 );

The only trick here is making sure you twist the wall in the right direction. Since we are using a one sided material twisting the wall in the other direction would leave us looking at the wrong direction and make it invisible.

Load the page now and you should notice that the tiny bit of black you used to be able to see to the empty left of our wall has disappeared. That’s because there is now a left wall attached to it.

Now let’s add another wall to the right and one right behind the camera.

var wall3 = new THREE.Mesh( wallGeometry, wallMaterial)
 wall3.position.x = 0.5;
 wall3.rotation.y = -Math.PI/2;
 scene.add( wall3 );
 
 var wall4 = new THREE.Mesh( wallGeometry, wallMaterial)
 wall4.position.z = 0.5;
 wall4.rotation.y = Math.PI;
 scene.add( wall4 );

Now the white rectangle should go all the way from the left side of the screen to the right. But how do we make sure that the wall behind the camera was properly placed?

Easiest way is to add a little code to the render loop to make the camera spin so we can see all four walls of the room.

var render = function () {
   requestAnimationFrame( render );

   camera.rotation.y += 0.01;

   renderer.render(scene, camera);
};

Run the code now and you should look like you’re lazily spinning around in an office chair stuck in the middle of a room with pure white walls.

Although to be honest since the walls are pure white you might find that the whole thing looks more like one big white rectangle that keeps bulging and shrinking. There’s no strong sense of depth so it’s pretty easy to see the scene as a 2D animation instead of a 3D one.

Let’s fix this by adding some lighting. Lights and shadows will give the scene a strong sense of 3D.

But first we need to upgrade our material. The BasicMaterial we were using ignores lights and always looks the same no matter how bright or dark the scene is. So go find the line where we defined our material and change it to this:

var wallMaterial = new THREE.MeshStandardMaterial( );

This new wall material actually has a lighting system, which you can check this by reloading the code. Notice how the whole screen is suddenly pitch black? That’s because we don’t have any lights yet and standard materials can’t be seen without them.

So let’s give ourselves a light. Specifically we’ll be setting up a point light in the center of the room. A point light is basically a glowing point the shines equally in all directions, making it a great way to simulate light bulbs, torches and lanterns.

We can create a basic point light up by adding this code after our walls but before our rendering code:

var playerPointLight = new THREE.PointLight();
playerPointLight.position.set( 0, 0, 0 );
scene.add( playerPointLight );

Now run your code again. The point lighting should make the distant corners of the room much darker than the nearby centers of each wall, helping to improve the illusion that viewer is looking at a 3D space.

Monochromatic games are artsy and profound, right?

Cool! Technically we now know enough to build an entire maze. But let’s be honest, manually creating and positioning dozens or hundreds of walls sounds uper boring. So next time we’ll be building some utility code that will simplify maze building.

Let’s Program A 3D Browser Game Part 2: Three.js To The Rescue

Sorry For The Delay

My original plan was to show how to program a 3D web application from scratch, but after several rewrites I’ve come to the conclusion I can’t really do that without spending weeks worth of posts talking about general 3D programming. If that’s what you really want I suggest hitting this tutorial. Please note that none of their sample code will work unless you also download the helper libraries they provide in the online code repository. Just copy pasting their examples will fail without them.

As for me, I’m going to use a third party library to help us jump right into web based 3D without having to mess around too much with linear algebra and view frustums, which are all quite interesting but a bit too deep for the casual and fun game programming series I wanted to write here.

The particular library we’ll be using is called “three.js”, which can be downloaded at threejs.org. They also have plenty of documentation and examples there so if you find you enjoy the little application we’re writing here you’ll know where to go next to learn some more advanced tricks.

Let’s Get To Work

Go ahead and download three.js and grab the relatively small “three.min.js” file it comes with. Unless you’re planning on editing the library itself (and we aren’t) that’s all you really need so toss a copy of it into a new folder called something like “3dwebmaze” and create a blank “index.html” file right next to it. Now pop open your index in your favorite text editor and get ready for some coding.

First things first: we need to set up the basic webpage where all of our cool 3D stuff is going to happen. Our main requirements are to set up a canvas, make sure we load the three.js library and then tell the body of the webpage to trigger a script after it finishes loading. The script we want it to trigger will, of course, be the 3d maze program we’re going to write.

<html>
<head>
   <title>Maze With three.js</title>
   <script src="three.min.js"></script>
   <script>
      function runMaze()
      {
      }
   </script>
</head>
<body>
<body onload="runMaze();">
   <canvas id="mazeCanvas" width=600 height=450 />
</body>
</html>

With the boring stuff out of the way we can start writing some real code and filling in our runMaze function. Please note that every line of code in this particular post needs to go inside of runMaze.

The first thing runMaze needs to do is grab a reference to our canvas. Not only do we need to tell three.js where to find its drawing target, we also need to be able to grab the canvas’s width and height in order to fine tune various bits of our rendering process.

var mazeCanvas = document.getElementById("mazeCanvas");

After that we need to set up three js itself. First up is a “scene” whose job is to keep track of all the objects we want to draw. Just creating a 3D shape isn’t enough, you have to add it to the scene for anything to happen.

var scene = new THREE.Scene();

Next we need to set up the renderer, which does all the work of actually drawing things. By default the renderer will try to create a brand new canvas but if you already have one you can pass a reference to it. Notice that we’re passing WebGLRenderer an anonymous object with a “canvas” property. Just passing the canvas reference alone won’t work; this function, like a lot of three.js, expects an object full of setup details instead of direct arguments.

var renderer = new THREE.WebGLRenderer({ canvas: mazeCanvas });

The last thing we need is a camera, which helps the renderer figure out exactly how things should be drawn to the screen. The camera needs four pieces of information to work properly, so buckle yourself in for a wall o’ text.

The first camera argument is “field of view”, which is kind of a weird idea. Think of it like this: humans and cameras see the world through a cone shape. Up close you can only see things right in front of you (like a single tree) but the farther away things are the wider an area you can see all at once (like miles of distant forest). Well field of view is just a measurement of what kind of vision cone you want your camera to have. Will it be a big wide panoramic camera or will it be more focused? The “best” cone shape depends heavily on what kind of scene you want to draw. For now we’ll choose a cone with a 75 degree angle and then tweak it later if we think that’s too wide / not wide enough.

The second piece of camera information is aspect ratio, or the shape of the final picture. This needs to match whatever you canvas size is or else things will get stretched and squished just like when you accidentally try to watch an old movie on a widescreen TV. The easiest way to make sure it’s right is to just calculate the aspect ratio on the spot based on the same canvas reference we passed to the renderer. That way if we later decide to change the size of the canvas the code here will automatically update the next time we reload the page.

The last two pieces of information are the minimum and maximum distances you want objects to appear at, commonly called the near and far plane of the view frustum. Anything closer to you than the near limit gets ignored (useful for making sure you don’t draw things behind or overlapping with your camera). Anything beyond the far limit also gets ignored (useful for making sure you don’t waste processing power drawing objects so far away they would barely be a blip on the screen anyways). Once again the exact values for these depend on what kind of scene you’re programming so for now we’ll just use the default values from the three.js online tutorial and then change them later if that doesn’t work.

Put those four pieces of information all together and we get this:

var camera = new THREE.PerspectiveCamera( 75, mazeCanvas.width/mazeCanvas.height, 0.1, 1000 );

With that three.js is all set up and ready for us to start building mazes.

I think it’s safe to say that the most important part of a maze is the walls. You can get away with leaving off the ceiling and people usually don’t look too closely at the floor but without walls it’s not a maze at all. So let’s draw some walls.

In general the easiest way to draw a wall is to just make a 2D rectangle and stick it the way of the player, and after a little experimentation I found that a wall twice as wide as it is tall seems to look the best. But exactly how tall should it be?

If this was a real life maze we’d probably base the height of our maze walls off of the average height of the people we want to explore it. Example: Almost all humans (at the time of this writing) are less than seven feet tall so an eight foot high wall should be enough to prevent anybody from bumping their head.

But this isn’t a physical maze. It’s a 3D graphics maze and the imaginary cameraman running through the maze is exactly 0 units tall and has no head to bump. That means that exact size of our walls doesn’t actually matter. We could make our walls HUGE and still get them to fit on screen by just putting them far away. We could also make them super tiny but still have them feel taller than the player by just putting them close together and right in the front of the camera. What matters isn’t the size of the wall, but the ratio of its size and camera distance.

This is nice because it means we can set the scale of our maze to whatever is most convenient to work with and then adjust the camera until everything looks appropriately player sized.

Now it seems to me that the most convenient scale for a grid based maze is to let every grid square be one unit wide and one unit long. That way we can easily convert between maze locations (column 5, row 3) and 3D positions (x:5, z:3). And of course if the grid squares are one unit wide every wall will also be 1 unit wide and to get the proper 2-to-1 width to height ratio that means every wall will be 0.5 units tall.

So now that we know what size our walls are going to be let’s finish up this post by building one and drawing it to screen. It will be one unit wide, half a unit tall and we will put it half a unit away from the camera. Since the camera, by default, sits at (0,0,0) and faces in the negative Z direction that means we want to put the wall at (0,0,-0.5).

Digital Carpentry

To build this wall we need two things:

1) Some geometry describing the shape of the wall

2) A “material” describing how the wall should be colored and textured.

var wallGeometry = new THREE.PlaneGeometry( 1, 0.5 );
 var wallMaterial = new THREE.MeshBasicMaterial( );
 var wall = new THREE.Mesh( wallGeometry, wallMaterial );
 wall.position.z = -0.5;
 scene.add( wall );

For our geometry we used a simple plane, basically a flat rectangle with whatever width and height we want.

For our material we use the default MeshBasicMaterial. Without any special arguments this represents a simple flat coat of white digital paint. The only interesting thing about is that it’s a one sided texture; you can only see it from the front. Look at the wall from behind and it’s invisible. This is actually very useful (and efficient) because most of the time you only want to draw one side of a shape anyways. We will only ever look at our maze from the inside and a character like Mario should only ever be seen from the outside.

Now that we have our rectangle and our boring white material we can mix them together into a an actual useable 3D mesh. We then push its z position away from the camera by half a unit and add the final objection to our scene. Remember, only objects that are part of the scene get rendered!

But how do we make that rendering happen? Simple enough, we just have to set up a function with a few three.js hooks and then call it.

var render = function () {
   requestAnimationFrame( render );

   renderer.render(scene, camera);
};

render();

And with that we have a wall. A single, bland wall that frankly doesn’t look very 3D at all. We could have done the same thing with a single 2D draw rectangle call. But it’s a start.

Yes, it’s a boring screenshot but just think how easy it would be to compress!

Bonus Project!

Earlier I said that the exact size of our wall didn’t matter since we could make small walls look big by getting up close or big walls look small by backing away. Test this out by making the wall two or five or ten times bigger (or smaller) while also changing its z position to be two or five or ten times further away (or closer). You should notice that as long as the ratio of size to distance is constant you’ll get the exact seem fills-most-of-the-screen rectangle no matter what sizes you choose.

Let’s Program A 3D Browser Game Part 1: You Can Do That?!

A while back we saw how recent improvements in HTML5 standards and modern browser technology make it possible to program classic 2D games using nothing but Javascript. But that’s not all that modern browsers can do!

They can do full 3D.

This is exciting because it used to be that the only way to share 3D content with your users was to ask them to download and run a a suspicious standalone exe file. But now you can share your 3D ideas directly through the browser they already trust.

Of course, there are limits. The browser may already know how to render 3D graphics but you still have to send the user a copy of all your models and texture and users aren’t going to wait around for hours while you send them ten gigs of data. So you’re unlikely to be hosting full triple A games on your blog.

But there’s still a lot of cool stuff you can do with a mere dozen megabytes of 3D data. Games, simulations, interactive presentations and so on. It’s not going to be replacing the fast and reliable text based Internet we know and love anytime soon (or ever) but it can certainly enhance an existing website.

Always Turn Right

So what shall we do to practice our browser based 3D skills?

Well, lately I’ve been playing an unreasonable amount of dungeon crawlers. There’s just something about exploring a labyrinth in first person that’s more exciting than doing it from a third person eagle eye view. Probably the suspense of not being able to see what’s around each corner combined with the sense of scale you get from actually being in the dungeon.

So let’s build ourselves a first person, grid based maze. Just a maze, mind you. No combat system or treasure chests or anything. Just the maze exploration system.

Now before we start talking about our code let’s go over what our project needs to actually do:

  • Draw textured squares to represent walls, floors and ceilings
  • Accept user movement input
  • Force the user to move in a grid pattern
  • Keep track of the shape of the maze
  • Prevent the user from walking through walls

Pretty easy and straightforward as long as you have a reliable way to draw 3D graphics. And thanks to the smart people in charge of browser design we do.

So next time we can start experimenting with actual graphics code.

Let’s Program A Compression Algorithm: Index And Code

Data compression is the science of taking big computer files, shrinking them down for easy transport or storage and then expanding them back into their original size and shape later on when you need to use them again. It gets used everywhere from computer backups to streaming video to software downloads and more.

But how does it work? How do you take 100 MB of data and somehow fit it into an only 10 MB file?

Let’s find out by writing our own simple compression algorithm!

Let’s Program A Compression Algorithm Part 1: How To Fit A Byte Into A Bit And Other Curious Tricks

Let’s Program A Compression Algorithm Part 2: In Which A General Idea Becomes A Specific Algorithm

Let’s Program A Compression Algorithm Part 3: In Which A General Compression Algorithm Becomes Working Lisp Code

Let’s Program A Compression Algorithm Part 4: In Which Byte Compression Code Is Applied To An Entire File

Let’s Program A Compression Algorithm Part 5: In Which Compressed Files Are Decompressed And The Project Completed

Let’s Program A Compression Algorithm Part 6: In Which We Consider How Professional Compression Algorithms Function

Let’s Program A Compression Algorithm Part 7: Considerations On Speeding Up Slow Code

Let’s Program A Compression Algorithm Part 8: Further Musings On Speeding Up Slow Code

Code

wrcslow.lisp – Our super slow but functional prototype from parts 1-5

wrcfast.lisp – The much faster optimized code from parts 7 and 8

Let’s Program A Compression Algorithm Part 8: Further Musings On Speeding Up Slow Code

Last time we drastically sped up our program by changing how we put lists together. This time we’re going to speed things up again by changing how we take them apart.

Speed Reading A List

The part of our code that creates compressed bit lists now runs super fast but our program as a whole is still very slow. That suggests we’ve got a problem in the part of the compression function that actually writes to output. If you look at the code you’ll notice that we write to output eight bits at a time by using a series of subsequence copies and assignments. This was nice because it perfectly matched the idea in our heads (read 8 bits from the front of the list and then throw them away).

The problem here is that subseq can get pretty expensive, especially when used to grab really big subsequences from really big lists. In particular the way that we delete used bits from our bitlist is pretty horrible since we aren’t actually deleting them; instead we basically ask subseq to make a brand new copy of everything in the list except the first eight items. We then replace our original list with this slightly shorter list. This obviously does work but all that copying is reaaaalllly slow. Is there any way we can get the same effect but avoid all that work?

Well, when you think about it as long as we only read each bit once it doesn’t really matter whether we delete them after we use them or if we just leave them alone but ignore them. Kind of like a book. You don’t rip out pages after you finish reading them, you just use a bookmark to keep track of which pages you have and haven’t read.

Is it possible that we can do the same thing with our data and write some code that reads through our list eight bits at a time without deleting any of the old stuff?

Of course we can! It’s not even terribly hard. The mighty Lisp loop macro can actually be configured to read multiple values at a time and then skip forward multiple values to get its next input.

(defun white-rabbit-compress-file (input-filename output-filename)
    (let ((bitlist (compress-terminate-and-pad-file input-filename))
            (out (open output-filename :direction :output :element-type '(unsigned-byte 8))))
            (when out
                (loop for (b1 b2 b3 b4 b5 b6 b7 b8) on bitlist 
                    by (lambda (x) (cddddr (cddddr x))) 
                    do (write-byte (8-bit-list-to-byte (list b1 b2 b3 b4 b5 b6 b7 b8)) out)))
            (close out)))

As you can see we’re asking the loop for eight variables at once instead of just one and we’re telling it to skip forward multiple spaces at once by using a local lambda function that uses some weird syntax to basically look for the fourth neighbor of the fourth neighbor of our current list item. So now we read eight bits from our list, jump forward eight spaces and then repeat till we’re done.

What does that do for our runtime?

[100]> (time (white-rabbit-compress-file “chapter1.txt” “bettertimedoutput3”))

Real time: 0.44549 sec.

Run time: 0.436 sec.

Space: 3931184 Bytes

GC: 3, GC time: 0.036 sec.

T

Look at that! Half a second processing time and only 4 megabytes of memory usage now that we aren’t wasting all our time and RAM making copies of copies of copies of subsequences of copies of our copied data copies.

In fact, at this speed we can finally achieve our goal of compressing the entirety of Alice in Wonderland!

[102]> (time (white-rabbit-compress-file “aliceASCII.txt” “tinyalice”))

Real time: 5.921212 sec.

Run time: 5.872 sec.

Space: 52589912 Bytes

GC: 16, GC time: 0.86 sec.

T

For anyone who cares we managed to shrink it from 147.8 kilobytes to 113.3, which is roughly 25% smaller just like we hoped for. Go us!

Making More Functions Fast

As long as we’re in our groove it might be nice to also speed up our decompression function, especially since I’m getting pretty tired of having to wait five minutes every time I want to test whether or not a change to compression logic actually worked.

Like compression our decompression is a two step process. The white-rabbit-decompress-file function starts by calling the file-to-bitlist function to get a compressed file full of bits and then it goes on to decompress those bits and write them to our output file.

These two functions have the same efficiency flaws that compression functions did. file-to-bitlist relies too much on append and white-rabbit-decompress-file abuses subsequences.

Cutting the appends out of file-to-bitlist isn’t any different than it was for our compression function. Replace the appends with pushes to efficiently create a backwards list and then flip it around.

(defun file-to-bitlist (filename)
    (let ((bitlist '())
            (in (open filename :element-type '(unsigned-byte 8))))
        (when in
            (loop for testbyte = (read-byte in nil)
                while testbyte do (let ((bits (byte-to-8-bit-list testbyte)))
                                                    (loop for i in bits do (push i bitlist))))
        (close in))
        (nreverse bitlist)))

Fixing up the way we write to output is going to be a little bit harder. Since we’re working with a compressed file we can no longer safely say that all of our letters are eight bits long. Some letters will actually only be four bits long and others will be nine bits long. So we can’t just write a loop that always jumps n items forward between writes. Instead we’re going to have to write our own looping logic that’s smart enough to decide when to jump forward four spaces and when to jump forward nine.

The basic idea is that we will start with a variable pointing at the start of our list. We will then use subseq to check whether the list starts with a 0 or 1 (a short subsequence at the start of a list is pretty cheap). Like usual we will use this information to decide how to decompress our data. We will then manually change our variable to point either four neighbors further or nine neighbors further as needed. This will make that new spot look like the start of the list and we can just loop until we hit the termination sequence.

(defun white-rabbit-decompress-file (input-filename output-filename)
    (let ((bitlist (file-to-bitlist input-filename))
            (decompressing 1)
            (out (open output-filename :direction :output :element-type '(unsigned-byte 8))))
            (when out
                (loop while decompressing do                                             
                        (if (= 0 (first bitlist))
                            (progn (write-byte (gethash (subseq bitlist 0 4) *list-to-byte-compression-hash*) out)
                                    (setf bitlist (cddddr bitlist)))
                            (if (equal (subseq bitlist 0 9) '(1 0 0 0 0 0 0 0 0))
                                    (setf decompressing nil)                                    
                                    (progn (write-byte (8-bit-list-to-byte (subseq bitlist 1 9)) out)
                                    (setf bitlist (cdr (cddddr (cddddr bitlist)))))))))
            (close out)))

 

Once again we use a chain of cdr and it’s subtypes to jump through our list. Since this is the second time we’ve used them we might as well explain them.

Remember how I said every item in a Lisp list is made of two halves? The first usually* holds a piece of data and the second usually holds a link to the next item in the list. Together these two halves make up a “cons cell” and you can individually access each half using the historically named car function to grab the first half and cdr to grab the second half. Since the second is usually a link to the next item in the list cdr

You can walk through lisp lists by chaining these together. If you want the data at the third spot of the list you need to cdr to get to the second item then cdr again to the third item and the car to get the data, or in other words (car (cdr (crd list-with-cool-data))).

On the other hand if you want a new sublist that starts at the third part of the list you would just use two cdrs to get the link to the third item without using car to then specifically grab only the data.

Nesting these calls can get annoying though so Lisp has several build in functions that condense several chains into single function such as caddr or cddr. Unfortunately these only go up to four calls in a row which is why in order to jump the start of our list forward nine spaces we still have to chain multiple calls together.

Now that you understand how the code works let’s see how well it works:

[2]> (time (white-rabbit-decompress-file “tinyalice” “fastexpandtest.txt”))

Real time: 5.585598 sec.

Run time: 5.268 sec.

Space: 51634168 Bytes

GC: 20, GC time: 0.608 sec.

T

We can now decompress files just as fast as we compress them. It’s not exactly a great program but for an educational toy I think it turned our pretty well. And hopefully you learned at least a little about Lisp’s inner guts and what to look for when things seem slow.

 

* Sometimes the first and the second halves of a cons cell will both hold links to other cons cells, allowing for nested lists and interesting tree data structures.

Let’s Program A Compression Algorithm Part 7: Considerations On Speeding Up Slow Code

Welcome fellow Lisp enthusiasts*! Now that our proof of concept code works it’s time to talk about the elephant in the room: Our code is really slow and considering that modern Lisp is supposed to be FAST that means we’re probably doing something dumb.

But exactly how dumb? Well let’s take a scientific approach and run a test:

[7]> (time (white-rabbit-compress-file “chapter1.txt” “timedoutput”))

Real time: 251.23251 sec.

Run time: 250.732 sec.

Space: 10816791720 Bytes

GC: 9324, GC time: 159.104 sec.

T

Looks like compressing a small text file on my (admittedly old) Linux laptop took over four minutes and, more surprisingly, consumed something like 10 gigabytes of memory. That’s a freakishly huge amount of memory for working with a file that’s only 11.4kB long. And using up so much memory was a real strain on the garbage collector, which spent over two minutes just cleaning up all the data our program threw away.

How Are We Going To Fix This?

Code optimization is the reason why it’s important to not only understand WHAT your programming language can do but also HOW it does it.

Sure, most programmers can easily spot the warning signs when they personally write inefficient code: Too many nested loops, recursive functions on complex data and so on.

But what about when you load up someone else’s library and call the doSomethingCool function they wrote? Is it fast? Slow? Does it loop? Without some research you have no way of telling.

Moral of the story: Do your research!

Doing Some Research!

For example, let’s take a look at Lisp’s handy append function. Its job is simple: take two lists, glue them together and then return the combined list. Our prototype uses this function everywhere. To build bit lists. To build compressed output files. To build decompressed output files. It is no exaggeration to say most of what our program does is appending lists to other lists.

So…. How does append append lists anyways?

For that matter, what is a Lisp list?

A list in Lisp is a pretty simple thing. Each individual item in the list contains two pieces of information: The data stored there and directions on where to find the next item in the list. The final item in the list has a blank in the “next” spot. That’s how you know it’s the end of the list.

Now the easiest and fastest way to connect two lists together is to take the empty “next” from the end of the first list and point it at the start of the second list.

But append comes with a bonus guarantee that complicates things: It is guaranteed to NOT change either of its inputs. That means changing the last item in the first list is a no go.

Instead append creates a complete copy of the first list and then links that new list to the second list. (This doesn’t change the second list because Lisp lists only move forward and don’t care if anyone links to them, just who they link to).

Did we find the problem?

So append makes a copy of its first argument. That sounds like it could be a bit slow. Maybe we should double check how we us it. Let’s start with a look at our file-to-compressed-bitlist function. You might notice this little gem:

(append bit-list (compress-byte testbyte))

That’s the line of our code where we say “Take our current bit list and add the next set of compressed bits to the end”. This happens inside of a loop that runs once for every byte in the input file. So our eleven kilobyte test file is going to trigger this line some 11,000** times. And every single time is going to involve making a complete copy of our compressed bit list so far.

That will add up fast. How fast? Hmmm….

Let’s assume that between our 4 bit short codes and our 9 bit long codes the average compressed-byte comes out at 6 bits (that matches the 25% compression rate we were aiming for). Working with that average our loop probably looks sort of like this.

Step one: Append first 6 bits to empty list. No copying.

Step two: Copy existing 6 bits and link to next 6 bits. Total of 6 bits copied.

Step three: Copy existing 12 bits and link to next 6 bits. Total of 18 bits copied.

Step three: Copy existing 18 bits and link to next 6 bits. Total of 36 bits copied.

Step four: Copy existing 24 bits and link to next 6 bits. Total of 60 bits copied.

Step five: Copy existing 30 bits and link to next 6 bits. Total of 90 bits copied.

Step six: Copy existing 36 bits and link to next 6 bits. Total of 126 bits copied.

So we’re only six bytes into our 11,000+ byte file and we’ve already had to make copies of 126 list items. And while I’ve been calling them bits remember that they’re we’re actually using full 32 bit (4 byte) integers to keep track of our 0s and 1s. So that means we’ve had to copy 126 * 4 = 504 bytes just to compress six letters of input.

And it only gets worse. By the time we make it to the end of our elven kilobyte file we will have made copies equal to several thousand times the size of our original input! The bigger the input gets the worse that multiplier becomes and suddenly it’s not so mysterious why our code takes multiple minutes to run and consumes gigabytes of memory.

Functional Programming: What that be?

Before we start talking about how to “fix” append I want to take a minute and talk about why it’s not actually “broken”. Sure, using append the way we do is horribly inefficient but that’s not because append was poorly designed. In fact, append was very carefully designed to be very safe. Because it copies its inputs instead of changing them you can use it anywhere you want without having to worry about accidentally mutilating some bit of data you might need to use again later.

This is the core of what’s known as “functional programming”.

Basically programmers noticed that a lot of their worst and hardest to solve bugs were caused when different bits of code accidentally changed each other’s variables. Example: If three functions all depend on the same global variable and something else changes that variable suddenly all three functions might break down. Or if you pass the same variable as input to multiple functions and it gets changed halfway through your later functions might not work like you expected.

The traditional solution to this problem is to just work really really hard to remember which functions share which global variables and how every function changes their input. The problem here is that the bigger your program gets the harder it is to keep track of all that stuff.

This lead some people to come up with a clever idea: What if we avoid writing code with shared variables and just don’t let functions change their inputs? That should get rid of all those weird bugs and accidental data corruption issues.

When you write your code according to these rules it’s “functional”, named after math functions. After all, the quadratic equation won’t ever break down just because you used the Pythagorean Theorem wrong and and four is always four no matter how many equations you pass it through.

The obvious cost here is that functional code tends to “waste” a lot of time and memory copying inputs so they can safely work with the copies and leave the original data alone. So it’s up to you as the programmer to decide what each part of your program needs most: Functional code that’s easy to maintain and experiment with or efficient code that’s harder to work with but runs much faster.

In our case since we’re dealing with the interior of a massive loop it’s probably time to say goodbye to our safe functional prototype code and focus on speed.

Appending Faster Than Append

So we need a list building loop that doesn’t waste time or memory space. Turns out one of the more popular ways to do this in Lisp is to actually build your list backwards and then reverse it. This takes advantage of the fact that adding something to the front of Lisp list is both fast and simple.

Interesting trivia: We actually already did this in our current byte-to-8-bit-list function.

Now let’s rewrite our file-to-compressed-bitlist to use the same technique.

Basically in our new version when we compress a byte we won’t append the compressed bit pattern directly to our output. Instead we will load the compressed bits into a temporary variable and then push them one by one onto the front of our bit list.

After all the bytes are read we’ll then load our termination sequence into a variable and push that one bit at time too. This will give us a complete mirror image of the bit list we actually want, so we finish off with a call to nreverse which reverses the list. The “n” indicates this is a “unsafe” function that works very fast but will destroy the original data. Since we won’t ever need that backwards list that’s fine.

(defun file-to-compressed-bitlist (filename)
    (let ((bit-list '())
            (in (open filename :element-type '(unsigned-byte 8))))
        (when in
            (loop for testbyte = (read-byte in nil)
                while testbyte do (let ((compressed-bits (compress-byte testbyte)))
                                                    (loop for i in compressed-bits do (push i bit-list))))
        (close in))
        (let ((termination-symbol '(1 0 0 0 0 0 0 0 0)))
            (loop for i in termination-symbol do (push i bit-list)))
        (nreverse bit-lists)))

Interestingly enough this function is actually fairly functional since it avoids global variables. The only data it changes or destroys is private data so as far as the rest of the world is concerned nothing important has been changed.

Now that we’ve tuned up on of our major functions let’s give things another whirl and see what happens:

[41]> (time (white-rabbit-compress-file “chapter1.txt” “bettertimedoutput”))

Real time: 120.46666 sec.

Run time: 120.376 sec.

Space: 4636093800 Bytes

GC: 2925, GC time: 47.0 sec.

T

Not bad. Twice as fast, used up only half as much memory and not nearly as hard on the garbage collector. But even after that sort of major improvement it’s still pretty slow and memory hungry so our works not done quite yet. We’ll see what remaining inefficiencies we can trim out next time.

 

 

* Reminder: I’m not particularly good at Lisp. I just like the language.

** Yes, yes, I know. A kilobyte is actually 1024 bytes not an even 1000 but multiplying by powers of ten in your head is so much easier than powers of two and is close enough for general discussion.

Let’s Program A Compression Algorithm Part 6: In Which We Consider How Professional Compression Algorithms Function

Now that we’ve got a working little ASCII compressor it’s time to talk about how “real” compression algorithms work.

Now honestly I should probably just drop a couple wiki links to Huffman_Coding and DEFLATE but presumably you’re here on my blog because you specifically want to hear me talk about computer science so here we go!

Let’s start by refreshing our memories on how we approached the problem of compression.

We started by discovering that normal ASCII text assigns a full 8-bits to every possible letter. We then did some research and found that some ASCII letters are much more common than others. That let us invent a new standard that squished common letters into 4-bits while expanding less common letters into 9-bits. This overall resulted in files that were 20-30% smaller than their plan ASCII equivalents.

So how would a professional approach the same problem?

Believe it or not they would do more or less the same thing. Professional tools achieve their impressive 80%+ compression standards using the same basic approach we did: Find unusually common data patterns and then replace them with much smaller compressed place holders. The professional tools are just much much smarter about how they find and compress these patterns than we were.

How much smarter? Let’s find out by comparing and contrasting what we did against what better compression tools do.

 

Our OK Idea: We designed our compression patterns based around our knowledge of average letter frequencies in the English language. While this worked OK there was the problem that not every text file follows the average English pattern. A report about zigzagging zebras is going to have a ton of z’s and g’s while a C++ code file is going to have lots of brackets and semi-colons. Our average English compression strategy won’t work very well on files like that.

Their Better Idea: Professional tools don’t use a single universal encoding scheme but instead create a unique encoding for every file based on their individual symbol frequency. So if a particular text file has a ton of “z”s or parenthesis or brackets it will get its own custom code that compresses those letters. This lack of a universal standard does mean that each compressed file has to start with a dictionary explaining the specific compression pattern used for that file but the space saved by using an optimized compression pattern for each file more than makes up for the space taken up by the dictionary.

 

Our OK Idea: We not only used the same compression codes for every single file, we used the same compression codes throughout every file. This was easy to program but could be sub-optimal on longer files. If a book starts out talking about zigzagging zebras but later focuses on electric eels then one half of the book might compress better than the other.

Their Better Idea: Many compressional algorithms split files into multiple segments and then compress them individually before gluing them together into an output file. This way each segment can have its own optimized compression pattern. So if one half of a file had a lot of “z”s but no “l”s it could get a compression pattern focused on “z”s. If the second half of the file switched things around and had lots of “l”s but almost no “z”s the algorithm could then switch to a new compression pattern that shrunk down the “l”s and did nothing with “z”s. Of course this means each compressed file has to have multiple dictionaries explaining the specific compression pattern and length of each file segment but once again the space you save in better compression outweighs the space you lose from the extra dictionaries.

 

Our OK Idea: Our compression code focused only on ASCII text files. This made it easy to design and test our project but also severely limits it’s utility.

Their Better Idea: Admittedly some professional tools also only focus on one file type. For example, PNG only compresses image files. But many other professional compression tools work by looking for repeated bit patterns of any sort and length. This lets them compress any file made of bits which is, well, all of them. Sure, not every file compresses particularly well but at least there are no files you plain can’t run the algorithm on.

 

Our OK Idea: Our compression code worked by replacing common 8-bit letter patterns with smaller 4-bit patterns. This means that even in a best case scenario (a file containing only our eight compressible common letters) we could only shrink a file down to 50%.

Their Better Idea: Professional tools don’t restrict themselves to only looking for common 8-bit patterns but can instead find repeated patterns of any length. This can lead to massive compression by finding things like commonly used 256-bit patterns that can be replaced with two or three bit placeholders for 99% compression. Of course not every file is going to have conveniently repeated large bit sequences but being able to recognize these opportunities when they do happen opens up a lot of possibilities.

 

So there you have it. The main difference between professional compression algorithms and our toy is that professional programs spend a lot more time upfront analyzing their target files in order to figure out an ideal compression strategy for that specific file. This obviously is going to lead to better results than our naive “one size fits all” approach.

But figuring out the ideal compression strategy for a specific file or type of file is often easier said than done! To go much deeper into that we’d have to start seriously digging into information theory and various bits of complex math and that’s way beyond the scope of this blog. But if this little project piqued your interest I’d definitely encourage you to take some time and study up on it a bit yourself.

 

BONUS PROJECT FOR ENTERPRISING PROGRAMMERS

Hopefully this article has given you a few ideas on how our little toy compressor could be improved. And while you probably don’t want to try anything as drastic as implementing a full Huffman style algorithm I think there are a lot of relatively simple improvements you could experiment with.

For example, what if you got rid the hard coded set of “English’s eight most common symbols” and instead had your algorithm begin compression by doing a simple ASCII letter count to figure out the eight most common symbols for your specific target file? You could then have your algorithm assign compression short codes to those eight symbols which should probably lead to results at least a few percentage points smaller than our one size fits all prototypes

Of course, in order to decompress these files you’ll have to start your file with a list of which short codes map to which letters. You could do this with some sort of pair syntax (ex: {0000:z,0001:l,0010:p}) or maybe just have the first eight bytes in a compressed file always be the eight symbols from your compression map.

And with that we’re pretty much done talking about compression, but speaking of improvements reminds me that my Lisp code is still slow as cold tar. So if you have any interest in Lisp at all I’d like to encourage you to tune in again next week as we analyze and speed up our code and maybe even talk a little about that “functional programming”.