Markov Chains are, in my personal opinion, one of the niftiest things to be found in the computer science bag of tricks. They are easy to understand and simple to program but still powerful enough to do some really cool stuff.
So what is a Markov Chain?
A Markov Chain is a way to model or simulate any sort of list of things or events where the next item in the list is influenced by the item that came before.
One of the best examples is human language. Here, try to fill in the blank: “He saw a bright red _____”.
You probably came up with something like “a bright red apple” or “a bright red car”.
You might have come up with something like “a bright red star” or “a bright red dragon”.
You probably didn’t think anything obviously wrong like “a bright red bluebird” or “a bright red emerald”.
And you definitely didn’t think of anything grammatically nonsensical like “a bright red sleeping” or “a bright red quickly”.
See? All I had to do was tell you the start of a sentence and your years of experience with human language let you instinctively guess what sort of words probably would or wouldn’t come next.
A Markov Chain is just a way to formally define this idea of “what will probably come next”.
For instance, a Markov Chain describing a family cookbook might include mathematical facts like:
- The word “red” is followed by the word “sauce” 50% of the time, the word “apple” 25% of the time and the word “pepper” 25% of the time.
- The phrase “preheat the oven to” is followed by “350 degrees” 80% of the time.
- The phrase “2 cups of” is followed by the word “water” 75% of the time, the word “milk” 20% of the time and the word “butter” 5% of the time.
Great. We now know what words tend to follow other words. That’s interesting trivia, but what can we do with it?
What we can do is simulate new cook book instructions by chaining our rules together. That’s why they’re called Markov Chains and not Markov Factoids.
Example: if you know the word “the” is sometimes followed by “red” and that “red” is sometimes followed by “apple” and that “apple” is sometimes followed by “tastes” and that “tastes” is sometimes followed by “good” then you can logically chain these rules together and create the full sentence “The red apple tastes good”.
That Seems Too Conveninet. What’s the catch?
Sadly, not every chain makes sense.
Consider the sentence “Sally ate red apple pie into pieces of oil”.
Every individual step in this sentence makes sense. “Sally ate” makes sense. So does “ate red” and “red apple” and “apple pie”. And there’s nothing wrong with “pie into” or “into pieces”. And “of oil” is something you’d expect to see a lot of in a cook book.
It’s only when you link all these individually valid chains together that you wind up with nonsense. So while our cooking book Markov Chain has the potential to put together consistent sounding cooking instructions it also has the potential to spout complete gibberish.
Oh No! Can We Make The Markov Chain’s Smarter?
The reason our imaginary Markov Chain generated nonsense is because it is only smart enough to look one word ahead or behind at a time, but in English you often need two or three or more words of context for things to make sense.
That suggests we could improve the quality of our Markov Chain by getting rid of “word A leads to either word B or word C” and instead upgrade to “phrase A B C leads to either word X or word Y”.
But watch out! The more context you force on your Markov Chain the less flexible it becomes, until eventually it becomes entirely useless.
How could too much context be a problem? Well, suppose you are trying to build a Markov Chain based off of Lord of the Rings with the goal of creating unique and original sentences that sound like something Tolkien would actually write. Your want to make sure you get high quality sentences so you design your Markov Chain base itself off of the past 10 words of context.
Unfortunately when you run your program you find that you aren’t getting original sentences at all! Your program does nothing but randomly print entire sentences taken word for word from the original books.
The problem was you got too specific. Tolkien might have used the individual word “elves” in hundreds of different ways throughout his books but the specific phrase “For the Elves the world moves, and it moves both…” only shows up once, leading to a boring Markov Model that does nothing but tell you with 100% certainty how specific sentences actually ended.
So building useful Markov Chains requires a certain amount of experimentation in order to find an input length with enough context to provide intelligent results but short enough that it doesn’t just directly copy whatever it is you’re modeling.
What Are They Good For?
So are Markov Chains good for anything other than generating inedible recipe fragments or producing text that sound vaguely like they came from famous books?
Of course!
Search engines use them for ranking pages and predicting user behavior. Smart phone keyboards use them for auto complete. Statisticians use them for studying medicine, economics, sociology and pretty much everything else where things happen in a specific order.
They’re also a handy way to build video game AIs that act in intelligent semi-random ways.
So understanding the basics of Markov Chains will come in handy regardless of whether you plan to try and cure cancer or just program a really awesome giant robot videogame.
Or both. Dare to be a modern day Renaissance Man!
The Title Is “Weird” Markov Chains. What is that about?
Like I said, you can use Markov Chains to model anything that has an order. But whether or not said Markov Chain is actually good for anything is completely separate matter.
So for this Let’s Program we’re going to start with a classic word based Markov Chain to analyze some public domain books and then try to mimic their writing style. That should be enough to prove we know what we’re doing. But after that we’re going to try a couple experimental ideas that may or may not produce anything worthwhile.
The first experiment will be to try and grab a bunch of musical scores and then build some Markov Chains that model what notes should come next in a song given the last couple of notes. This might let us generate some interesting sounding semi-random music. Or it might just generate the sound of a cat walking on a piano.
After that we’ll finish up with an even stranger experiment based around grabbing a bunch of thematically related images and using them to build a Markov Chain that predicts what color/shade a pixel should be based on the pixels around it. We will then use this to generate new images that I am almost certain will look like static but might include recognizable fragments that actually look related to the our original images. It’s an experiment I’ve been wanting to try for a while, honestly, and I’m glad to finally be getting around to it.