Let’s Program Some Weird Markov Chains Part 4: Quantifiably Bad

We’re just about done with text based Markov Chains but there’s one thing that’s still bothering me. We saw that as our chain got deeper and deeper it got more and more likely that we would duplicate entire sentences from the original text instead of generating interesting new text.

Last time we dealt with this problem by increasing and decreasing the depth of our chain manually and then going with our gut instinct on what felt “right” and what was “too many” copies of original text.

But we’re doing computer science here! Surely we can do better than gut instinct. Maybe there’s some way we can statistically or mathematically analyze how good or bad a given Markov Chain is when it comes to generating unique text?

Well… one of the main causes of duplicated text is when entries in the Markov Chain have only one possible next word associated with them. If “I” always leads to “like” and “like” always leads to “programming” then you can’t generate anything except “I like programming”.

So maybe we should take a quick look through our one deep, two deep and three deep Markov Chains and take a look at how often their words have lots of different options vs just one or two. The code for this is pretty easy:

markovChain1OptionCount = {}
totalOptions1 = 0;

for word, options in markovChain.items():
    optionCount = len(options)
    if optionCount not in markovChain1OptionCount:
        markovChain1OptionCount[optionCount] = 0
    markovChain1OptionCount[optionCount] += 1
    totalOptions1 += 1

markovChain2OptionCount = {}
totalOptions2 = 0;

for layer1word, layer1options in markovChain2Deep.items():
    for layer2word, layer2options in layer1options.items():
        optionCount = len(layer2options)
        if optionCount not in markovChain2OptionCount:
            markovChain2OptionCount[optionCount] = 0
        markovChain2OptionCount[optionCount] += 1
        totalOptions2 += 1

markovChain3OptionCount = {}
totalOptions3 = 0;

for layer1word, layer1options in markovChain3Deep.items():
    for layer2word, layer2options in layer1options.items():
        for layer3word, layer3options in layer2options.items():
            optionCount = len(layer3options)
            if optionCount not in markovChain3OptionCount:
                markovChain3OptionCount[optionCount] = 0
            markovChain3OptionCount[optionCount] += 1
            totalOptions3 += 1

Everybody following along? We just go to the end of each option in our change and count how many options are available for it. We then keep track of how often that count has shown up. Did our first words only have one possible next word? OptionCount[1] = 1. Did our second word have three possible next words? OptionCount[3] = 1. Another 1 option only word? OptionCount[1] = 2.

And then we keep going on and on until we have a complete count for every possible number of next words. We can then look at how often we found words that only had one possible next word:

print("1 deep chains with only 1 option: {0} ({1:.0f}%)".
    format(markovChain1OptionCount[1],markovChain1OptionCount[1]/totalOptions1 * 100))
print("2 deep chains with only 1 option: {0} ({1:.0f}%)".
    format(markovChain2OptionCount[1],markovChain2OptionCount[1]/totalOptions2 * 100))
print("3 deep chains with only 1 option: {0} ({1:.0f}%)".
    format(markovChain3OptionCount[1],markovChain3OptionCount[1]/totalOptions3 * 100))

1 deep chains with only 1 option: 17917 (61%)

2 deep chains with only 1 option: 111426 (86%)

3 deep chains with only 1 option: 189202(95%)

Hmmm… that is interesting. Pretty clear evidence that the deeper our chain is the less flexibility there is.

(You can use the markovChainNOptionCount arrays to look at how often words have 2 or 3 or whatever options too! That won’t help us with our current questions but it can be interesting to explore.)

Sadly, this simple experiment doesn’t tell us the whole story.

The fact that 86% of 2 deep chains only have 1 option does NOT meant that there is an 86% chance that any given word has only one option. For instance, the 2 chains that come at the beginning of sentences have a very different probability spread:

chain2Starters = markovChain2Deep['sentencestart']
chain2StarterOptionCount = {}
chain2StarterTotalCount = 0

for word in chain2Starters:
    optionCount = len(chain2Starters[word])
    if optionCount not in chain2StarterOptionCount:
        chain2StarterOptionCount[optionCount] = 0
    chain2StarterOptionCount[optionCount] += 1
    chain2StarterTotalCount+=1

print("2 deep chain starters with only 1 option: {0} ({1:.0f}%)".
    format(chain2StarterOptionCount[1], chain2StarterOptionCount[1]/chain2StarterTotalCount * 100))

2 deep chain starters with only 1 option: 1047 (69%)

Well look at that! While 86% of 2 deep chains only have 1 possible next word only 69% of starting 2 deep pairs have only one possible next word. This sort of makes intuitive sense. Sentences in English tend to start out in just a handful of ways that can each link to a wide variety of topics. It’s only when you get near the end of a sentence that you start feeling locked into just a few specific possibilities.

Of course, even that isn’t the whole story because not every one of those 2 deep starting possibilities is equally likely. Lots of sentences start with “The” so that word will be randomly chosen a lot more often than something like “Pythagoras”.

So let’s do another experiment to figure out just how likely it is that the first word randomly chosen for our 2 deep Markov Chain will only have one possible next word.

It’s not too hard to do this. We already know how to calculate the probability of a specific starting word pair by comparing how often that word pair was seen (ex: sentencestart, the) compared to all possible word pairs that start with sentencestart. So to calculate the total probability of all starting word pairs that only have one possible next word we just have to count up how many times we saw word pairs with only one option and then divide that by all sentencestart word pairs.

total = 0
totalWithOnlyOneOption = 0

for word, count in markovChain['sentencestart'].items():
    total = total + count
    if len(markovChain2Deep['sentencestart'][word]) == 1:
        totalWithOnlyOneOption += count

print("Odds of starting with a phrase that only has 1 possible next option using 2 deep chain: {0:.0f}%".
    format(totalWithOnlyOneOption/total * 100))

Odds of starting with a phrase that only has 1 possible next option using 2 deep chain: 16%

So that’s a pretty interesting tidbit. Even though 86% of word pairs only have 1 possible next word the odds of a randomly generated sentence starting with that sort of pair is only 16%. So now we have numbers showing us why a 2 deep chain does a pretty good of job of creating different sentence starts.

But wait! Even that’s not the entire story. We may have done some good analysis on how sentences start but that doesn’t tell us much about the middle or end of sentences. Plus our analysis of how sentences start is still pretty weak. We might know the odds of a sentence starting with only option of what to do next but that doesn’t tell us very much about whether that option will also have only one option or if it might be the gateway to hundreds of different possibilities.

Now we’re not going to take these statistical experiments any further (although we certainly could). But I think this is a really good demonstration of my original point: Markov Chains are easy to understand and program but are also a very powerful tool for capturing and using statistical patterns in data.

I think we’re done with traditional text based Markov Models. Next time we’ll jump into the world of music. Will that work out well for us? No idea!