Let’s Program A Prisoners Dilemma 5: What Is A Decision Making Process?

The saint and devil prisoners were easy to write but, as we’ve seen, they’re not actually very good at playing the game. The always cooperating saints are wide open to exploitation and the devils will always defect even when cooperating would score them more points in the long run.

We clearly need a prisoner who can actually make choices instead of just doing the same thing again and again.

1d6 Points of SAN Loss

As any game designer can tell you, the easiest way to simulate decision making is with random numbers. Just give the computer a list of decisions and let it pick one at random.

So on that note I give you: The madman.

class MadMan < Prisoner
   def initialize(id)
      super(id)
      @strategy = "MadMan"
   end

   def cooperate?(opponentID)
      choice = rand(2)
      if( choice == 1)
         return true
      else
         return false
      end
   end
end

Ruby weirdness warning: In a lot of languages “0” is the same as false, so you might be tempted to have cooperate? just return the number generated by rand. Don’t do that. In Ruby only “false” and “null” are considered false. Everything else, including the number 0, are considered true. This is useful because it means the number 0 always acts like just a number. On the other hand it messes up a lot of other useful programming shortcuts so all in all it sort of break even.

Anyways, don’t forget to tell our create Prisoners method that there’s a new type of prisoner object for it to work with.

def createPrisoners(saintCount, devilCount, madmanCount)
   prisoners = Array.new
   playerCounter = 0
   saintCount.times{ prisoners.push(Saint.new(playerCounter += 1)) }
   devilCount.times{ prisoners.push(Devil.new(playerCounter += 1)) }
   madmanCount.times{ prisoners.push(MadMan.new(playerCounter += 1)) }
   return prisoners
end

Inmates Are Running The Asylum

Let’s be honest here: Making decisions at random is almost never a good idea. So how does our new class of insane prisoners perform in an actual game?

prisoners = createPrisoners(4, 4, 4)
playPrisonersDilemma(prisoners, 1000)

Ten doesn’t divide evenly into thirds so this time we’ll have four of each type of prisoner. Please not that this changes the perfect score to -12,000.

The Group’s Overall Score was -17905 in 1000 rounds with 12 prisoners

ID: 6 Score: -876 Strategy: Devil

ID: 7 Score: -882 Strategy: Devil

ID: 8 Score: -892 Strategy: Devil

ID: 5 Score: -906 Strategy: Devil

ID: 12 Score: -1499 Strategy: MadMan

ID: 9 Score: -1504 Strategy: MadMan

ID: 10 Score: -1538 Strategy: MadMan

ID: 11 Score: -1564 Strategy: MadMan

ID: 2 Score: -2026 Strategy: Saint

ID: 1 Score: -2060 Strategy: Saint

ID: 4 Score: -2074 Strategy: Saint

ID: 3 Score: -2084 Strategy: Saint

Madmen randomly flip between acting like saints and acting like devils so it makes sense they would wind up scoring squarely in between the two. They don’t just let the devils betray them; sometimes they betray right back. And they alternate between cooperating with saints for small gains and betraying them for big gains.

So all in all it seems like mild insanity is actually a pretty well rounded fit for the cutthroat world of the prisoner’s dilemma.

Also, as promised, the fact that madmen can make actual decisions means that our overall group score now has some variation to it even when running multiple games with the same group.

The Group’s Overall Score was -17995 in 1000 rounds with 12 prisoners

The Group’s Overall Score was -18046 in 1000 rounds with 12 prisoners

The Group’s Overall Score was -17938 in 1000 rounds with 12 prisoners

The Lost And The Damned

So madmen seem to do pretty well in a mixed group… but maybe that’s just because they had some saints to act as backup. What happens when we pair up only devils and madmen?

The Group’s Overall Score was -17523 in 1000 rounds with 10 prisoners

ID: 4 Score: -1440 Strategy: Devil

ID: 1 Score: -1450 Strategy: Devil

ID: 2 Score: -1456 Strategy: Devil

ID: 3 Score: -1456 Strategy: Devil

ID: 5 Score: -1472 Strategy: Devil

ID: 9 Score: -2013 Strategy: MadMan

ID: 7 Score: -2014 Strategy: MadMan

ID: 8 Score: -2068 Strategy: MadMan

ID: 6 Score: -2072 Strategy: MadMan

ID: 10 Score: -2082 Strategy: MadMan

About the same thing as when there were saints, it turns out. The madmen’s habit of cooperating roughly half the time means they still can’t directly compete with the vicious defecting devils, but at least randomly defecting half of the time allows them to sort of defend themselves.

In fact, if you compare this to the time we evenly paired up devils and saints you’ll see that the madmen scored about the same as the saints. But the big difference is that the madmen did much more damage to the devils in the process.

Although it’s up in the air as to whether this is a good thing or not. Taking the devils down a notch is certainly a satisfying feeling but the madmen still lost and the group score is much much worse then when the saints had their match with the devil.

The More Randomness You Have The Less Random It Is

For our final experiment I just want to point out that while I call them “Madmen” the random decision prisoners actually managed to achieve some pretty reliable results, consistently scoring halfway between a saint and a devil.

This is of course because random numbers tend to average out over time. So “cooperates at random” eventually transforms into “consistently cooperates 50% of the time”.

To show this off I’m going to have a bunch of madmen play increasingly long games against each other.

prisoners = createPrisoners(0, 0, 10)
playPrisonersDilemma(prisoners, 10)
playPrisonersDilemma(prisoners, 100)
playPrisonersDilemma(prisoners, 1000)
playPrisonersDilemma(prisoners, 1000000)

The Group’s Overall Score was -141 in 10 rounds with 10 prisoners

ID: 10 Score: -7 Strategy: MadMan

ID: 1 Score: -10 Strategy: MadMan

ID: 6 Score: -12 Strategy: MadMan

ID: 8 Score: -13 Strategy: MadMan

ID: 9 Score: -14 Strategy: MadMan

ID: 5 Score: -15 Strategy: MadMan

ID: 2 Score: -16 Strategy: MadMan

ID: 3 Score: -17 Strategy: MadMan

ID: 7 Score: -18 Strategy: MadMan

ID: 4 Score: -19 Strategy: MadMan

The Group’s Overall Score was -1493 in 100 rounds with 10 prisoners

ID: 8 Score: -132 Strategy: MadMan

ID: 3 Score: -135 Strategy: MadMan

ID: 6 Score: -139 Strategy: MadMan

ID: 5 Score: -144 Strategy: MadMan

ID: 1 Score: -145 Strategy: MadMan

ID: 9 Score: -154 Strategy: MadMan

ID: 7 Score: -156 Strategy: MadMan

ID: 4 Score: -159 Strategy: MadMan

ID: 2 Score: -163 Strategy: MadMan

ID: 10 Score: -166 Strategy: MadMan

The Group’s Overall Score was -14976 in 1000 rounds with 10 prisoners

ID: 3 Score: -1437 Strategy: MadMan

ID: 8 Score: -1457 Strategy: MadMan

ID: 1 Score: -1482 Strategy: MadMan

ID: 4 Score: -1487 Strategy: MadMan

ID: 7 Score: -1491 Strategy: MadMan

ID: 10 Score: -1492 Strategy: MadMan

ID: 5 Score: -1503 Strategy: MadMan

ID: 6 Score: -1514 Strategy: MadMan

ID: 9 Score: -1537 Strategy: MadMan

ID: 2 Score: -1576 Strategy: MadMan

The Group’s Overall Score was -15001829 in 1000000 rounds with 10 prisoners

ID: 3 Score: -1498113 Strategy: MadMan

ID: 10 Score: -1499339 Strategy: MadMan

ID: 1 Score: -1499525 Strategy: MadMan

ID: 9 Score: -1500065 Strategy: MadMan

ID: 7 Score: -1500128 Strategy: MadMan

ID: 2 Score: -1500445 Strategy: MadMan

ID: 4 Score: -1500662 Strategy: MadMan

ID: 5 Score: -1500894 Strategy: MadMan

ID: 8 Score: -1501085 Strategy: MadMan

ID: 6 Score: -1501573 Strategy: MadMan

As you can see, the longer the game lasts the less difference there is in the scores.

After ten rounds the worst score (-19) was almost three times as bad as the best score (-7).

After one hundred rounds the worst score (-166) was only about 25% worse than the best score (-132).

After a thousand rounds there was only a 10% difference between best (-1437) and worst (-1576).

And after a million rounds there was less than a 1% difference between best and worst.

So in the long run cooperating at random is a viable way to take a middle path cooperation and defection.

We Can Be Smarter Than This

Random decision making lead to some interesting outcomes but it failed to come even close to beating the devils. Plus it’s an embarrassingly simple algorithm for AI enthusiasts like ourselves. Surely we can come up with a smarter prisoner. One that actually thinks instead of guessing.

But that’s going to have to wait for next time.

Let’s Program A Prisoners Dilemma Part 4: Heaven or Hell, Let’s Rock!

Last time we wrote enough code to actually play the prisoner’s dilemma. Combined with the Saint and Devil prisoners we wrote we now have enough code to run some interesting experiments.

Heaven

Let’s start by putting together a group of nothing but saints and seeing what happens to them after a thousand rounds.

#A group of all saints
prisoners = createPrisoners(10, 0)
playPrisonersDilemma(prisoners, 1000)

Which results in this:

The Group’s Overall Score was -10000 in 1000 rounds with 10 prisoners

ID: 1 Score: -1000 Strategy: Saint

ID: 2 Score: -1000 Strategy: Saint

ID: 3 Score: -1000 Strategy: Saint

ID: 4 Score: -1000 Strategy: Saint

ID: 5 Score: -1000 Strategy: Saint

ID: 6 Score: -1000 Strategy: Saint

ID: 7 Score: -1000 Strategy: Saint

ID: 8 Score: -1000 Strategy: Saint

ID: 9 Score: -1000 Strategy: Saint

ID: 10 Score: -1000 Strategy: Saint

Because the saints always cooperate they all ended up with the same score. What is much more interesting is their group score, which is actually perfect.

Why is -10,000 perfect? In every round every pair can do one of three things: if both prisoners cooperate the pair loses only 2 points, if one prisoner cooperates but the other defects the pair loses 3 points and if both defect the pair loses 4 points. Ten prisoners means five pairs. Five pairs multiplied by the best case scenario of -2 points gives us -10 points per round. Multiply that by the 1000 rounds we played the game and you find out that the best possible score for a group of this size playing a game this long is in fact -10,000 points.

So, unsurprisingly, when you have a group made up of nothing but saints you get a really really good outcome.

Hell

Now let’s try the inverse and put 10 devil’s together.

#A group of all devils
prisoners = createPrisoners(0, 10)
playPrisonersDilemma(prisoners, 1000)

Which turns out like this:

The Group’s Overall Score was -20000 in 1000 rounds with 10 prisoners

ID: 1 Score: -2000 Strategy: Devil

ID: 2 Score: -2000 Strategy: Devil

ID: 3 Score: -2000 Strategy: Devil

ID: 4 Score: -2000 Strategy: Devil

ID: 5 Score: -2000 Strategy: Devil

ID: 6 Score: -2000 Strategy: Devil

ID: 7 Score: -2000 Strategy: Devil

ID: 8 Score: -2000 Strategy: Devil

ID: 9 Score: -2000 Strategy: Devil

ID: 10 Score: -2000 Strategy: Devil

The devils also wound up with a ten-way tie, but the real story is in that group score: -20,000 points is the absolute worst score a group this size can manage! (-4 points for a double defection * 5 pairs * 1000 rounds)

You probably saw that coming. Of course a big group of devils is going to make an absolute mess of things. I didn’t just draw that name out of a hat.

Wolf Among Sheep

So if you have a group of all saints they will all tie for first place while a group of all devils will tie for last place. That’s good to know but doesn’t do much good for those of us living in a world full of both good and bad people.

So let’s mix things up!

#Saints with one devil among them
prisoners = createPrisoners(9, 1)
playPrisonersDilemma(prisoners, 1000)

The Group’s Overall Score was -11000 in 1000 rounds with 10 prisoners

ID: 10 Score: 0 Strategy: Devil

ID: 5 Score: -1188 Strategy: Saint

ID: 6 Score: -1206 Strategy: Saint

ID: 4 Score: -1210 Strategy: Saint

ID: 2 Score: -1212 Strategy: Saint

ID: 1 Score: -1216 Strategy: Saint

ID: 8 Score: -1220 Strategy: Saint

ID: 9 Score: -1232 Strategy: Saint

ID: 3 Score: -1236 Strategy: Saint

ID: 7 Score: -1280 Strategy: Saint

This test is actually a little interesting because it’s the first one we’ve run where the score will be different every time you play the game. Run the program a few times in a row and see for yourself: The scores will move move up and down a bit. The devil always gets a perfect score but the individual saint scores will go up and down based on how often they had the bad luck of getting stuck with the all-betraying devil.

The fact that the devil always get a perfect score is also pretty interesting, although it isn’t hard to figure out how he pulled it off. The devil never accepts the -1 penalty for cooperating and the saints never penalize their partner with the -2 points of betrayal, so there is literally no way for the devil to lose.

Also, look at the group score. The devil may have messed up the saints ten way tie but at least the group is still within 10% of being perfect.

Diabolic Duo

It doesn’t seem fair to have a single devil running around taking advantage of everything, so let’s give him a taste of his own medicine by introducing a second devil to the mix.

#A group of mostly saints
prisoners = createPrisoners(8, 2)
playPrisonersDilemma(prisoners, 1000)

The Group’s Overall Score was -12000 in 1000 rounds with 10 prisoners

ID: 10 Score: -232 Strategy: Devil

ID: 9 Score: -232 Strategy: Devil

ID: 8 Score: -1408 Strategy: Saint

ID: 7 Score: -1438 Strategy: Saint

ID: 1 Score: -1444 Strategy: Saint

ID: 6 Score: -1446 Strategy: Saint

ID: 2 Score: -1448 Strategy: Saint

ID: 3 Score: -1448 Strategy: Saint

ID: 4 Score: -1452 Strategy: Saint

ID: 5 Score: -1452 Strategy: Saint

Once again the scores will move up and down a bit depending on how often the devils get paired up and betray each other versus how often they get paired up with a defenseless saint, but the overwhelming trend here is for the devils to come out ahead. After all, each devil has an 8 out of 9 chance of getting paired with an easy to exploit saint.

(It is technically possible for the devils to lose if they get paired up a lot… but the odds of that happening a thousand times in a row is really really low. Try running shorter games if you want to see what happens when lightning strikes and devils always get stuck together).

Other interesting things to note include the fact that the devils will always have the same score (since they only lose points when they get paired together and then they always both lose two points). It’s also worth noting that our group score is now 20% worse than perfect. If this was a neighborhood it would still be a nice place to live even if a few of your neighbors smell like sulfur and brimstone and tend to park their cars in front of other people’s driveways.

Wheat and Tears

Let’s level the playing field and mix together an even number of saints and devils.

#An even mix of saints and devils
prisoners = createPrisoners(5, 5)
playPrisonersDilemma(prisoners, 1000)

The Group’s Overall Score was -15000 in 1000 rounds with 10 prisoners

ID: 6 Score: -900 Strategy: Devil

ID: 9 Score: -900 Strategy: Devil

ID: 7 Score: -910 Strategy: Devil

ID: 10 Score: -924 Strategy: Devil

ID: 8 Score: -934 Strategy: Devil

ID: 4 Score: -2030 Strategy: Saint

ID: 2 Score: -2080 Strategy: Saint

ID: 1 Score: -2086 Strategy: Saint

ID: 5 Score: -2116 Strategy: Saint

ID: 3 Score: -2120 Strategy: Saint

About what you would expect. The devils still manage to absolutely destroy the saints. Even worse, our group score is starting to get pretty bad and even after exploiting the saints mercilessly the devils are only a little better off than they would have been if they had decided to cooperate.

I guess the lesson here is that crime only pays when you have a small number of criminals exploiting a large population of good productive citizens. If you have too many criminals there just isn’t enough loot to go around.

A Light in the Darkness

For our last experiment let’s see what happens when you have a group made up mostly of devils and throw in a single saint. Doesn’t take a genius to predict this is going to be pretty bad…

#A single saint in a group of devils
prisoners = createPrisoners(1, 9)
playPrisonersDilemma(prisoners, 1000)

The Group’s Overall Score was -19000 in 1000 rounds with 10 prisoners

ID: 6 Score: -1756 Strategy: Devil

ID: 3 Score: -1766 Strategy: Devil

ID: 10 Score: -1766 Strategy: Devil

ID: 7 Score: -1774 Strategy: Devil

ID: 8 Score: -1774 Strategy: Devil

ID: 4 Score: -1780 Strategy: Devil

ID: 9 Score: -1792 Strategy: Devil

ID: 5 Score: -1794 Strategy: Devil

ID: 2 Score: -1798 Strategy: Devil

ID: 1 Score: -3000 Strategy: Saint

Ouch. Our poor saint wound up with the absolute worst possible individual score this time around. In every single round the saint hit himself with a -1 penalty for cooperating and then got with a -2 penalty from his defecting partner. Doesn’t get any worse. How perfectly terrible.

The group score is also pretty horrible and the devils are now significantly worse off than if they had just cooperated. (Fun experiment, mess with the ratio of saints to devils and find the exact point at which being a devil no longer pays).

One Last Observation

Let’s finish this post off with one last interesting trend you may have noticed: Running an experiment multiple times in a row caused individual scores to bounce around a lot but the group score never changed by even a single point. Why is that?

I’ll give you a few moments to think about it.

The answer is simple enough. Saints always cooperate, which causes the group to lose one point per round. Devils always defect, which causes the group to lose two points per round. Doesn’t matter how you mix up the pairs; the group as a whole will always lose one point per saint and two points per devil.

To get a group score that occasionally changes we would have to have a prisoner that sometimes cooperates (losing the group only one point) and sometimes defects (losing the group two points). Then the group score could go up or down based on how that individual prisoner decided to play.

Which is exactly what we’re going to be doing next time!

Let’s Program A Prisoner’s Dilemma Part 3: Fight In Cell Block D

Last time we programmed a couple prisoners and ran them through their paces. This time we’re going to write the code for the actual prisoner’s dilemma game.

We start by pseudo-coding our algorithm:

  • First: generate a bunch of prisoners objects.
  • Second: decide how many rounds the game should last.
  • During each round:
    •    Randomly pair up prisoners.
    •    Each prisoner is asked whether or not they want to cooperate with their partner
    •    Prisoners lose points based on the decision they and their partner made
    •    Move on to the next round by randomly sorting prisoners into new pairs
  • When all the rounds are done show the stats of every prisoners so we know who won (by losing the least points).
  • Also show the total sum of all the prisoners’ scores so we can compare different mixes of prisoners.

Breaking News: Prisons Filling Up At An Alarming Rate

Let’s start by figuring out how to set up a group of prisoners. We could just hard code an array of prisoner objects but that would leave us in the bad situation of having to rewrite actual code every time we wanted to try a new mix of prisoners. That sounds both boring and error prone so instead let’s come up with a function we can use to create a mix of prisoners on the fly.

def createPrisoners(saintCount, devilCount)
   prisoners = Array.new
   playerCounter = 0
   saintCount.times{ prisoners.push(Saint.new(playerCounter += 1)) }
   devilCount.times{ prisoners.push(Devil.new(playerCounter += 1)) }
   return prisoners
end

We just tell this function how many saints and devils we want and it gives us exactly what we asked for packaged inside a convenient array. It also keeps count of how many prisoners it has created so far and uses that count to make sure every prisoner has a unique ID.

This function also shows off a rather handy Ruby shortcut for writing a classic “for loop”. If you want a loop that runs a certain number of times just take that number, or a variable holding that number, and add .times{ your code here } to the end. Ex: 4.times{ puts “This will print four times” }

Exciting Next-gen Gameplay!

To actually play the prisoners dilemma we need a group of prisoners and an idea of how many rounds we want the game to last, which means our function definition probably needs to look a little like this:

def playPrisonersDilemma(prisoners, rounds)

Then inside the function itself we want to set up a loop that runs once for every round in the game. During those rounds we want to randomly shuffle our prisoners, divide them up into pairs and then ask every prisoner whether they want to cooperate with their temporary partner or not.

At that point we subtract one point from any player who chose to cooperate and subtract two points from any player who was betrayed by their partner (Yes, a player can be hit by both penalties in the same round).

rounds.times{
   pairs = prisoners.shuffle.each_slice(2)
   pairs.each{ |pair|
      firstPlayerCooperates = pair[0].cooperate?(pair[1].id)
      secondPlayerCooperates = pair[1].cooperate?(pair[0].id)

      if(firstPlayerCooperates)
         pair[0].score -= 1
      else
         pair[1].score -= 2
      end

      if(secondPlayerCooperates)
         pair[1].score -= 1
      else
         pair[0].score -= 2
      end

      pair[0].learnResults(pair[1].id, secondPlayerCooperates)
      pair[1].learnResults(pair[0].id, firstPlayerCooperates)
   }
}

Once again we’re using handy Ruby shortcuts to save a bit of typing. We use rounds.times to set up a loop that will play the game the proper number of times. We then use prionser.shuffle to randomly mix up the prisoners and then chain it to each_slice(2) to divide the random mix into random pairs.

Important note: shuffle and slice don’t change the original array. They instead return a transformed copy that has to be assigned to a variable before you can use it. But for us this is actually a good thing because it means we can just shuffle and slice the same prisoners array at the start of each loop without having to worry about it somehow getting messed up between iterations.

Once we’ve got our random pair array we can use pairs.each to write some quick code we want to run once for each piece of data in our array. The each loop starts out by grabbing an item from the array and storing it inside whatever variable we’ve named with the | variable | syntax.

In our case the pairs array is full of tiny two item arrays, so we call our variable pair. From there it’s pretty simple to each half ot he pair whether or not it wants to cooperate with the other half and then we can assign points. Remember our scoring rules: A player who cooperates loses one point. A player who refuses to cooperate forces the other player to lose two points. We also call the learnResults function to let each prisoner know how the round played out.

After the rounds.times loop has finished playing the game all that’s left is to print out the final score.

#Show the stats for the group as a whole as well as for each individual prisoner
groupScore = 0
prisoners.each{ |prisoner| groupScore += prisoner.score }
puts "The Group's Overall Score was #{groupScore} in #{rounds} rounds with #{prisoners.length} prisoners"
prisoners.sort{ |x, y| y.score <=> x.score}.each{ |prisoner| prisoner.report }

Nothing complicated here. We use an each loop to tally up the scores from every player so we can print a group score. Then we sort the prisoners by score and use another each loop to print out the individual stats for every prisoner.

All Together Now

Gluing together all the bits of code we just wrote leaves us with this nice function:

def playPrisonersDilemma(prisoners, rounds)
   if !prisoners.length.even?
      throw "Prisoners Dilemma requires an even number of participants"
   end

   # Make sure each prisoner starts out with a clean slate
   prisoners.each{ |prisoner| prisoner.score = 0}
   
   rounds.times{
      pairs = prisoners.shuffle.each_slice(2)
      pairs.each{ |pair|
         firstPlayerCooperates = pair[0].cooperate?(pair[1].id)
         secondPlayerCooperates = pair[1].cooperate?(pair[0].id)

         if(firstPlayerCooperates)
            pair[0].score -= 1
         else
            pair[1].score -= 2
         end

         if(secondPlayerCooperates)
            pair[1].score -= 1
         else
            pair[0].score -= 2
         end
      }
   }

   #Show the stats for the group as a whole as well as for each individual prisoner
   groupScore = 0
   prisoners.each{ |prisoner| groupScore += prisoner.score }
   puts "The Group's Overall Score was #{groupScore} in #{rounds} rounds with #{prisoners.length} prisoners"
   prisoners.sort{ |x, y| y.score <=> x.score}.each{ |prisoner| prisoner.report }
end

And now we can test it by having a group of prisoners play the game. Let’s try it out with a group of two saints and two devils playing for a thousand rounds:

prisoners = createPrisoners(2, 2)
playPrisonersDilemma(prisoners, 1000)

This should give you output kind of like this:

The Group’s Overall Score was -6000 in 1000 rounds with 4 prisoners

ID: 4 Score: -650 Strategy: Devil

ID: 3 Score: -650 Strategy: Devil

ID: 2 Score: -2350 Strategy: Saint

ID: 1 Score: -2350 Strategy: Saint

Hmm… looks like the Devils had no trouble at all completely crushing the Saints. We’ll look at that match up in more detail next time.