Let’s Program A Prisoner’s Dilemma Part 6: Eye For An Eye

It’s finally time to write a prisoner with some actual brain power. Let’s start by looking at our current prisoners and their unique issues:

Saints always cooperate, making them an easy target for defectors.

Devils always defect, which leads to bad scores for everyone when multiple devils get together.

Madmen act randomly, which means whether or not they make good choices is up to chance.

What we really want is a prisoner that cooperates when cooperating is a good idea and defects in self defense when it suspects its about to be betrayed. A prisoner that can judge whether or not its partner deserves to be trusted.

Trust No One?

But how will our new “judge” type prisoner decide who to trust? The only thing it will know about its partner is their ID. There’s no way to force the other prisoner to reveal their strategy*, or at least there shouldn’t be. After all, the prisoner’s dilemma is meant to model real life and most real humans aren’t mind readers.

So how will our judge decide how to make decisions if he can’t read minds or see the future? The same way we humans do. We trust people who act trustworthy and we are suspicious of people who betray us.

This leads to a strategy called “tit-for-tat”, where you treat your partner the same way they treat you. If they betray you then the next time you meet them you betray them. If they cooperate then the next time you see them you pay them back by cooperating.

To pull this off we’re going to have to give our judges a memory.

Remember that reportResults function that we programed into the original prisoner class? We’ve been ignoring that so far because none of our prisoners needed it. Saints, devils and madmen always act the same way so they don’t really care how their partners act.

But our judge can use the info from reportResults to build a list of people he can and can’t trust. All we have to do is initialize a blank hash when the judge is created. Then every time reportResults is called we update that hash. We then use this hash in cooperate? by taking the provided partner ID, looking up in our hash whether or not they like to cooperate and then matching their behavior.

But wait! What do we do the first time we meet a new player? We can’t copy their behavior if we’ve never seen how they behave!

The answer is to remember the Golden Rule and “Treat others like you want to be treated”. That means the first time a judge meets a new player he will cooperate. Not only is this good manners, it helps make sure that when judges meet each other they will get locked into a loop of infinite cooperation instead of a loop of infinite betrayal.

Designing The Judicial Branch

The two things that make a judge a judge are:

1) It keeps track of how other prisoners treat it

2) If it meets a prisoner more than once it copies their last behavior

So let’s start with feature 1 by giving our prisoner a memory. To do that we’re going to have to create a hash during prisoner initialization and then use the learnResults function to fill it with useful data.

def initialize(id)
   super(id)
   @strategy = "Judge"
   @opponentBehavior = Hash.new()
end

def learnResults(opponentID, opponentCooperated)
   @opponentBehavior[opponentID] = opponentCooperated
end

Now that our opponentBehavior hash is filling up all that’s left is to use it as part of our decision making process. This is a pretty simple two step process. When our judge is asked whether or not it wants to cooperate with a given prisoner it starts by checking whether or not it has an opponentBehvaior entry for that ID. If it does it just mimics back whatever behavior it recorded for that ID, but if it’s empty it instead defaults to cooperation.

def cooperate?(opponentID)
   if( @opponentBehavior.key?(opponentID))
      return @opponentBehavior[opponentID]
   else
      return true
   end
end

Put it all together and what do you get?

class Judge < Prisoner

   def initialize(id)
      super(id)
      @strategy = "Judge"
      @opponentBehavior = Hash.new()
   end

   def cooperate?(opponentID)
      if( @opponentBehavior.key?(opponentID))
         return @opponentBehavior[opponentID]
      else
         return true
      end
   end

   def learnResults(opponentID, opponentCooperated)
      @opponentBehavior[opponentID] = opponentCooperated
   end
end

Featuring Four Different Character Classes!

Now let’s give our code a test by setting up a group with four judges, four saints, four devils and four madmen.

The Group’s Overall Score was -23546 in 1000 rounds with 16 prisoners

ID: 8 Score: -1126 Strategy: Devil

ID: 7 Score: -1170 Strategy: Devil

ID: 6 Score: -1198 Strategy: Devil

ID: 5 Score: -1254 Strategy: Devil

ID: 14 Score: -1391 Strategy: Judge

ID: 16 Score: -1394 Strategy: Judge

ID: 15 Score: -1398 Strategy: Judge

ID: 13 Score: -1425 Strategy: Judge

ID: 12 Score: -1479 Strategy: MadMan

ID: 9 Score: -1483 Strategy: MadMan

ID: 11 Score: -1498 Strategy: MadMan

ID: 10 Score: -1510 Strategy: MadMan

ID: 3 Score: -1766 Strategy: Saint

ID: 2 Score: -1790 Strategy: Saint

ID: 4 Score: -1802 Strategy: Saint

ID: 1 Score: -1862 Strategy: Saint

In our mixed group you can see that the judges outperform both the madmen and the saints and come pretty close to matching the devils.

But that’s not good enough. We didn’t want to just match the devils, we wanted to beat them! It’s absolutely infuriating that our genuinely intelligent judges are somehow being outscored by a bunch of dumb always-defectors.

Grudge Match

OK, time to get to the bottom of this. Why can’t our judges beat the devils? Let’s get rid of all the saints and madmen and isolate our problem.

The Group’s Overall Score was -17859 in 1000 rounds with 10 prisoners

ID: 10 Score: -1558 Strategy: Judge

ID: 7 Score: -1581 Strategy: Judge

ID: 6 Score: -1581 Strategy: Judge

ID: 9 Score: -1592 Strategy: Judge

ID: 8 Score: -1597 Strategy: Judge

ID: 2 Score: -1990 Strategy: Devil

ID: 3 Score: -1990 Strategy: Devil

ID: 4 Score: -1990 Strategy: Devil

ID: 5 Score: -1990 Strategy: Devil

ID: 1 Score: -1990 Strategy: Devil

Well would you look at that. When all you have are devils and judges the judges scream ahead into first place no problem.

It’s not that our judges weren’t smart enough to out-think the devils in the mixed game, it’s that the devils had a bunch of saints sitting around they could exploit for easy points. As soon as we took away the ever-forgiving all cooperators the devils sank like a rock while the judges pulled off a brilliant group victory by cooperating with each other and paying back the deceitful devils with defection after defection.

Impossible To Trust

Let’s finish up with a few more what-if scenarios, like “What if you stuck a bunch of judges with a bunch of saints?”

The Group’s Overall Score was -10000 in 1000 rounds with 10 prisoners

ID: 1 Score: -1000 Strategy: Saint

ID: 2 Score: -1000 Strategy: Saint

ID: 3 Score: -1000 Strategy: Saint

ID: 4 Score: -1000 Strategy: Saint

ID: 5 Score: -1000 Strategy: Saint

ID: 6 Score: -1000 Strategy: Judge

ID: 7 Score: -1000 Strategy: Judge

ID: 8 Score: -1000 Strategy: Judge

ID: 9 Score: -1000 Strategy: Judge

ID: 10 Score: -1000 Strategy: Judge

A ten way tie with a perfect group score, just like when we had a group of pure saints. As long as you’re nice to the judges they’ll be perfectly nice to you.

What if we get rid of the saints and just have judges?

The Group’s Overall Score was -10000 in 1000 rounds with 10 prisoners

ID: 1 Score: -1000 Strategy: Judge

ID: 2 Score: -1000 Strategy: Judge

ID: 3 Score: -1000 Strategy: Judge

ID: 4 Score: -1000 Strategy: Judge

ID: 5 Score: -1000 Strategy: Judge

ID: 6 Score: -1000 Strategy: Judge

ID: 7 Score: -1000 Strategy: Judge

ID: 8 Score: -1000 Strategy: Judge

ID: 9 Score: -1000 Strategy: Judge

ID: 10 Score: -1000 Strategy: Judge

Yet another perfect score! Since judges start off by cooperating they inevitably end up trusting each other and create just as nice a world as a group of pure saints would.

That’s enough happy thoughts for now. Why not try something more sinister? Like a group full of devils with only one judge?

The Group’s Overall Score was -19991 in 1000 rounds with 10 prisoners

ID: 1 Score: -1998 Strategy: Devil

ID: 2 Score: -1998 Strategy: Devil

ID: 3 Score: -1998 Strategy: Devil

ID: 4 Score: -1998 Strategy: Devil

ID: 5 Score: -1998 Strategy: Devil

ID: 9 Score: -1998 Strategy: Devil

ID: 7 Score: -1998 Strategy: Devil

ID: 8 Score: -1998 Strategy: Devil

ID: 6 Score: -1998 Strategy: Devil

ID: 10 Score: -2009 Strategy: Judge

As we’ve seen time and time again there is literally no way to win in a world full of devils. They will always betray you and so you can never get ahead.

But unlike the saint, which was absolutely destroyed by a gang of devils, the judge manages to at least keep his loses pretty low. As you can see from the numbers the judge cooperated exactly once with every single devil and then never trusted them again, thereby preventing them from leeching more points out of him.

* There is actually a variation of the prisoner dilemma where each prisoner is given a complete copy of their partner’s code as an input. We’re not doing that.