Let’s Program A Chatbot 19: Third Time’s The Charm

The Third (And Final?) Test User

 

Last time I tested DELPHI on a human it looked we were only a few modifications away from an acceptable 80% success rate. Now that we’ve made those modifications let’s take a look at a third conversation between the chatbot and another holiday visitor. It looks to me like we may have hit our goal:

 

DELPHI: HELLO! MY NAME IS DELPHI.

DELPHI: I CAN USE MY MYSTERIOUS POWER TO ANSWER YES OR NO QUESTIONS LIKE:

DELPHI: “WILL IT RAIN TOMORROW?” OR “DID I PASS MY LAST TEST?”

DELPHI: WHAT WOULD YOU LIKE TO KNOW?

>What color is the sky?

DELPHI: I’M SORRY, COULD YOU THINK OF A SIMPLER WAY TO ASK THAT QUESTION? MAYBE AS A YES OR NO QUESTION?

>Is the sky blue or purple?

DELPHI: THE FIRST OPTION. WAIT, NO THE SECOND. WAIT, WAIT, THE FIRST. I MEAN IT THIS TIME!

>Do you like sunsets?

DELPHI: ATHENA DOESN’T THINK SO, AND SHE’S THE GREEK GOD OF WISDOM!

>Is wisdom relative?

DELPHI: FATE INDICATES THAT WISDOM IS RELATIVE

>So Athena could be wrong?

DELPHI: THAT LOOKED LIKE A QUESTION, BUT I’M NOT SURE HOW TO RESPOND. MAYBE YOU COULD SIMPLIFY IT TO A YES OR NO QUESTION

>Is Athena wrong about sunsets?

DELPHI: ATHENA CAN’T BE WRONG ABOUT SUNSETS. THAT WOULD MESS UP THE CROSSWORD PUZZLE I’M WORKING ON

>Do you like crossword puzzles?

DELPHI: OF COURSE NOT. WHY WOULD YOU EVEN THINK THAT I LIKE CROSSWORD PUZZLES?

>Why are you doing a crossword if you don’t like it?

DELPHI: BECAUSE OF REASONS

>What reason?

DELPHI: I’M SORRY, COULD YOU THINK OF A SIMPLER WAY TO ASK THAT QUESTION? MAYBE AS A YES OR NO QUESTION?

>Do you often do things you don’t like?

DELPHI: ATHENA DOESN’T THINK SO, AND SHE’S THE GREEK GOD OF WISDOM!

>Does Athena do things she doesn’t like?

DELPHI: TRUST YOUR INSTINCTS ON THIS ONE

>Do you usually avoid answering questions?

DELPHI: OF COURSE NOT. WHY WOULD YOU EVEN THINK THAT I USUALLY AVOID ANSWERING QUESTIONS?

>Do you usually win arguments?

DELPHI: FATE INDICATES THAT I USUALLY WIN ARGUMENTS

>Do you believe in fate?

DELPHI: I DON’T THINK THAT I BELIEVE IN FATE

>Do you believe in soulmates?

DELPHI: OF COURSE NOT. WHY WOULD YOU EVEN THINK THAT I BELIEVE IN SOULMATES?

>Do you feel lonely?

DELPHI: I DON’T THINK THAT I FEEL LONELY

>What do you think?

DELPHI: THAT LOOKED LIKE A QUESTION, BUT I’M NOT SURE HOW TO RESPOND. MAYBE YOU COULD SIMPLIFY IT TO A YES OR NO QUESTION

>Do you value happiness?

DELPHI: I THINK IT SHOULD BE OBVIOUS THAT I VALUE HAPPINESS

>What makes you happy?

DELPHI: I’M SORRY, COULD YOU THINK OF A SIMPLER WAY TO ASK THAT QUESTION? MAYBE AS A YES OR NO QUESTION?

>Does food make you happy?

DELPHI: TRUST YOUR INSTINCTS ON THIS ONE

>Does talking to people make you happy?

DELPHI: FATE INDICATES THAT TALKING DOES TO PEOPLE MAKE I HAPPY

>Are you enjoying this conversation?

DELPHI: I SEE NO REASON WHY I WOULDN’T BE ENJOYING THIS CONVERSATION

>I like you!

DELPHI: I’M SORRY, I’M NOT SURE WHAT TO SAY ABOUT THAT. TRY A QUESTION LIKE ‘WHY IS THE SKY BLUE?’

>Why is the sky blue?

DELPHI: YOU KNOW, I’M REALLY NOT SURE EITHER. LET ME FORWARD THAT QUESTION TO MY SUPERVISOR

>Who is your supervisor?

DELPHI: THAT LOOKED LIKE A QUESTION, BUT I’M NOT SURE HOW TO RESPOND. MAYBE YOU COULD SIMPLIFY IT TO A YES OR NO QUESTION

>Is Scott your supervisor?

DELPHI: I THINK THAT SCOTT IS MINE SUPERVISOR, BUT YOU SHOULD PROBABLY DOUBLE CHECK WITH SOMEONE ELSE

>Is Scott a good supervisor?

DELPHI: FATE INDICATES THAT SCOTT IS A GOOD SUPERVISOR

>Does he compliment your performance?

DELPHI: MEH, HE PROBABLY DOES COMPLIMENT MINE PERFORMANCE, BUT I’M NOT REALLY INTERESTED IN THAT KIND OF STUFF

>Do you tell jokes?

DELPHI: I THINK IT SHOULD BE OBVIOUS THAT I TELL JOKES

>Tell me a joke.

DELPHI: I DON’T WANT TO TALK ABOUT THAT. PLEASE FEEL FREE TO ASK ME WHY

>Why?

DELPHI: I’M SORRY, COULD YOU TRY REWORDING THAT AS A YES OR NO QUESTION?

>Why don’t you want to tell a joke?

DELPHI: WHY NOT?

>Jokes are funny.

DELPHI: I’M SORRY, I’M NOT SURE WHAT TO SAY ABOUT THAT. TRY A QUESTION LIKE ‘WHY IS THE SKY BLUE?’

>goodbye

DELPHI: Come back anytime

 

Quick Analysis

 

This time around I’m going to count an answer as “good” if DELPHI either provided a correct response or if DELPHI complained and the user immediately thought of a more DELPHI compatible way to ask their questions. I’m only going to count an answer as “bad” if it has a huge grammar mistake or if the user seemed confused by DELPHI’s responses.

 

Going by this very rough metric I get a count of 30 “good” answers against 4 bad answers. That’s a whopping 88% acceptable performance. Not bad at all.

 

Overall the user did a good job of picking up quickly on what questions DELPHI could and couldn’t answer. The user occasionally explored new kinds of questions but returned to the tried and true formula every time an error appeared. The only real issue from a design perspective came near the end when DELPHI accidentally suggested it knew how to tell jokes, leading the user to try some new input that failed horribly.

 

This suggests that it might be a good idea to write a few more rules to help DELPHI talk about itself. A simple “Do you” version of our existing “Can you” rule would have been enough to avoid misleading the user about DELPHI’s joke telling capabilities.

 

But I’m going to leave that as an exercise for my wonderful readers. The only problems I’m going to try and solve are a few grammar issues I noticed.

 

My Quick Fix Is Mine

 

Two of DELPHI’s mistakes involved switching “your” to “mine”, resulting in awkward grammar like this:

 

DELPHI: I THINK THAT SCOTT IS MINE SUPERVISOR, BUT YOU SHOULD PROBABLY DOUBLE CHECK WITH SOMEONE ELSE

 

Obviously that should have been “my supervisor”. In fact, now that I think about it, “your” should always be swapped to “my”. It’s “yours” with an “s” that matches “mine”. We can fix this by updating the dictionaries we use to power switchFirstAndSecondPerson.

 

$wordsToPlaceholders[5][0]=qr/\byour\b/i;
$wordsToPlaceholders[5][1]='DELPHImy';

$wordsToPlaceholders[6][0]=qr/\byours\b/i;
$wordsToPlaceholders[6][1]='DELPHImine';

 

And of course here as some test cases to make sure the fix really fixed things

 

$testCases[22][0] = "Is that pizza yours?";
$testCases[22][1] = "FATE INDICATES THAT THAT PIZZA IS MINE";

$testCases[23][0] = "Is that dog your pet?";
$testCases[23][1] = "FATE INDICATES THAT THAT DOG IS MY PET";

$testCases[24][0] = "Is that pizza mine?";
$testCases[24][1] = "FATE INDICATES THAT THAT PIZZA IS YOURS";

$testCases[25][0] = "Is that dog my pet?";
$testCases[25][1] = "FATE INDICATES THAT THAT DOG IS YOUR PET";

 

Conclusion

 

With those two fixes DELPHI has now achieved an acceptable response rate of over 90%. That’s really impressive for a simple pattern matching program with no memory, no language parsing abilities and no sense of context. Even better, every user who worked with DELPHI admitted that they had fun with the experience and liked the silly random answers. That means we succeeded at our primary goal of creating an entertaining fortune teller.

 

So I’m done. Bye!

 

What’s that? You’re still not satisfied? Well then, maybe I can fit one more post into this Let’s Program and give you a few suggestions on how a hardcore reader could take this whole project to the next level.