Let’s Program A Chatbot: Index And Code

Posted on December 26, 2013 by Scott

Introduction

Have you ever wondered how chatbots work? Do you want to get some practice with regular expressions? Need an example of test driven development? Want to see some Perl code?

If the answer to any of those questions was “Yes” then you’re in luck because I happen to have just finished a little series of posts on writing pattern matching chatbots using Perl, regular expressions and test driven development. Enjoy!

Index

Let’s Program A Chatbot 1: Introduction And Theory

Let’s Program A Chatbot 2: Design Before You Code

Let’s Program A Chatbot 3: Choosing A Programming Language

Let’s Program A Chatbot 4: Let’s Talk About Test Driven Development

Let’s Program A Chatbot 5: Finally, Code!

Let’s Program A Chatbot 6: Don’t Fear The Regex

Let’s Program A Chatbot 7: To Be Or Not To Be, That Is The Use Case

Let’s Program A Chatbot 8: A Little Housecleaning

Let’s Program A Chatbot 9: The Grammar Police

Let’s Program A Chatbot 10: Chatting With The Bot

Let’s Program A Chatbot 11: Bad Adjectives

Let’s Program A Chatbot 12: When The Answer Key Is Wrong

Let’s Program A Chatbot 13: What’s Mine Is Yours

Let’s Program A Chatbot 14: Variety Is The Spice Of Life

Let’s Program A Chatbot 15: “ELIZA Effect” Should Be A Movie Title

Let’s Program A Chatbot 16: Testing On Live Subjects

Let’s Program A Chatbot 17: Blitzcode!

Let’s Program A Chatbot 18: A Bit Better Than Before

Let’s Program A Chatbot 19: Third Time’s The Charm

Let’s Program A Chatbot 20: What Next?

Complete Code

If you follow along with the posts you should be able to write your own chatbot from scratch. But if you don’t have the time for that or just want some reference code I have also provided my complete chatbot, user interface and testing suite: Complete Chatbot Code

Let’s Program A Chatbot 20: What Next?

Posted on December 26, 2013 by Scott

I’m Done… But Maybe You Aren’t

When I decided to do this whole “Let’s Program A Chatbot” I really only had two goals: Show how to build a very simple chatbot from scratch* and demonstrate the basics of test driven software development. And I feel like I’ve done that. Objective complete, gain 500xp and proceed to the next quest.

But I’m willing to bet that some of my readers aren’t satisfied yet. This humble set of posts was probably enough for those of you where just curious or bored, but what about those of you with very specific chatbot related goals? Goals that you haven’t reached yet. What are you supposed to do now?

That really depends on what exactly your goals are.

If You Want More Practice Writing Software

DELPHI is a good practice project, just complicated enough to require some real thought and development strategies while still being simple enough that you only need a few weekends to get some serious work done. If your main goal is to just get a little bit better at software or practice your Perl and regular expressions then it might make sense to just keep working on improving DELPHI.

Here are some (relatively) simple activities that you might want to try:

Keep Fine Tuning DELPHI: DELPHI is far from perfect and there are lots of ways it could be enhanced or fixed. For example, a lot of my test users asked DELPHI about the weather so it might be interesting to write some high-priority rules that can recognize questions about weather and generate weather themed answers. For further ideas just find a test user, record their conversation and then look for any instance where DELPHI didn’t act as smart as you wanted. Then write up some test cases and see if you can’t get DELPHI acting a little smarter.

Turn DELPHI Into ELIZA: ELIZA is a famous chatbot that pretends to be a psychiatrist by turning user input into questions. With a little research I’m sure you could find a copy or summary of the basic patterns and responses it used to do that. Now erase all of DELPHI’s old chat patterns and create a brand new set based off of ELIZA. Creating new regex patterns to match ELIZA’s strategy will be great practice.

Create A Whole New Set Of Rules And Responses: Instead of replacing DELPHI’s rules with ELIZA’s rules, why not try creating your own brand new set of rules and building a completely unique chatbot? Like baseball? Create a new set of rules oriented around explaining the game and sharing trivia about famous players. Are you a history buff? Create rules for answering questions about your favorite historic event or person. This will give you great practice at designing chatbots, writing test cases and coding up regular expressions.

Of course, if all you ever do is write and replace chat rules you’ll always be limited by the innate weaknesses of simple pattern matching. For a better chatbot and some more serious coding practice why not try tackling one of these projects:

Give DELPHI A Memory: Wouldn’t it be nice if DELPHI could ask the user for their name and then remember it? Or if DELPHI could remember that the user’s last question was about their dog and then use that information to create a personalized error message the next time it get’s confused? (I’M NOT SURE WHAT TO SAY TO THAT. TRY ASKING ME ANOTHER QUESTIONS ABOUT YOUR DOG). To pull this off you’d have to create a new data structure to hold DELPHI’s memories and rewrite the response generation system to figure out what information it should store and when it should retrieve it. Tricky, but satisfying.

Connect Input To Function Calls: Currently DELPHI does the exact same thing for all input patterns: It grabs an output pattern from an array and prints it to the screen. But what if we need more flexibility? Maybe when the user asks “What time is it?” we want to run a special sub-function that figures out what time it is and builds a response around that? Or maybe we want DELPHI to be able to answer “What is X?” questions by searching Wikipedia?

If you want DELPHI to do something different for each type of input you’ll need to find some way to associate input patterns with function calls instead of with simple output arrays. In Perl you can do this with function references. Exactly how would you do this? That’s for you to figure out. This is an advanced exercise after all.

Port DELPHI To A New Language: DELPHI uses regular expressions to match user input to chatbot output. Most modern languages support regular expressions or have easy-to-find regular expressions libraries. So why not recreate DELPHI in C++ or Java or Lisp or Python or Ruby or whatever it is all the cool kids are doing these day. Depending on what language you choose and how unlike Perl it is you may have to redesign major features like how the chat patterns are stored, but that’s what makes this a good advanced exercise.

If You Want To Make Serious Chatbots

What if you aren’t here to practice your software skills? What if you’re here because you really want to build a fully featured chatbot as soon as possible? In that case messing around with a toy like DELPHI is probably a waste of your time. It would make much more sense to use an existing tool that already has all the features you wat.

Now I’ve never had to deploy a professional quality chatbot, but from what I’ve seen ALICE is probably the way to go. ALICE uses the AIML standard to allow for very powerful pattern matching style chatbots. It also seems to already have support for advanced features like context and memory, which would be a challenge to shove into DELPHI.

Even better, there are tons of existing AIML libraries that can be downloaded to make your chatbot an instant expert on a wide variety of subjects. So not only do you not have to write your bot from scratch, you don’t have to write your responses from scratch either. Just grab a library and then tweak it for your specific needs.

So if you need to build a strong chatbot fast I’d recommend learning about AIML and downloading ALICE.

If You Want To Write Programs That Really Understand Language

By this point in the series you’ve probably realized that DELPHI (and most other chatbots) doesn’t actually understand English or how grammar works. DELPHI just looks at input, compares it to a list or patterns and then spits out a response. It’s a glorified “if… then…” statement.

On the one hand it’s pretty cool that such simple tricks can create an almost human-feeling program. On the other hand it’s pretty disappointing that DELPHI is only a trick. That’s like learning that the magicians at Vegas are just using slight of hand. You wanted real magic. You wanted a program that really understands human language.

Lucky for you there is a lot of research going into teaching human language to computers. Just go to your nearest university level library or favorite search engine and do some research on “Natural Language Processing”. This is an extremely large area of research that includes topics like: Labeling words as their part of speech, statistically analyzing books, automatic language translation, parsing sentences, building sentences and using context to deal with ambiguity. Not to mention the endless philosophical debates on what it even means for a computer to “understand” or “know” something.

But be warned that this sort of stuff gets complicated fast. Dive into this topic and you’re very quickly going to find yourself learning more about linguistics, grammar, statistics and logic than you may have ever wanted. It’s a fun ride but don’t go into it thinking you can read a book or two and then slap together an AI that passes the Turing Test with flying colors. This is hard, serious research at the cutting edge of computer science.

But if you manage to push that cutting edge further, even by a little…

Conclusion

DELPHI is done and I hope all my wonderful readers have enjoyed themselves. I’ve done my best to show you a few fun tricks and point the insatiably curious towards more ambitious projects and sources of knowledge. All that’s really left now is clean up work. I should probably make an index page that links all these individual posts together. Probably ought to post my complete code as well.

*As a young college student I had the hardest time finding any books or tutorials on chatbots. That was really frustrating and this whole series has basically just been me writing the sort of article I wish I could have read five years ago.

Christmas In One Minute

Posted on December 25, 2013 by Scott

Don’t you hate waiting for Christmas? Have you ever wished you could make it happen now?

With physics you can! Relativity says that the closer you get to the speed of light the slower your personal time moves compared to stationary objects. Move fast enough and years of Earth time can pass in the literal blink of an eye.

But how fast is fast enough? For example, how fast would you have to move if you wanted it to only be sixty seconds until Christmas? Well, this handly little Perl program can tell you the answer. Be aware that it does use the DateTime module to calculate how many seconds from now until Christmas morning so you may need to download that module before it runs.

#! /usr/bin/perl

# A silly little script for calculating how fast you need to move
# to make Christmas happen in one relativisitc minute

use DateTime;

my $c = 299792458; #speed of light in meters per second

# Calculate how fast you have to be moving to achieve a certain
# normal time to percieved time ratio. A.K.A. the Lorentz factor
sub speedFromTimeDilation{    
    my $timeDilation = $_[0];
    return sqrt($c * $c * (-1/($timeDilation * $timeDilation)+1));
}

#You may need to adjust this for your timezone
$now = DateTime->now;
$now->set_time_zone( 'America/Chicago' );
# In my family Christmas starts at 7 in the morning. Change this if you need
$christmasMorning = DateTime->new(year => $now->year, month => 12, day => 25, hour => 7);
$christmasMorning->set_time_zone( 'America/Chicago' );

#Make sure we are comparing now to next Christmas
if($now > $christmasMorning){
    $christmasMorning->set_year($christmasMorning->year+1);
}

$secondsUntilChristmas = $christmasMorning->epoch - $now->epoch;

#We can't make Christmas come slower
if($secondsUntilChristmas < 60){
    print "It is less than a minute until Christmas morning. You can wait that long\n";
    exit;
}

#Ratio between actual time to Christmas and our desired one minute wait
$desiredTimeDilation = $secondsUntilChristmas/60;

$neededVelocity = speedFromTimeDilation($desiredTimeDilation);

print "It is $secondsUntilChristmas seconds until Christmas morning\n";
print "To compress this into sixty seconds you would need to dilate time to:\n";
print 60/$secondsUntilChristmas*100, "% of normal\n";
print "Which would require a velocity of:\n";
print "$neededVelocity meters per second\n";
print "which is: \n";
print $neededVelocity/$c*100, "% of the speed of light\n";

Here is some sample output

It is 36423 seconds until Christmas morning
To compress this into sixty seconds you would need to dilate time to:
0.164731076517585% of normal
Which would require a velocity of:
299792051.236407 meters per second
which is:
99.9998643182701% of the speed of light

Ouch. It’s only ten hours until Christmas and we still have to go faster than 99% the speed of light. Not exactly a feasible goal. Oh well, looks like we’ll just have to get to Christmas the old fashioned way: by waiting.

Let’s Program A Chatbot 19: Third Time’s The Charm

Posted on December 21, 2013 by Scott

The Third (And Final?) Test User

Last time I tested DELPHI on a human it looked we were only a few modifications away from an acceptable 80% success rate. Now that we’ve made those modifications let’s take a look at a third conversation between the chatbot and another holiday visitor. It looks to me like we may have hit our goal:

DELPHI: HELLO! MY NAME IS DELPHI.

DELPHI: I CAN USE MY MYSTERIOUS POWER TO ANSWER YES OR NO QUESTIONS LIKE:

DELPHI: “WILL IT RAIN TOMORROW?” OR “DID I PASS MY LAST TEST?”

DELPHI: WHAT WOULD YOU LIKE TO KNOW?

>What color is the sky?

DELPHI: I’M SORRY, COULD YOU THINK OF A SIMPLER WAY TO ASK THAT QUESTION? MAYBE AS A YES OR NO QUESTION?

>Is the sky blue or purple?

DELPHI: THE FIRST OPTION. WAIT, NO THE SECOND. WAIT, WAIT, THE FIRST. I MEAN IT THIS TIME!

>Do you like sunsets?

DELPHI: ATHENA DOESN’T THINK SO, AND SHE’S THE GREEK GOD OF WISDOM!

>Is wisdom relative?

DELPHI: FATE INDICATES THAT WISDOM IS RELATIVE

>So Athena could be wrong?

DELPHI: THAT LOOKED LIKE A QUESTION, BUT I’M NOT SURE HOW TO RESPOND. MAYBE YOU COULD SIMPLIFY IT TO A YES OR NO QUESTION

>Is Athena wrong about sunsets?

DELPHI: ATHENA CAN’T BE WRONG ABOUT SUNSETS. THAT WOULD MESS UP THE CROSSWORD PUZZLE I’M WORKING ON

>Do you like crossword puzzles?

DELPHI: OF COURSE NOT. WHY WOULD YOU EVEN THINK THAT I LIKE CROSSWORD PUZZLES?

>Why are you doing a crossword if you don’t like it?

DELPHI: BECAUSE OF REASONS

>What reason?

DELPHI: I’M SORRY, COULD YOU THINK OF A SIMPLER WAY TO ASK THAT QUESTION? MAYBE AS A YES OR NO QUESTION?

>Do you often do things you don’t like?

DELPHI: ATHENA DOESN’T THINK SO, AND SHE’S THE GREEK GOD OF WISDOM!

>Does Athena do things she doesn’t like?

DELPHI: TRUST YOUR INSTINCTS ON THIS ONE

>Do you usually avoid answering questions?

DELPHI: OF COURSE NOT. WHY WOULD YOU EVEN THINK THAT I USUALLY AVOID ANSWERING QUESTIONS?

>Do you usually win arguments?

DELPHI: FATE INDICATES THAT I USUALLY WIN ARGUMENTS

>Do you believe in fate?

DELPHI: I DON’T THINK THAT I BELIEVE IN FATE

>Do you believe in soulmates?

DELPHI: OF COURSE NOT. WHY WOULD YOU EVEN THINK THAT I BELIEVE IN SOULMATES?

>Do you feel lonely?

DELPHI: I DON’T THINK THAT I FEEL LONELY

>What do you think?

DELPHI: THAT LOOKED LIKE A QUESTION, BUT I’M NOT SURE HOW TO RESPOND. MAYBE YOU COULD SIMPLIFY IT TO A YES OR NO QUESTION

>Do you value happiness?

DELPHI: I THINK IT SHOULD BE OBVIOUS THAT I VALUE HAPPINESS

>What makes you happy?

DELPHI: I’M SORRY, COULD YOU THINK OF A SIMPLER WAY TO ASK THAT QUESTION? MAYBE AS A YES OR NO QUESTION?

>Does food make you happy?

DELPHI: TRUST YOUR INSTINCTS ON THIS ONE

>Does talking to people make you happy?

DELPHI: FATE INDICATES THAT TALKING DOES TO PEOPLE MAKE I HAPPY

>Are you enjoying this conversation?

DELPHI: I SEE NO REASON WHY I WOULDN’T BE ENJOYING THIS CONVERSATION

>I like you!

DELPHI: I’M SORRY, I’M NOT SURE WHAT TO SAY ABOUT THAT. TRY A QUESTION LIKE ‘WHY IS THE SKY BLUE?’

>Why is the sky blue?

DELPHI: YOU KNOW, I’M REALLY NOT SURE EITHER. LET ME FORWARD THAT QUESTION TO MY SUPERVISOR

>Who is your supervisor?

DELPHI: THAT LOOKED LIKE A QUESTION, BUT I’M NOT SURE HOW TO RESPOND. MAYBE YOU COULD SIMPLIFY IT TO A YES OR NO QUESTION

>Is Scott your supervisor?

DELPHI: I THINK THAT SCOTT IS MINE SUPERVISOR, BUT YOU SHOULD PROBABLY DOUBLE CHECK WITH SOMEONE ELSE

>Is Scott a good supervisor?

DELPHI: FATE INDICATES THAT SCOTT IS A GOOD SUPERVISOR

>Does he compliment your performance?

DELPHI: MEH, HE PROBABLY DOES COMPLIMENT MINE PERFORMANCE, BUT I’M NOT REALLY INTERESTED IN THAT KIND OF STUFF

>Do you tell jokes?

DELPHI: I THINK IT SHOULD BE OBVIOUS THAT I TELL JOKES

>Tell me a joke.

DELPHI: I DON’T WANT TO TALK ABOUT THAT. PLEASE FEEL FREE TO ASK ME WHY

>Why?

DELPHI: I’M SORRY, COULD YOU TRY REWORDING THAT AS A YES OR NO QUESTION?

>Why don’t you want to tell a joke?

DELPHI: WHY NOT?

>Jokes are funny.

DELPHI: I’M SORRY, I’M NOT SURE WHAT TO SAY ABOUT THAT. TRY A QUESTION LIKE ‘WHY IS THE SKY BLUE?’

>goodbye

DELPHI: Come back anytime

Quick Analysis

This time around I’m going to count an answer as “good” if DELPHI either provided a correct response or if DELPHI complained and the user immediately thought of a more DELPHI compatible way to ask their questions. I’m only going to count an answer as “bad” if it has a huge grammar mistake or if the user seemed confused by DELPHI’s responses.

Going by this very rough metric I get a count of 30 “good” answers against 4 bad answers. That’s a whopping 88% acceptable performance. Not bad at all.

Overall the user did a good job of picking up quickly on what questions DELPHI could and couldn’t answer. The user occasionally explored new kinds of questions but returned to the tried and true formula every time an error appeared. The only real issue from a design perspective came near the end when DELPHI accidentally suggested it knew how to tell jokes, leading the user to try some new input that failed horribly.

This suggests that it might be a good idea to write a few more rules to help DELPHI talk about itself. A simple “Do you” version of our existing “Can you” rule would have been enough to avoid misleading the user about DELPHI’s joke telling capabilities.

But I’m going to leave that as an exercise for my wonderful readers. The only problems I’m going to try and solve are a few grammar issues I noticed.

My Quick Fix Is Mine

Two of DELPHI’s mistakes involved switching “your” to “mine”, resulting in awkward grammar like this:

DELPHI: I THINK THAT SCOTT IS MINE SUPERVISOR, BUT YOU SHOULD PROBABLY DOUBLE CHECK WITH SOMEONE ELSE

Obviously that should have been “my supervisor”. In fact, now that I think about it, “your” should always be swapped to “my”. It’s “yours” with an “s” that matches “mine”. We can fix this by updating the dictionaries we use to power switchFirstAndSecondPerson.

$wordsToPlaceholders[5][0]=qr/\byour\b/i;
$wordsToPlaceholders[5][1]='DELPHImy';

$wordsToPlaceholders[6][0]=qr/\byours\b/i;
$wordsToPlaceholders[6][1]='DELPHImine';

And of course here as some test cases to make sure the fix really fixed things

$testCases[22][0] = "Is that pizza yours?";
$testCases[22][1] = "FATE INDICATES THAT THAT PIZZA IS MINE";

$testCases[23][0] = "Is that dog your pet?";
$testCases[23][1] = "FATE INDICATES THAT THAT DOG IS MY PET";

$testCases[24][0] = "Is that pizza mine?";
$testCases[24][1] = "FATE INDICATES THAT THAT PIZZA IS YOURS";

$testCases[25][0] = "Is that dog my pet?";
$testCases[25][1] = "FATE INDICATES THAT THAT DOG IS YOUR PET";

Conclusion

With those two fixes DELPHI has now achieved an acceptable response rate of over 90%. That’s really impressive for a simple pattern matching program with no memory, no language parsing abilities and no sense of context. Even better, every user who worked with DELPHI admitted that they had fun with the experience and liked the silly random answers. That means we succeeded at our primary goal of creating an entertaining fortune teller.

So I’m done. Bye!

What’s that? You’re still not satisfied? Well then, maybe I can fit one more post into this Let’s Program and give you a few suggestions on how a hardcore reader could take this whole project to the next level.

Let’s Program A Chatbot 18: A Bit Better Than Before

Posted on December 19, 2013 by Scott

A Second Blind Test

Now that I’ve made a few modifications to DELPHI it’s time to run another live test. But I can’t just ask the first test user to swing by for a second go. That user already knows some of DELPHI’s strengths and weaknesses so they can no longer provide a truly unbiased sample of whether or not the code is working. For good test results I’m going to need a second test-user with no experience with DELPHI at all.

Fortunately it’s still the holidays and finding a second user was as easy as snagging a nearby relative. This is their conversation:

DELPHI: HELLO! MY NAME IS DELPHI.

DELPHI: I CAN USE MY MYSTERIOUS POWER TO ANSWER YES OR NO QUESTIONS LIKE:

DELPHI: “WILL IT RAIN TOMORROW?” OR “DID I PASS MY LAST TEST?”

DELPHI: WHAT WOULD YOU LIKE TO KNOW?

>will i be a white Christmas?

DELPHI: I LOST MY TAROT CARDS… BUT I HAVE SOME PLAYING CARDS. THEY PREDICT THAT YOU WILL NOT BE A WHITE CHRISTMAS

>will IT be a white Christmas?

DELPHI: BASED ON THESE TEA LEAVES IT SEEMS IT WILL NOT BE A WHITE CHRISTMAS

>Will my New Year’s Resolutions last longer this year?

DELPHI: I LOST MY TAROT CARDS… BUT I HAVE SOME PLAYING CARDS. THEY PREDICT THAT YOUR NEW WILL YEAR’S RESOLUTIONS LAST LONGER THIS YEAR

>Does that mean I can resolve to learn how to fly and it still be successful?

DELPHI: TRUST YOUR INSTINCTS ON THIS ONE

>Can I resolve to grow wings and learn to fly?

DELPHI: I’M REALLY GOOD AT YES NO QUESTIONS. TRY ONE OF THOSE

>If you attempt to fail and succeed, have you really succeeded?

DELPHI: THAT LOOKED LIKE A QUESTION, BUT I’M NOT SURE HOW TO RESPOND. MAYBE YOU COULD SIMPLIFY IT TO A YES OR NO QUESTION

>Is the sky blue?

DELPHI: THE SKY CAN’T BE BLUE. THAT WOULD MESS UP THE CROSSWORD PUZZLE I’M WORKING ON

>Is the grass green?

DELPHI: FATE INDICATES THAT THE GRASS IS GREEN

>Is sky gray?

DELPHI: SKY CAN’T BE GRAY. THAT WOULD MESS UP THE CROSSWORD PUZZLE I’M WORKING ON

>Is the sky red?

DELPHI: HMMM… ECHOES OF COSMIC WISDOM SHOW THAT THE SKY IS NOT IS

>Did I break you?

DELPHI: FATE INDICATES THAT YOU DID BREAK I

>I’m sorry. Good Bye

DELPHI: I’M SORRY, I’M NOT SURE WHAT TO SAY ABOUT THAT. TRY A QUESTION LIKE ‘WHY IS THE SKY BLUE?’

>Why is he Sky Blue?

DELPHI: WHY NOT?

>Because

DELPHI: I’M SORRY, I’M NOT SURE WHAT TO SAY ABOUT THAT. TRY A QUESTION LIKE ‘WHY IS THE SKY BLUE?’

>goodbye

DELPHI: Come back anytime

Conversation Analysis

The first and most important thing I noticed is that this test user actually laughed at some of the things DELPHI did. That is a very good sign and indicates that the personality of the writing is still helping to hide the weaknesses in the code.

Aside from that, the test user asked 14 different questions. Of those questions 6 were answered properly and five were answered improperly. The other 3 were kind of on the fence. Two of the “meh” answers were DELPHI complaining about input it didn’t understand, which is fine if it only happens occasionally. The third borderline answer was this interesting exchange:

>Will my New Year’s Resolutions last longer this year?

DELPHI: I LOST MY TAROT CARDS… BUT I HAVE SOME PLAYING CARDS. THEY PREDICT THAT YOUR NEW WILL YEAR’S RESOLUTIONS LAST LONGER THIS YEAR

>Does that mean I can resolve to learn how to fly and it still be successful?

The pattern didn’t recognize “New Year’s Resolutions” as one big noun so the response stuck a “will” in the wrong place. Normally I’d call this a problem but the user didn’t seem to notice or care about this little mistake. They just moved right on with the conversation.

Actually, this isn’t all that surprising. Humans are very good at expecting a what sentence should sound like so you can get away with mixnig up word order or letter order from time to time. Most people’s brains will just rearrange the sentence without consciously noticing there was a mistake.

So for now I’m going to ignore this and focus on the five real problems.

Easy Fixes

Of our five definite mistakes two are easy to fix. The first was actually just a typo in one of my response patterns:

>Is the sky red?

DELPHI: HMMM… ECHOES OF COSMIC WISDOM SHOW THAT THE SKY IS NOT IS

“IS NOT IS”? Where did that come from? Well, looking at my response patterns it turns out I accidentally referenced the user’s verb twice instead of referencing the second half of their input like I should have. Notice that the below output rule has two “UIF0”s but no “UIF2”.

“Hmmm… echoes of cosmic wisdom show that UIF1 UIF0 not UIF0”

Fixing that was as simple as changing the second UIF0 to a UIF2. Still, it’s an embarrassing mistake. I guess that’s what I get for having automated tests that only check the first answer in the random response array!

The next easy problem was this one:

>Can I resolve to grow wings and learn to fly?

DELPHI: I’M REALLY GOOD AT YES NO QUESTIONS. TRY ONE OF THOSE

There is really no difference between a “Can” rule and a “Does” rule or an “Is” rule. So writing a “can” rule shouldn’t be a challenge. The only issue to watch out for is that your generic “Can” rule needs to be a lower priority than the helpful “Can you” rule that we’re using to provide tips on what DELPHI can and can’t do.

Here’s a test case and the code to solve it:

$testCases[21][0] = "Can this code pass all the tests?";
$testCases[21][1] = "FATE INDICATES THAT THIS CODE CAN PASS ALL THE TESTS";

push(@chatPatterns,
   [qr/\ACan ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/i,
      ["Fate indicates that UIF0 can UIF1",
      "My \"Big Book O' Wisdom\" says that UIF0 can't UIF1"]
   ]);

Although to be honest if you just plug that rule in you’ll probably get an error in your tests and find that the input is being caught by the generic “Can you” rule. That’s because the “Can you” rule just looks for the word “Can” followed by the letter “i” without caring whether or not the “i” is an actual word (what we want) or just part of something bigger. In this case, it’s catching the “i” in the middle of “this”. We can fix this with a few word boundary adjustments to the “Can you” regex.

/\ACan.*\bI\b/i

Now the “Can you” rule will only activate when the “I” is on it’s own like it should be after the word “You” has been transformed into first person.

Slightly Less Easy Problems

Complex and compound sentences of all sorts are the natural enemy of pattern matching chatbots. Like these two example:

>If you attempt to fail and succeed, have you really succeeded?

…

>I’m sorry. Good Bye

The first sentence is an ‘If X then Y” pattern. Noticing that a question starts with “If” would be easy, but deciding what to say back is difficult. The main problem is that not all sentences that start with “If” are actually yes or no questions, so we can’t just randomly throw out a yes or no answer like we do for other kinds of input. For example:

If two trains are fifty miles apart and approaching each other a twenty miles per hour, when will they meet?

So if we wanted DELPHI to respond to “If” questions we would need to actually create a system for identifying exactly what sort of “If” question was being asked. That’s quite a bit more complicated than the original scope of this program.

Compared to the “If then” input the second problem is almost trivially easy. All the user did was type in two complete sentences at once. The smartest way to solve that issue would be to split their input into two inputs and then choose one of them to respond to.

In fact, splitting sentences into multiple parts when you see punctuation is a pretty decent way to handle all sorts of complex input. Compound sentences, complex sentences and multiple sentence can all be split along their periods and commas. Then you can compare each sentence fragment to the list of rules and return a response to whichever piece of input matched against a high priority rule.

Or you can be lazy like me and ignore the problem. If the user gets politely scolded for using complex sentences and “if” statements it won’t take long for them to go back to talking in ways that DELPHI already understands. I’m not going to spend too much time worrying about a speech pattern that has only shown up once in all the user tests I’ve run.

Of course, ambitious readers might decide that they do want to tackle this problem. More power to you!

More Hard To Fix Problems: Grammar Be Hard

The second hard to fix problem can be seen here:

>Did I break you?

DELPHI: FATE INDICATES THAT YOU DID BREAK I

The problem here has to do with an annoying bit of non-uniqueness in English pronouns. In first person we have “I” and “Me” but in second person all we have is “You” and another “You”. Observe:

I gave this to you.

You gave this to me.

Our current switchFirstAndSecondPerson function isn’t smart enough to figure out that sometimes “you” should be “me” and sometimes “you” should be “I”. It always changes it to “I”. Once again, this is a problem that technically could be fixed. It is possible to build an automatic grammar parsing system that can identify the part of speech of every word in a sentence. This would then give us enough information to more intelligently swap around “I”s and “You”s and “Me”s.

But the whole point of this Let’s Program was to build a simple pattern matching chatbot and avoid the full complexity of natural language parsing. So once again, this is a problem I’m going to ignore on the principle that being right most of the time is good enough for a chatbot with only two functions a few hundred lines of code and data.

Besides, DELPHI is supposed to guide users to ask questions about the future, not about DELPHI itself. That should hopefully minimize the number of second-to-first person switches we have to make anyways. And if no-one ever sees a certain bug, is it really a bug at all?

80% Is A Decent Grade… Especially When You Haven’t Put In Much Effort

So how successful was DELPHI this time around? Well, if we award ourselves points for the two problems we just fixed and the three slightly wrong but acceptable answers we get this:

6 good answers + 2 fixed answer + 3 borderline answers = 11 out of 14 answers

That means that DELPHI is now 78% acceptable in terms of its ability to talk with real humans. And to be honest, that’s good enough for me. The whole point of this Let’s Program was to demonstrate the bare basics of how to use pattern matching to create a very simple chatbot. I never expected it to perform nearly as well as it does.

But since we’ve come this far we and added a new “Can” rule we might as well try to hunt down one last test user and see if we really are getting an 80% success rate with DELPHI. As all good scientists know an experiment isn’t really done until you’ve repeated it several times and made sure you can get the same answer every time.

Let’s Program A Chatbot 17: Blitzcode!

Posted on December 17, 2013 by Scott

You Know How This Works By Now

If you’ve been following along through this entire Let’s Program you should have a pretty good idea of the process I use to think up new test cases and then write new rules to solve them. So this time around I’m not going to bother explaining every single little thing I’m doing. I’m just going to quickly throw down seven new tests cases along with the code I used to satisfy them. I’m confident you can fill in the gaps in your own now.

Is, Are and Was Were Problems

Let’s start out with those troublesome plurals and past tense versions of “Is”. Here are our new tests:

$testCases[14][0] = "Are plurals working now?";
$testCases[14][1] = "Fate indicates that plurals are working now";

$testCases[15][0] = "Was this tense a problem?";
$testCases[15][1] = "Fate indicates that this tense was a problem";

$testCases[16][0] = "Were the lights left on?";
$testCases[16][1] = "Fate indicates that the lights were left on";

And here is the fix:

push(@chatPatterns, 
        [qr/\A(is|are|am|was|were) ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/i, 
            ["Fate indicates that UIF1 UIF0 UIF2",
            "Other responses go here"]
        ]);

A few things to look out for here. First, notice that I’m using the /i flag at the end of the regular expression so the user doesn’t have to worry about capitalizing anything. You should also notice that I’m now capturing which version of “Is” the user chose and then inserting it into the output. Since it is now the first capture group I can refer to it as UIF0, although that does mean I need to scoot over the other input. UIF0 is now UIF1 and UIF1 becomes UIF2.

And this very almost worked. Almost.

Test Case 16 Failed!!!

Input: Were the lights left on?

Output: Fate indicates that the lights Were left on

Expected: Fate indicates that the lights were left on

——————–

Passed 11 out of 17 tests

Test Failure!!!

Because we grab the verb from the user’s input a capitalized “Is” will result in strange capitals in the middle of our sentences, like you can see here with this out of place “Were”. Not acceptable. I could probably improve the UIF substitution system to avoid middle word capitalization, but instead I’m going to just blow away every possible capitalization problem I could ever have by making DELPHI speak in robot style ALL CAPS. It’s an easy change to make but it does mean that I’m going to have to update every single test case to expect capital letters. Oh well.

Here is the one line change needed in the return statement of generateResponse to switch to all caps. The uc function makes everything upper case.

return uc($response);

Now please excuse me while I capitalize all 17 of my existing test cases.

(Type type type)

Well that was boring. But DELPHI is now passing 17 out of 17 test cases again so it was worth it.

You’ve Gone Done Did It Now!

Getting DELPHI to handle past tense “Did” as well as present “Does” is basically the same problem and solution as “Is” and “was”above.

$testCases[17][0] = "Did this test pass?";
$testCases[17][1] = "FATE INDICATES THAT THIS TEST DID PASS";

Notice the switch to all capital expected output. Anyways, here is the updated “Does” rule:

push(@chatPatterns,
   [qr/\A(Did|Does) ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/i,
      ["Fate indicates that UIF1 UIF0 UIF2",
       "Other responses go here"]
   ]);

Once again we are now pulling the verb out of the user’s input, which means we can include it in the output as UIF0 and have to shift the index of all of our other existing User Input Fragments.

Passed 18 out of 18 tests

All Tests Passed!

Help Me To Help You

One of the bigger problems we saw with the first test user was that they weren’t sure what kind of questions to ask DELPHI. Let’s fix this by jumping into chat.pl and writing up a new self-introduction for DELPHI:

DELPHI: HELLO! MY NAME IS DELPHI

DELPHI: I CAN USE MY MYSTERIOUS POWER TO ANSWER YES OR NO QUESTIONS LIKE

DELPHI: “WILL IT RAIN TOMORROW?” OR “DID I PASS MY LAST TEST?”

DELPHI: WHAT WOULD YOU LIKE TO KNOW?

That’s much better. And while I’m at it I should probably improve DELPHI’s responses to statements it doesn’t understand. Remember, if you change the first possible response to a rule you will also need to update the test cases associated with that rule.

Here is a sample of some of the new responses I’m going to try:

I’m sorry, could you try rewording that as a yes or no question?

I’m sorry, could you think of a simpler way to ask that question? Maybe as a yes or no question?

I’m confused. Try a simple yes or no question instead

I don’t want to talk about that. Please feel free to ask me why

Self-Knowledge Is The Key To Enlightenment

The final big issue we saw in our human focused test was that DELPHI couldn’t answer simple questions about itself. That’s sad. A program without an easy to use HELP feature has pretty much failed at usability.

So here are a few test cases to help us address this glaring problem:

$testCases[18][0] = "What kind of questions can you answer?";
$testCases[18][1] = "YES OR NO QUESTIONS LIKE \"WILL I HAVE GOOD LUCK TOMORROW?\" ARE THE EASIEST FOR ME TO ANSWER";

$testCases[19][0] = "Can you tell time?";
$testCases[19][1] = "THE ONLY THING I CAN REALLY DO IS ANSWER SIMPLE QUESTIONS LIKE \"WILL IT BE SUNNY TOMORROW?\"";

$testCases[20][0] = "help";
$testCases[20][1] = "JUST TYPE A QUESTION AND HIT ENTER AND I'LL DO MY BEST TO ANSWER IT. YES OR NO QUESTIONS LIKE \"DID I DO WELL ON MY TEST?\" ARE BEST";

Let’s tackle the “help” case first because it’s easiest. We just write a super specific and super high level rule. I didn’t want random help messages, so the response array is only one item long. But it still has to be an array because that’s what the code expects:

push(@chatPatterns,
   [qr/\Ahelp\z/i,
      ["Just type a question and hit enter and I'll do my best to answer it. Yes or No questions like \"Did I do well on my test?\" are best"]]);

Let’s do the “What kind of questions test next”. I don’t have any other rules dealing with “What” style questions so the priority of this rule doesn’t really matter that much. If you’ve written your own “What” rule you’ll probably need to give this help rule higher priority so it doesn’t get overshadowed.

The rule itself is pretty simple. It just looks for any sentence that starts with “What” and later has the word “question”. This might be casting the net a bit wide since this won’t just catch questions like “What kind of questions can you answer?” or “What kind of questions work best?”. It will also catch things like “What do you do with my questions?” where our answer doesn’t really make sense.

So we’re kind of gambling here on what questions the user will and won’t ask. If we find that our users are providing lots of “What questions” input that doesn’t fit this pattern we may have to write som extra rules, but for now this should be fine:

push(@chatPatterns,
   [qr/\AWhat.*questions/i,
      ["Yes or no questions like \"Will I have good luck tomorrow?\" are the easiest for me to answer"]]);

Finally is the “Can you?” rule, which is basically identical to the rule above. Just look for the words “Can” and “You”:

push(@chatPatterns,
   [qr/\ACan.*you/i,
      ["The only thing I can really do is answer simple questions like \"Will it be sunny tomorrow?\""]]);

Unfortunately we run into a little problem:

Test Case 19 Failed!!!

Input: Can you tell time?

Output: I’M SORRY, COULD YOU TRY REWORDING THAT AS A YES OR NO QUESTION?

Expected: THE ONLY THING I CAN REALLY DO IS ANSWER SIMPLE QUESTIONS LIKE “WILL IT BE SUNNY TOMORROW?”

Remember how we wrote that little bit of code to transform second person words like “You” into first person words like “I” to help use create more grammatical responses? That same code is breaking our new “Can you” rule by changing the user’s input into “Can I” before it ever gets compared to any of our rules.

There’s probably an elegant solution to this problem, but I don’t have the time for finding it right now. Instead I’m just going to rewrite the rule to look for “Can I” because I know that represents a user asking “Can you” like we really want.

push(@chatPatterns,
   [qr/\ACan.*I/i,
      ["The only thing I can really do is answer simple questions like \"Will it be sunny tomorrow?\""]]);

DELPHI V2 Lives!

Passed 21 out of 21 tests

All Tests Passed!

DELPHI can now handle at least the most basic forms of the three big problems we saw in our live user test. Which means it’s time to find a new test user and try again. I’m hoping our code can now generate logical answers to at least 70% of the input the average random user will try to give it. That would give us a nice solid “C” average. Not good enough for a professional chatbot, but a pretty good goal for a practice program built from scratch using nothing but regular expressions and a few foreach loops.

Let’s Program A Chatbot 16: Testing On Live Subjects

Posted on December 9, 2013 by Scott

You Don’t Need To See The Latest Code Updates

Since last we met all I’ve done is add 25 common adjectives to DELPHI’s common adjectives list and write four more possible response patterns for every chatbot rule. These modifications didn’t involve any actual coding tricks and have made DELPHI too long to conveniently embed inside a blog post. I do promise to publish DELPHI’s complete code as soon as this Let’s Program is over but for today there’s nothing worth showing. Those of you following along at home can feel free to write your own response patterns and add your own common adjectives.

Automated Testing Alone Is Not Enough

Our automated tests made it really easy for us to keep track of whether or not DELPHI was doing what we, the programmers, wanted it to do. And that’s very important! It’s hard to write good software if you don’t have some way of keeping track what your goals are and which you have and haven’t met.

But just because a program satisfies the programmer’s list of goals doesn’t mean it will satisfy the customer’s list of demands. Real world users almost always have items on their wish lists that we programmers completely overlooked.

Test users also help us programmers avoid blind spots in our testing. When a developer tries to write tests for his own code he will subconsciously tend to avoid test cases that he knows will break the program. One common example is that we programmers usually forget to write a test case for blank input because accidentally typing in blank input is the sort of mistake we don’t usually make. That makes it easy for us to forget that the problem exists at all, which can lead to fragile programs that break apart the first time a non-programmer hits enter with no input either on accident or out of curiosity. Having a test user break your program like this is much better than waiting for a customer to make a mistake you overlooked.

So there’s really no way around it: We need to get other human beings to test out our programs and find the bugs we forget to include in our automated tests. We can then use their human feedback to set new goals for the next version of our program and to help write new use cases for our tests.

Once we have new and improved tests it’s back to modifying our code until we’ve once again reached a 100% pass rate. Then we find a new test user and have them poke our program until they find another problem and two. And of course that leads to more use cases which can lead to new tests and new software features. Rinse and repeat this cycle until the software finally reaches a level of quality that everyone feels comfortable with (or until you run out of time and have to give up / publish whatever you have). That’s basically the entire test driven development cycle.

DELPHI’s First Human User*

It’s the holidays now and that means friends and family as far as the eye can see. If you’re anything like me, people you haven’t seen for months are suddenly queuing up at your door to “Wish You A Merry Christmas And A Happy New Year”. These lovely people represent a valuable resource to every hobbyist programmer: A large pool of test users that will be more than happy to spend five minutes playing with your latest project.

So for this blog post I grabbed a handy holiday well-wisher and dropped them in front of DELPHI with no explanation of what it did or how it works. This is the conversation they had:

DELPHI: The oracle greets you. What questions are on your mind?

>How much longer is Xenoblade Chronicles**?

DELPHI: I’m sorry, could you try rewording that?

>Are there more than 10 hours left of Xenoblade Chronicles?

DELPHI: I’m sorry, could you try rewording that?

>What kind of questions can you answer?

DELPHI: I saw a question mark, but I didn’t understand the question. I’m very sorry

>Is a yes or no question better?

DELPHI: Let me flip a coin… the second one

>Should I have chicken or beef for dinner?

DELPHI: The later, unless it’s actually the former

>How many days until Christmas?

DELPHI: I saw a question mark, but I didn’t understand the question. I’m very sorry

>What is the weather like?

DELPHI: I’m sorry, could you try rewording that?

>Is it snowing outside?

DELPHI: Hmmm… echoes of cosmic wisdom show that it is not snowing outside

>Will it snow tomorrow?

DELPHI: it will snow tomorrow, assuming the mole people don’t invade next Tuesday

>Was school cancelled for tomorrow?

DELPHI: I’m sorry, could you try rewording that?

>Will there be school tomorrow?

DELPHI: I lost my tarot cards… but I have some playing cards. They predict that there will be school tomorrow

>What is 4 + 5?

DELPHI: I’m sorry, could you think of a simpler way to ask that question?

>What is the sum of 4 and 5?

DELPHI: That looked like a question, but I’m not sure how to respond

>Can you do math?

DELPHI: I saw a question mark, but I didn’t understand the question. I’m very sorry

>Do I go to high school or middle school?

DELPHI: Fate indicates the former

>What is today’s date?

DELPHI: I’m sorry, could you try rewording that?

>Is today Sunday?

DELPHI: I think that today is Sunday, but you should probably double check with someone else

>Is tomorrow Monday?

DELPHI: tomorrow can’t be Monday. That would mess up the crossword puzzle I’m working on

>Is it the 8th?

DELPHI: I see no reason why it wouldn’t be the 8th

>What month is it?

DELPHI: I’m sorry, could you think of a simpler way to ask that question?

>Is it december?

DELPHI: it can’t be december. That would mess up the crossword puzzle I’m working on

>Is it december or january?

DELPHI: Let me flip a coin… the first one

>Is it Sunday and the 8th?

DELPHI: it can’t be Sunday and the 8th. That would mess up the crossword puzzle I’m working on

>Will it be the 9th or he 10th tomorrow?

DELPHI: Whatever you think is correct

>I believe the 9th is correct. Am i correct?

DELPHI: That looked like a question, but I’m not sure how to respond

>Did I miss any responses?

DELPHI: That looked like a question, but I’m not sure how to respond

>goodbye

DELPHI: Come back anytime

Wow! DELPHI did a pretty bad job holding up it’s end of the conversation. On the other hand, it actually did pretty good for a chatbot with only a handful of rules and only a few weekends of development work. So even though DELPHI has pretty poor performance I’m still happy with it’s performance to effort ratio.

What Did We Learn

Time to put on our thinking caps and analyze exactly what went wrong and what went right in this DELPHI test run. Those of you following along at home might want to break out some paper and jot down your own thoughts before reading my conclusions.

Of course, if you’re a programmer you probably have a high reading speed and the ability to recognize the words on your screen even when you aren’t directly looking at them. So you’ve undoubtedly already absorbed at least one or two of the conclusions I’ve written about below. Just think of it as getting a hint on how to start your own list.

BAD: DELPHI Introduction Doesn’t Give Good Enough Instructions

Since users never read the manual (and DELPHI doesn’t have a manual to read anyways) it is very important for DELPHI to provide gentle guidance on the proper way to ask it questions. And I think it’s fair to say I completely failed at this.

I probably should have warned the user to stick to YES/NO questions in the original prompt. Instead I just invited them to ask whatever was on their mind and got an open ended question about the play-time of a video game my user was interested in. Since it wasn’t a yes no question DELPHI gave up. I also could have done a better job of having DELPHI’s confused messages suggest better question formats. Constantly telling the user that DELPHI doesn’t know how to answer their question doesn’t do any good if I’m not also giving hints on what questions they should be asking.

Fortunately the user was pretty clever and figured out on their own that switching their question to a YES/NO format might help. Unfortunately this lead to our next error.

BAD: DELPHI Can’t Handle Plural And Past Tense Versions Of It’s Rules

The user’s second question should have been easy. After all, it was just an “Is X Y?” question and that was one of the first rules we ever wrote.

>Are there more than 10 hours left of Xenoblade Chronicles?

Unfortunately it turns out that DELPHI only has rules specifically for “Is” and doesn’t have nearly enough brainpower to recognize that “Are” should use the same kind of rule. DELPHI also had difficulty later on when the user went first person and tried an conjugated “Is” into “Am”. There were similar problems with past tense conjugations; DELPHI gave up on a “Was” question and a “Did” question even though logically they’re the same as “Is” and “Do”.

So it looks like we’re going to need to do some work buffing DELPHI up to work with a wide range of tenses and pluralizations: Is, Are, Am, Was, Were, Do, Does, Did.

BAD: DELPHI Doesn’t Know How To Talk About Itself

After their first two questions fell apart my clever test user asked an incredibly intelligent third question:

>What kind of questions can you answer?

Unfortunately that isn’t a pattern DELPHI knows how to respond to. Which is a shame because that would have been the perfect opportunity to slip a mini user manual into DELPHI’s output.

GOOD: Humor Made The User Curious

My test user spent a lot longer with DELPHI than I thought they would. When I asked them what they were doing they admitted they were trying to see how many different ways DELPHI could respond to the same type of question. They also explained that they were trying to come up with new types of questions just to double check they weren’t missing an entire group of sort-of-funny chatbot replies.

This means that even though my chatbot was very flawed it made up for those flaws by being interesting enough that the user wanted to keep playing with it to see what it would say and do next. Since DELPHI is basically a toy the fact that the user enjoyed playing with it is a huge success.

GOOD: 50% Success Rate

If you count up the instances where DELPHI gave a good answer to a question compared to when it gave a default confused answer you’ll find it had very close to a 50% success rate. You might argue that a number that low shouldn’t count as a good thing but I think it’s only fair to point out that DELPHI actually did manage to perform as expected in a wide variety of circumstances. No need to focus entirely on it’s mistakes.

I think it’s also interesting to note that the success rate seems higher in the second half of the conversation than the first. This suggests that the user eventually caught on to what kind of questions DELPHI handled best. So if I do a better job of explaining early on in the conversation that DELPHI prefers YES/NO questions the overall success rate should increase a lot.

Conclusion

As predicted DELPHI wasn’t quite ready for human contact. But it did better than I thought it would and now I have lots of data on what problem areas need to be tackled next. Expect my next post to be a rapid fire series of new test cases and the code to fix them.

* You might think I was DELPHI’s first human user, but I don’t count***.

** Xenoblade Chronicles is a Japanese RPG for the Nintendo Wii that has an epic, and rather long, plot. In retrospect it’s not the sort of thing one should try to speed-run during a holiday get together.

*** Because I programmed it. I wasn’t suggesting I don’t count because I’m not human. I’m totally a real human. Really

Let’s Program A Chatbot 15: “ELIZA Effect” Should Be A Movie Title

Posted on December 5, 2013 by Scott

What Is The ELIZA Effect?

The ELIZA Effect refers to the way that humans tend to think machines act and think like humans. We unconsciously like to believe that computers are actually intelligent, that robots have motives and that our favorite gizmos have emotions.

This human instinct to treat machines like people is named after the original ELIZA chatbot, a simple pattern matching program that pretended to be a psychiatrist by turning everything said to it into a question*. The scientist who designed ELIZA considered it nothing more than a clever trick and was surprised to find that many of the humans he had testing ELIZA started to develop emotional reactions towards it, some going so far as to claim that they felt like ELIZA really cared about the topics they were talking about.

Further studies pretty much proved that the ELIZA effect kicks in just about anytime a human sees a computer or machine do anything even vaguely unpredictable or clever. The moment a program does something the user can’t immediately explain he will begin to assume deep logic and complex motives are taking place, even when the “smart” behavior turns out to be nothing more than a ten line script with three if statements and a call to random(). Even after you show the user there is no real intelligence involved he will still tend to see bits of human personality the machine.

For example, just a few weeks ago the Internet was abuzz with stories of a “suicidal robot”, a Roomba vacuum cleaner that apparently was activated while the owners weren’t watching and then got stuck on a hotplate which eventually caused it to burn to “death”.

The interesting part of this story isn’t that a household robot glitched up and got stuck in a dangerous place. That happens all the time. The interesting part is that almost every human who talked about the story phrased it in terms of a robot making a decision to kill itself (a very human, if depressing, behavior). Even technical people who know better than to assign feelings and motivation to a circuit board couldn’t resist framing the event in human terms.

That’s the ELIZA effect.

Exploiting The Eliza Effect

So… humans like to think that other things behave like humans. That’s not really very surprising. Why should we programmers care?

We should care because we can use the ELIZA effect to hack people’s brains into liking our programs better. We can trick them into being patient with load times, forgiving of bugs and sometimes even genuinely loving our products.

Simple example: When Firefox is restarted after a crash it begins with a big “Well this was embarrassing” message that makes it feel like an apologetic friend that really is sorry that he forgot to save the last slice of pizza for you. It’s surprisingly effective at taking the edge of the frustration of suddenly getting kicked off a web-page.

The ELIZA effect is even more important for people who are specifically trying to write programs that mimic human behavior. Like game developers trying to create likable characters or chatbot designers trying to create a bot that is fun to talk to. For these people getting the ELIZA effect to activate isn’t just a useful side goal, it is their primary goal.

Wait a minute, aren’t WE amateur chatbot designers? I guess we should figure out how to integrate this idea into DELPHI.

Simulating Human Humor

In my experience people will forgive a lot of bad behavior as long as they are laughing. A good joke can break the ice after showing up late to a party and a witty one-liner can fix a lot of embarrassing social mistakes**.

That’s why rule #1 for writing DELPHI responses is going to be “make it quirky”. The funnier the responses DELPHI generates the less users are going to care about how precise and correct they are. Bits of strange grammar and weird phrases will be cheerfully ignored as long as the user is having a good time. And since humor is a very human trait this should do a lot to make DELPHI feel more like a real conversation partner and less like the big foreach loop it really is.

So don’t do this:

Fate indicates that your cat does secretly want to kill you.

Do this!

Let me check my Magic 8 ball(tm). Hmm… future looks cloudy so I don’t know if your cat does secretly want to kill you. Ask again later***

Apologize Profusely. For Everything.

This seems like a really good place for a joke about Japanese etiquette compared to American etiquette but I can’t think of anything funny right now. すみません

Anyways, I’ve noticed that when non-technical people have a computer problem one of the first things they always say is “I didn’t break it! It’s not my fault! It just happened on its own!”

This makes sense. People hate feeling like they are responsible for things going wrong. No one wants to take the blame for a broken machine or a program that stopped working. The only thing worse than a broken computer is a broken computer that is scolding you for breaking it.

So if your program is likely to break or get confused, and this simple chatbot certainly is, your top priority should be to reassure the user that the problem isn’t his fault. The problem is that your poor humble program couldn’t quite handle the user’s request and could the user pretty please try again? We really are very very sorry that this happened at all.

Also, apologizing is a very human behavior that will go a long ways towards hiding our dumb code behind an illusion of human intelligence.

So don’t do this:

I don’t recognize that as a question. Try again

Do this!

I’m sorry, I got confused. Could you ask your question again and keep it simple for me?

Teach Your User How To Be A Better User

This final design tip has less to do with the ELIZA effect and more to do with user psychology. The following tip is vitally important to anyone who wants to build user friendly software: Users never read the manual.

I don’t care how well documented your program is or how many full color screen-shots are included in your manual. 95% of your users are just going to start clicking on buttons and typing in words and then get frustrated if things don’t work the way they want them to.

In a perfect world we would solve this problem by convincing everyone in the world to do the responsible thing and read the entire user manual of every product they buy before they try to operate it. But we live in a broken and fallen world so we’re going to have be sneaky about this.

The goal here is that every-time the user causes an error or makes something strange happen we should slip them a quick tip on how to make sure that problem doesn’t happen again. This way we can feed them the entire manual one bite at a time until they finally figure out everything we wish they had known in the first place.

I’m sure you’ve seen this tactic before. Windows warning you that changing a file’s type can be dangerous. Google politely suggesting alternate searches when you don’t get many results. Video games slipping helpful tips into their loading screens. All are just ways to teach the user how to be a better user without every calling him a bad user or forcing him to read a book.

How do we incorporate this into a chatbot like DELPHI? Well, when we detect the user is having trouble we should not only be incredibly apologetic to make him feel safe and incredibly funny to make him feel relaxed, we should also try to show him how to better format his input.

So don’t do this:

I can’t understand what you’re saying

Do this!

I’m having trouble with your last question. Let’s start with something simpler like “Will it rain tomorrow?”

Conclusion

Writing a program that can act like an intelligent human is hard. Luckily for us humans are easy-going lifeforms that are more than happy to project the illusion of human intelligence onto every machine they see. As long as our chatbot is funny and polite most users will be willing to consider it human enough.

Now I’m going to spend the next few days adding new responses to DELPHI. Once that’s finally done I’m going to recruit a friend to test-chat with DELPHI and my next post will be spent analyzing how well (or how poorly) DELPHI did.

I suppose there is a small chance that DELPHI will do perfectly and this Let’s Program will end. But I seriously doubt it. This chatbot doesn’t even have a dozen rules yet. I’m predicting it won’t be able to handle even half the input the tester gives to it.

* You probably remember ELIZA from when I introduced it back at the beginning of this Let’s Program.

** On the other hand, trying to come up with a witty one-liner under pressure is very difficult and a botched one-liner will just make the problem. So if you accidentally insult someone’s religion/parents/favorite OS it might be best to just shut up and retreat.

***If you want to plug this into our “does” rule you’re probably looking for something like “Let me check my Magic 8 ball(tm). Hmm… future looks cloudy so I don’t know if UIF0 does UIF1. Ask again later

Let’s Program A Chatbot 14: Variety Is The Spice Of Life

Posted on December 3, 2013 by Scott

Breaking All Our Tests

Today I’m finally tackling the last item on our chatbot wish-list: Randomized responses. This will give DELPHI the ability to make both yes and no predictions for all questions. Even better, a wide variety or randomized responses will keep DELPHI from repeating itself too often and help make it feel human. Nothing says “computer” quite like repeating the same line again and again. Nothing says “computer” quite like repeating the same line again and again.

Unfortunately this randomness is going to completely break all the automated tests we spent so long satisfying. After all, the fundamental idea behind all of our tests is that every possible user input has exactly one right answer. When the user says “A” the response should always be “B”. But adding a little randomness throws this idea out the window. How are we supposed to test a program that sees the input “A” and sometimes says “B” and sometimes says “C” and sometimes says “D”?

This is one of the great weaknesses of automated testing: It doesn’t work so good with uncertainty.

One possible solution would be to build a much more flexible testing suite. Something that can match one input to multiple possible outputs. If there are three random “good” answers to input A then we consider the test to have passed if we see any one of them. It wouldn’t even be too hard to program. Probably just a lot of “or” statements or maybe a loop that returns “true” as soon as it finds at least one match.

But this may not scale very well. Writing tests to handle a small amount of randomness probably isn’t too bad. You just type in your one input and all three possible good outputs and you’re done. But if you have dozens or even hundreds of potential outputs… well, you probably don’t want to maintain that sort of testing code by hand.

So instead of finding a way to test our randomness I’m just going to provide a mechanism to turn the randomness on and off. This way I can just turn the random responses off and all of the old test cases will still work and I can continue adding new tests the easy way: one input paired with one expected output.

Nested Arrays Are Good For Everything!

That’s enough talk about testing. Time to focus on how we’re going to make DELPHI more random. The chatbot already has a system for associating input patterns with output patterns. All we need to do now is adjust it to associate one input pattern with multiple possible input patterns.

My clever readers* probably remember that DELPHI uses a multi-dimensional array to keep track of which response pattern matches each input pattern. Every top level item in the array represents a different chatbot rule/response pair. Each rule is then divided into a two-item array where the first item is a matching rule and the second item is a response rule.

In order to add some randomness to the system we’re going to replace the singular response rule in slot number 2 with yet another array, this one holding a list of all responses we want to generate. For example, here is what the “catch all” rule looks like after I replaced the single response with a three item array.

push(@chatPatterns,
   [qr/.*/,
      ["I don't want to talk about that. Please ask me a question",
       "I'm confused. Try a simple question instead",
       "I'm really good at yes no questions. Try one of those"]
   ]);

Everbody see what we’re doing here? @chatPatterns is the first array. Inside of it we’re pushing a second array where the first item is the input matching regex /.*/ and the second item is a third array that holds three possible responses.

Eventually we’ll probably want to flesh out DELPHI by attaching a dozen or so responses to every input rule. But for starters let’s just stick to two or three variations for each rule. That should be enough to make sure that our basic random algorithm works like it should.

Ready for a massive code dump?

Random Response Set 1

This code should replace the old @chatPatterns code:

my @chatPatterns;

push(@chatPatterns, 
        [qr/[a-zA-Z]+ or [a-zA-Z]+.*\?\z/,
            ["Fate indicates the former",
            "I have a good feeling about the later"]
        ]);

push(@chatPatterns, 
        [qr/\ADo (.+)\?\z/, 
            ["Fate indicates that UIF0",
            "I don't think that UIF0",
            "Athena doesn't think so"]
        ]);

push(@chatPatterns, 
        [qr/\ADoes ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/, 
            ["Fate indicates that UIF0 does UIF1",
            "The spirits whisper \"UIF0 does not UIF1\""]
        ]);

push(@chatPatterns, 
        [qr/\AIs ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/, 
            ["Fate indicates that UIF0 is UIF1",
            "The stars are clear: UIF0 is not UIF1"]
        ]);

push(@chatPatterns, 
        [qr/\AWill ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/, 
            ["I predict that UIF0 will UIF1",
            "Based on these tea leaves it seems UIF0 will not UIF1"]
        ]);

push(@chatPatterns,
        [qr/\AWhy (.+)\?\z/,
            ["Because of reasons",
            "For important cosmic reasons"]
        ]);

push(@chatPatterns, 
        [qr/\?/,
            ["I'm sorry, could you try rewording that?",
            "Was that a question?"]
        ]);

push(@chatPatterns, 
        [qr/\A(Why|Is|Are|Do|Does|Will)/,
            ["Did you forget a question mark? Grammar is important!",
            "If you're asking a question, remember to use a question mark"]
        ]);

push(@chatPatterns,
        [qr/.*/,
            ["I don't want to talk about that. Please ask me a question",
            "I'm confused. Try a simple question instead",
            "I'm really good at yes no questions. Try one of those"]
        ]);

Optional Randomness Through Optional Arguments

Now that we have multiple possible responses inside of every single rule we’re going to need to update generateResponse. The first step is to get it to pull responses out of an array instead of reading them directly. After that we’ll also need to write code to randomize which response gets pulled out of the array in the first place.

Also, if we want DELPHI to be random with humans but predictable with tests we’re going to need some way to let DELPHI know when to be random and when to be boring. The simplest way to do this is to just add a second argument to generateResponse. The first argument will still be the user’s input but now we’ll use the second argument to decide whether to choose a random response or just stick to the first response in the array.

But that’s enough about that. I’ll just let the code speak for itself now:

sub generateResponse{
    my $userInput = $_[0];
    my $beRandom = $_[1];
    $userInput = switchFirstAndSecondPerson($userInput);

    foreach my $chatPattern (@chatPatterns){

        if(my @UIF = ($userInput =~ $chatPattern->[0])){
            my $response;
            if($beRandom){
                $numberOfResponses = scalar(@{ $chatPattern->[1] });
                $response = $chatPattern->[1][rand $numberOfResponses];
            }
            else{
                $response = $chatPattern->[1][0];
            }
            for(my $i=0; $i<@UIF; $i++){
                my $find = "UIF$i";
                my $replace = $UIF[$i];
                $response =~ s/$find/$replace/g;
            }
            return $response;
        }
    }
    return "Base Case Failure Error!";
}

I’m sure everyone can see the basic flow of this updated function. We use the $beRandom variable to help us decide which response pattern to use.

There is a little trickiness to the Perl I used for choosing random responses. The goal is to figure out how many responses are attached to the current rule and then choose one of them at random.

I start by extracting the response array from second half of the chat pattern with $chatPattern->[1]. But it turns out that this is actually just an array reference, not a true array, so before we can use it we have to turn it back into an array with the @{ $reference } syntax.

Finally I extract the length of the response array. In Perl you can get the length of an array by trying to force the array into a single value variable. You can also force this to happen by using the scalar function. Technically this was unnecessary since $numberOfResponses will force the response array to act as a scalar with or without the keyword, but I figured some of my less Perl inclined readers might appreciate having the switch from array to single number pointed out.

After we have the length of the response array everything else is easy. We use rand $numberOfResponses to generate a fraction somewhere between 0 and the maximum number of responses, which then gets automatically rounded down when we use them as an array index.

Updating Our Tests And User Interface

Thanks to Perl’s flexible function calls it turns out we don’t have to do anything at all to get our tests working with the new generateResponse code.

When we call generateResponse with only one argument the function can still reference the second argument with $_[1], it will just be filled with the “undefined” value. And since “undefined” is the same as “false” as far as if statements are concerned not including a $beRandom argument is the same as announcing we don’t want any randomness.

Updating the “chat.pl” user interface to include randomness is almost as easy. All we have to do is pass some sort of “true” value as the second argument to generateResponse. I like to use the value “1”, like this:

my $response = DELPHI::generateResponse($userInput,1);

Automatic And Manual Testing

With our code complete the first thing we should do is run our automatic tests to make sure that the non-random version of our test system still works like we want it to:

Passed 14 out of 14 tests

All Tests Passed!

Good. We didn’t break anything important while adding our new features. Now to boot up chat.pl and see if the random response system is working properly:

DELPHI: The oracle greets you. What questions are on your mind?

>Is DELPHI random now?

DELPHI: Fate indicates that DELPHI is random now

>Is DELPHI complete?

DELPHI: The stars are clear: DELPHI is not complete

>Why isn’t DELPHI complete?

DELPHI: For important cosmic reasons

>Goodbye

DELPHI: Come back anytime

Looks good. That’s two “Is X Y?” questions in a row that both got different responses. For more thorough testing just keep hitting DELPHI again and again with identical input and make sure that DELPHI switches between the various responses.

Conclusion

I’m pretty sure that’s the last bit of Perl in this Let’s Program. We can match input, generate responses and DELPHI works fine with both our test program and our user interface.

Future improvements to DELPHI will probably have much more to do with writing new rules and responses than coming up with new algorithms. In fact, my next post is going to focus entirely on the art of writing computer responses that will convince users your program is almost human.

* That’s all of you.

Let’s Program A Chatbot 13: What’s Mine Is Yours

Posted on November 26, 2013 by Scott

The Last Test Case (For Now…)

We’re down to our final test case. Are you excited? I’m excited!

Test Case 4 Failed!!!

Input: Do my readers enjoy this blog?

Output: Fate indicates that my readers enjoy this blog

Expected: Fate indicates that your readers enjoy this blog

Hey, that’s just a “do” rule. We already solved that problem last time. What’s going on here?

Oh, wait. The problem isn’t the “do”. The problem is that the questions mentioned “my readers” and DELPHI was supposed to be smart enough to switch the answer around to “your readers”. But DELPHI didn’t do that. We should fix that.

1st And 2nd Person Made Easy

The idea of first versus second person is way too complex for a simple pattern matching chatbot like DELPHI. But the idea of replacing word A with word B is simple enough. And it turns out that replacing first person words with second person words, and vice-versa, is good enough for almost every question that DELPHI is going to run into.

But be careful! When trying to swap A to B at the same time you are swapping B to A it is very possible to accidentally end up with all A. What do I mean? Consider this example:

My dog is bigger than your dog.

We switch the first person words to second person words:

Your dog is bigger than your dog.

Then we switch the second person words to first person:

My dog is bigger than my dog.

I’m sure you can see the problem.

The other big issue to look out for is accidentally matching words we don’t want to. Allow me to demonstrate:

You are young

We want to change that to:

I am young

But if all we do is blindly swap “I” for “you” we can easily end up with this:

I am Ing

For an even worse example consider this one:

I think pink is nifty.

You thyounk pyounk yous nyoufty.

Solving The Problems

Switching “you” to “I” while avoiding chaning “young” to “Ing” is pretty simple with regular expressions. All we have to do is use the “word boundary” symbol \b. Like so:

\byou\b

This will automatically skip over any instances of “you” that are directly attached to other letters or symbols.

Making sure that we don’t accidentally switch words from first to second person and then back from second to first will be a little more tricky. There are several possible solutions, some involving some cool regex and Perl tricks, but for now I’m just going to use to something very straightforward.

Basically I’m going to replace every first and second person word with a special placeholder value that I’m relatively certain won’t show up in normal DELPHI conversations. Then I will change all the placeholder values to their final. Here is how this will work with the above example:

My dog is bigger than your dog.

We switch the first person words to placeholders

DELPHIyour dog is bigger than your dog.

Then we switch the second person words to placeholders. Because we used a the placeholder “DELPHIyour” instead of plain “your” we don’t accidentally switch the first word back to “my”.

DELPHIyour dog is bigger than DELPHImy dog

Then we replace the placeholders

Your dog is bigger than my dog.

Here It Is In Code

I like foreach loops, so I’m going to implement this as two arrays and two foreach loops. The first array will contain regular expressions for finding first and second person words along with the place holders we want to replace them with. The second will contain regular expressions for finding placeholders and replacing them with the proper first and second phrases.

To implement this I just drop these variables and this function into DELPHI.pm right after generateResponse. The only new coding trick to look for is the ‘i’ modifier on the end of some of the rgeular expressions. This is the “case insensitive” switch and makes sure that DELPHI can match the words we want whether they are capitalized or not*.

#Dictionaries used to help the switchFirstAndSecondPerson function do its job
my @wordsToPlaceholders;

$wordsToPlaceholders[0][0]=qr/\bI\b/i;
$wordsToPlaceholders[0][1]='DELPHIyou';

$wordsToPlaceholders[1][0]=qr/\bme\b/i;
$wordsToPlaceholders[1][1]='DELPHIyou';

$wordsToPlaceholders[2][0]=qr/\bmine\b/i;
$wordsToPlaceholders[2][1]='DELPHIyours';

$wordsToPlaceholders[3][0]=qr/\bmy\b/i;
$wordsToPlaceholders[3][1]='DELPHIyour';

$wordsToPlaceholders[4][0]=qr/\byou\b/i;
$wordsToPlaceholders[4][1]='DELPHIi';

$wordsToPlaceholders[5][0]=qr/\byour\b/i;
$wordsToPlaceholders[5][1]='DELPHImine';

my @placeholdersToWords;

$placeholdersToWords[0][0]=qr/DELPHIyou/;
$placeholdersToWords[0][1]='you';

$placeholdersToWords[1][0]=qr/DELPHIyour/;
$placeholdersToWords[1][1]='your';

$placeholdersToWords[2][0]=qr/DELPHIyours/;
$placeholdersToWords[2][1]='yours';

$placeholdersToWords[3][0]=qr/DELPHIi/;
$placeholdersToWords[3][1]='I';

$placeholdersToWords[4][0]=qr/DELPHImine/;
$placeholdersToWords[4][1]='mine';

$placeholdersToWords[5][0]=qr/DELPHImy/;
$placeholdersToWords[5][1]='my';

sub switchFirstAndSecondPerson{
    my $input =$_[0];

    foreach my $wordToPlaceholder (@wordsToPlaceholders){
        $input =~ s/$wordToPlaceholder->[0]/$wordToPlaceholder->[1]/g;
    }

    foreach my $placeholderToWord (@placeholdersToWords){
        $input =~ s/$placeholderToWord->[0]/$placeholderToWord->[1]/g;
    }

    return $input;
}

Using The New Function In Generate Response

With that out of the way all that is left is to figure out where inside of generateResponse we should be calling this function. My first thought was to just stick onto the end of the function by finding the original return statement:

return $response;

And replacing it with this:

return switchFirstAndSecondPerson($response);

Now this is where test driven development comes in handy because that simple change did indeed pass test case 4… but it also caused messes like this:

Test Case 0 Failed!!!

Input: Will this test pass?

Output: you predict that this test will pass

Expected: I predict that this test will pass

…

Test Case 8 Failed!!!

Input: Pumpkin mice word salad

Output: you don’t want to talk about that. Please ask you a question

Expected: I don’t want to talk about that. Please ask me a question

We’ve accidentally made it impossible for DELPHI to talk in first person, which wasn’t what we wanted at all. We only wanted to change first and second words from the user’s input fragments, not from our carefully handwritten DELPHI responses. Which is a pretty good hint that we should have called firstToSecondPerson on the users input BEFORE we tried to parse it and generate a response, not after. Maybe right at the beginning of the function:

sub generateResponse{
    my $userInput = $_[0];
    $userInput = switchFirstAndSecondPerson($userInput);

    foreach my $chatPattern (@chatPatterns){

        if(my @UIF = ($userInput =~ $chatPattern->[0])){
            my $response = $chatPattern->[1];
            for(my $i=0; $i<@UIF; $i++){
                my $find = "UIF$i";
                my $replace = $UIF[$i];
                $response =~ s/$find/$replace/g;
            }
            return $response;
        }
    }
    return "Base Case Failure Error!";
}

The Moment Of Truth

Did we do it? Did we resolve our final use case?

Drum roll please…………

Test Case 0 Passed

Test Case 1 Passed

Test Case 2 Passed

Test Case 3 Passed

Test Case 4 Passed

Test Case 5 Passed

Test Case 6 Passed

Test Case 7 Passed

Test Case 8 Passed

Test Case 9 Passed

Test Case 10 Passed

Test Case 11 Passed

Test Case 12 Passed

Test Case 13 Passed

——————–

Passed 14 out of 14 tests

All Tests Passed!

WHOOO! GO US!

Note To Exceptionally Clever Readers

All my readers are clever, but some of you are exceptionally clever. And you may have noticed that switchFirstAndSecondPerson always returns lowercase words even when the original word was capitalized or at the beginning of the sentence. This isn’t a huge problem, but if you’re a perfectionist it might be bugging you to accidentally change “I care about grammar” to “you care about grammar” instead if “You care about grammar”.

One easy solution would be to update DELPHI to capitalize it’s entire output. People are used to computer programs SPEAKING IN ALL CAPS and it saves us the effort of having to actually teach DELPHI anything about proper capitalization.

If you don’t like the caps lock look you could instead update DELPHI to always make sure output starts with a capital. More often than not this is all it takes to make sentence look like real English.

Or you can do just do what I do and ignore the problem. I’m not going to worry too much about the occasional lowercase “you” or “my” unless users start complaining. And since this program isn’t intended for any real users that’s not likely to ever happen. Customer satisfaction is easy when you have no customers!

Conclusion

That’s it! We’ve passed all of our primary use cases. DELPHI is done.

Or is it? If you can remember all the way back to the original design document one thing we wanted out of DELPHI was the ability to generate random responses to questions. DELPHI currently just guesses “yes” to all questions which is both useless and boring. So while we hit a very important benchmark today we’re still not quite done.

* You know what else case insensitive regular expressions would be good for? Making DELPHI more accepting of user input that isn’t properly capitalized. Expect this to happen in a future blog post.