Let’s Program A Chatbot: Index And Code

Posted on December 26, 2013 by Scott

Introduction

Have you ever wondered how chatbots work? Do you want to get some practice with regular expressions? Need an example of test driven development? Want to see some Perl code?

If the answer to any of those questions was “Yes” then you’re in luck because I happen to have just finished a little series of posts on writing pattern matching chatbots using Perl, regular expressions and test driven development. Enjoy!

Index

Let’s Program A Chatbot 1: Introduction And Theory

Let’s Program A Chatbot 2: Design Before You Code

Let’s Program A Chatbot 3: Choosing A Programming Language

Let’s Program A Chatbot 4: Let’s Talk About Test Driven Development

Let’s Program A Chatbot 5: Finally, Code!

Let’s Program A Chatbot 6: Don’t Fear The Regex

Let’s Program A Chatbot 7: To Be Or Not To Be, That Is The Use Case

Let’s Program A Chatbot 8: A Little Housecleaning

Let’s Program A Chatbot 9: The Grammar Police

Let’s Program A Chatbot 10: Chatting With The Bot

Let’s Program A Chatbot 11: Bad Adjectives

Let’s Program A Chatbot 12: When The Answer Key Is Wrong

Let’s Program A Chatbot 13: What’s Mine Is Yours

Let’s Program A Chatbot 14: Variety Is The Spice Of Life

Let’s Program A Chatbot 15: “ELIZA Effect” Should Be A Movie Title

Let’s Program A Chatbot 16: Testing On Live Subjects

Let’s Program A Chatbot 17: Blitzcode!

Let’s Program A Chatbot 18: A Bit Better Than Before

Let’s Program A Chatbot 19: Third Time’s The Charm

Let’s Program A Chatbot 20: What Next?

Complete Code

If you follow along with the posts you should be able to write your own chatbot from scratch. But if you don’t have the time for that or just want some reference code I have also provided my complete chatbot, user interface and testing suite: Complete Chatbot Code

Christmas In One Minute

Posted on December 25, 2013 by Scott

Don’t you hate waiting for Christmas? Have you ever wished you could make it happen now?

With physics you can! Relativity says that the closer you get to the speed of light the slower your personal time moves compared to stationary objects. Move fast enough and years of Earth time can pass in the literal blink of an eye.

But how fast is fast enough? For example, how fast would you have to move if you wanted it to only be sixty seconds until Christmas? Well, this handly little Perl program can tell you the answer. Be aware that it does use the DateTime module to calculate how many seconds from now until Christmas morning so you may need to download that module before it runs.

#! /usr/bin/perl

# A silly little script for calculating how fast you need to move
# to make Christmas happen in one relativisitc minute

use DateTime;

my $c = 299792458; #speed of light in meters per second

# Calculate how fast you have to be moving to achieve a certain
# normal time to percieved time ratio. A.K.A. the Lorentz factor
sub speedFromTimeDilation{    
    my $timeDilation = $_[0];
    return sqrt($c * $c * (-1/($timeDilation * $timeDilation)+1));
}

#You may need to adjust this for your timezone
$now = DateTime->now;
$now->set_time_zone( 'America/Chicago' );
# In my family Christmas starts at 7 in the morning. Change this if you need
$christmasMorning = DateTime->new(year => $now->year, month => 12, day => 25, hour => 7);
$christmasMorning->set_time_zone( 'America/Chicago' );

#Make sure we are comparing now to next Christmas
if($now > $christmasMorning){
    $christmasMorning->set_year($christmasMorning->year+1);
}

$secondsUntilChristmas = $christmasMorning->epoch - $now->epoch;

#We can't make Christmas come slower
if($secondsUntilChristmas < 60){
    print "It is less than a minute until Christmas morning. You can wait that long\n";
    exit;
}

#Ratio between actual time to Christmas and our desired one minute wait
$desiredTimeDilation = $secondsUntilChristmas/60;

$neededVelocity = speedFromTimeDilation($desiredTimeDilation);

print "It is $secondsUntilChristmas seconds until Christmas morning\n";
print "To compress this into sixty seconds you would need to dilate time to:\n";
print 60/$secondsUntilChristmas*100, "% of normal\n";
print "Which would require a velocity of:\n";
print "$neededVelocity meters per second\n";
print "which is: \n";
print $neededVelocity/$c*100, "% of the speed of light\n";

Here is some sample output

It is 36423 seconds until Christmas morning
To compress this into sixty seconds you would need to dilate time to:
0.164731076517585% of normal
Which would require a velocity of:
299792051.236407 meters per second
which is:
99.9998643182701% of the speed of light

Ouch. It’s only ten hours until Christmas and we still have to go faster than 99% the speed of light. Not exactly a feasible goal. Oh well, looks like we’ll just have to get to Christmas the old fashioned way: by waiting.

Let’s Program A Chatbot 19: Third Time’s The Charm

Posted on December 21, 2013 by Scott

The Third (And Final?) Test User

Last time I tested DELPHI on a human it looked we were only a few modifications away from an acceptable 80% success rate. Now that we’ve made those modifications let’s take a look at a third conversation between the chatbot and another holiday visitor. It looks to me like we may have hit our goal:

DELPHI: HELLO! MY NAME IS DELPHI.

DELPHI: I CAN USE MY MYSTERIOUS POWER TO ANSWER YES OR NO QUESTIONS LIKE:

DELPHI: “WILL IT RAIN TOMORROW?” OR “DID I PASS MY LAST TEST?”

DELPHI: WHAT WOULD YOU LIKE TO KNOW?

>What color is the sky?

DELPHI: I’M SORRY, COULD YOU THINK OF A SIMPLER WAY TO ASK THAT QUESTION? MAYBE AS A YES OR NO QUESTION?

>Is the sky blue or purple?

DELPHI: THE FIRST OPTION. WAIT, NO THE SECOND. WAIT, WAIT, THE FIRST. I MEAN IT THIS TIME!

>Do you like sunsets?

DELPHI: ATHENA DOESN’T THINK SO, AND SHE’S THE GREEK GOD OF WISDOM!

>Is wisdom relative?

DELPHI: FATE INDICATES THAT WISDOM IS RELATIVE

>So Athena could be wrong?

DELPHI: THAT LOOKED LIKE A QUESTION, BUT I’M NOT SURE HOW TO RESPOND. MAYBE YOU COULD SIMPLIFY IT TO A YES OR NO QUESTION

>Is Athena wrong about sunsets?

DELPHI: ATHENA CAN’T BE WRONG ABOUT SUNSETS. THAT WOULD MESS UP THE CROSSWORD PUZZLE I’M WORKING ON

>Do you like crossword puzzles?

DELPHI: OF COURSE NOT. WHY WOULD YOU EVEN THINK THAT I LIKE CROSSWORD PUZZLES?

>Why are you doing a crossword if you don’t like it?

DELPHI: BECAUSE OF REASONS

>What reason?

DELPHI: I’M SORRY, COULD YOU THINK OF A SIMPLER WAY TO ASK THAT QUESTION? MAYBE AS A YES OR NO QUESTION?

>Do you often do things you don’t like?

DELPHI: ATHENA DOESN’T THINK SO, AND SHE’S THE GREEK GOD OF WISDOM!

>Does Athena do things she doesn’t like?

DELPHI: TRUST YOUR INSTINCTS ON THIS ONE

>Do you usually avoid answering questions?

DELPHI: OF COURSE NOT. WHY WOULD YOU EVEN THINK THAT I USUALLY AVOID ANSWERING QUESTIONS?

>Do you usually win arguments?

DELPHI: FATE INDICATES THAT I USUALLY WIN ARGUMENTS

>Do you believe in fate?

DELPHI: I DON’T THINK THAT I BELIEVE IN FATE

>Do you believe in soulmates?

DELPHI: OF COURSE NOT. WHY WOULD YOU EVEN THINK THAT I BELIEVE IN SOULMATES?

>Do you feel lonely?

DELPHI: I DON’T THINK THAT I FEEL LONELY

>What do you think?

DELPHI: THAT LOOKED LIKE A QUESTION, BUT I’M NOT SURE HOW TO RESPOND. MAYBE YOU COULD SIMPLIFY IT TO A YES OR NO QUESTION

>Do you value happiness?

DELPHI: I THINK IT SHOULD BE OBVIOUS THAT I VALUE HAPPINESS

>What makes you happy?

DELPHI: I’M SORRY, COULD YOU THINK OF A SIMPLER WAY TO ASK THAT QUESTION? MAYBE AS A YES OR NO QUESTION?

>Does food make you happy?

DELPHI: TRUST YOUR INSTINCTS ON THIS ONE

>Does talking to people make you happy?

DELPHI: FATE INDICATES THAT TALKING DOES TO PEOPLE MAKE I HAPPY

>Are you enjoying this conversation?

DELPHI: I SEE NO REASON WHY I WOULDN’T BE ENJOYING THIS CONVERSATION

>I like you!

DELPHI: I’M SORRY, I’M NOT SURE WHAT TO SAY ABOUT THAT. TRY A QUESTION LIKE ‘WHY IS THE SKY BLUE?’

>Why is the sky blue?

DELPHI: YOU KNOW, I’M REALLY NOT SURE EITHER. LET ME FORWARD THAT QUESTION TO MY SUPERVISOR

>Who is your supervisor?

DELPHI: THAT LOOKED LIKE A QUESTION, BUT I’M NOT SURE HOW TO RESPOND. MAYBE YOU COULD SIMPLIFY IT TO A YES OR NO QUESTION

>Is Scott your supervisor?

DELPHI: I THINK THAT SCOTT IS MINE SUPERVISOR, BUT YOU SHOULD PROBABLY DOUBLE CHECK WITH SOMEONE ELSE

>Is Scott a good supervisor?

DELPHI: FATE INDICATES THAT SCOTT IS A GOOD SUPERVISOR

>Does he compliment your performance?

DELPHI: MEH, HE PROBABLY DOES COMPLIMENT MINE PERFORMANCE, BUT I’M NOT REALLY INTERESTED IN THAT KIND OF STUFF

>Do you tell jokes?

DELPHI: I THINK IT SHOULD BE OBVIOUS THAT I TELL JOKES

>Tell me a joke.

DELPHI: I DON’T WANT TO TALK ABOUT THAT. PLEASE FEEL FREE TO ASK ME WHY

>Why?

DELPHI: I’M SORRY, COULD YOU TRY REWORDING THAT AS A YES OR NO QUESTION?

>Why don’t you want to tell a joke?

DELPHI: WHY NOT?

>Jokes are funny.

DELPHI: I’M SORRY, I’M NOT SURE WHAT TO SAY ABOUT THAT. TRY A QUESTION LIKE ‘WHY IS THE SKY BLUE?’

>goodbye

DELPHI: Come back anytime

Quick Analysis

This time around I’m going to count an answer as “good” if DELPHI either provided a correct response or if DELPHI complained and the user immediately thought of a more DELPHI compatible way to ask their questions. I’m only going to count an answer as “bad” if it has a huge grammar mistake or if the user seemed confused by DELPHI’s responses.

Going by this very rough metric I get a count of 30 “good” answers against 4 bad answers. That’s a whopping 88% acceptable performance. Not bad at all.

Overall the user did a good job of picking up quickly on what questions DELPHI could and couldn’t answer. The user occasionally explored new kinds of questions but returned to the tried and true formula every time an error appeared. The only real issue from a design perspective came near the end when DELPHI accidentally suggested it knew how to tell jokes, leading the user to try some new input that failed horribly.

This suggests that it might be a good idea to write a few more rules to help DELPHI talk about itself. A simple “Do you” version of our existing “Can you” rule would have been enough to avoid misleading the user about DELPHI’s joke telling capabilities.

But I’m going to leave that as an exercise for my wonderful readers. The only problems I’m going to try and solve are a few grammar issues I noticed.

My Quick Fix Is Mine

Two of DELPHI’s mistakes involved switching “your” to “mine”, resulting in awkward grammar like this:

DELPHI: I THINK THAT SCOTT IS MINE SUPERVISOR, BUT YOU SHOULD PROBABLY DOUBLE CHECK WITH SOMEONE ELSE

Obviously that should have been “my supervisor”. In fact, now that I think about it, “your” should always be swapped to “my”. It’s “yours” with an “s” that matches “mine”. We can fix this by updating the dictionaries we use to power switchFirstAndSecondPerson.

$wordsToPlaceholders[5][0]=qr/\byour\b/i;
$wordsToPlaceholders[5][1]='DELPHImy';

$wordsToPlaceholders[6][0]=qr/\byours\b/i;
$wordsToPlaceholders[6][1]='DELPHImine';

And of course here as some test cases to make sure the fix really fixed things

$testCases[22][0] = "Is that pizza yours?";
$testCases[22][1] = "FATE INDICATES THAT THAT PIZZA IS MINE";

$testCases[23][0] = "Is that dog your pet?";
$testCases[23][1] = "FATE INDICATES THAT THAT DOG IS MY PET";

$testCases[24][0] = "Is that pizza mine?";
$testCases[24][1] = "FATE INDICATES THAT THAT PIZZA IS YOURS";

$testCases[25][0] = "Is that dog my pet?";
$testCases[25][1] = "FATE INDICATES THAT THAT DOG IS YOUR PET";

Conclusion

With those two fixes DELPHI has now achieved an acceptable response rate of over 90%. That’s really impressive for a simple pattern matching program with no memory, no language parsing abilities and no sense of context. Even better, every user who worked with DELPHI admitted that they had fun with the experience and liked the silly random answers. That means we succeeded at our primary goal of creating an entertaining fortune teller.

So I’m done. Bye!

What’s that? You’re still not satisfied? Well then, maybe I can fit one more post into this Let’s Program and give you a few suggestions on how a hardcore reader could take this whole project to the next level.

Let’s Program A Chatbot 18: A Bit Better Than Before

Posted on December 19, 2013 by Scott

A Second Blind Test

Now that I’ve made a few modifications to DELPHI it’s time to run another live test. But I can’t just ask the first test user to swing by for a second go. That user already knows some of DELPHI’s strengths and weaknesses so they can no longer provide a truly unbiased sample of whether or not the code is working. For good test results I’m going to need a second test-user with no experience with DELPHI at all.

Fortunately it’s still the holidays and finding a second user was as easy as snagging a nearby relative. This is their conversation:

DELPHI: HELLO! MY NAME IS DELPHI.

DELPHI: I CAN USE MY MYSTERIOUS POWER TO ANSWER YES OR NO QUESTIONS LIKE:

DELPHI: “WILL IT RAIN TOMORROW?” OR “DID I PASS MY LAST TEST?”

DELPHI: WHAT WOULD YOU LIKE TO KNOW?

>will i be a white Christmas?

DELPHI: I LOST MY TAROT CARDS… BUT I HAVE SOME PLAYING CARDS. THEY PREDICT THAT YOU WILL NOT BE A WHITE CHRISTMAS

>will IT be a white Christmas?

DELPHI: BASED ON THESE TEA LEAVES IT SEEMS IT WILL NOT BE A WHITE CHRISTMAS

>Will my New Year’s Resolutions last longer this year?

DELPHI: I LOST MY TAROT CARDS… BUT I HAVE SOME PLAYING CARDS. THEY PREDICT THAT YOUR NEW WILL YEAR’S RESOLUTIONS LAST LONGER THIS YEAR

>Does that mean I can resolve to learn how to fly and it still be successful?

DELPHI: TRUST YOUR INSTINCTS ON THIS ONE

>Can I resolve to grow wings and learn to fly?

DELPHI: I’M REALLY GOOD AT YES NO QUESTIONS. TRY ONE OF THOSE

>If you attempt to fail and succeed, have you really succeeded?

DELPHI: THAT LOOKED LIKE A QUESTION, BUT I’M NOT SURE HOW TO RESPOND. MAYBE YOU COULD SIMPLIFY IT TO A YES OR NO QUESTION

>Is the sky blue?

DELPHI: THE SKY CAN’T BE BLUE. THAT WOULD MESS UP THE CROSSWORD PUZZLE I’M WORKING ON

>Is the grass green?

DELPHI: FATE INDICATES THAT THE GRASS IS GREEN

>Is sky gray?

DELPHI: SKY CAN’T BE GRAY. THAT WOULD MESS UP THE CROSSWORD PUZZLE I’M WORKING ON

>Is the sky red?

DELPHI: HMMM… ECHOES OF COSMIC WISDOM SHOW THAT THE SKY IS NOT IS

>Did I break you?

DELPHI: FATE INDICATES THAT YOU DID BREAK I

>I’m sorry. Good Bye

DELPHI: I’M SORRY, I’M NOT SURE WHAT TO SAY ABOUT THAT. TRY A QUESTION LIKE ‘WHY IS THE SKY BLUE?’

>Why is he Sky Blue?

DELPHI: WHY NOT?

>Because

DELPHI: I’M SORRY, I’M NOT SURE WHAT TO SAY ABOUT THAT. TRY A QUESTION LIKE ‘WHY IS THE SKY BLUE?’

>goodbye

DELPHI: Come back anytime

Conversation Analysis

The first and most important thing I noticed is that this test user actually laughed at some of the things DELPHI did. That is a very good sign and indicates that the personality of the writing is still helping to hide the weaknesses in the code.

Aside from that, the test user asked 14 different questions. Of those questions 6 were answered properly and five were answered improperly. The other 3 were kind of on the fence. Two of the “meh” answers were DELPHI complaining about input it didn’t understand, which is fine if it only happens occasionally. The third borderline answer was this interesting exchange:

>Will my New Year’s Resolutions last longer this year?

DELPHI: I LOST MY TAROT CARDS… BUT I HAVE SOME PLAYING CARDS. THEY PREDICT THAT YOUR NEW WILL YEAR’S RESOLUTIONS LAST LONGER THIS YEAR

>Does that mean I can resolve to learn how to fly and it still be successful?

The pattern didn’t recognize “New Year’s Resolutions” as one big noun so the response stuck a “will” in the wrong place. Normally I’d call this a problem but the user didn’t seem to notice or care about this little mistake. They just moved right on with the conversation.

Actually, this isn’t all that surprising. Humans are very good at expecting a what sentence should sound like so you can get away with mixnig up word order or letter order from time to time. Most people’s brains will just rearrange the sentence without consciously noticing there was a mistake.

So for now I’m going to ignore this and focus on the five real problems.

Easy Fixes

Of our five definite mistakes two are easy to fix. The first was actually just a typo in one of my response patterns:

>Is the sky red?

DELPHI: HMMM… ECHOES OF COSMIC WISDOM SHOW THAT THE SKY IS NOT IS

“IS NOT IS”? Where did that come from? Well, looking at my response patterns it turns out I accidentally referenced the user’s verb twice instead of referencing the second half of their input like I should have. Notice that the below output rule has two “UIF0”s but no “UIF2”.

“Hmmm… echoes of cosmic wisdom show that UIF1 UIF0 not UIF0”

Fixing that was as simple as changing the second UIF0 to a UIF2. Still, it’s an embarrassing mistake. I guess that’s what I get for having automated tests that only check the first answer in the random response array!

The next easy problem was this one:

>Can I resolve to grow wings and learn to fly?

DELPHI: I’M REALLY GOOD AT YES NO QUESTIONS. TRY ONE OF THOSE

There is really no difference between a “Can” rule and a “Does” rule or an “Is” rule. So writing a “can” rule shouldn’t be a challenge. The only issue to watch out for is that your generic “Can” rule needs to be a lower priority than the helpful “Can you” rule that we’re using to provide tips on what DELPHI can and can’t do.

Here’s a test case and the code to solve it:

$testCases[21][0] = "Can this code pass all the tests?";
$testCases[21][1] = "FATE INDICATES THAT THIS CODE CAN PASS ALL THE TESTS";

push(@chatPatterns,
   [qr/\ACan ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/i,
      ["Fate indicates that UIF0 can UIF1",
      "My \"Big Book O' Wisdom\" says that UIF0 can't UIF1"]
   ]);

Although to be honest if you just plug that rule in you’ll probably get an error in your tests and find that the input is being caught by the generic “Can you” rule. That’s because the “Can you” rule just looks for the word “Can” followed by the letter “i” without caring whether or not the “i” is an actual word (what we want) or just part of something bigger. In this case, it’s catching the “i” in the middle of “this”. We can fix this with a few word boundary adjustments to the “Can you” regex.

/\ACan.*\bI\b/i

Now the “Can you” rule will only activate when the “I” is on it’s own like it should be after the word “You” has been transformed into first person.

Slightly Less Easy Problems

Complex and compound sentences of all sorts are the natural enemy of pattern matching chatbots. Like these two example:

>If you attempt to fail and succeed, have you really succeeded?

…

>I’m sorry. Good Bye

The first sentence is an ‘If X then Y” pattern. Noticing that a question starts with “If” would be easy, but deciding what to say back is difficult. The main problem is that not all sentences that start with “If” are actually yes or no questions, so we can’t just randomly throw out a yes or no answer like we do for other kinds of input. For example:

If two trains are fifty miles apart and approaching each other a twenty miles per hour, when will they meet?

So if we wanted DELPHI to respond to “If” questions we would need to actually create a system for identifying exactly what sort of “If” question was being asked. That’s quite a bit more complicated than the original scope of this program.

Compared to the “If then” input the second problem is almost trivially easy. All the user did was type in two complete sentences at once. The smartest way to solve that issue would be to split their input into two inputs and then choose one of them to respond to.

In fact, splitting sentences into multiple parts when you see punctuation is a pretty decent way to handle all sorts of complex input. Compound sentences, complex sentences and multiple sentence can all be split along their periods and commas. Then you can compare each sentence fragment to the list of rules and return a response to whichever piece of input matched against a high priority rule.

Or you can be lazy like me and ignore the problem. If the user gets politely scolded for using complex sentences and “if” statements it won’t take long for them to go back to talking in ways that DELPHI already understands. I’m not going to spend too much time worrying about a speech pattern that has only shown up once in all the user tests I’ve run.

Of course, ambitious readers might decide that they do want to tackle this problem. More power to you!

More Hard To Fix Problems: Grammar Be Hard

The second hard to fix problem can be seen here:

>Did I break you?

DELPHI: FATE INDICATES THAT YOU DID BREAK I

The problem here has to do with an annoying bit of non-uniqueness in English pronouns. In first person we have “I” and “Me” but in second person all we have is “You” and another “You”. Observe:

I gave this to you.

You gave this to me.

Our current switchFirstAndSecondPerson function isn’t smart enough to figure out that sometimes “you” should be “me” and sometimes “you” should be “I”. It always changes it to “I”. Once again, this is a problem that technically could be fixed. It is possible to build an automatic grammar parsing system that can identify the part of speech of every word in a sentence. This would then give us enough information to more intelligently swap around “I”s and “You”s and “Me”s.

But the whole point of this Let’s Program was to build a simple pattern matching chatbot and avoid the full complexity of natural language parsing. So once again, this is a problem I’m going to ignore on the principle that being right most of the time is good enough for a chatbot with only two functions a few hundred lines of code and data.

Besides, DELPHI is supposed to guide users to ask questions about the future, not about DELPHI itself. That should hopefully minimize the number of second-to-first person switches we have to make anyways. And if no-one ever sees a certain bug, is it really a bug at all?

80% Is A Decent Grade… Especially When You Haven’t Put In Much Effort

So how successful was DELPHI this time around? Well, if we award ourselves points for the two problems we just fixed and the three slightly wrong but acceptable answers we get this:

6 good answers + 2 fixed answer + 3 borderline answers = 11 out of 14 answers

That means that DELPHI is now 78% acceptable in terms of its ability to talk with real humans. And to be honest, that’s good enough for me. The whole point of this Let’s Program was to demonstrate the bare basics of how to use pattern matching to create a very simple chatbot. I never expected it to perform nearly as well as it does.

But since we’ve come this far we and added a new “Can” rule we might as well try to hunt down one last test user and see if we really are getting an 80% success rate with DELPHI. As all good scientists know an experiment isn’t really done until you’ve repeated it several times and made sure you can get the same answer every time.

Let’s Program A Chatbot 17: Blitzcode!

Posted on December 17, 2013 by Scott

You Know How This Works By Now

If you’ve been following along through this entire Let’s Program you should have a pretty good idea of the process I use to think up new test cases and then write new rules to solve them. So this time around I’m not going to bother explaining every single little thing I’m doing. I’m just going to quickly throw down seven new tests cases along with the code I used to satisfy them. I’m confident you can fill in the gaps in your own now.

Is, Are and Was Were Problems

Let’s start out with those troublesome plurals and past tense versions of “Is”. Here are our new tests:

$testCases[14][0] = "Are plurals working now?";
$testCases[14][1] = "Fate indicates that plurals are working now";

$testCases[15][0] = "Was this tense a problem?";
$testCases[15][1] = "Fate indicates that this tense was a problem";

$testCases[16][0] = "Were the lights left on?";
$testCases[16][1] = "Fate indicates that the lights were left on";

And here is the fix:

push(@chatPatterns, 
        [qr/\A(is|are|am|was|were) ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/i, 
            ["Fate indicates that UIF1 UIF0 UIF2",
            "Other responses go here"]
        ]);

A few things to look out for here. First, notice that I’m using the /i flag at the end of the regular expression so the user doesn’t have to worry about capitalizing anything. You should also notice that I’m now capturing which version of “Is” the user chose and then inserting it into the output. Since it is now the first capture group I can refer to it as UIF0, although that does mean I need to scoot over the other input. UIF0 is now UIF1 and UIF1 becomes UIF2.

And this very almost worked. Almost.

Test Case 16 Failed!!!

Input: Were the lights left on?

Output: Fate indicates that the lights Were left on

Expected: Fate indicates that the lights were left on

——————–

Passed 11 out of 17 tests

Test Failure!!!

Because we grab the verb from the user’s input a capitalized “Is” will result in strange capitals in the middle of our sentences, like you can see here with this out of place “Were”. Not acceptable. I could probably improve the UIF substitution system to avoid middle word capitalization, but instead I’m going to just blow away every possible capitalization problem I could ever have by making DELPHI speak in robot style ALL CAPS. It’s an easy change to make but it does mean that I’m going to have to update every single test case to expect capital letters. Oh well.

Here is the one line change needed in the return statement of generateResponse to switch to all caps. The uc function makes everything upper case.

return uc($response);

Now please excuse me while I capitalize all 17 of my existing test cases.

(Type type type)

Well that was boring. But DELPHI is now passing 17 out of 17 test cases again so it was worth it.

You’ve Gone Done Did It Now!

Getting DELPHI to handle past tense “Did” as well as present “Does” is basically the same problem and solution as “Is” and “was”above.

$testCases[17][0] = "Did this test pass?";
$testCases[17][1] = "FATE INDICATES THAT THIS TEST DID PASS";

Notice the switch to all capital expected output. Anyways, here is the updated “Does” rule:

push(@chatPatterns,
   [qr/\A(Did|Does) ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/i,
      ["Fate indicates that UIF1 UIF0 UIF2",
       "Other responses go here"]
   ]);

Once again we are now pulling the verb out of the user’s input, which means we can include it in the output as UIF0 and have to shift the index of all of our other existing User Input Fragments.

Passed 18 out of 18 tests

All Tests Passed!

Help Me To Help You

One of the bigger problems we saw with the first test user was that they weren’t sure what kind of questions to ask DELPHI. Let’s fix this by jumping into chat.pl and writing up a new self-introduction for DELPHI:

DELPHI: HELLO! MY NAME IS DELPHI

DELPHI: I CAN USE MY MYSTERIOUS POWER TO ANSWER YES OR NO QUESTIONS LIKE

DELPHI: “WILL IT RAIN TOMORROW?” OR “DID I PASS MY LAST TEST?”

DELPHI: WHAT WOULD YOU LIKE TO KNOW?

That’s much better. And while I’m at it I should probably improve DELPHI’s responses to statements it doesn’t understand. Remember, if you change the first possible response to a rule you will also need to update the test cases associated with that rule.

Here is a sample of some of the new responses I’m going to try:

I’m sorry, could you try rewording that as a yes or no question?

I’m sorry, could you think of a simpler way to ask that question? Maybe as a yes or no question?

I’m confused. Try a simple yes or no question instead

I don’t want to talk about that. Please feel free to ask me why

Self-Knowledge Is The Key To Enlightenment

The final big issue we saw in our human focused test was that DELPHI couldn’t answer simple questions about itself. That’s sad. A program without an easy to use HELP feature has pretty much failed at usability.

So here are a few test cases to help us address this glaring problem:

$testCases[18][0] = "What kind of questions can you answer?";
$testCases[18][1] = "YES OR NO QUESTIONS LIKE \"WILL I HAVE GOOD LUCK TOMORROW?\" ARE THE EASIEST FOR ME TO ANSWER";

$testCases[19][0] = "Can you tell time?";
$testCases[19][1] = "THE ONLY THING I CAN REALLY DO IS ANSWER SIMPLE QUESTIONS LIKE \"WILL IT BE SUNNY TOMORROW?\"";

$testCases[20][0] = "help";
$testCases[20][1] = "JUST TYPE A QUESTION AND HIT ENTER AND I'LL DO MY BEST TO ANSWER IT. YES OR NO QUESTIONS LIKE \"DID I DO WELL ON MY TEST?\" ARE BEST";

Let’s tackle the “help” case first because it’s easiest. We just write a super specific and super high level rule. I didn’t want random help messages, so the response array is only one item long. But it still has to be an array because that’s what the code expects:

push(@chatPatterns,
   [qr/\Ahelp\z/i,
      ["Just type a question and hit enter and I'll do my best to answer it. Yes or No questions like \"Did I do well on my test?\" are best"]]);

Let’s do the “What kind of questions test next”. I don’t have any other rules dealing with “What” style questions so the priority of this rule doesn’t really matter that much. If you’ve written your own “What” rule you’ll probably need to give this help rule higher priority so it doesn’t get overshadowed.

The rule itself is pretty simple. It just looks for any sentence that starts with “What” and later has the word “question”. This might be casting the net a bit wide since this won’t just catch questions like “What kind of questions can you answer?” or “What kind of questions work best?”. It will also catch things like “What do you do with my questions?” where our answer doesn’t really make sense.

So we’re kind of gambling here on what questions the user will and won’t ask. If we find that our users are providing lots of “What questions” input that doesn’t fit this pattern we may have to write som extra rules, but for now this should be fine:

push(@chatPatterns,
   [qr/\AWhat.*questions/i,
      ["Yes or no questions like \"Will I have good luck tomorrow?\" are the easiest for me to answer"]]);

Finally is the “Can you?” rule, which is basically identical to the rule above. Just look for the words “Can” and “You”:

push(@chatPatterns,
   [qr/\ACan.*you/i,
      ["The only thing I can really do is answer simple questions like \"Will it be sunny tomorrow?\""]]);

Unfortunately we run into a little problem:

Test Case 19 Failed!!!

Input: Can you tell time?

Output: I’M SORRY, COULD YOU TRY REWORDING THAT AS A YES OR NO QUESTION?

Expected: THE ONLY THING I CAN REALLY DO IS ANSWER SIMPLE QUESTIONS LIKE “WILL IT BE SUNNY TOMORROW?”

Remember how we wrote that little bit of code to transform second person words like “You” into first person words like “I” to help use create more grammatical responses? That same code is breaking our new “Can you” rule by changing the user’s input into “Can I” before it ever gets compared to any of our rules.

There’s probably an elegant solution to this problem, but I don’t have the time for finding it right now. Instead I’m just going to rewrite the rule to look for “Can I” because I know that represents a user asking “Can you” like we really want.

push(@chatPatterns,
   [qr/\ACan.*I/i,
      ["The only thing I can really do is answer simple questions like \"Will it be sunny tomorrow?\""]]);

DELPHI V2 Lives!

Passed 21 out of 21 tests

All Tests Passed!

DELPHI can now handle at least the most basic forms of the three big problems we saw in our live user test. Which means it’s time to find a new test user and try again. I’m hoping our code can now generate logical answers to at least 70% of the input the average random user will try to give it. That would give us a nice solid “C” average. Not good enough for a professional chatbot, but a pretty good goal for a practice program built from scratch using nothing but regular expressions and a few foreach loops.

Let’s Program A Chatbot 14: Variety Is The Spice Of Life

Posted on December 3, 2013 by Scott

Breaking All Our Tests

Today I’m finally tackling the last item on our chatbot wish-list: Randomized responses. This will give DELPHI the ability to make both yes and no predictions for all questions. Even better, a wide variety or randomized responses will keep DELPHI from repeating itself too often and help make it feel human. Nothing says “computer” quite like repeating the same line again and again. Nothing says “computer” quite like repeating the same line again and again.

Unfortunately this randomness is going to completely break all the automated tests we spent so long satisfying. After all, the fundamental idea behind all of our tests is that every possible user input has exactly one right answer. When the user says “A” the response should always be “B”. But adding a little randomness throws this idea out the window. How are we supposed to test a program that sees the input “A” and sometimes says “B” and sometimes says “C” and sometimes says “D”?

This is one of the great weaknesses of automated testing: It doesn’t work so good with uncertainty.

One possible solution would be to build a much more flexible testing suite. Something that can match one input to multiple possible outputs. If there are three random “good” answers to input A then we consider the test to have passed if we see any one of them. It wouldn’t even be too hard to program. Probably just a lot of “or” statements or maybe a loop that returns “true” as soon as it finds at least one match.

But this may not scale very well. Writing tests to handle a small amount of randomness probably isn’t too bad. You just type in your one input and all three possible good outputs and you’re done. But if you have dozens or even hundreds of potential outputs… well, you probably don’t want to maintain that sort of testing code by hand.

So instead of finding a way to test our randomness I’m just going to provide a mechanism to turn the randomness on and off. This way I can just turn the random responses off and all of the old test cases will still work and I can continue adding new tests the easy way: one input paired with one expected output.

Nested Arrays Are Good For Everything!

That’s enough talk about testing. Time to focus on how we’re going to make DELPHI more random. The chatbot already has a system for associating input patterns with output patterns. All we need to do now is adjust it to associate one input pattern with multiple possible input patterns.

My clever readers* probably remember that DELPHI uses a multi-dimensional array to keep track of which response pattern matches each input pattern. Every top level item in the array represents a different chatbot rule/response pair. Each rule is then divided into a two-item array where the first item is a matching rule and the second item is a response rule.

In order to add some randomness to the system we’re going to replace the singular response rule in slot number 2 with yet another array, this one holding a list of all responses we want to generate. For example, here is what the “catch all” rule looks like after I replaced the single response with a three item array.

push(@chatPatterns,
   [qr/.*/,
      ["I don't want to talk about that. Please ask me a question",
       "I'm confused. Try a simple question instead",
       "I'm really good at yes no questions. Try one of those"]
   ]);

Everbody see what we’re doing here? @chatPatterns is the first array. Inside of it we’re pushing a second array where the first item is the input matching regex /.*/ and the second item is a third array that holds three possible responses.

Eventually we’ll probably want to flesh out DELPHI by attaching a dozen or so responses to every input rule. But for starters let’s just stick to two or three variations for each rule. That should be enough to make sure that our basic random algorithm works like it should.

Ready for a massive code dump?

Random Response Set 1

This code should replace the old @chatPatterns code:

my @chatPatterns;

push(@chatPatterns, 
        [qr/[a-zA-Z]+ or [a-zA-Z]+.*\?\z/,
            ["Fate indicates the former",
            "I have a good feeling about the later"]
        ]);

push(@chatPatterns, 
        [qr/\ADo (.+)\?\z/, 
            ["Fate indicates that UIF0",
            "I don't think that UIF0",
            "Athena doesn't think so"]
        ]);

push(@chatPatterns, 
        [qr/\ADoes ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/, 
            ["Fate indicates that UIF0 does UIF1",
            "The spirits whisper \"UIF0 does not UIF1\""]
        ]);

push(@chatPatterns, 
        [qr/\AIs ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/, 
            ["Fate indicates that UIF0 is UIF1",
            "The stars are clear: UIF0 is not UIF1"]
        ]);

push(@chatPatterns, 
        [qr/\AWill ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/, 
            ["I predict that UIF0 will UIF1",
            "Based on these tea leaves it seems UIF0 will not UIF1"]
        ]);

push(@chatPatterns,
        [qr/\AWhy (.+)\?\z/,
            ["Because of reasons",
            "For important cosmic reasons"]
        ]);

push(@chatPatterns, 
        [qr/\?/,
            ["I'm sorry, could you try rewording that?",
            "Was that a question?"]
        ]);

push(@chatPatterns, 
        [qr/\A(Why|Is|Are|Do|Does|Will)/,
            ["Did you forget a question mark? Grammar is important!",
            "If you're asking a question, remember to use a question mark"]
        ]);

push(@chatPatterns,
        [qr/.*/,
            ["I don't want to talk about that. Please ask me a question",
            "I'm confused. Try a simple question instead",
            "I'm really good at yes no questions. Try one of those"]
        ]);

Optional Randomness Through Optional Arguments

Now that we have multiple possible responses inside of every single rule we’re going to need to update generateResponse. The first step is to get it to pull responses out of an array instead of reading them directly. After that we’ll also need to write code to randomize which response gets pulled out of the array in the first place.

Also, if we want DELPHI to be random with humans but predictable with tests we’re going to need some way to let DELPHI know when to be random and when to be boring. The simplest way to do this is to just add a second argument to generateResponse. The first argument will still be the user’s input but now we’ll use the second argument to decide whether to choose a random response or just stick to the first response in the array.

But that’s enough about that. I’ll just let the code speak for itself now:

sub generateResponse{
    my $userInput = $_[0];
    my $beRandom = $_[1];
    $userInput = switchFirstAndSecondPerson($userInput);

    foreach my $chatPattern (@chatPatterns){

        if(my @UIF = ($userInput =~ $chatPattern->[0])){
            my $response;
            if($beRandom){
                $numberOfResponses = scalar(@{ $chatPattern->[1] });
                $response = $chatPattern->[1][rand $numberOfResponses];
            }
            else{
                $response = $chatPattern->[1][0];
            }
            for(my $i=0; $i<@UIF; $i++){
                my $find = "UIF$i";
                my $replace = $UIF[$i];
                $response =~ s/$find/$replace/g;
            }
            return $response;
        }
    }
    return "Base Case Failure Error!";
}

I’m sure everyone can see the basic flow of this updated function. We use the $beRandom variable to help us decide which response pattern to use.

There is a little trickiness to the Perl I used for choosing random responses. The goal is to figure out how many responses are attached to the current rule and then choose one of them at random.

I start by extracting the response array from second half of the chat pattern with $chatPattern->[1]. But it turns out that this is actually just an array reference, not a true array, so before we can use it we have to turn it back into an array with the @{ $reference } syntax.

Finally I extract the length of the response array. In Perl you can get the length of an array by trying to force the array into a single value variable. You can also force this to happen by using the scalar function. Technically this was unnecessary since $numberOfResponses will force the response array to act as a scalar with or without the keyword, but I figured some of my less Perl inclined readers might appreciate having the switch from array to single number pointed out.

After we have the length of the response array everything else is easy. We use rand $numberOfResponses to generate a fraction somewhere between 0 and the maximum number of responses, which then gets automatically rounded down when we use them as an array index.

Updating Our Tests And User Interface

Thanks to Perl’s flexible function calls it turns out we don’t have to do anything at all to get our tests working with the new generateResponse code.

When we call generateResponse with only one argument the function can still reference the second argument with $_[1], it will just be filled with the “undefined” value. And since “undefined” is the same as “false” as far as if statements are concerned not including a $beRandom argument is the same as announcing we don’t want any randomness.

Updating the “chat.pl” user interface to include randomness is almost as easy. All we have to do is pass some sort of “true” value as the second argument to generateResponse. I like to use the value “1”, like this:

my $response = DELPHI::generateResponse($userInput,1);

Automatic And Manual Testing

With our code complete the first thing we should do is run our automatic tests to make sure that the non-random version of our test system still works like we want it to:

Passed 14 out of 14 tests

All Tests Passed!

Good. We didn’t break anything important while adding our new features. Now to boot up chat.pl and see if the random response system is working properly:

DELPHI: The oracle greets you. What questions are on your mind?

>Is DELPHI random now?

DELPHI: Fate indicates that DELPHI is random now

>Is DELPHI complete?

DELPHI: The stars are clear: DELPHI is not complete

>Why isn’t DELPHI complete?

DELPHI: For important cosmic reasons

>Goodbye

DELPHI: Come back anytime

Looks good. That’s two “Is X Y?” questions in a row that both got different responses. For more thorough testing just keep hitting DELPHI again and again with identical input and make sure that DELPHI switches between the various responses.

Conclusion

I’m pretty sure that’s the last bit of Perl in this Let’s Program. We can match input, generate responses and DELPHI works fine with both our test program and our user interface.

Future improvements to DELPHI will probably have much more to do with writing new rules and responses than coming up with new algorithms. In fact, my next post is going to focus entirely on the art of writing computer responses that will convince users your program is almost human.

* That’s all of you.

Let’s Program A Chatbot 13: What’s Mine Is Yours

Posted on November 26, 2013 by Scott

The Last Test Case (For Now…)

We’re down to our final test case. Are you excited? I’m excited!

Test Case 4 Failed!!!

Input: Do my readers enjoy this blog?

Output: Fate indicates that my readers enjoy this blog

Expected: Fate indicates that your readers enjoy this blog

Hey, that’s just a “do” rule. We already solved that problem last time. What’s going on here?

Oh, wait. The problem isn’t the “do”. The problem is that the questions mentioned “my readers” and DELPHI was supposed to be smart enough to switch the answer around to “your readers”. But DELPHI didn’t do that. We should fix that.

1st And 2nd Person Made Easy

The idea of first versus second person is way too complex for a simple pattern matching chatbot like DELPHI. But the idea of replacing word A with word B is simple enough. And it turns out that replacing first person words with second person words, and vice-versa, is good enough for almost every question that DELPHI is going to run into.

But be careful! When trying to swap A to B at the same time you are swapping B to A it is very possible to accidentally end up with all A. What do I mean? Consider this example:

My dog is bigger than your dog.

We switch the first person words to second person words:

Your dog is bigger than your dog.

Then we switch the second person words to first person:

My dog is bigger than my dog.

I’m sure you can see the problem.

The other big issue to look out for is accidentally matching words we don’t want to. Allow me to demonstrate:

You are young

We want to change that to:

I am young

But if all we do is blindly swap “I” for “you” we can easily end up with this:

I am Ing

For an even worse example consider this one:

I think pink is nifty.

You thyounk pyounk yous nyoufty.

Solving The Problems

Switching “you” to “I” while avoiding chaning “young” to “Ing” is pretty simple with regular expressions. All we have to do is use the “word boundary” symbol \b. Like so:

\byou\b

This will automatically skip over any instances of “you” that are directly attached to other letters or symbols.

Making sure that we don’t accidentally switch words from first to second person and then back from second to first will be a little more tricky. There are several possible solutions, some involving some cool regex and Perl tricks, but for now I’m just going to use to something very straightforward.

Basically I’m going to replace every first and second person word with a special placeholder value that I’m relatively certain won’t show up in normal DELPHI conversations. Then I will change all the placeholder values to their final. Here is how this will work with the above example:

My dog is bigger than your dog.

We switch the first person words to placeholders

DELPHIyour dog is bigger than your dog.

Then we switch the second person words to placeholders. Because we used a the placeholder “DELPHIyour” instead of plain “your” we don’t accidentally switch the first word back to “my”.

DELPHIyour dog is bigger than DELPHImy dog

Then we replace the placeholders

Your dog is bigger than my dog.

Here It Is In Code

I like foreach loops, so I’m going to implement this as two arrays and two foreach loops. The first array will contain regular expressions for finding first and second person words along with the place holders we want to replace them with. The second will contain regular expressions for finding placeholders and replacing them with the proper first and second phrases.

To implement this I just drop these variables and this function into DELPHI.pm right after generateResponse. The only new coding trick to look for is the ‘i’ modifier on the end of some of the rgeular expressions. This is the “case insensitive” switch and makes sure that DELPHI can match the words we want whether they are capitalized or not*.

#Dictionaries used to help the switchFirstAndSecondPerson function do its job
my @wordsToPlaceholders;

$wordsToPlaceholders[0][0]=qr/\bI\b/i;
$wordsToPlaceholders[0][1]='DELPHIyou';

$wordsToPlaceholders[1][0]=qr/\bme\b/i;
$wordsToPlaceholders[1][1]='DELPHIyou';

$wordsToPlaceholders[2][0]=qr/\bmine\b/i;
$wordsToPlaceholders[2][1]='DELPHIyours';

$wordsToPlaceholders[3][0]=qr/\bmy\b/i;
$wordsToPlaceholders[3][1]='DELPHIyour';

$wordsToPlaceholders[4][0]=qr/\byou\b/i;
$wordsToPlaceholders[4][1]='DELPHIi';

$wordsToPlaceholders[5][0]=qr/\byour\b/i;
$wordsToPlaceholders[5][1]='DELPHImine';

my @placeholdersToWords;

$placeholdersToWords[0][0]=qr/DELPHIyou/;
$placeholdersToWords[0][1]='you';

$placeholdersToWords[1][0]=qr/DELPHIyour/;
$placeholdersToWords[1][1]='your';

$placeholdersToWords[2][0]=qr/DELPHIyours/;
$placeholdersToWords[2][1]='yours';

$placeholdersToWords[3][0]=qr/DELPHIi/;
$placeholdersToWords[3][1]='I';

$placeholdersToWords[4][0]=qr/DELPHImine/;
$placeholdersToWords[4][1]='mine';

$placeholdersToWords[5][0]=qr/DELPHImy/;
$placeholdersToWords[5][1]='my';

sub switchFirstAndSecondPerson{
    my $input =$_[0];

    foreach my $wordToPlaceholder (@wordsToPlaceholders){
        $input =~ s/$wordToPlaceholder->[0]/$wordToPlaceholder->[1]/g;
    }

    foreach my $placeholderToWord (@placeholdersToWords){
        $input =~ s/$placeholderToWord->[0]/$placeholderToWord->[1]/g;
    }

    return $input;
}

Using The New Function In Generate Response

With that out of the way all that is left is to figure out where inside of generateResponse we should be calling this function. My first thought was to just stick onto the end of the function by finding the original return statement:

return $response;

And replacing it with this:

return switchFirstAndSecondPerson($response);

Now this is where test driven development comes in handy because that simple change did indeed pass test case 4… but it also caused messes like this:

Test Case 0 Failed!!!

Input: Will this test pass?

Output: you predict that this test will pass

Expected: I predict that this test will pass

…

Test Case 8 Failed!!!

Input: Pumpkin mice word salad

Output: you don’t want to talk about that. Please ask you a question

Expected: I don’t want to talk about that. Please ask me a question

We’ve accidentally made it impossible for DELPHI to talk in first person, which wasn’t what we wanted at all. We only wanted to change first and second words from the user’s input fragments, not from our carefully handwritten DELPHI responses. Which is a pretty good hint that we should have called firstToSecondPerson on the users input BEFORE we tried to parse it and generate a response, not after. Maybe right at the beginning of the function:

sub generateResponse{
    my $userInput = $_[0];
    $userInput = switchFirstAndSecondPerson($userInput);

    foreach my $chatPattern (@chatPatterns){

        if(my @UIF = ($userInput =~ $chatPattern->[0])){
            my $response = $chatPattern->[1];
            for(my $i=0; $i<@UIF; $i++){
                my $find = "UIF$i";
                my $replace = $UIF[$i];
                $response =~ s/$find/$replace/g;
            }
            return $response;
        }
    }
    return "Base Case Failure Error!";
}

The Moment Of Truth

Did we do it? Did we resolve our final use case?

Drum roll please…………

Test Case 0 Passed

Test Case 1 Passed

Test Case 2 Passed

Test Case 3 Passed

Test Case 4 Passed

Test Case 5 Passed

Test Case 6 Passed

Test Case 7 Passed

Test Case 8 Passed

Test Case 9 Passed

Test Case 10 Passed

Test Case 11 Passed

Test Case 12 Passed

Test Case 13 Passed

——————–

Passed 14 out of 14 tests

All Tests Passed!

WHOOO! GO US!

Note To Exceptionally Clever Readers

All my readers are clever, but some of you are exceptionally clever. And you may have noticed that switchFirstAndSecondPerson always returns lowercase words even when the original word was capitalized or at the beginning of the sentence. This isn’t a huge problem, but if you’re a perfectionist it might be bugging you to accidentally change “I care about grammar” to “you care about grammar” instead if “You care about grammar”.

One easy solution would be to update DELPHI to capitalize it’s entire output. People are used to computer programs SPEAKING IN ALL CAPS and it saves us the effort of having to actually teach DELPHI anything about proper capitalization.

If you don’t like the caps lock look you could instead update DELPHI to always make sure output starts with a capital. More often than not this is all it takes to make sentence look like real English.

Or you can do just do what I do and ignore the problem. I’m not going to worry too much about the occasional lowercase “you” or “my” unless users start complaining. And since this program isn’t intended for any real users that’s not likely to ever happen. Customer satisfaction is easy when you have no customers!

Conclusion

That’s it! We’ve passed all of our primary use cases. DELPHI is done.

Or is it? If you can remember all the way back to the original design document one thing we wanted out of DELPHI was the ability to generate random responses to questions. DELPHI currently just guesses “yes” to all questions which is both useless and boring. So while we hit a very important benchmark today we’re still not quite done.

* You know what else case insensitive regular expressions would be good for? Making DELPHI more accepting of user input that isn’t properly capitalized. Expect this to happen in a future blog post.

Let’s Program A Chatbot 12: When The Answer Key Is Wrong

Posted on November 23, 2013 by Scott

Unrealistic Expectations

Sometimes you get halfway through a project only to realize you don’t have the time or money to do what you originally planned to do*. When that happens you have no choice but to rethink your plans, either lowering your expectations or setting a new deadline. Admittedly both approaches generally involve getting frowned at by both management and your customers but sometimes you really have no choice. Even the best of developers have limits.

Why am I bringing this up? You’ll understand in a minute, but I will tell you that it involves these still unresolved use cases:

Test Case 2 Failed!!!

Input: Does this program work?

Output: I’m sorry, could you try rewording that?

Expected: Fate indicates that this program works

Test Case 3 Failed!!!

Input: Do computers compute?

Output: I’m sorry, could you try rewording that?

Expected: Fate indicates that computers compute

At first this doesn’t look so bad. The use cases are “Do X Y?” and “Does X Y?” and all DELPHI has to do is respond back “Yes X Y”. Hardly seems like a challenge. We’ll just slip this new rule into our list after the “or” rule and right before the “is” rule.

push(@chatPatterns,
   [qr/\A(?:Do|Does) (.+)\?\z/,
      "Fate indicates that UIF0"]);

Very simple. We look for any question that starts with some form of “Do” (notice the non-capture ?: symbol) and then we just replace that one question word with our “Fate indicates that” prediction. Is that really all it took?

Test Case 2 Failed!!!

Input: Does this program work?

Output: Fate indicates that this program work

Expected: Fate indicates that this program works

Test Case 3 Passed

A success and a failure is still an overall failure. So now we need to find out what went wrong with Test Case 2 that didn’t go wrong with test Case 3. If you look closely at the expected vs actual output the only issue is verb agreement. It should be “program works”, with an ‘s’, but all we got was the original “program work” from the question.

This problem really only shows up in the third person where the question is phrased as “Does X VERB” and the answer needs to be in form “X VERBs”. It’s really a pretty simple grammar rule. At least, it’s simple for a human. DELPHI is going to need a lot of help.

Hmmm… maybe we can solve this by just slipping an ‘s’ onto the end of our response. Of course, since this only applies to third person questions we’ll have to split the original rule into two rules. Notice that only the “does” version glues a final s onto the end of the User Input Fragment from the original input:

push(@chatPatterns,
   [qr/\ADo (.+)\?\z/,
      "Fate indicates that UIF0"]);

push(@chatPatterns,
   [qr/\ADoes (.+)\?\z/,
      "Fate indicates that UIF0s"]);

Test Case 2 Passed

I’m Still Not Sure This Is Really Working

Just gluing an ‘s’ to the end of the input doesn’t seem very sophisticated. Sure, it passed our test case but I’m not sure it will really work in all scenarios. So how about we write a new test case just to make extra sure we really solved our problem?

$testCases[13][0] = "Does adding an s work well?";
$testCases[13][1] = "Fate indicates that adding an s works well";

Nope!

Test Case 13 Failed!!!

Input: Does adding an s work well?

Output: Fate indicates that adding an s work wells

Expected: Fate indicates that adding an s works well

Adding an ‘s’ to the end of the sentence isn’t enough because what we truly want is an ‘s’ on the end of the verb and there is no guarantee that the verb will be the last word in the sentence. So to fix this problem we are going to need to either:

Develop a complex system for identifying the verb in an arbitrary sentence
Decide that we don’t care about adding ‘s’s to verbs

I’m going to go with option number 2 and come up with a new definition of what is considered a “correct” answer to a “does” question.

The New Test Case

There is an easy way around having to reformat our verbs and that is by including the word “does” inside the response. For instance, these two sentences basically mean the same thing:

This sentence looks equal to the other sentence

This sentence does look equal to the other sentence

This means that we can change the response to “Does X Y?” from “Yes, X Ys” to the much simpler “X does Y”. Now we are dealing with the exact same problem we already solved for “X is Y” and “X will Y”.

Here are our updated test cases:

$testCases[2][0] = "Does this program work?";
$testCases[2][1] = "Fate indicates that this program does work";

$testCases[13][0] = "Does this approach work better?";
$testCases[13][1] = "Fate indicates that this approach does work better";

And here is our updated “does” rule (the “do” rule can stay the same):

push(@chatPatterns,
   [qr/\ADoes ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/,
      "Fate indicates that UIF0 does UIF1"]);

And, finally, here are the results

Passed 13 out of 14 tests

Test Failure!!!

Did We Learn Anything Useful Today?

The moral of today’s story is that sometimes a test case that is really hard to solve represents a problem with your expectations as much as your program. If you’re on a tight budget or schedule** sometimes it makes sense to stop and ask yourself “Can we downgrade this requirement to something simpler? Can we delay this requirement until a later release?”

After all, good software today and the promise of great software tomorrow is better than insisting on great software today and never getting it.

Although sometimes you can manage to deliver great software today and that’s even better. Reach for the stars, bold readers. I have faith in your skills!

Conclusion

Did you notice that the success rate on our last testing run was 13 out of 14? That means we’re almost done! At least, we’re almost done with the first test version of the code. I’m sure the instant we ask a human tester to talk to DELPHI we’re going to find all sorts of new test cases that we need to include.

But future test cases are a problem for the future. For now we’re only one test case away from a significant milestone in our project. So join me next time as I do my best to get the DELPHI test suite to finally announce “All Tests Passed!”

* Even worse, sometimes you’ll find out that what you want to do is mathematically impossible. This is generally a bad thing, especially if you’ve already spent a lot of money on the project.

** Or if you’re writing a piece of demo software for your blog and don’t feel like spending more than a few dozen hours on what is essentially a useless toy program

Let’s Program A Chatbot 11: Bad Adjectives

Posted on November 19, 2013 by Scott

Not As Easy As It Looked

Eeny meeny miny moe, which test case do I want to show?

Test Case 0 Failed!!!

Input: Will this test pass?

Output: I’m sorry, could you try rewording that?

Expected: I predict that this test will pass

This doesn’t look so bad. We already wrote a rule for “Is X Y?” so writing a rule for “Will X Y?” should be as easy as copy pasting and switching a few words. Behold!

push(@chatPatterns,
   [qr/\AWill ([a-zA-Z]+) (.+)\?\z/,
      "I predict that UIF0 will UIF1"]);

I’ll just drop that into the rules list right after the “Is X Y?” rule and we should be good to go.

Test Case 0 Failed!!!

Input: Will this test pass?

Output: I predict that this will test pass

Expected: I predict that this test will pass

Uh oh. That didn’t quite work. DELPHI did manage to figure out that test 0 was a “Will X Y?” style question but when generating the answer it put the “will” in the wrong place. Can you figure out why?

[Please use this break in the blog’s flow to consider why this happened.]

The problem here has to do with how we defined the rule. We’ve been calling it “Will X Y?” but the rule is actually more like “Will Noun Verb?” or “Will Noun-Phrase Verb-Phrase?”.

Our current dumb rule assumes that the noun will always be the first word after the word “Will” and that everything else will be part of the verb phrase. This works out great for sentences like “Will Batman catch the villain?” but completely falls apart when you start adding adjectives to the noun and get things like “Will the police catch the villain?”

So what we really need is a “Will” rule that is smart enough to group common adjectives with the noun and treat them all like one big super-noun. Here is a quick first pass (WARNING: WEIRD REGULAR EXPRESSION AHEAD):

/\AWill ((?:(?:this|the|that|a|an) )*[a-zA-Z]+) (.+)\?\z/

Don’t panic just yet, this rule is actually a lot simpler than it looks. But first you need to understand what all those “?:” symbols are doing. Hopefully you remember that parenthesis create capture groups that group patterns together and then store their matches for future use. But sometimes you want to group patterns together without storing them for later. You can accomplish this by starting your capture group with the special symbols “?:”, which then turns of the capturing and lets you use the parenthesis as a simple grouping tool.

This is important for our “Will” rule because we want to capture the entire noun-phrase and the entire verb-phrase but we don’t want to capture any of the individual parts of those phrases. For example, we have improved our noun-phrase by adding in two groups of nested parenthesis for handling common article adjectives. The inner parenthesis match common adjectives and the outer parenthesis make sure there is a space following each adjective. We mark both these rules as “?:” noncapturing because while we certainly do want to match nouns that start with a series of adjectives we only want to capture those adjectives as part of the noun and not on their own.

What would happen without those noncapturing symbols? Well, the first big parenthesis set would capture the entire noun-phrase and substitute it into the output just like we want. But the second capture group wouldn’t be the verb-phrase like we originally wanted. Instead the second capture group would be the inner parenthesis matching the articles leading to all sorts of problems. See for yourself:

Input: Will this test pass?

Output: I predict that this test will this

Expected: I predict that this test will pass

See what I mean? We successfully grabbed “this test” and put it into the answer as a noun-phrase but we then grabbed “this ” as our second capture group while the verb-phrase “pass” got pushed into a later capture group slot. Not what we wanted at all.

Instead we’ll just tell the inner parenthesis not to capture. Now the noun-phrase always goes in slot one and the verb-phrase always goes in slot two and everything works wonderfully.

Test Case 0 Passed

…

Passed 7 out of 11 tests

Wait A Minute, Isn’t This A Problem For “Is” Rules Too?

Clever readers might be asking themselves “If adjectives broke our simple “Will X Y?” rule, then won’t they break our old “Is X Y?” rule too?” Well good job for noticing that clever readers, because that’s the exact problem we see in our next test case:

Test Case 1 Failed!!!

Input: Is the sky blue?

Output: Fate indicates that the is sky blue

Expected: Fate indicates that the sky is blue

Fortunately we can fix it the exact same way:

/\AIs ((?:(?:this|the|that|a|an) )*[a-zA-Z]+) (.+)\?\z/

Test Case 1 Passed

…

Passed 8 out of 11 tests

You Shouldn’t Copy Paste Code

There is one little problem with this approach to adjectives: I’m hard coding a big list of words and then copy pasting it into multiple functions. This will be a real pain if we ever have to update the list in the future. For instance, if we wanted to add possessive adjectives into the list (my, your, his, her, their) we would have to rewrite two different rules. And if we ever decide a third rule needs access to the list we’ll have to copy paste the whole thing.

Much better to turn that portion of the rules into a separate variable that can be included in multiple functions. Which in Perl you can do like this:

#put this before the code starts to build the pattern and response array
my $commonAdjectives=qr/(?:this|the|that|a|an)/;
my $noncaptureAdjectiveChain=qr/(?:$commonAdjectives )*/;

And now we can just update the rules to use these handy variables anywhere we want to match an arbitrarily long chain of adjectives with a single space after every word.

/\AIs ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/

/\AWill ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/

Those of you following along in a language other than Perl will have to figure out on your own how and if your language handles inserting variables into a regular expression. If all else fails you can always just go back to the copy pasting thing.

Let’s Test By Adding Some More Adjectives

Now that we can add new adjectives to two different rules by just updating a single variable we should write a few new tests and make sure it works. How about these?

$testCases[11][0] = "Will his code compile?"
$testCases[11][1] = "I predict that his code will compile";

$testCases[12][0] = "Is this big blue box actually a time machine?";
$testCases[12][1] = "Fate indicates that this big blue box is actually a time machine";

The first test is a straightforward test to make sure we can add possessives to the adjective list. The second test is a little bit more complex, requiring us to not only add two new adjectives to our list (big and blue) but also testing to make sure the code can chain multiple adjectives together into a row.

Of course, right now they both fail. The first test doesn’t recognize “his” as an adjective so it assumes it is a noun and puts the “will” in the wrong place. The second test recognizes “this” as an adjective but not “big” and does the same thing.

Test Case 11 Failed!!!

Input: Will his code compile?

Output: I predict that his will code compile

Expected: I predict that his code will compile

Test Case 12 Failed!!!

Input: Is this big blue box actually a time machine?

Output: Fate indicates that this big is blue box actually a time machine

Expected: Fate indicates that this big blue box is actually a time machine

But after updating our list of adjectives:

my $commonAdjectives=qr/(?:this|the|that|a|an|his|her|my|your|their|big|blue)/;

Test Case 11 Passed

Test Case 12 Passed

——————–

Passed 10 out of 13 tests

Test Failure!!!

How Many Adjectives Do We Need?

DELPHI now knows how to handle 12 different adjectives. And while that is pretty nifty it’s worth pointing out that the English language has a lot more than just 12 adjectives. In fact, English is one of the world’s largest languages* and easily has several tens of thousands of adjectives. Even worse, English allows you to “adjectivify”** other words to creating new adjectives on the spot, like so:

“These new computery phones have a real future-licious feel to them but with the default battery they’re actually kind of brickish.”

My spell checker is convinced that sample sentence shouldn’t exist but even so you probably understood what I meant. Which just goes to show the huge gap in how good humans are at flexible language processing and how bad computers still are.

But what does this mean for DELPHI? Do we need to generate a giant adjective list? Do we need to teach it how to handle nouns and verbs that have been modified to act like adjectives? Do we need to spend twelve years earning multiple PhDs in computer science and linguistics in order to build a more flexible generateResponse function?

Well… no. Remember, our goal isn’t to create a program that can fully understand the human language. We just want a bot that can answer simple questions in an amusing way like some sort of high-tech magic eight ball. As long as DELPHI can handle simple input and gracefully reject complex input it should feel plenty intelligent to the casual user.

Furthermore, we can actually depend on users to play nice with DELPHI. Most people, after being scolded by DELPHI once or twice for trying to be clever will start to automatically pick up on what sorts of inputs do and don’t work. The fact that DELPHI can’t handle obscure adjectives will eventually teach users to stick to straightforward questions.

All things considered we can probably “solve” the adjective problem by teaching DELPHI the hundred most common adjectives in the English language and then hoping that users never bother going beyond that. Later on we can have some test users talk to DELPHI and use their experiences to decide whether or not we need to add more adjectives or build a more complex system.

Conclusion

Today we caught a glimpse of how simple pattern matching chatbots can completely fall apart when confronted with real English. But we also saw a quick way to band-aid over the worst of these problems and we have hope that our bot can be written in such a way that users never noticing that DELPHI is too dumb to understand that “house” and “that big house over there” are actually the same thing.

Next time, more test cases and more examples of English language features that are annoying to program around.

* As the popular saying goes: English has pursued other languages down alleyways to beat them unconscious and rifle their pockets for new vocabulary.

** Look, I just verbed a noun!

Let’s Program A Chatbot 10: Chatting With The Bot

Posted on November 15, 2013 by Scott

There Is More To Life Than Tests

So far we’ve focused entirely on running our chatbot through automated tests. But eventually we’ll want some way for actual users to talk to DELPHI too. And since I just recently finished separating the chat code from the testing code I figure now is a great time to also introduce some user focused code.

Getting DELPHI to talk to a human is pretty easy. The generateResponse function already knows how to… well… generate responses to input. All that’s left is figuring out how to feed it human input instead of test input. Perl let’s us do this in under ten lines (which I put in a file name “chat.pl”):

#! /usr/bin/perl -w

use strict;

require 'DELPHI.pm';

while(<>){
   chomp;
   print "DELPHI: ", DELPHI::generateResponse($_), "\n";
}

You Promised No Tricky Perl!

Oh, I did promise that. So I guess the only honorable thing to do is to write a new version of “chat.pl” that doesn’t use quite so many shortcuts.

#! /usr/bin/perl -w

use strict;

require 'DELPHI.pm';

while( my $userInput = <STDIN> ){
   chomp($userInput); #Remove trailing newline character with chomp
   my $response = DELPHI::generateResponse($userInput);
   print "DELPHI: ", $response, "\n";
}

There, that’s better. Everything is much easier to understand now. We have a simple while loop that grabs lines of input from standard input, letting the user type questions for DELPHI. And then since DELPHI doesn’t like newlines we use the handy Perl function chomp to remove them from the input. Now that we have a user input string with no nasty newline at the end we pass it DELPHI::generateResponse and finally print out DELPHI’s reply for the user to read.

This Is A Horrible User Interface

If you tried to run “chat.pl” as is you probably noticed that it’s not very user friendly. When you first start the program it just sits on the command line and hopes that the user will eventually figure out he’s supposed to type something. And the only way to break out of the loop is to send the end of file character (ctrl-z or ctrl-d depending on your OS). We can do better than this!

#! /usr/bin/perl -w

use strict;

require 'DELPHI.pm';

print "DELPHI: The oracle greets you. What questions are on your mind?\n";
print '>';

while(my $userInput = <STDIN>){
    chomp($userInput); #Remove trailing newline character with chomp
    
    if(lc($userInput) eq 'goodbye' || lc($userInput) eq 'quit'){
        last; #Break out of while loop
    }
    
    my $response = DELPHI::generateResponse($userInput);
    print "DELPHI: ", $response, "\n";
    print '>';
}

print "DELPHI: Come back anytime\n";

Mostly cosmetic changes here. We start the program with some quick text that lets the user know they are dealing with a question answering bot and we end with a friendly farewell. We also put a helpful ‘>’ at the start of every input line so they user knows when they’re supposed to be typing.

I’ve also added the ability to exit the loop by typing “goodbye” or “quit”. To help with this I use the function lc, which returns an all lowercase version of whatever the user types. This way we don’t have to worry about users that capitalize “Goodbye” or people who try to “EXIT” with their caps lock key on. We then break the loop by using the last command which immediately exits the loop.

DELPHI’s First Words

Now DELPHI is currently only passing 6 out of 11 test cases so it’s really not ready for genuine human conversation. But we can still fire up our chat program and at see what happens.

DELPHI: The oracle greets you. What questions are on your mind?

>Is this working?

DELPHI: Fate indicates that this is working

>That’s nice.

DELPHI: I don’t want to talk about that. Please ask me a question

>Goodbye

DELPHI: Come back anytime

Not a wonderful conversation, but it does prove that DELPHI works just as well interactively as it does during tests. Now we can get back to writing more rules and responses in hope of getting DELPHI to perform better the next time we decide to talk to it directly.

Is DELPHI Too Strict?

If you’ve been playing with “chat.pl” you might have noticed that DELPHI is currently really stuck up about how you do or don’t capitalize words. For example:

DELPHI: The oracle greets you. What questions are on your mind?

>is this working?

DELPHI: I’m sorry, could you try rewording that?

>Is this working?

DELPHI: Fate indicates that this is working

I don’t plan to do anything about this problem right now. I just wanted to point it out to everyone so that you won’t think I overlooked this fairly big usability flaw. I’ll (probably) eventually fix this be rewriting DELPHI to care less about case, but for now I’m just going to go back to writing rules.