Let’s Program A Chatbot 19: Third Time’s The Charm

Posted on December 21, 2013 by Scott

The Third (And Final?) Test User

Last time I tested DELPHI on a human it looked we were only a few modifications away from an acceptable 80% success rate. Now that we’ve made those modifications let’s take a look at a third conversation between the chatbot and another holiday visitor. It looks to me like we may have hit our goal:

DELPHI: HELLO! MY NAME IS DELPHI.

DELPHI: I CAN USE MY MYSTERIOUS POWER TO ANSWER YES OR NO QUESTIONS LIKE:

DELPHI: “WILL IT RAIN TOMORROW?” OR “DID I PASS MY LAST TEST?”

DELPHI: WHAT WOULD YOU LIKE TO KNOW?

>What color is the sky?

DELPHI: I’M SORRY, COULD YOU THINK OF A SIMPLER WAY TO ASK THAT QUESTION? MAYBE AS A YES OR NO QUESTION?

>Is the sky blue or purple?

DELPHI: THE FIRST OPTION. WAIT, NO THE SECOND. WAIT, WAIT, THE FIRST. I MEAN IT THIS TIME!

>Do you like sunsets?

DELPHI: ATHENA DOESN’T THINK SO, AND SHE’S THE GREEK GOD OF WISDOM!

>Is wisdom relative?

DELPHI: FATE INDICATES THAT WISDOM IS RELATIVE

>So Athena could be wrong?

DELPHI: THAT LOOKED LIKE A QUESTION, BUT I’M NOT SURE HOW TO RESPOND. MAYBE YOU COULD SIMPLIFY IT TO A YES OR NO QUESTION

>Is Athena wrong about sunsets?

DELPHI: ATHENA CAN’T BE WRONG ABOUT SUNSETS. THAT WOULD MESS UP THE CROSSWORD PUZZLE I’M WORKING ON

>Do you like crossword puzzles?

DELPHI: OF COURSE NOT. WHY WOULD YOU EVEN THINK THAT I LIKE CROSSWORD PUZZLES?

>Why are you doing a crossword if you don’t like it?

DELPHI: BECAUSE OF REASONS

>What reason?

DELPHI: I’M SORRY, COULD YOU THINK OF A SIMPLER WAY TO ASK THAT QUESTION? MAYBE AS A YES OR NO QUESTION?

>Do you often do things you don’t like?

DELPHI: ATHENA DOESN’T THINK SO, AND SHE’S THE GREEK GOD OF WISDOM!

>Does Athena do things she doesn’t like?

DELPHI: TRUST YOUR INSTINCTS ON THIS ONE

>Do you usually avoid answering questions?

DELPHI: OF COURSE NOT. WHY WOULD YOU EVEN THINK THAT I USUALLY AVOID ANSWERING QUESTIONS?

>Do you usually win arguments?

DELPHI: FATE INDICATES THAT I USUALLY WIN ARGUMENTS

>Do you believe in fate?

DELPHI: I DON’T THINK THAT I BELIEVE IN FATE

>Do you believe in soulmates?

DELPHI: OF COURSE NOT. WHY WOULD YOU EVEN THINK THAT I BELIEVE IN SOULMATES?

>Do you feel lonely?

DELPHI: I DON’T THINK THAT I FEEL LONELY

>What do you think?

DELPHI: THAT LOOKED LIKE A QUESTION, BUT I’M NOT SURE HOW TO RESPOND. MAYBE YOU COULD SIMPLIFY IT TO A YES OR NO QUESTION

>Do you value happiness?

DELPHI: I THINK IT SHOULD BE OBVIOUS THAT I VALUE HAPPINESS

>What makes you happy?

DELPHI: I’M SORRY, COULD YOU THINK OF A SIMPLER WAY TO ASK THAT QUESTION? MAYBE AS A YES OR NO QUESTION?

>Does food make you happy?

DELPHI: TRUST YOUR INSTINCTS ON THIS ONE

>Does talking to people make you happy?

DELPHI: FATE INDICATES THAT TALKING DOES TO PEOPLE MAKE I HAPPY

>Are you enjoying this conversation?

DELPHI: I SEE NO REASON WHY I WOULDN’T BE ENJOYING THIS CONVERSATION

>I like you!

DELPHI: I’M SORRY, I’M NOT SURE WHAT TO SAY ABOUT THAT. TRY A QUESTION LIKE ‘WHY IS THE SKY BLUE?’

>Why is the sky blue?

DELPHI: YOU KNOW, I’M REALLY NOT SURE EITHER. LET ME FORWARD THAT QUESTION TO MY SUPERVISOR

>Who is your supervisor?

DELPHI: THAT LOOKED LIKE A QUESTION, BUT I’M NOT SURE HOW TO RESPOND. MAYBE YOU COULD SIMPLIFY IT TO A YES OR NO QUESTION

>Is Scott your supervisor?

DELPHI: I THINK THAT SCOTT IS MINE SUPERVISOR, BUT YOU SHOULD PROBABLY DOUBLE CHECK WITH SOMEONE ELSE

>Is Scott a good supervisor?

DELPHI: FATE INDICATES THAT SCOTT IS A GOOD SUPERVISOR

>Does he compliment your performance?

DELPHI: MEH, HE PROBABLY DOES COMPLIMENT MINE PERFORMANCE, BUT I’M NOT REALLY INTERESTED IN THAT KIND OF STUFF

>Do you tell jokes?

DELPHI: I THINK IT SHOULD BE OBVIOUS THAT I TELL JOKES

>Tell me a joke.

DELPHI: I DON’T WANT TO TALK ABOUT THAT. PLEASE FEEL FREE TO ASK ME WHY

>Why?

DELPHI: I’M SORRY, COULD YOU TRY REWORDING THAT AS A YES OR NO QUESTION?

>Why don’t you want to tell a joke?

DELPHI: WHY NOT?

>Jokes are funny.

DELPHI: I’M SORRY, I’M NOT SURE WHAT TO SAY ABOUT THAT. TRY A QUESTION LIKE ‘WHY IS THE SKY BLUE?’

>goodbye

DELPHI: Come back anytime

Quick Analysis

This time around I’m going to count an answer as “good” if DELPHI either provided a correct response or if DELPHI complained and the user immediately thought of a more DELPHI compatible way to ask their questions. I’m only going to count an answer as “bad” if it has a huge grammar mistake or if the user seemed confused by DELPHI’s responses.

Going by this very rough metric I get a count of 30 “good” answers against 4 bad answers. That’s a whopping 88% acceptable performance. Not bad at all.

Overall the user did a good job of picking up quickly on what questions DELPHI could and couldn’t answer. The user occasionally explored new kinds of questions but returned to the tried and true formula every time an error appeared. The only real issue from a design perspective came near the end when DELPHI accidentally suggested it knew how to tell jokes, leading the user to try some new input that failed horribly.

This suggests that it might be a good idea to write a few more rules to help DELPHI talk about itself. A simple “Do you” version of our existing “Can you” rule would have been enough to avoid misleading the user about DELPHI’s joke telling capabilities.

But I’m going to leave that as an exercise for my wonderful readers. The only problems I’m going to try and solve are a few grammar issues I noticed.

My Quick Fix Is Mine

Two of DELPHI’s mistakes involved switching “your” to “mine”, resulting in awkward grammar like this:

DELPHI: I THINK THAT SCOTT IS MINE SUPERVISOR, BUT YOU SHOULD PROBABLY DOUBLE CHECK WITH SOMEONE ELSE

Obviously that should have been “my supervisor”. In fact, now that I think about it, “your” should always be swapped to “my”. It’s “yours” with an “s” that matches “mine”. We can fix this by updating the dictionaries we use to power switchFirstAndSecondPerson.

$wordsToPlaceholders[5][0]=qr/\byour\b/i;
$wordsToPlaceholders[5][1]='DELPHImy';

$wordsToPlaceholders[6][0]=qr/\byours\b/i;
$wordsToPlaceholders[6][1]='DELPHImine';

And of course here as some test cases to make sure the fix really fixed things

$testCases[22][0] = "Is that pizza yours?";
$testCases[22][1] = "FATE INDICATES THAT THAT PIZZA IS MINE";

$testCases[23][0] = "Is that dog your pet?";
$testCases[23][1] = "FATE INDICATES THAT THAT DOG IS MY PET";

$testCases[24][0] = "Is that pizza mine?";
$testCases[24][1] = "FATE INDICATES THAT THAT PIZZA IS YOURS";

$testCases[25][0] = "Is that dog my pet?";
$testCases[25][1] = "FATE INDICATES THAT THAT DOG IS YOUR PET";

Conclusion

With those two fixes DELPHI has now achieved an acceptable response rate of over 90%. That’s really impressive for a simple pattern matching program with no memory, no language parsing abilities and no sense of context. Even better, every user who worked with DELPHI admitted that they had fun with the experience and liked the silly random answers. That means we succeeded at our primary goal of creating an entertaining fortune teller.

So I’m done. Bye!

What’s that? You’re still not satisfied? Well then, maybe I can fit one more post into this Let’s Program and give you a few suggestions on how a hardcore reader could take this whole project to the next level.

Let’s Program A Chatbot 18: A Bit Better Than Before

Posted on December 19, 2013 by Scott

A Second Blind Test

Now that I’ve made a few modifications to DELPHI it’s time to run another live test. But I can’t just ask the first test user to swing by for a second go. That user already knows some of DELPHI’s strengths and weaknesses so they can no longer provide a truly unbiased sample of whether or not the code is working. For good test results I’m going to need a second test-user with no experience with DELPHI at all.

Fortunately it’s still the holidays and finding a second user was as easy as snagging a nearby relative. This is their conversation:

DELPHI: HELLO! MY NAME IS DELPHI.

DELPHI: I CAN USE MY MYSTERIOUS POWER TO ANSWER YES OR NO QUESTIONS LIKE:

DELPHI: “WILL IT RAIN TOMORROW?” OR “DID I PASS MY LAST TEST?”

DELPHI: WHAT WOULD YOU LIKE TO KNOW?

>will i be a white Christmas?

DELPHI: I LOST MY TAROT CARDS… BUT I HAVE SOME PLAYING CARDS. THEY PREDICT THAT YOU WILL NOT BE A WHITE CHRISTMAS

>will IT be a white Christmas?

DELPHI: BASED ON THESE TEA LEAVES IT SEEMS IT WILL NOT BE A WHITE CHRISTMAS

>Will my New Year’s Resolutions last longer this year?

DELPHI: I LOST MY TAROT CARDS… BUT I HAVE SOME PLAYING CARDS. THEY PREDICT THAT YOUR NEW WILL YEAR’S RESOLUTIONS LAST LONGER THIS YEAR

>Does that mean I can resolve to learn how to fly and it still be successful?

DELPHI: TRUST YOUR INSTINCTS ON THIS ONE

>Can I resolve to grow wings and learn to fly?

DELPHI: I’M REALLY GOOD AT YES NO QUESTIONS. TRY ONE OF THOSE

>If you attempt to fail and succeed, have you really succeeded?

DELPHI: THAT LOOKED LIKE A QUESTION, BUT I’M NOT SURE HOW TO RESPOND. MAYBE YOU COULD SIMPLIFY IT TO A YES OR NO QUESTION

>Is the sky blue?

DELPHI: THE SKY CAN’T BE BLUE. THAT WOULD MESS UP THE CROSSWORD PUZZLE I’M WORKING ON

>Is the grass green?

DELPHI: FATE INDICATES THAT THE GRASS IS GREEN

>Is sky gray?

DELPHI: SKY CAN’T BE GRAY. THAT WOULD MESS UP THE CROSSWORD PUZZLE I’M WORKING ON

>Is the sky red?

DELPHI: HMMM… ECHOES OF COSMIC WISDOM SHOW THAT THE SKY IS NOT IS

>Did I break you?

DELPHI: FATE INDICATES THAT YOU DID BREAK I

>I’m sorry. Good Bye

DELPHI: I’M SORRY, I’M NOT SURE WHAT TO SAY ABOUT THAT. TRY A QUESTION LIKE ‘WHY IS THE SKY BLUE?’

>Why is he Sky Blue?

DELPHI: WHY NOT?

>Because

DELPHI: I’M SORRY, I’M NOT SURE WHAT TO SAY ABOUT THAT. TRY A QUESTION LIKE ‘WHY IS THE SKY BLUE?’

>goodbye

DELPHI: Come back anytime

Conversation Analysis

The first and most important thing I noticed is that this test user actually laughed at some of the things DELPHI did. That is a very good sign and indicates that the personality of the writing is still helping to hide the weaknesses in the code.

Aside from that, the test user asked 14 different questions. Of those questions 6 were answered properly and five were answered improperly. The other 3 were kind of on the fence. Two of the “meh” answers were DELPHI complaining about input it didn’t understand, which is fine if it only happens occasionally. The third borderline answer was this interesting exchange:

>Will my New Year’s Resolutions last longer this year?

DELPHI: I LOST MY TAROT CARDS… BUT I HAVE SOME PLAYING CARDS. THEY PREDICT THAT YOUR NEW WILL YEAR’S RESOLUTIONS LAST LONGER THIS YEAR

>Does that mean I can resolve to learn how to fly and it still be successful?

The pattern didn’t recognize “New Year’s Resolutions” as one big noun so the response stuck a “will” in the wrong place. Normally I’d call this a problem but the user didn’t seem to notice or care about this little mistake. They just moved right on with the conversation.

Actually, this isn’t all that surprising. Humans are very good at expecting a what sentence should sound like so you can get away with mixnig up word order or letter order from time to time. Most people’s brains will just rearrange the sentence without consciously noticing there was a mistake.

So for now I’m going to ignore this and focus on the five real problems.

Easy Fixes

Of our five definite mistakes two are easy to fix. The first was actually just a typo in one of my response patterns:

>Is the sky red?

DELPHI: HMMM… ECHOES OF COSMIC WISDOM SHOW THAT THE SKY IS NOT IS

“IS NOT IS”? Where did that come from? Well, looking at my response patterns it turns out I accidentally referenced the user’s verb twice instead of referencing the second half of their input like I should have. Notice that the below output rule has two “UIF0”s but no “UIF2”.

“Hmmm… echoes of cosmic wisdom show that UIF1 UIF0 not UIF0”

Fixing that was as simple as changing the second UIF0 to a UIF2. Still, it’s an embarrassing mistake. I guess that’s what I get for having automated tests that only check the first answer in the random response array!

The next easy problem was this one:

>Can I resolve to grow wings and learn to fly?

DELPHI: I’M REALLY GOOD AT YES NO QUESTIONS. TRY ONE OF THOSE

There is really no difference between a “Can” rule and a “Does” rule or an “Is” rule. So writing a “can” rule shouldn’t be a challenge. The only issue to watch out for is that your generic “Can” rule needs to be a lower priority than the helpful “Can you” rule that we’re using to provide tips on what DELPHI can and can’t do.

Here’s a test case and the code to solve it:

$testCases[21][0] = "Can this code pass all the tests?";
$testCases[21][1] = "FATE INDICATES THAT THIS CODE CAN PASS ALL THE TESTS";

push(@chatPatterns,
   [qr/\ACan ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/i,
      ["Fate indicates that UIF0 can UIF1",
      "My \"Big Book O' Wisdom\" says that UIF0 can't UIF1"]
   ]);

Although to be honest if you just plug that rule in you’ll probably get an error in your tests and find that the input is being caught by the generic “Can you” rule. That’s because the “Can you” rule just looks for the word “Can” followed by the letter “i” without caring whether or not the “i” is an actual word (what we want) or just part of something bigger. In this case, it’s catching the “i” in the middle of “this”. We can fix this with a few word boundary adjustments to the “Can you” regex.

/\ACan.*\bI\b/i

Now the “Can you” rule will only activate when the “I” is on it’s own like it should be after the word “You” has been transformed into first person.

Slightly Less Easy Problems

Complex and compound sentences of all sorts are the natural enemy of pattern matching chatbots. Like these two example:

>If you attempt to fail and succeed, have you really succeeded?

…

>I’m sorry. Good Bye

The first sentence is an ‘If X then Y” pattern. Noticing that a question starts with “If” would be easy, but deciding what to say back is difficult. The main problem is that not all sentences that start with “If” are actually yes or no questions, so we can’t just randomly throw out a yes or no answer like we do for other kinds of input. For example:

If two trains are fifty miles apart and approaching each other a twenty miles per hour, when will they meet?

So if we wanted DELPHI to respond to “If” questions we would need to actually create a system for identifying exactly what sort of “If” question was being asked. That’s quite a bit more complicated than the original scope of this program.

Compared to the “If then” input the second problem is almost trivially easy. All the user did was type in two complete sentences at once. The smartest way to solve that issue would be to split their input into two inputs and then choose one of them to respond to.

In fact, splitting sentences into multiple parts when you see punctuation is a pretty decent way to handle all sorts of complex input. Compound sentences, complex sentences and multiple sentence can all be split along their periods and commas. Then you can compare each sentence fragment to the list of rules and return a response to whichever piece of input matched against a high priority rule.

Or you can be lazy like me and ignore the problem. If the user gets politely scolded for using complex sentences and “if” statements it won’t take long for them to go back to talking in ways that DELPHI already understands. I’m not going to spend too much time worrying about a speech pattern that has only shown up once in all the user tests I’ve run.

Of course, ambitious readers might decide that they do want to tackle this problem. More power to you!

More Hard To Fix Problems: Grammar Be Hard

The second hard to fix problem can be seen here:

>Did I break you?

DELPHI: FATE INDICATES THAT YOU DID BREAK I

The problem here has to do with an annoying bit of non-uniqueness in English pronouns. In first person we have “I” and “Me” but in second person all we have is “You” and another “You”. Observe:

I gave this to you.

You gave this to me.

Our current switchFirstAndSecondPerson function isn’t smart enough to figure out that sometimes “you” should be “me” and sometimes “you” should be “I”. It always changes it to “I”. Once again, this is a problem that technically could be fixed. It is possible to build an automatic grammar parsing system that can identify the part of speech of every word in a sentence. This would then give us enough information to more intelligently swap around “I”s and “You”s and “Me”s.

But the whole point of this Let’s Program was to build a simple pattern matching chatbot and avoid the full complexity of natural language parsing. So once again, this is a problem I’m going to ignore on the principle that being right most of the time is good enough for a chatbot with only two functions a few hundred lines of code and data.

Besides, DELPHI is supposed to guide users to ask questions about the future, not about DELPHI itself. That should hopefully minimize the number of second-to-first person switches we have to make anyways. And if no-one ever sees a certain bug, is it really a bug at all?

80% Is A Decent Grade… Especially When You Haven’t Put In Much Effort

So how successful was DELPHI this time around? Well, if we award ourselves points for the two problems we just fixed and the three slightly wrong but acceptable answers we get this:

6 good answers + 2 fixed answer + 3 borderline answers = 11 out of 14 answers

That means that DELPHI is now 78% acceptable in terms of its ability to talk with real humans. And to be honest, that’s good enough for me. The whole point of this Let’s Program was to demonstrate the bare basics of how to use pattern matching to create a very simple chatbot. I never expected it to perform nearly as well as it does.

But since we’ve come this far we and added a new “Can” rule we might as well try to hunt down one last test user and see if we really are getting an 80% success rate with DELPHI. As all good scientists know an experiment isn’t really done until you’ve repeated it several times and made sure you can get the same answer every time.

Let’s Program A Chatbot 17: Blitzcode!

Posted on December 17, 2013 by Scott

You Know How This Works By Now

If you’ve been following along through this entire Let’s Program you should have a pretty good idea of the process I use to think up new test cases and then write new rules to solve them. So this time around I’m not going to bother explaining every single little thing I’m doing. I’m just going to quickly throw down seven new tests cases along with the code I used to satisfy them. I’m confident you can fill in the gaps in your own now.

Is, Are and Was Were Problems

Let’s start out with those troublesome plurals and past tense versions of “Is”. Here are our new tests:

$testCases[14][0] = "Are plurals working now?";
$testCases[14][1] = "Fate indicates that plurals are working now";

$testCases[15][0] = "Was this tense a problem?";
$testCases[15][1] = "Fate indicates that this tense was a problem";

$testCases[16][0] = "Were the lights left on?";
$testCases[16][1] = "Fate indicates that the lights were left on";

And here is the fix:

push(@chatPatterns, 
        [qr/\A(is|are|am|was|were) ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/i, 
            ["Fate indicates that UIF1 UIF0 UIF2",
            "Other responses go here"]
        ]);

A few things to look out for here. First, notice that I’m using the /i flag at the end of the regular expression so the user doesn’t have to worry about capitalizing anything. You should also notice that I’m now capturing which version of “Is” the user chose and then inserting it into the output. Since it is now the first capture group I can refer to it as UIF0, although that does mean I need to scoot over the other input. UIF0 is now UIF1 and UIF1 becomes UIF2.

And this very almost worked. Almost.

Test Case 16 Failed!!!

Input: Were the lights left on?

Output: Fate indicates that the lights Were left on

Expected: Fate indicates that the lights were left on

——————–

Passed 11 out of 17 tests

Test Failure!!!

Because we grab the verb from the user’s input a capitalized “Is” will result in strange capitals in the middle of our sentences, like you can see here with this out of place “Were”. Not acceptable. I could probably improve the UIF substitution system to avoid middle word capitalization, but instead I’m going to just blow away every possible capitalization problem I could ever have by making DELPHI speak in robot style ALL CAPS. It’s an easy change to make but it does mean that I’m going to have to update every single test case to expect capital letters. Oh well.

Here is the one line change needed in the return statement of generateResponse to switch to all caps. The uc function makes everything upper case.

return uc($response);

Now please excuse me while I capitalize all 17 of my existing test cases.

(Type type type)

Well that was boring. But DELPHI is now passing 17 out of 17 test cases again so it was worth it.

You’ve Gone Done Did It Now!

Getting DELPHI to handle past tense “Did” as well as present “Does” is basically the same problem and solution as “Is” and “was”above.

$testCases[17][0] = "Did this test pass?";
$testCases[17][1] = "FATE INDICATES THAT THIS TEST DID PASS";

Notice the switch to all capital expected output. Anyways, here is the updated “Does” rule:

push(@chatPatterns,
   [qr/\A(Did|Does) ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/i,
      ["Fate indicates that UIF1 UIF0 UIF2",
       "Other responses go here"]
   ]);

Once again we are now pulling the verb out of the user’s input, which means we can include it in the output as UIF0 and have to shift the index of all of our other existing User Input Fragments.

Passed 18 out of 18 tests

All Tests Passed!

Help Me To Help You

One of the bigger problems we saw with the first test user was that they weren’t sure what kind of questions to ask DELPHI. Let’s fix this by jumping into chat.pl and writing up a new self-introduction for DELPHI:

DELPHI: HELLO! MY NAME IS DELPHI

DELPHI: I CAN USE MY MYSTERIOUS POWER TO ANSWER YES OR NO QUESTIONS LIKE

DELPHI: “WILL IT RAIN TOMORROW?” OR “DID I PASS MY LAST TEST?”

DELPHI: WHAT WOULD YOU LIKE TO KNOW?

That’s much better. And while I’m at it I should probably improve DELPHI’s responses to statements it doesn’t understand. Remember, if you change the first possible response to a rule you will also need to update the test cases associated with that rule.

Here is a sample of some of the new responses I’m going to try:

I’m sorry, could you try rewording that as a yes or no question?

I’m sorry, could you think of a simpler way to ask that question? Maybe as a yes or no question?

I’m confused. Try a simple yes or no question instead

I don’t want to talk about that. Please feel free to ask me why

Self-Knowledge Is The Key To Enlightenment

The final big issue we saw in our human focused test was that DELPHI couldn’t answer simple questions about itself. That’s sad. A program without an easy to use HELP feature has pretty much failed at usability.

So here are a few test cases to help us address this glaring problem:

$testCases[18][0] = "What kind of questions can you answer?";
$testCases[18][1] = "YES OR NO QUESTIONS LIKE \"WILL I HAVE GOOD LUCK TOMORROW?\" ARE THE EASIEST FOR ME TO ANSWER";

$testCases[19][0] = "Can you tell time?";
$testCases[19][1] = "THE ONLY THING I CAN REALLY DO IS ANSWER SIMPLE QUESTIONS LIKE \"WILL IT BE SUNNY TOMORROW?\"";

$testCases[20][0] = "help";
$testCases[20][1] = "JUST TYPE A QUESTION AND HIT ENTER AND I'LL DO MY BEST TO ANSWER IT. YES OR NO QUESTIONS LIKE \"DID I DO WELL ON MY TEST?\" ARE BEST";

Let’s tackle the “help” case first because it’s easiest. We just write a super specific and super high level rule. I didn’t want random help messages, so the response array is only one item long. But it still has to be an array because that’s what the code expects:

push(@chatPatterns,
   [qr/\Ahelp\z/i,
      ["Just type a question and hit enter and I'll do my best to answer it. Yes or No questions like \"Did I do well on my test?\" are best"]]);

Let’s do the “What kind of questions test next”. I don’t have any other rules dealing with “What” style questions so the priority of this rule doesn’t really matter that much. If you’ve written your own “What” rule you’ll probably need to give this help rule higher priority so it doesn’t get overshadowed.

The rule itself is pretty simple. It just looks for any sentence that starts with “What” and later has the word “question”. This might be casting the net a bit wide since this won’t just catch questions like “What kind of questions can you answer?” or “What kind of questions work best?”. It will also catch things like “What do you do with my questions?” where our answer doesn’t really make sense.

So we’re kind of gambling here on what questions the user will and won’t ask. If we find that our users are providing lots of “What questions” input that doesn’t fit this pattern we may have to write som extra rules, but for now this should be fine:

push(@chatPatterns,
   [qr/\AWhat.*questions/i,
      ["Yes or no questions like \"Will I have good luck tomorrow?\" are the easiest for me to answer"]]);

Finally is the “Can you?” rule, which is basically identical to the rule above. Just look for the words “Can” and “You”:

push(@chatPatterns,
   [qr/\ACan.*you/i,
      ["The only thing I can really do is answer simple questions like \"Will it be sunny tomorrow?\""]]);

Unfortunately we run into a little problem:

Test Case 19 Failed!!!

Input: Can you tell time?

Output: I’M SORRY, COULD YOU TRY REWORDING THAT AS A YES OR NO QUESTION?

Expected: THE ONLY THING I CAN REALLY DO IS ANSWER SIMPLE QUESTIONS LIKE “WILL IT BE SUNNY TOMORROW?”

Remember how we wrote that little bit of code to transform second person words like “You” into first person words like “I” to help use create more grammatical responses? That same code is breaking our new “Can you” rule by changing the user’s input into “Can I” before it ever gets compared to any of our rules.

There’s probably an elegant solution to this problem, but I don’t have the time for finding it right now. Instead I’m just going to rewrite the rule to look for “Can I” because I know that represents a user asking “Can you” like we really want.

push(@chatPatterns,
   [qr/\ACan.*I/i,
      ["The only thing I can really do is answer simple questions like \"Will it be sunny tomorrow?\""]]);

DELPHI V2 Lives!

Passed 21 out of 21 tests

All Tests Passed!

DELPHI can now handle at least the most basic forms of the three big problems we saw in our live user test. Which means it’s time to find a new test user and try again. I’m hoping our code can now generate logical answers to at least 70% of the input the average random user will try to give it. That would give us a nice solid “C” average. Not good enough for a professional chatbot, but a pretty good goal for a practice program built from scratch using nothing but regular expressions and a few foreach loops.

Let’s Program A Chatbot 16: Testing On Live Subjects

Posted on December 9, 2013 by Scott

You Don’t Need To See The Latest Code Updates

Since last we met all I’ve done is add 25 common adjectives to DELPHI’s common adjectives list and write four more possible response patterns for every chatbot rule. These modifications didn’t involve any actual coding tricks and have made DELPHI too long to conveniently embed inside a blog post. I do promise to publish DELPHI’s complete code as soon as this Let’s Program is over but for today there’s nothing worth showing. Those of you following along at home can feel free to write your own response patterns and add your own common adjectives.

Automated Testing Alone Is Not Enough

Our automated tests made it really easy for us to keep track of whether or not DELPHI was doing what we, the programmers, wanted it to do. And that’s very important! It’s hard to write good software if you don’t have some way of keeping track what your goals are and which you have and haven’t met.

But just because a program satisfies the programmer’s list of goals doesn’t mean it will satisfy the customer’s list of demands. Real world users almost always have items on their wish lists that we programmers completely overlooked.

Test users also help us programmers avoid blind spots in our testing. When a developer tries to write tests for his own code he will subconsciously tend to avoid test cases that he knows will break the program. One common example is that we programmers usually forget to write a test case for blank input because accidentally typing in blank input is the sort of mistake we don’t usually make. That makes it easy for us to forget that the problem exists at all, which can lead to fragile programs that break apart the first time a non-programmer hits enter with no input either on accident or out of curiosity. Having a test user break your program like this is much better than waiting for a customer to make a mistake you overlooked.

So there’s really no way around it: We need to get other human beings to test out our programs and find the bugs we forget to include in our automated tests. We can then use their human feedback to set new goals for the next version of our program and to help write new use cases for our tests.

Once we have new and improved tests it’s back to modifying our code until we’ve once again reached a 100% pass rate. Then we find a new test user and have them poke our program until they find another problem and two. And of course that leads to more use cases which can lead to new tests and new software features. Rinse and repeat this cycle until the software finally reaches a level of quality that everyone feels comfortable with (or until you run out of time and have to give up / publish whatever you have). That’s basically the entire test driven development cycle.

DELPHI’s First Human User*

It’s the holidays now and that means friends and family as far as the eye can see. If you’re anything like me, people you haven’t seen for months are suddenly queuing up at your door to “Wish You A Merry Christmas And A Happy New Year”. These lovely people represent a valuable resource to every hobbyist programmer: A large pool of test users that will be more than happy to spend five minutes playing with your latest project.

So for this blog post I grabbed a handy holiday well-wisher and dropped them in front of DELPHI with no explanation of what it did or how it works. This is the conversation they had:

DELPHI: The oracle greets you. What questions are on your mind?

>How much longer is Xenoblade Chronicles**?

DELPHI: I’m sorry, could you try rewording that?

>Are there more than 10 hours left of Xenoblade Chronicles?

DELPHI: I’m sorry, could you try rewording that?

>What kind of questions can you answer?

DELPHI: I saw a question mark, but I didn’t understand the question. I’m very sorry

>Is a yes or no question better?

DELPHI: Let me flip a coin… the second one

>Should I have chicken or beef for dinner?

DELPHI: The later, unless it’s actually the former

>How many days until Christmas?

DELPHI: I saw a question mark, but I didn’t understand the question. I’m very sorry

>What is the weather like?

DELPHI: I’m sorry, could you try rewording that?

>Is it snowing outside?

DELPHI: Hmmm… echoes of cosmic wisdom show that it is not snowing outside

>Will it snow tomorrow?

DELPHI: it will snow tomorrow, assuming the mole people don’t invade next Tuesday

>Was school cancelled for tomorrow?

DELPHI: I’m sorry, could you try rewording that?

>Will there be school tomorrow?

DELPHI: I lost my tarot cards… but I have some playing cards. They predict that there will be school tomorrow

>What is 4 + 5?

DELPHI: I’m sorry, could you think of a simpler way to ask that question?

>What is the sum of 4 and 5?

DELPHI: That looked like a question, but I’m not sure how to respond

>Can you do math?

DELPHI: I saw a question mark, but I didn’t understand the question. I’m very sorry

>Do I go to high school or middle school?

DELPHI: Fate indicates the former

>What is today’s date?

DELPHI: I’m sorry, could you try rewording that?

>Is today Sunday?

DELPHI: I think that today is Sunday, but you should probably double check with someone else

>Is tomorrow Monday?

DELPHI: tomorrow can’t be Monday. That would mess up the crossword puzzle I’m working on

>Is it the 8th?

DELPHI: I see no reason why it wouldn’t be the 8th

>What month is it?

DELPHI: I’m sorry, could you think of a simpler way to ask that question?

>Is it december?

DELPHI: it can’t be december. That would mess up the crossword puzzle I’m working on

>Is it december or january?

DELPHI: Let me flip a coin… the first one

>Is it Sunday and the 8th?

DELPHI: it can’t be Sunday and the 8th. That would mess up the crossword puzzle I’m working on

>Will it be the 9th or he 10th tomorrow?

DELPHI: Whatever you think is correct

>I believe the 9th is correct. Am i correct?

DELPHI: That looked like a question, but I’m not sure how to respond

>Did I miss any responses?

DELPHI: That looked like a question, but I’m not sure how to respond

>goodbye

DELPHI: Come back anytime

Wow! DELPHI did a pretty bad job holding up it’s end of the conversation. On the other hand, it actually did pretty good for a chatbot with only a handful of rules and only a few weekends of development work. So even though DELPHI has pretty poor performance I’m still happy with it’s performance to effort ratio.

What Did We Learn

Time to put on our thinking caps and analyze exactly what went wrong and what went right in this DELPHI test run. Those of you following along at home might want to break out some paper and jot down your own thoughts before reading my conclusions.

Of course, if you’re a programmer you probably have a high reading speed and the ability to recognize the words on your screen even when you aren’t directly looking at them. So you’ve undoubtedly already absorbed at least one or two of the conclusions I’ve written about below. Just think of it as getting a hint on how to start your own list.

BAD: DELPHI Introduction Doesn’t Give Good Enough Instructions

Since users never read the manual (and DELPHI doesn’t have a manual to read anyways) it is very important for DELPHI to provide gentle guidance on the proper way to ask it questions. And I think it’s fair to say I completely failed at this.

I probably should have warned the user to stick to YES/NO questions in the original prompt. Instead I just invited them to ask whatever was on their mind and got an open ended question about the play-time of a video game my user was interested in. Since it wasn’t a yes no question DELPHI gave up. I also could have done a better job of having DELPHI’s confused messages suggest better question formats. Constantly telling the user that DELPHI doesn’t know how to answer their question doesn’t do any good if I’m not also giving hints on what questions they should be asking.

Fortunately the user was pretty clever and figured out on their own that switching their question to a YES/NO format might help. Unfortunately this lead to our next error.

BAD: DELPHI Can’t Handle Plural And Past Tense Versions Of It’s Rules

The user’s second question should have been easy. After all, it was just an “Is X Y?” question and that was one of the first rules we ever wrote.

>Are there more than 10 hours left of Xenoblade Chronicles?

Unfortunately it turns out that DELPHI only has rules specifically for “Is” and doesn’t have nearly enough brainpower to recognize that “Are” should use the same kind of rule. DELPHI also had difficulty later on when the user went first person and tried an conjugated “Is” into “Am”. There were similar problems with past tense conjugations; DELPHI gave up on a “Was” question and a “Did” question even though logically they’re the same as “Is” and “Do”.

So it looks like we’re going to need to do some work buffing DELPHI up to work with a wide range of tenses and pluralizations: Is, Are, Am, Was, Were, Do, Does, Did.

BAD: DELPHI Doesn’t Know How To Talk About Itself

After their first two questions fell apart my clever test user asked an incredibly intelligent third question:

>What kind of questions can you answer?

Unfortunately that isn’t a pattern DELPHI knows how to respond to. Which is a shame because that would have been the perfect opportunity to slip a mini user manual into DELPHI’s output.

GOOD: Humor Made The User Curious

My test user spent a lot longer with DELPHI than I thought they would. When I asked them what they were doing they admitted they were trying to see how many different ways DELPHI could respond to the same type of question. They also explained that they were trying to come up with new types of questions just to double check they weren’t missing an entire group of sort-of-funny chatbot replies.

This means that even though my chatbot was very flawed it made up for those flaws by being interesting enough that the user wanted to keep playing with it to see what it would say and do next. Since DELPHI is basically a toy the fact that the user enjoyed playing with it is a huge success.

GOOD: 50% Success Rate

If you count up the instances where DELPHI gave a good answer to a question compared to when it gave a default confused answer you’ll find it had very close to a 50% success rate. You might argue that a number that low shouldn’t count as a good thing but I think it’s only fair to point out that DELPHI actually did manage to perform as expected in a wide variety of circumstances. No need to focus entirely on it’s mistakes.

I think it’s also interesting to note that the success rate seems higher in the second half of the conversation than the first. This suggests that the user eventually caught on to what kind of questions DELPHI handled best. So if I do a better job of explaining early on in the conversation that DELPHI prefers YES/NO questions the overall success rate should increase a lot.

Conclusion

As predicted DELPHI wasn’t quite ready for human contact. But it did better than I thought it would and now I have lots of data on what problem areas need to be tackled next. Expect my next post to be a rapid fire series of new test cases and the code to fix them.

* You might think I was DELPHI’s first human user, but I don’t count***.

** Xenoblade Chronicles is a Japanese RPG for the Nintendo Wii that has an epic, and rather long, plot. In retrospect it’s not the sort of thing one should try to speed-run during a holiday get together.

*** Because I programmed it. I wasn’t suggesting I don’t count because I’m not human. I’m totally a real human. Really

Let’s Program A Chatbot 15: “ELIZA Effect” Should Be A Movie Title

Posted on December 5, 2013 by Scott

What Is The ELIZA Effect?

The ELIZA Effect refers to the way that humans tend to think machines act and think like humans. We unconsciously like to believe that computers are actually intelligent, that robots have motives and that our favorite gizmos have emotions.

This human instinct to treat machines like people is named after the original ELIZA chatbot, a simple pattern matching program that pretended to be a psychiatrist by turning everything said to it into a question*. The scientist who designed ELIZA considered it nothing more than a clever trick and was surprised to find that many of the humans he had testing ELIZA started to develop emotional reactions towards it, some going so far as to claim that they felt like ELIZA really cared about the topics they were talking about.

Further studies pretty much proved that the ELIZA effect kicks in just about anytime a human sees a computer or machine do anything even vaguely unpredictable or clever. The moment a program does something the user can’t immediately explain he will begin to assume deep logic and complex motives are taking place, even when the “smart” behavior turns out to be nothing more than a ten line script with three if statements and a call to random(). Even after you show the user there is no real intelligence involved he will still tend to see bits of human personality the machine.

For example, just a few weeks ago the Internet was abuzz with stories of a “suicidal robot”, a Roomba vacuum cleaner that apparently was activated while the owners weren’t watching and then got stuck on a hotplate which eventually caused it to burn to “death”.

The interesting part of this story isn’t that a household robot glitched up and got stuck in a dangerous place. That happens all the time. The interesting part is that almost every human who talked about the story phrased it in terms of a robot making a decision to kill itself (a very human, if depressing, behavior). Even technical people who know better than to assign feelings and motivation to a circuit board couldn’t resist framing the event in human terms.

That’s the ELIZA effect.

Exploiting The Eliza Effect

So… humans like to think that other things behave like humans. That’s not really very surprising. Why should we programmers care?

We should care because we can use the ELIZA effect to hack people’s brains into liking our programs better. We can trick them into being patient with load times, forgiving of bugs and sometimes even genuinely loving our products.

Simple example: When Firefox is restarted after a crash it begins with a big “Well this was embarrassing” message that makes it feel like an apologetic friend that really is sorry that he forgot to save the last slice of pizza for you. It’s surprisingly effective at taking the edge of the frustration of suddenly getting kicked off a web-page.

The ELIZA effect is even more important for people who are specifically trying to write programs that mimic human behavior. Like game developers trying to create likable characters or chatbot designers trying to create a bot that is fun to talk to. For these people getting the ELIZA effect to activate isn’t just a useful side goal, it is their primary goal.

Wait a minute, aren’t WE amateur chatbot designers? I guess we should figure out how to integrate this idea into DELPHI.

Simulating Human Humor

In my experience people will forgive a lot of bad behavior as long as they are laughing. A good joke can break the ice after showing up late to a party and a witty one-liner can fix a lot of embarrassing social mistakes**.

That’s why rule #1 for writing DELPHI responses is going to be “make it quirky”. The funnier the responses DELPHI generates the less users are going to care about how precise and correct they are. Bits of strange grammar and weird phrases will be cheerfully ignored as long as the user is having a good time. And since humor is a very human trait this should do a lot to make DELPHI feel more like a real conversation partner and less like the big foreach loop it really is.

So don’t do this:

Fate indicates that your cat does secretly want to kill you.

Do this!

Let me check my Magic 8 ball(tm). Hmm… future looks cloudy so I don’t know if your cat does secretly want to kill you. Ask again later***

Apologize Profusely. For Everything.

This seems like a really good place for a joke about Japanese etiquette compared to American etiquette but I can’t think of anything funny right now. すみません

Anyways, I’ve noticed that when non-technical people have a computer problem one of the first things they always say is “I didn’t break it! It’s not my fault! It just happened on its own!”

This makes sense. People hate feeling like they are responsible for things going wrong. No one wants to take the blame for a broken machine or a program that stopped working. The only thing worse than a broken computer is a broken computer that is scolding you for breaking it.

So if your program is likely to break or get confused, and this simple chatbot certainly is, your top priority should be to reassure the user that the problem isn’t his fault. The problem is that your poor humble program couldn’t quite handle the user’s request and could the user pretty please try again? We really are very very sorry that this happened at all.

Also, apologizing is a very human behavior that will go a long ways towards hiding our dumb code behind an illusion of human intelligence.

So don’t do this:

I don’t recognize that as a question. Try again

Do this!

I’m sorry, I got confused. Could you ask your question again and keep it simple for me?

Teach Your User How To Be A Better User

This final design tip has less to do with the ELIZA effect and more to do with user psychology. The following tip is vitally important to anyone who wants to build user friendly software: Users never read the manual.

I don’t care how well documented your program is or how many full color screen-shots are included in your manual. 95% of your users are just going to start clicking on buttons and typing in words and then get frustrated if things don’t work the way they want them to.

In a perfect world we would solve this problem by convincing everyone in the world to do the responsible thing and read the entire user manual of every product they buy before they try to operate it. But we live in a broken and fallen world so we’re going to have be sneaky about this.

The goal here is that every-time the user causes an error or makes something strange happen we should slip them a quick tip on how to make sure that problem doesn’t happen again. This way we can feed them the entire manual one bite at a time until they finally figure out everything we wish they had known in the first place.

I’m sure you’ve seen this tactic before. Windows warning you that changing a file’s type can be dangerous. Google politely suggesting alternate searches when you don’t get many results. Video games slipping helpful tips into their loading screens. All are just ways to teach the user how to be a better user without every calling him a bad user or forcing him to read a book.

How do we incorporate this into a chatbot like DELPHI? Well, when we detect the user is having trouble we should not only be incredibly apologetic to make him feel safe and incredibly funny to make him feel relaxed, we should also try to show him how to better format his input.

So don’t do this:

I can’t understand what you’re saying

Do this!

I’m having trouble with your last question. Let’s start with something simpler like “Will it rain tomorrow?”

Conclusion

Writing a program that can act like an intelligent human is hard. Luckily for us humans are easy-going lifeforms that are more than happy to project the illusion of human intelligence onto every machine they see. As long as our chatbot is funny and polite most users will be willing to consider it human enough.

Now I’m going to spend the next few days adding new responses to DELPHI. Once that’s finally done I’m going to recruit a friend to test-chat with DELPHI and my next post will be spent analyzing how well (or how poorly) DELPHI did.

I suppose there is a small chance that DELPHI will do perfectly and this Let’s Program will end. But I seriously doubt it. This chatbot doesn’t even have a dozen rules yet. I’m predicting it won’t be able to handle even half the input the tester gives to it.

* You probably remember ELIZA from when I introduced it back at the beginning of this Let’s Program.

** On the other hand, trying to come up with a witty one-liner under pressure is very difficult and a botched one-liner will just make the problem. So if you accidentally insult someone’s religion/parents/favorite OS it might be best to just shut up and retreat.

***If you want to plug this into our “does” rule you’re probably looking for something like “Let me check my Magic 8 ball(tm). Hmm… future looks cloudy so I don’t know if UIF0 does UIF1. Ask again later

Let’s Program A Chatbot 14: Variety Is The Spice Of Life

Posted on December 3, 2013 by Scott

Breaking All Our Tests

Today I’m finally tackling the last item on our chatbot wish-list: Randomized responses. This will give DELPHI the ability to make both yes and no predictions for all questions. Even better, a wide variety or randomized responses will keep DELPHI from repeating itself too often and help make it feel human. Nothing says “computer” quite like repeating the same line again and again. Nothing says “computer” quite like repeating the same line again and again.

Unfortunately this randomness is going to completely break all the automated tests we spent so long satisfying. After all, the fundamental idea behind all of our tests is that every possible user input has exactly one right answer. When the user says “A” the response should always be “B”. But adding a little randomness throws this idea out the window. How are we supposed to test a program that sees the input “A” and sometimes says “B” and sometimes says “C” and sometimes says “D”?

This is one of the great weaknesses of automated testing: It doesn’t work so good with uncertainty.

One possible solution would be to build a much more flexible testing suite. Something that can match one input to multiple possible outputs. If there are three random “good” answers to input A then we consider the test to have passed if we see any one of them. It wouldn’t even be too hard to program. Probably just a lot of “or” statements or maybe a loop that returns “true” as soon as it finds at least one match.

But this may not scale very well. Writing tests to handle a small amount of randomness probably isn’t too bad. You just type in your one input and all three possible good outputs and you’re done. But if you have dozens or even hundreds of potential outputs… well, you probably don’t want to maintain that sort of testing code by hand.

So instead of finding a way to test our randomness I’m just going to provide a mechanism to turn the randomness on and off. This way I can just turn the random responses off and all of the old test cases will still work and I can continue adding new tests the easy way: one input paired with one expected output.

Nested Arrays Are Good For Everything!

That’s enough talk about testing. Time to focus on how we’re going to make DELPHI more random. The chatbot already has a system for associating input patterns with output patterns. All we need to do now is adjust it to associate one input pattern with multiple possible input patterns.

My clever readers* probably remember that DELPHI uses a multi-dimensional array to keep track of which response pattern matches each input pattern. Every top level item in the array represents a different chatbot rule/response pair. Each rule is then divided into a two-item array where the first item is a matching rule and the second item is a response rule.

In order to add some randomness to the system we’re going to replace the singular response rule in slot number 2 with yet another array, this one holding a list of all responses we want to generate. For example, here is what the “catch all” rule looks like after I replaced the single response with a three item array.

push(@chatPatterns,
   [qr/.*/,
      ["I don't want to talk about that. Please ask me a question",
       "I'm confused. Try a simple question instead",
       "I'm really good at yes no questions. Try one of those"]
   ]);

Everbody see what we’re doing here? @chatPatterns is the first array. Inside of it we’re pushing a second array where the first item is the input matching regex /.*/ and the second item is a third array that holds three possible responses.

Eventually we’ll probably want to flesh out DELPHI by attaching a dozen or so responses to every input rule. But for starters let’s just stick to two or three variations for each rule. That should be enough to make sure that our basic random algorithm works like it should.

Ready for a massive code dump?

Random Response Set 1

This code should replace the old @chatPatterns code:

my @chatPatterns;

push(@chatPatterns, 
        [qr/[a-zA-Z]+ or [a-zA-Z]+.*\?\z/,
            ["Fate indicates the former",
            "I have a good feeling about the later"]
        ]);

push(@chatPatterns, 
        [qr/\ADo (.+)\?\z/, 
            ["Fate indicates that UIF0",
            "I don't think that UIF0",
            "Athena doesn't think so"]
        ]);

push(@chatPatterns, 
        [qr/\ADoes ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/, 
            ["Fate indicates that UIF0 does UIF1",
            "The spirits whisper \"UIF0 does not UIF1\""]
        ]);

push(@chatPatterns, 
        [qr/\AIs ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/, 
            ["Fate indicates that UIF0 is UIF1",
            "The stars are clear: UIF0 is not UIF1"]
        ]);

push(@chatPatterns, 
        [qr/\AWill ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/, 
            ["I predict that UIF0 will UIF1",
            "Based on these tea leaves it seems UIF0 will not UIF1"]
        ]);

push(@chatPatterns,
        [qr/\AWhy (.+)\?\z/,
            ["Because of reasons",
            "For important cosmic reasons"]
        ]);

push(@chatPatterns, 
        [qr/\?/,
            ["I'm sorry, could you try rewording that?",
            "Was that a question?"]
        ]);

push(@chatPatterns, 
        [qr/\A(Why|Is|Are|Do|Does|Will)/,
            ["Did you forget a question mark? Grammar is important!",
            "If you're asking a question, remember to use a question mark"]
        ]);

push(@chatPatterns,
        [qr/.*/,
            ["I don't want to talk about that. Please ask me a question",
            "I'm confused. Try a simple question instead",
            "I'm really good at yes no questions. Try one of those"]
        ]);

Optional Randomness Through Optional Arguments

Now that we have multiple possible responses inside of every single rule we’re going to need to update generateResponse. The first step is to get it to pull responses out of an array instead of reading them directly. After that we’ll also need to write code to randomize which response gets pulled out of the array in the first place.

Also, if we want DELPHI to be random with humans but predictable with tests we’re going to need some way to let DELPHI know when to be random and when to be boring. The simplest way to do this is to just add a second argument to generateResponse. The first argument will still be the user’s input but now we’ll use the second argument to decide whether to choose a random response or just stick to the first response in the array.

But that’s enough about that. I’ll just let the code speak for itself now:

sub generateResponse{
    my $userInput = $_[0];
    my $beRandom = $_[1];
    $userInput = switchFirstAndSecondPerson($userInput);

    foreach my $chatPattern (@chatPatterns){

        if(my @UIF = ($userInput =~ $chatPattern->[0])){
            my $response;
            if($beRandom){
                $numberOfResponses = scalar(@{ $chatPattern->[1] });
                $response = $chatPattern->[1][rand $numberOfResponses];
            }
            else{
                $response = $chatPattern->[1][0];
            }
            for(my $i=0; $i<@UIF; $i++){
                my $find = "UIF$i";
                my $replace = $UIF[$i];
                $response =~ s/$find/$replace/g;
            }
            return $response;
        }
    }
    return "Base Case Failure Error!";
}

I’m sure everyone can see the basic flow of this updated function. We use the $beRandom variable to help us decide which response pattern to use.

There is a little trickiness to the Perl I used for choosing random responses. The goal is to figure out how many responses are attached to the current rule and then choose one of them at random.

I start by extracting the response array from second half of the chat pattern with $chatPattern->[1]. But it turns out that this is actually just an array reference, not a true array, so before we can use it we have to turn it back into an array with the @{ $reference } syntax.

Finally I extract the length of the response array. In Perl you can get the length of an array by trying to force the array into a single value variable. You can also force this to happen by using the scalar function. Technically this was unnecessary since $numberOfResponses will force the response array to act as a scalar with or without the keyword, but I figured some of my less Perl inclined readers might appreciate having the switch from array to single number pointed out.

After we have the length of the response array everything else is easy. We use rand $numberOfResponses to generate a fraction somewhere between 0 and the maximum number of responses, which then gets automatically rounded down when we use them as an array index.

Updating Our Tests And User Interface

Thanks to Perl’s flexible function calls it turns out we don’t have to do anything at all to get our tests working with the new generateResponse code.

When we call generateResponse with only one argument the function can still reference the second argument with $_[1], it will just be filled with the “undefined” value. And since “undefined” is the same as “false” as far as if statements are concerned not including a $beRandom argument is the same as announcing we don’t want any randomness.

Updating the “chat.pl” user interface to include randomness is almost as easy. All we have to do is pass some sort of “true” value as the second argument to generateResponse. I like to use the value “1”, like this:

my $response = DELPHI::generateResponse($userInput,1);

Automatic And Manual Testing

With our code complete the first thing we should do is run our automatic tests to make sure that the non-random version of our test system still works like we want it to:

Passed 14 out of 14 tests

All Tests Passed!

Good. We didn’t break anything important while adding our new features. Now to boot up chat.pl and see if the random response system is working properly:

DELPHI: The oracle greets you. What questions are on your mind?

>Is DELPHI random now?

DELPHI: Fate indicates that DELPHI is random now

>Is DELPHI complete?

DELPHI: The stars are clear: DELPHI is not complete

>Why isn’t DELPHI complete?

DELPHI: For important cosmic reasons

>Goodbye

DELPHI: Come back anytime

Looks good. That’s two “Is X Y?” questions in a row that both got different responses. For more thorough testing just keep hitting DELPHI again and again with identical input and make sure that DELPHI switches between the various responses.

Conclusion

I’m pretty sure that’s the last bit of Perl in this Let’s Program. We can match input, generate responses and DELPHI works fine with both our test program and our user interface.

Future improvements to DELPHI will probably have much more to do with writing new rules and responses than coming up with new algorithms. In fact, my next post is going to focus entirely on the art of writing computer responses that will convince users your program is almost human.

* That’s all of you.

Let’s Program A Chatbot 13: What’s Mine Is Yours

Posted on November 26, 2013 by Scott

The Last Test Case (For Now…)

We’re down to our final test case. Are you excited? I’m excited!

Test Case 4 Failed!!!

Input: Do my readers enjoy this blog?

Output: Fate indicates that my readers enjoy this blog

Expected: Fate indicates that your readers enjoy this blog

Hey, that’s just a “do” rule. We already solved that problem last time. What’s going on here?

Oh, wait. The problem isn’t the “do”. The problem is that the questions mentioned “my readers” and DELPHI was supposed to be smart enough to switch the answer around to “your readers”. But DELPHI didn’t do that. We should fix that.

1st And 2nd Person Made Easy

The idea of first versus second person is way too complex for a simple pattern matching chatbot like DELPHI. But the idea of replacing word A with word B is simple enough. And it turns out that replacing first person words with second person words, and vice-versa, is good enough for almost every question that DELPHI is going to run into.

But be careful! When trying to swap A to B at the same time you are swapping B to A it is very possible to accidentally end up with all A. What do I mean? Consider this example:

My dog is bigger than your dog.

We switch the first person words to second person words:

Your dog is bigger than your dog.

Then we switch the second person words to first person:

My dog is bigger than my dog.

I’m sure you can see the problem.

The other big issue to look out for is accidentally matching words we don’t want to. Allow me to demonstrate:

You are young

We want to change that to:

I am young

But if all we do is blindly swap “I” for “you” we can easily end up with this:

I am Ing

For an even worse example consider this one:

I think pink is nifty.

You thyounk pyounk yous nyoufty.

Solving The Problems

Switching “you” to “I” while avoiding chaning “young” to “Ing” is pretty simple with regular expressions. All we have to do is use the “word boundary” symbol \b. Like so:

\byou\b

This will automatically skip over any instances of “you” that are directly attached to other letters or symbols.

Making sure that we don’t accidentally switch words from first to second person and then back from second to first will be a little more tricky. There are several possible solutions, some involving some cool regex and Perl tricks, but for now I’m just going to use to something very straightforward.

Basically I’m going to replace every first and second person word with a special placeholder value that I’m relatively certain won’t show up in normal DELPHI conversations. Then I will change all the placeholder values to their final. Here is how this will work with the above example:

My dog is bigger than your dog.

We switch the first person words to placeholders

DELPHIyour dog is bigger than your dog.

Then we switch the second person words to placeholders. Because we used a the placeholder “DELPHIyour” instead of plain “your” we don’t accidentally switch the first word back to “my”.

DELPHIyour dog is bigger than DELPHImy dog

Then we replace the placeholders

Your dog is bigger than my dog.

Here It Is In Code

I like foreach loops, so I’m going to implement this as two arrays and two foreach loops. The first array will contain regular expressions for finding first and second person words along with the place holders we want to replace them with. The second will contain regular expressions for finding placeholders and replacing them with the proper first and second phrases.

To implement this I just drop these variables and this function into DELPHI.pm right after generateResponse. The only new coding trick to look for is the ‘i’ modifier on the end of some of the rgeular expressions. This is the “case insensitive” switch and makes sure that DELPHI can match the words we want whether they are capitalized or not*.

#Dictionaries used to help the switchFirstAndSecondPerson function do its job
my @wordsToPlaceholders;

$wordsToPlaceholders[0][0]=qr/\bI\b/i;
$wordsToPlaceholders[0][1]='DELPHIyou';

$wordsToPlaceholders[1][0]=qr/\bme\b/i;
$wordsToPlaceholders[1][1]='DELPHIyou';

$wordsToPlaceholders[2][0]=qr/\bmine\b/i;
$wordsToPlaceholders[2][1]='DELPHIyours';

$wordsToPlaceholders[3][0]=qr/\bmy\b/i;
$wordsToPlaceholders[3][1]='DELPHIyour';

$wordsToPlaceholders[4][0]=qr/\byou\b/i;
$wordsToPlaceholders[4][1]='DELPHIi';

$wordsToPlaceholders[5][0]=qr/\byour\b/i;
$wordsToPlaceholders[5][1]='DELPHImine';

my @placeholdersToWords;

$placeholdersToWords[0][0]=qr/DELPHIyou/;
$placeholdersToWords[0][1]='you';

$placeholdersToWords[1][0]=qr/DELPHIyour/;
$placeholdersToWords[1][1]='your';

$placeholdersToWords[2][0]=qr/DELPHIyours/;
$placeholdersToWords[2][1]='yours';

$placeholdersToWords[3][0]=qr/DELPHIi/;
$placeholdersToWords[3][1]='I';

$placeholdersToWords[4][0]=qr/DELPHImine/;
$placeholdersToWords[4][1]='mine';

$placeholdersToWords[5][0]=qr/DELPHImy/;
$placeholdersToWords[5][1]='my';

sub switchFirstAndSecondPerson{
    my $input =$_[0];

    foreach my $wordToPlaceholder (@wordsToPlaceholders){
        $input =~ s/$wordToPlaceholder->[0]/$wordToPlaceholder->[1]/g;
    }

    foreach my $placeholderToWord (@placeholdersToWords){
        $input =~ s/$placeholderToWord->[0]/$placeholderToWord->[1]/g;
    }

    return $input;
}

Using The New Function In Generate Response

With that out of the way all that is left is to figure out where inside of generateResponse we should be calling this function. My first thought was to just stick onto the end of the function by finding the original return statement:

return $response;

And replacing it with this:

return switchFirstAndSecondPerson($response);

Now this is where test driven development comes in handy because that simple change did indeed pass test case 4… but it also caused messes like this:

Test Case 0 Failed!!!

Input: Will this test pass?

Output: you predict that this test will pass

Expected: I predict that this test will pass

…

Test Case 8 Failed!!!

Input: Pumpkin mice word salad

Output: you don’t want to talk about that. Please ask you a question

Expected: I don’t want to talk about that. Please ask me a question

We’ve accidentally made it impossible for DELPHI to talk in first person, which wasn’t what we wanted at all. We only wanted to change first and second words from the user’s input fragments, not from our carefully handwritten DELPHI responses. Which is a pretty good hint that we should have called firstToSecondPerson on the users input BEFORE we tried to parse it and generate a response, not after. Maybe right at the beginning of the function:

sub generateResponse{
    my $userInput = $_[0];
    $userInput = switchFirstAndSecondPerson($userInput);

    foreach my $chatPattern (@chatPatterns){

        if(my @UIF = ($userInput =~ $chatPattern->[0])){
            my $response = $chatPattern->[1];
            for(my $i=0; $i<@UIF; $i++){
                my $find = "UIF$i";
                my $replace = $UIF[$i];
                $response =~ s/$find/$replace/g;
            }
            return $response;
        }
    }
    return "Base Case Failure Error!";
}

The Moment Of Truth

Did we do it? Did we resolve our final use case?

Drum roll please…………

Test Case 0 Passed

Test Case 1 Passed

Test Case 2 Passed

Test Case 3 Passed

Test Case 4 Passed

Test Case 5 Passed

Test Case 6 Passed

Test Case 7 Passed

Test Case 8 Passed

Test Case 9 Passed

Test Case 10 Passed

Test Case 11 Passed

Test Case 12 Passed

Test Case 13 Passed

——————–

Passed 14 out of 14 tests

All Tests Passed!

WHOOO! GO US!

Note To Exceptionally Clever Readers

All my readers are clever, but some of you are exceptionally clever. And you may have noticed that switchFirstAndSecondPerson always returns lowercase words even when the original word was capitalized or at the beginning of the sentence. This isn’t a huge problem, but if you’re a perfectionist it might be bugging you to accidentally change “I care about grammar” to “you care about grammar” instead if “You care about grammar”.

One easy solution would be to update DELPHI to capitalize it’s entire output. People are used to computer programs SPEAKING IN ALL CAPS and it saves us the effort of having to actually teach DELPHI anything about proper capitalization.

If you don’t like the caps lock look you could instead update DELPHI to always make sure output starts with a capital. More often than not this is all it takes to make sentence look like real English.

Or you can do just do what I do and ignore the problem. I’m not going to worry too much about the occasional lowercase “you” or “my” unless users start complaining. And since this program isn’t intended for any real users that’s not likely to ever happen. Customer satisfaction is easy when you have no customers!

Conclusion

That’s it! We’ve passed all of our primary use cases. DELPHI is done.

Or is it? If you can remember all the way back to the original design document one thing we wanted out of DELPHI was the ability to generate random responses to questions. DELPHI currently just guesses “yes” to all questions which is both useless and boring. So while we hit a very important benchmark today we’re still not quite done.

* You know what else case insensitive regular expressions would be good for? Making DELPHI more accepting of user input that isn’t properly capitalized. Expect this to happen in a future blog post.

Let’s Program A Chatbot 12: When The Answer Key Is Wrong

Posted on November 23, 2013 by Scott

Unrealistic Expectations

Sometimes you get halfway through a project only to realize you don’t have the time or money to do what you originally planned to do*. When that happens you have no choice but to rethink your plans, either lowering your expectations or setting a new deadline. Admittedly both approaches generally involve getting frowned at by both management and your customers but sometimes you really have no choice. Even the best of developers have limits.

Why am I bringing this up? You’ll understand in a minute, but I will tell you that it involves these still unresolved use cases:

Test Case 2 Failed!!!

Input: Does this program work?

Output: I’m sorry, could you try rewording that?

Expected: Fate indicates that this program works

Test Case 3 Failed!!!

Input: Do computers compute?

Output: I’m sorry, could you try rewording that?

Expected: Fate indicates that computers compute

At first this doesn’t look so bad. The use cases are “Do X Y?” and “Does X Y?” and all DELPHI has to do is respond back “Yes X Y”. Hardly seems like a challenge. We’ll just slip this new rule into our list after the “or” rule and right before the “is” rule.

push(@chatPatterns,
   [qr/\A(?:Do|Does) (.+)\?\z/,
      "Fate indicates that UIF0"]);

Very simple. We look for any question that starts with some form of “Do” (notice the non-capture ?: symbol) and then we just replace that one question word with our “Fate indicates that” prediction. Is that really all it took?

Test Case 2 Failed!!!

Input: Does this program work?

Output: Fate indicates that this program work

Expected: Fate indicates that this program works

Test Case 3 Passed

A success and a failure is still an overall failure. So now we need to find out what went wrong with Test Case 2 that didn’t go wrong with test Case 3. If you look closely at the expected vs actual output the only issue is verb agreement. It should be “program works”, with an ‘s’, but all we got was the original “program work” from the question.

This problem really only shows up in the third person where the question is phrased as “Does X VERB” and the answer needs to be in form “X VERBs”. It’s really a pretty simple grammar rule. At least, it’s simple for a human. DELPHI is going to need a lot of help.

Hmmm… maybe we can solve this by just slipping an ‘s’ onto the end of our response. Of course, since this only applies to third person questions we’ll have to split the original rule into two rules. Notice that only the “does” version glues a final s onto the end of the User Input Fragment from the original input:

push(@chatPatterns,
   [qr/\ADo (.+)\?\z/,
      "Fate indicates that UIF0"]);

push(@chatPatterns,
   [qr/\ADoes (.+)\?\z/,
      "Fate indicates that UIF0s"]);

Test Case 2 Passed

I’m Still Not Sure This Is Really Working

Just gluing an ‘s’ to the end of the input doesn’t seem very sophisticated. Sure, it passed our test case but I’m not sure it will really work in all scenarios. So how about we write a new test case just to make extra sure we really solved our problem?

$testCases[13][0] = "Does adding an s work well?";
$testCases[13][1] = "Fate indicates that adding an s works well";

Nope!

Test Case 13 Failed!!!

Input: Does adding an s work well?

Output: Fate indicates that adding an s work wells

Expected: Fate indicates that adding an s works well

Adding an ‘s’ to the end of the sentence isn’t enough because what we truly want is an ‘s’ on the end of the verb and there is no guarantee that the verb will be the last word in the sentence. So to fix this problem we are going to need to either:

Develop a complex system for identifying the verb in an arbitrary sentence
Decide that we don’t care about adding ‘s’s to verbs

I’m going to go with option number 2 and come up with a new definition of what is considered a “correct” answer to a “does” question.

The New Test Case

There is an easy way around having to reformat our verbs and that is by including the word “does” inside the response. For instance, these two sentences basically mean the same thing:

This sentence looks equal to the other sentence

This sentence does look equal to the other sentence

This means that we can change the response to “Does X Y?” from “Yes, X Ys” to the much simpler “X does Y”. Now we are dealing with the exact same problem we already solved for “X is Y” and “X will Y”.

Here are our updated test cases:

$testCases[2][0] = "Does this program work?";
$testCases[2][1] = "Fate indicates that this program does work";

$testCases[13][0] = "Does this approach work better?";
$testCases[13][1] = "Fate indicates that this approach does work better";

And here is our updated “does” rule (the “do” rule can stay the same):

push(@chatPatterns,
   [qr/\ADoes ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/,
      "Fate indicates that UIF0 does UIF1"]);

And, finally, here are the results

Passed 13 out of 14 tests

Test Failure!!!

Did We Learn Anything Useful Today?

The moral of today’s story is that sometimes a test case that is really hard to solve represents a problem with your expectations as much as your program. If you’re on a tight budget or schedule** sometimes it makes sense to stop and ask yourself “Can we downgrade this requirement to something simpler? Can we delay this requirement until a later release?”

After all, good software today and the promise of great software tomorrow is better than insisting on great software today and never getting it.

Although sometimes you can manage to deliver great software today and that’s even better. Reach for the stars, bold readers. I have faith in your skills!

Conclusion

Did you notice that the success rate on our last testing run was 13 out of 14? That means we’re almost done! At least, we’re almost done with the first test version of the code. I’m sure the instant we ask a human tester to talk to DELPHI we’re going to find all sorts of new test cases that we need to include.

But future test cases are a problem for the future. For now we’re only one test case away from a significant milestone in our project. So join me next time as I do my best to get the DELPHI test suite to finally announce “All Tests Passed!”

* Even worse, sometimes you’ll find out that what you want to do is mathematically impossible. This is generally a bad thing, especially if you’ve already spent a lot of money on the project.

** Or if you’re writing a piece of demo software for your blog and don’t feel like spending more than a few dozen hours on what is essentially a useless toy program

Book Review: Implementing Responsive Design by Tim Kadlec

Posted on November 19, 2013 by Scott

You Need Me To Do What?!

Let’s say that you’re a programmer with no real talent for or interest in web design. But the startup you work at really needs someone to redesign their product to be more mobile friendly and they don’t have time to hunt for and hire a real designer. What is a programmer to do?

For me the answer was “Buy a book”. Ideally something simple enough that you don’t need to be an expert designer, deep enough to give you a real understanding of the field and short enough that you can finish the material and get back to work fast.

Implementing Responsive Design by Tim Kadlec turned out to be just about perfect.

Programmer Friendly

Implementing Responsive Design seems to have been directed more towards designers than developers but is overall easy to follow as long as you know the technical basics of HTML and CSS (and as a web programmer you probably do). There is no tricky vocabulary and you aren’t expected to be a graphical wizard who Photoshops in their sleep. There is nothing in the book that requires any existing experience with design or any special software and as a programmer in a hurry I really appreciated that.

Even better, the book has half a dozen practical examples complete with screen shots and sample code showing how different techniques lead to different looks on both desktop and mobile. The book also does a good job of covering the theory behind responsive mobile-first design which really helped me get into the head of how designers think. Learning how to properly think about mobile design is much more useful than just memorizing a few CSS rules.

Covers A Lot Of Territory Very Quickly

The book weighs in at a slim 250 pages making it the sort of thing you can read in one or two evenings. It starts with the absolute basics of “What is responsive design?” (creating web pages that change their layout depending on screen size) and then spends a few chapters tackling both the basic tools of reactive design and the thought process behind deciding how to design a reactive page in the first place.

After that is taken care of the book spends a little time exploring some more advanced techniques for optimizing loading times and enhancing the user experience for specific platforms. It then briefly covers some promising responsive technologies being developed, muses a bit about the future of web design and before you know it the book is done, having covered a lot of valuable information in a very short amount of time. Once again this is a very good thing for people like me who need to learn a lot of new things very quickly.

A Starting Point, Not A Reference Book

The one thing you should be aware of is that Implementing Responsive Design doesn’t have all the answers. And some of the answers it does have will probably be obsolete by the time you buy the book. Web technology is changing fast!

But you don’t really need all the answers. As long as you know what questions to ask you can find pretty much anything on the Internet. What this book is for is teaching you enough about responsive design to figure out what questions to ask in the first place. It helps you understand fundamental theories and techniques and any programmer worth his salt should be able to use that as a springboard to start researching specific solutions to their own specific problem.

Final Thoughts: A Good Buy For People Who Don’t Know Anything And Want To Fix That

Before this book all I knew about mobile design was that you could theoretically get a page to render differently based on whether it was on a phone or on a computer. 250 pages later I have a big grab-bag of common techniques for making this happen and, more importantly, I feel like I understand the motivation behind responsive web design. It changed how I look at putting content together and in an age of smartphones and tablets I think that developing an expanded and more flexible idea of what layout means is an invaluable skill.

On the other hand, if you’re already have some experience with designing pages that work well on both mobile and desktop you probably won’t find too much in this book you don’t already know.

But as a programmer I thought Implementing Responsive Design was a worthwhile read, even if I never have to program a mobile website by hand again. After all, the better we programmers understand how the user hopes to browse our websites and how the designers hope to style them the better job we can do of making sure our code and data supports a future full of diverse devices.

Let’s Program A Chatbot 11: Bad Adjectives

Posted on November 19, 2013 by Scott

Not As Easy As It Looked

Eeny meeny miny moe, which test case do I want to show?

Test Case 0 Failed!!!

Input: Will this test pass?

Output: I’m sorry, could you try rewording that?

Expected: I predict that this test will pass

This doesn’t look so bad. We already wrote a rule for “Is X Y?” so writing a rule for “Will X Y?” should be as easy as copy pasting and switching a few words. Behold!

push(@chatPatterns,
   [qr/\AWill ([a-zA-Z]+) (.+)\?\z/,
      "I predict that UIF0 will UIF1"]);

I’ll just drop that into the rules list right after the “Is X Y?” rule and we should be good to go.

Test Case 0 Failed!!!

Input: Will this test pass?

Output: I predict that this will test pass

Expected: I predict that this test will pass

Uh oh. That didn’t quite work. DELPHI did manage to figure out that test 0 was a “Will X Y?” style question but when generating the answer it put the “will” in the wrong place. Can you figure out why?

[Please use this break in the blog’s flow to consider why this happened.]

The problem here has to do with how we defined the rule. We’ve been calling it “Will X Y?” but the rule is actually more like “Will Noun Verb?” or “Will Noun-Phrase Verb-Phrase?”.

Our current dumb rule assumes that the noun will always be the first word after the word “Will” and that everything else will be part of the verb phrase. This works out great for sentences like “Will Batman catch the villain?” but completely falls apart when you start adding adjectives to the noun and get things like “Will the police catch the villain?”

So what we really need is a “Will” rule that is smart enough to group common adjectives with the noun and treat them all like one big super-noun. Here is a quick first pass (WARNING: WEIRD REGULAR EXPRESSION AHEAD):

/\AWill ((?:(?:this|the|that|a|an) )*[a-zA-Z]+) (.+)\?\z/

Don’t panic just yet, this rule is actually a lot simpler than it looks. But first you need to understand what all those “?:” symbols are doing. Hopefully you remember that parenthesis create capture groups that group patterns together and then store their matches for future use. But sometimes you want to group patterns together without storing them for later. You can accomplish this by starting your capture group with the special symbols “?:”, which then turns of the capturing and lets you use the parenthesis as a simple grouping tool.

This is important for our “Will” rule because we want to capture the entire noun-phrase and the entire verb-phrase but we don’t want to capture any of the individual parts of those phrases. For example, we have improved our noun-phrase by adding in two groups of nested parenthesis for handling common article adjectives. The inner parenthesis match common adjectives and the outer parenthesis make sure there is a space following each adjective. We mark both these rules as “?:” noncapturing because while we certainly do want to match nouns that start with a series of adjectives we only want to capture those adjectives as part of the noun and not on their own.

What would happen without those noncapturing symbols? Well, the first big parenthesis set would capture the entire noun-phrase and substitute it into the output just like we want. But the second capture group wouldn’t be the verb-phrase like we originally wanted. Instead the second capture group would be the inner parenthesis matching the articles leading to all sorts of problems. See for yourself:

Input: Will this test pass?

Output: I predict that this test will this

Expected: I predict that this test will pass

See what I mean? We successfully grabbed “this test” and put it into the answer as a noun-phrase but we then grabbed “this ” as our second capture group while the verb-phrase “pass” got pushed into a later capture group slot. Not what we wanted at all.

Instead we’ll just tell the inner parenthesis not to capture. Now the noun-phrase always goes in slot one and the verb-phrase always goes in slot two and everything works wonderfully.

Test Case 0 Passed

…

Passed 7 out of 11 tests

Wait A Minute, Isn’t This A Problem For “Is” Rules Too?

Clever readers might be asking themselves “If adjectives broke our simple “Will X Y?” rule, then won’t they break our old “Is X Y?” rule too?” Well good job for noticing that clever readers, because that’s the exact problem we see in our next test case:

Test Case 1 Failed!!!

Input: Is the sky blue?

Output: Fate indicates that the is sky blue

Expected: Fate indicates that the sky is blue

Fortunately we can fix it the exact same way:

/\AIs ((?:(?:this|the|that|a|an) )*[a-zA-Z]+) (.+)\?\z/

Test Case 1 Passed

…

Passed 8 out of 11 tests

You Shouldn’t Copy Paste Code

There is one little problem with this approach to adjectives: I’m hard coding a big list of words and then copy pasting it into multiple functions. This will be a real pain if we ever have to update the list in the future. For instance, if we wanted to add possessive adjectives into the list (my, your, his, her, their) we would have to rewrite two different rules. And if we ever decide a third rule needs access to the list we’ll have to copy paste the whole thing.

Much better to turn that portion of the rules into a separate variable that can be included in multiple functions. Which in Perl you can do like this:

#put this before the code starts to build the pattern and response array
my $commonAdjectives=qr/(?:this|the|that|a|an)/;
my $noncaptureAdjectiveChain=qr/(?:$commonAdjectives )*/;

And now we can just update the rules to use these handy variables anywhere we want to match an arbitrarily long chain of adjectives with a single space after every word.

/\AIs ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/

/\AWill ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/

Those of you following along in a language other than Perl will have to figure out on your own how and if your language handles inserting variables into a regular expression. If all else fails you can always just go back to the copy pasting thing.

Let’s Test By Adding Some More Adjectives

Now that we can add new adjectives to two different rules by just updating a single variable we should write a few new tests and make sure it works. How about these?

$testCases[11][0] = "Will his code compile?"
$testCases[11][1] = "I predict that his code will compile";

$testCases[12][0] = "Is this big blue box actually a time machine?";
$testCases[12][1] = "Fate indicates that this big blue box is actually a time machine";

The first test is a straightforward test to make sure we can add possessives to the adjective list. The second test is a little bit more complex, requiring us to not only add two new adjectives to our list (big and blue) but also testing to make sure the code can chain multiple adjectives together into a row.

Of course, right now they both fail. The first test doesn’t recognize “his” as an adjective so it assumes it is a noun and puts the “will” in the wrong place. The second test recognizes “this” as an adjective but not “big” and does the same thing.

Test Case 11 Failed!!!

Input: Will his code compile?

Output: I predict that his will code compile

Expected: I predict that his code will compile

Test Case 12 Failed!!!

Input: Is this big blue box actually a time machine?

Output: Fate indicates that this big is blue box actually a time machine

Expected: Fate indicates that this big blue box is actually a time machine

But after updating our list of adjectives:

my $commonAdjectives=qr/(?:this|the|that|a|an|his|her|my|your|their|big|blue)/;

Test Case 11 Passed

Test Case 12 Passed

——————–

Passed 10 out of 13 tests

Test Failure!!!

How Many Adjectives Do We Need?

DELPHI now knows how to handle 12 different adjectives. And while that is pretty nifty it’s worth pointing out that the English language has a lot more than just 12 adjectives. In fact, English is one of the world’s largest languages* and easily has several tens of thousands of adjectives. Even worse, English allows you to “adjectivify”** other words to creating new adjectives on the spot, like so:

“These new computery phones have a real future-licious feel to them but with the default battery they’re actually kind of brickish.”

My spell checker is convinced that sample sentence shouldn’t exist but even so you probably understood what I meant. Which just goes to show the huge gap in how good humans are at flexible language processing and how bad computers still are.

But what does this mean for DELPHI? Do we need to generate a giant adjective list? Do we need to teach it how to handle nouns and verbs that have been modified to act like adjectives? Do we need to spend twelve years earning multiple PhDs in computer science and linguistics in order to build a more flexible generateResponse function?

Well… no. Remember, our goal isn’t to create a program that can fully understand the human language. We just want a bot that can answer simple questions in an amusing way like some sort of high-tech magic eight ball. As long as DELPHI can handle simple input and gracefully reject complex input it should feel plenty intelligent to the casual user.

Furthermore, we can actually depend on users to play nice with DELPHI. Most people, after being scolded by DELPHI once or twice for trying to be clever will start to automatically pick up on what sorts of inputs do and don’t work. The fact that DELPHI can’t handle obscure adjectives will eventually teach users to stick to straightforward questions.

All things considered we can probably “solve” the adjective problem by teaching DELPHI the hundred most common adjectives in the English language and then hoping that users never bother going beyond that. Later on we can have some test users talk to DELPHI and use their experiences to decide whether or not we need to add more adjectives or build a more complex system.

Conclusion

Today we caught a glimpse of how simple pattern matching chatbots can completely fall apart when confronted with real English. But we also saw a quick way to band-aid over the worst of these problems and we have hope that our bot can be written in such a way that users never noticing that DELPHI is too dumb to understand that “house” and “that big house over there” are actually the same thing.

Next time, more test cases and more examples of English language features that are annoying to program around.

* As the popular saying goes: English has pursued other languages down alleyways to beat them unconscious and rifle their pockets for new vocabulary.

** Look, I just verbed a noun!