You Know How This Works By Now
If you’ve been following along through this entire Let’s Program you should have a pretty good idea of the process I use to think up new test cases and then write new rules to solve them. So this time around I’m not going to bother explaining every single little thing I’m doing. I’m just going to quickly throw down seven new tests cases along with the code I used to satisfy them. I’m confident you can fill in the gaps in your own now.
Is, Are and Was Were Problems
Let’s start out with those troublesome plurals and past tense versions of “Is”. Here are our new tests:
$testCases = "Are plurals working now?"; $testCases = "Fate indicates that plurals are working now"; $testCases = "Was this tense a problem?"; $testCases = "Fate indicates that this tense was a problem"; $testCases = "Were the lights left on?"; $testCases = "Fate indicates that the lights were left on";
And here is the fix:
push(@chatPatterns, [qr/\A(is|are|am|was|were) ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/i, ["Fate indicates that UIF1 UIF0 UIF2", "Other responses go here"] ]);
A few things to look out for here. First, notice that I’m using the /i flag at the end of the regular expression so the user doesn’t have to worry about capitalizing anything. You should also notice that I’m now capturing which version of “Is” the user chose and then inserting it into the output. Since it is now the first capture group I can refer to it as UIF0, although that does mean I need to scoot over the other input. UIF0 is now UIF1 and UIF1 becomes UIF2.
And this very almost worked. Almost.
Test Case 16 Failed!!!
Input: Were the lights left on?
Output: Fate indicates that the lights Were left on
Expected: Fate indicates that the lights were left on
Passed 11 out of 17 tests
Because we grab the verb from the user’s input a capitalized “Is” will result in strange capitals in the middle of our sentences, like you can see here with this out of place “Were”. Not acceptable. I could probably improve the UIF substitution system to avoid middle word capitalization, but instead I’m going to just blow away every possible capitalization problem I could ever have by making DELPHI speak in robot style ALL CAPS. It’s an easy change to make but it does mean that I’m going to have to update every single test case to expect capital letters. Oh well.
Here is the one line change needed in the return statement of generateResponse to switch to all caps. The uc function makes everything upper case.
Now please excuse me while I capitalize all 17 of my existing test cases.
(Type type type)
Well that was boring. But DELPHI is now passing 17 out of 17 test cases again so it was worth it.
You’ve Gone Done Did It Now!
Getting DELPHI to handle past tense “Did” as well as present “Does” is basically the same problem and solution as “Is” and “was”above.
$testCases = "Did this test pass?"; $testCases = "FATE INDICATES THAT THIS TEST DID PASS";
Notice the switch to all capital expected output. Anyways, here is the updated “Does” rule:
push(@chatPatterns, [qr/\A(Did|Does) ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/i, ["Fate indicates that UIF1 UIF0 UIF2", "Other responses go here"] ]);
Once again we are now pulling the verb out of the user’s input, which means we can include it in the output as UIF0 and have to shift the index of all of our other existing User Input Fragments.
Passed 18 out of 18 tests
All Tests Passed!
Help Me To Help You
One of the bigger problems we saw with the first test user was that they weren’t sure what kind of questions to ask DELPHI. Let’s fix this by jumping into chat.pl and writing up a new self-introduction for DELPHI:
DELPHI: HELLO! MY NAME IS DELPHI
DELPHI: I CAN USE MY MYSTERIOUS POWER TO ANSWER YES OR NO QUESTIONS LIKE
DELPHI: “WILL IT RAIN TOMORROW?” OR “DID I PASS MY LAST TEST?”
DELPHI: WHAT WOULD YOU LIKE TO KNOW?
That’s much better. And while I’m at it I should probably improve DELPHI’s responses to statements it doesn’t understand. Remember, if you change the first possible response to a rule you will also need to update the test cases associated with that rule.
Here is a sample of some of the new responses I’m going to try:
I’m sorry, could you try rewording that as a yes or no question?
I’m sorry, could you think of a simpler way to ask that question? Maybe as a yes or no question?
I’m confused. Try a simple yes or no question instead
I don’t want to talk about that. Please feel free to ask me why
Self-Knowledge Is The Key To Enlightenment
The final big issue we saw in our human focused test was that DELPHI couldn’t answer simple questions about itself. That’s sad. A program without an easy to use HELP feature has pretty much failed at usability.
So here are a few test cases to help us address this glaring problem:
$testCases = "What kind of questions can you answer?"; $testCases = "YES OR NO QUESTIONS LIKE \"WILL I HAVE GOOD LUCK TOMORROW?\" ARE THE EASIEST FOR ME TO ANSWER"; $testCases = "Can you tell time?"; $testCases = "THE ONLY THING I CAN REALLY DO IS ANSWER SIMPLE QUESTIONS LIKE \"WILL IT BE SUNNY TOMORROW?\""; $testCases = "help"; $testCases = "JUST TYPE A QUESTION AND HIT ENTER AND I'LL DO MY BEST TO ANSWER IT. YES OR NO QUESTIONS LIKE \"DID I DO WELL ON MY TEST?\" ARE BEST";
Let’s tackle the “help” case first because it’s easiest. We just write a super specific and super high level rule. I didn’t want random help messages, so the response array is only one item long. But it still has to be an array because that’s what the code expects:
push(@chatPatterns, [qr/\Ahelp\z/i, ["Just type a question and hit enter and I'll do my best to answer it. Yes or No questions like \"Did I do well on my test?\" are best"]]);
Let’s do the “What kind of questions test next”. I don’t have any other rules dealing with “What” style questions so the priority of this rule doesn’t really matter that much. If you’ve written your own “What” rule you’ll probably need to give this help rule higher priority so it doesn’t get overshadowed.
The rule itself is pretty simple. It just looks for any sentence that starts with “What” and later has the word “question”. This might be casting the net a bit wide since this won’t just catch questions like “What kind of questions can you answer?” or “What kind of questions work best?”. It will also catch things like “What do you do with my questions?” where our answer doesn’t really make sense.
So we’re kind of gambling here on what questions the user will and won’t ask. If we find that our users are providing lots of “What questions” input that doesn’t fit this pattern we may have to write som extra rules, but for now this should be fine:
push(@chatPatterns, [qr/\AWhat.*questions/i, ["Yes or no questions like \"Will I have good luck tomorrow?\" are the easiest for me to answer"]]);
Finally is the “Can you?” rule, which is basically identical to the rule above. Just look for the words “Can” and “You”:
push(@chatPatterns, [qr/\ACan.*you/i, ["The only thing I can really do is answer simple questions like \"Will it be sunny tomorrow?\""]]);
Unfortunately we run into a little problem:
Test Case 19 Failed!!!
Input: Can you tell time?
Output: I’M SORRY, COULD YOU TRY REWORDING THAT AS A YES OR NO QUESTION?
Expected: THE ONLY THING I CAN REALLY DO IS ANSWER SIMPLE QUESTIONS LIKE “WILL IT BE SUNNY TOMORROW?”
Remember how we wrote that little bit of code to transform second person words like “You” into first person words like “I” to help use create more grammatical responses? That same code is breaking our new “Can you” rule by changing the user’s input into “Can I” before it ever gets compared to any of our rules.
There’s probably an elegant solution to this problem, but I don’t have the time for finding it right now. Instead I’m just going to rewrite the rule to look for “Can I” because I know that represents a user asking “Can you” like we really want.
push(@chatPatterns, [qr/\ACan.*I/i, ["The only thing I can really do is answer simple questions like \"Will it be sunny tomorrow?\""]]);
DELPHI V2 Lives!
Passed 21 out of 21 tests
All Tests Passed!
DELPHI can now handle at least the most basic forms of the three big problems we saw in our live user test. Which means it’s time to find a new test user and try again. I’m hoping our code can now generate logical answers to at least 70% of the input the average random user will try to give it. That would give us a nice solid “C” average. Not good enough for a professional chatbot, but a pretty good goal for a practice program built from scratch using nothing but regular expressions and a few foreach loops.