Let’s Program A Chatbot 5: Finally, Code!

When Perl != Perl

 

It just occurred to me that before I go any further I should probably mention that when I say I’m using “Perl” I mean Perl 5. More specifically, Perl 5 version 14 subversion 2 (5.14.2).

 

This is important because there is a major project called Perl 6 in the works that will change so many features of Perl that it could really be considered a new language instead of an upgrade. This means that Perl 5 code probably won’t work with a Perl 6 system.

 

So if you’re reading this in a Perl 6 dominated future be warned that you’ll also need Perl 5 if you want to follow along with my sample code. Or you could just rewrite all the sample code in Perl 6 as you read along.

 

Starting The Chatbot With generateResponse

 

We’re finally finished with all the design work, tests and background knowledge needed to really start the programming portion of this Let’s Program. So let’s get started!

 

The core of our system is going to be a function called “generateResponse”. It will be given strings representing user input and then generate a response based off of DELPHI’s pattern matching rules. It will then return this response to whatever piece of code called the “generateResponse” function in the first place.

 

Last post, as part of our test driven development, we created an empty “generateResponse” function for the testing software to talk to. Today’s goal is to fill that function in.

 

The Basics Of A Perl Function

 

One of the more unusual aspects of writing Perl functions is that you don’t have to explicitly list the arguments that will be passed to the function. Instead you are given the freedom to pass as many arguments as you want. Perl then packages those arguments into an array called “@_” for the function to use.

 

Here’s an example of what I mean:

 

In C you would define an addition function like this:

 

int add (int arg1, int arg2)
{
    return arg1 + arg2;
}

But in Perl you would do something more like this:

 

sub add{
    my $num1 = $_[0];
    my $num2 = $_[1];
    return num1 + num2;
}

or maybe even this

 

sub add{
    return $_[0] + $_[1];
}

 

Packaging arguments into an array allows for a lot of really cool tricks, like writing flexible functions that can sort or process unlimited numbers of arguments. But none of that really matters right now because “generateResponse” needs to be passed exactly one input. No flexibility needed (although it’s good to know that’s an option if we change our mind).

 

To function properly “generateResponse” will also need access to our list of DELPHI input patterns and response patterns. For now I’m just going to include those patterns inside of the function*. This will probably change later on, especially as the list starts getting bigger.

 

 sub generateResponse{
    my $userInput = $_[0];
    my @chatPatterns;

    $chatPatterns[0][0]=qr/\AIs ([a-zA-Z]+) (.+)\?\z/;
    $chatPatterns[0][1]="Fate indicates that UIF0 is UIF1";

    $chatPatterns[1][0]=qr/\AWhy (.+)\?\z/;
    $chatPatterns[1][1]="Because I said so";

    $chatPatterns[2][0]=qr/.*/;
    $chatPatterns[2][1]="I don't want to talk about that. Please ask me a question";

    #Pattern Processing Code Goes Here
}

Following along so far? We grab the string input argument with $_[0] and set up our array of regular expressions and output patterns. Don’t worry if the regular expressions still look like gibberish, I’ll go over it more in depth next time.

 

Looping And Comparing

 

We have the user input and the list of patterns to match. All that’s left to do is compare the input to the patterns until we find a match. Which makes this an obvious place for a foreach loop.

 

As for printing out the response, we actually already wrote code for that back in the experimental “Should we use Perl?” stage of this project. All it takes is a little modification for that code to fit perfectly.

 

sub generateResponse{
    my $userInput = $_[0];
    my @chatPatterns;

    $chatPatterns[0][0]=qr/\AIs ([a-zA-Z]+) (.+)\?\z/;
    $chatPatterns[0][1]="Fate indicates that UIF0 is UIF1";

    $chatPatterns[1][0]=qr/\AWhy (.+)\?\z/;
    $chatPatterns[1][1]="Because I said so";

    $chatPatterns[2][0]=qr/.*/;
    $chatPatterns[2][1]="I don't want to talk about that. Please ask me a question";

    foreach my $chatPattern (@chatPatterns){

        if(my @UIF = ($userInput =~ $chatPattern->[0])){
            my $response = $chatPattern->[1];
            for(my $i=0; $i<@UIF; $i++){
                my $find = "UIF$i";
                my $replace = $UIF[$i];
                $response =~ s/$find/$replace/g;
            }
            return $response;
        }
    }
    return "Base Case Failure Error!";
}

Let’s walk through this code really quick. The foreach grabs items out of the test pattern array and stores them in the $chatPattern variable. We then use an if-statement and the regex equals (=~) to try and match the regex pattern inside the first half of $chatPattern against the user’s input inside of $userInput.

 

If we don’t find a match then we start the loop over, grab the next pattern in the array and try again. If we go through the entire list without ever finding a match we warn the user that something has gone wrong since the pattern matching list should have at least one emergency base case that will match anything.

 

Things get a bit more complicated when there is a match. First, we pull some User Input Fragments of the input and place them in the @UIF array using some regular expression magic that I’ll cover in the next post. We then look for UIF keywords in our output pattern and replace them with the proper bits of user input. For example, if the output pattern has the string “UIF0” inside of it we replace it with the first entry in the @UIF array. This lets us create chatbot responses that include some of the same words as the user’s input. Once we’re finished searching and substituting we break out of the loop and return the now complete response string.

 

Breaking out of the loop is very important. Remember that DELPHI is supposed to use prioritized pattern matching to break ties. We can achieve this by placing high priority items near the beginning of our array and then stopping the pattern search the first time we find a match. This means that when an input matches more than one pattern it will naturally match against the highest priority pattern without ever even seeing the lower priority pattern it also would have matched.

 

The Sweet Smell of Successful Tests

 

Now that we’ve filled in generateResponse with some guts it’s time to run our tests…

 

Passed 2 out of 11 tests
Test Failure!!!

Two successes! More specifically the tests we are passing is the “nonsense” test and the basic “Yes/No” question test.

 

Input: Pumpkin mice word salad

Output: I don’t want to talk about that. Please ask me a question

Input: Is Perl a good choice for this program?

Output: Fate indicates that Perl is a good choice for this program

 

Anybody that was confused by the User Input Fragments idea during response generation should take a close look at the second test case. When the user asks DELPHI “Is Perl a good choice for this program?” their input gets split into fragments: “Perl” and “a good choice for this program”. We can then glue those into the output pattern of “Fate indicates that UIF0 is UIF1” to create an intelligent response of “Fate indicates that Perl is a good choice for this program”.

 

Conclusion

 

Believe it or not I’m now done with almost 50% of this project’s Perl code. Most of the real work on DELPHI will actually come from thinking up all the regular expressions that we’re going to use to pattern-match the user input.

 

Which is why next post is going to jump into regular expressions, analyzing the three rules I’ve already presented and writing one or two more.

 

 

* Bonus points** to anyone who can point out why declaring and populating a predictable array in the middle of a function that’s going to be called dozens or hundreds of times is a bad idea.

 

** Bonus points are not redeemable for cash or prizes and do not, in fact, exist.