» Let’s Program A Chatbot 14: Variety Is The Spice Of LifeScott Cornaby

Breaking All Our Tests

Today I’m finally tackling the last item on our chatbot wish-list: Randomized responses. This will give DELPHI the ability to make both yes and no predictions for all questions. Even better, a wide variety or randomized responses will keep DELPHI from repeating itself too often and help make it feel human. Nothing says “computer” quite like repeating the same line again and again. Nothing says “computer” quite like repeating the same line again and again.

Unfortunately this randomness is going to completely break all the automated tests we spent so long satisfying. After all, the fundamental idea behind all of our tests is that every possible user input has exactly one right answer. When the user says “A” the response should always be “B”. But adding a little randomness throws this idea out the window. How are we supposed to test a program that sees the input “A” and sometimes says “B” and sometimes says “C” and sometimes says “D”?

This is one of the great weaknesses of automated testing: It doesn’t work so good with uncertainty.

One possible solution would be to build a much more flexible testing suite. Something that can match one input to multiple possible outputs. If there are three random “good” answers to input A then we consider the test to have passed if we see any one of them. It wouldn’t even be too hard to program. Probably just a lot of “or” statements or maybe a loop that returns “true” as soon as it finds at least one match.

But this may not scale very well. Writing tests to handle a small amount of randomness probably isn’t too bad. You just type in your one input and all three possible good outputs and you’re done. But if you have dozens or even hundreds of potential outputs… well, you probably don’t want to maintain that sort of testing code by hand.

So instead of finding a way to test our randomness I’m just going to provide a mechanism to turn the randomness on and off. This way I can just turn the random responses off and all of the old test cases will still work and I can continue adding new tests the easy way: one input paired with one expected output.

Nested Arrays Are Good For Everything!

That’s enough talk about testing. Time to focus on how we’re going to make DELPHI more random. The chatbot already has a system for associating input patterns with output patterns. All we need to do now is adjust it to associate one input pattern with multiple possible input patterns.

My clever readers* probably remember that DELPHI uses a multi-dimensional array to keep track of which response pattern matches each input pattern. Every top level item in the array represents a different chatbot rule/response pair. Each rule is then divided into a two-item array where the first item is a matching rule and the second item is a response rule.

In order to add some randomness to the system we’re going to replace the singular response rule in slot number 2 with yet another array, this one holding a list of all responses we want to generate. For example, here is what the “catch all” rule looks like after I replaced the single response with a three item array.

push(@chatPatterns,
   [qr/.*/,
      ["I don't want to talk about that. Please ask me a question",
       "I'm confused. Try a simple question instead",
       "I'm really good at yes no questions. Try one of those"]
   ]);

Everbody see what we’re doing here? @chatPatterns is the first array. Inside of it we’re pushing a second array where the first item is the input matching regex /.*/ and the second item is a third array that holds three possible responses.

Eventually we’ll probably want to flesh out DELPHI by attaching a dozen or so responses to every input rule. But for starters let’s just stick to two or three variations for each rule. That should be enough to make sure that our basic random algorithm works like it should.

Ready for a massive code dump?

Random Response Set 1

This code should replace the old @chatPatterns code:

my @chatPatterns;

push(@chatPatterns, 
        [qr/[a-zA-Z]+ or [a-zA-Z]+.*\?\z/,
            ["Fate indicates the former",
            "I have a good feeling about the later"]
        ]);

push(@chatPatterns, 
        [qr/\ADo (.+)\?\z/, 
            ["Fate indicates that UIF0",
            "I don't think that UIF0",
            "Athena doesn't think so"]
        ]);

push(@chatPatterns, 
        [qr/\ADoes ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/, 
            ["Fate indicates that UIF0 does UIF1",
            "The spirits whisper \"UIF0 does not UIF1\""]
        ]);

push(@chatPatterns, 
        [qr/\AIs ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/, 
            ["Fate indicates that UIF0 is UIF1",
            "The stars are clear: UIF0 is not UIF1"]
        ]);

push(@chatPatterns, 
        [qr/\AWill ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/, 
            ["I predict that UIF0 will UIF1",
            "Based on these tea leaves it seems UIF0 will not UIF1"]
        ]);

push(@chatPatterns,
        [qr/\AWhy (.+)\?\z/,
            ["Because of reasons",
            "For important cosmic reasons"]
        ]);

push(@chatPatterns, 
        [qr/\?/,
            ["I'm sorry, could you try rewording that?",
            "Was that a question?"]
        ]);

push(@chatPatterns, 
        [qr/\A(Why|Is|Are|Do|Does|Will)/,
            ["Did you forget a question mark? Grammar is important!",
            "If you're asking a question, remember to use a question mark"]
        ]);

push(@chatPatterns,
        [qr/.*/,
            ["I don't want to talk about that. Please ask me a question",
            "I'm confused. Try a simple question instead",
            "I'm really good at yes no questions. Try one of those"]
        ]);

Optional Randomness Through Optional Arguments

Now that we have multiple possible responses inside of every single rule we’re going to need to update generateResponse. The first step is to get it to pull responses out of an array instead of reading them directly. After that we’ll also need to write code to randomize which response gets pulled out of the array in the first place.

Also, if we want DELPHI to be random with humans but predictable with tests we’re going to need some way to let DELPHI know when to be random and when to be boring. The simplest way to do this is to just add a second argument to generateResponse. The first argument will still be the user’s input but now we’ll use the second argument to decide whether to choose a random response or just stick to the first response in the array.

But that’s enough about that. I’ll just let the code speak for itself now:

sub generateResponse{
    my $userInput = $_[0];
    my $beRandom = $_[1];
    $userInput = switchFirstAndSecondPerson($userInput);

    foreach my $chatPattern (@chatPatterns){

        if(my @UIF = ($userInput =~ $chatPattern->[0])){
            my $response;
            if($beRandom){
                $numberOfResponses = scalar(@{ $chatPattern->[1] });
                $response = $chatPattern->[1][rand $numberOfResponses];
            }
            else{
                $response = $chatPattern->[1][0];
            }
            for(my $i=0; $i<@UIF; $i++){
                my $find = "UIF$i";
                my $replace = $UIF[$i];
                $response =~ s/$find/$replace/g;
            }
            return $response;
        }
    }
    return "Base Case Failure Error!";
}

I’m sure everyone can see the basic flow of this updated function. We use the $beRandom variable to help us decide which response pattern to use.

There is a little trickiness to the Perl I used for choosing random responses. The goal is to figure out how many responses are attached to the current rule and then choose one of them at random.

I start by extracting the response array from second half of the chat pattern with $chatPattern->[1]. But it turns out that this is actually just an array reference, not a true array, so before we can use it we have to turn it back into an array with the @{ $reference } syntax.

Finally I extract the length of the response array. In Perl you can get the length of an array by trying to force the array into a single value variable. You can also force this to happen by using the scalar function. Technically this was unnecessary since $numberOfResponses will force the response array to act as a scalar with or without the keyword, but I figured some of my less Perl inclined readers might appreciate having the switch from array to single number pointed out.

After we have the length of the response array everything else is easy. We use rand $numberOfResponses to generate a fraction somewhere between 0 and the maximum number of responses, which then gets automatically rounded down when we use them as an array index.

Updating Our Tests And User Interface

Thanks to Perl’s flexible function calls it turns out we don’t have to do anything at all to get our tests working with the new generateResponse code.

When we call generateResponse with only one argument the function can still reference the second argument with $_[1], it will just be filled with the “undefined” value. And since “undefined” is the same as “false” as far as if statements are concerned not including a $beRandom argument is the same as announcing we don’t want any randomness.

Updating the “chat.pl” user interface to include randomness is almost as easy. All we have to do is pass some sort of “true” value as the second argument to generateResponse. I like to use the value “1”, like this:

my $response = DELPHI::generateResponse($userInput,1);

Automatic And Manual Testing

With our code complete the first thing we should do is run our automatic tests to make sure that the non-random version of our test system still works like we want it to:

Passed 14 out of 14 tests

All Tests Passed!

Good. We didn’t break anything important while adding our new features. Now to boot up chat.pl and see if the random response system is working properly:

DELPHI: The oracle greets you. What questions are on your mind?

>Is DELPHI random now?

DELPHI: Fate indicates that DELPHI is random now

>Is DELPHI complete?

DELPHI: The stars are clear: DELPHI is not complete

>Why isn’t DELPHI complete?

DELPHI: For important cosmic reasons

>Goodbye

DELPHI: Come back anytime

Looks good. That’s two “Is X Y?” questions in a row that both got different responses. For more thorough testing just keep hitting DELPHI again and again with identical input and make sure that DELPHI switches between the various responses.

Conclusion

I’m pretty sure that’s the last bit of Perl in this Let’s Program. We can match input, generate responses and DELPHI works fine with both our test program and our user interface.

Future improvements to DELPHI will probably have much more to do with writing new rules and responses than coming up with new algorithms. In fact, my next post is going to focus entirely on the art of writing computer responses that will convince users your program is almost human.

* That’s all of you.