Let’s Program A Chatbot 8: A Little Housecleaning

Just Because It Works Doesn’t Mean It’s Good


I’ll be honest, I’m a neat freak. And our current code is starting to get a little messy. Keeping our chatbot code in the same file as all our test code is not neat. Creating our array of chat patterns inside of the generateResponse method is not good. So let’s take just a few minutes and fix both of those problems.


The first thing I’m going to do is separate the test code from the chatbot code by moving generateResponse into a file named DELPHI.pm. Then I’m going to wrap it all inside of a package called DELPHI, which is basically just a way to attach a prefix to a bunch of Perl variables and functions. This will transform generateResponse into DELPHI::generateResponse.


Then I’m going to clean up generateResponse by moving the array of input patterns and responses outside of the function. Now the chat array will be built, once, when the DELPHI.pm file is first referenced. This helps keep generateResponse small and readable; no matter how many rules we add to our chatbot generateResponse will always remain short and easy to understand.


Updating our tests to work with these changes is a pretty simple two line change. First, we need to tell it to load the DELPHI.pm file by adding this simple command near the top of our file:


require 'DELPHI.pm';


The second change is just as easy. Our test code used to directly reference generateResponse, which worked because they were both part of the same file and package. Now that generateResponse lives in it’s own package we’ll have to change the line:


my $output = generateResponse($test->[0]);


Now it needs to be:


my $output = DELPHI::generateResponse($test->[0]);


And that’s all there is to it. Now our test program knows to look for generateResponse inside of the DELPHI package that it grabbed out of the DELPHI.pm file.


Now for the moment of truth… did it work? Did we successfully clean our code without breaking anything? Sounds like a job for our automated tests.


Passed 4 out of 11 tests

Test Failure!!!


That’s a relief! Even after modifying all that code the tests still run, no errors are thrown and we’re passing and failing all the same tests we used to.


The Code So Far


I figure now is a good time for a complete code dump in case anyone wants to run my code themselves and see what I’m doing. On the other hand… maybe you’re from the future where I’ve already finished DELPHI and released the complete code. In that case this old incomplete DELPHI code might be worth a laugh:


First up is DELPHI.pm. Note the “1” at the end of the file. Perl expects all packages to end with something that evaluates to true and the number 1 is the traditional way to do this.


package DELPHI;

my @chatPatterns;

        [qr/[a-zA-Z]+ or [a-zA-Z]+.*\?\z/,
            "Fate indicates the former"]);

        [qr/\AIs ([a-zA-Z]+) (.+)\?\z/, 
            "Fate indicates that UIF0 is UIF1"]);

        [qr/\AWhy (.+)\?\z/,
            "Because of reasons"]);

            "I don't want to talk about that. Please ask me a question"]);

sub generateResponse{
    my $userInput = $_[0];

    foreach my $chatPattern (@chatPatterns){
        if(my @UIF = ($userInput =~ $chatPattern->[0])){
            my $response = $chatPattern->[1];
            for(my $i=0; $i<@UIF; $i++){
                my $find = "UIF$i";
                my $replace = $UIF[$i];
                $response =~ s/$find/$replace/g;
            return $response;
    return "Base Case Failure Error!";



And then here is test.pl


#! /usr/bin/perl -w

use strict;

require 'DELPHI.pm';

my @testCases;

$testCases[0][0] = "Will this test pass?";
$testCases[0][1] = "I predict that this test will pass";

$testCases[1][0] = "Is the sky blue?";
$testCases[1][1] = "Fate indicates that the sky is blue";

$testCases[2][0] = "Does this program work?";
$testCases[2][1] = "Fate indicates that this program works";

$testCases[3][0] = "Do computers compute?";
$testCases[3][1] = "Fate indicates that computers compute";

$testCases[4][0] = "Do my readers enjoy this blog?";
$testCases[4][1] = "Fate indicates that your readers enjoy this blog";

$testCases[5][0] = "Is it better to be loved or feared?";
$testCases[5][1] = "Fate indicates the former";

$testCases[6][0] = "Why is natural language processing so hard?";
$testCases[6][1] = "Because of reasons";

$testCases[7][0] = "Pumpkin mice word salad?";
$testCases[7][1] = "I'm sorry, could you try rewording that?";

$testCases[8][0] = "Pumpkin mice word salad";
$testCases[8][1] = "I don't want to talk about that. Please ask me a question";

$testCases[9][0] = "Why do you say things like that";
$testCases[9][1] = "Did you forget a question mark? Grammar is important!";

$testCases[10][0] = "Is Perl a good choice for this program?";
$testCases[10][1] = "Fate indicates that Perl is a good choice for this program";

my $testCount=0;
my $successCount=0;

foreach my $test (@testCases){
    my $output = DELPHI::generateResponse($test->[0]);
    if( $output ne $test->[1] ){
        print "Test Case $testCount Failed!!!\n";
        print "Input: ".$test->[0]."\n";
        print "Output: $output\n";
        print "Expected: ".$test->[1]."\n";
        print "Test Case $testCount Passed\n";

print "--------------------";
print "\n";
print "Passed $successCount out of $testCount tests\n";
if($testCount == $successCount){
    print "All Tests Passed!\n";
    print "Test Failure!!!\n";


A Note On Performance


My main goal here was to make the code easier to read and manage by separating my chatbot code from my test code and then further separating my response generation code from the response data. Not that my little 150 line script was actually that hard to read and manage in the first place. I probably could have safely procrastinated cleaning it up until I had a few dozen more test cases and a few dozen more rules.


But I was also curious if there would be any performance gain from moving the array creation outside of the generateResponse function. As things were the code had to rebuild the same nested array every time generateResponse was called. Sounds wasteful! Much better to move that array somewhere where it only has to be built once no matter how often generateResponse is called.


On the other hand, people who write compilers and interpreters* are really quite frightfully smart. One of their favorite optimization tricks is to find repetitive bits of code and then turn them into static data. Sometimes you can get away with something stupid like building a static array again and again and just rely on the compiler to fix it for you.


Which made me wonder: Would cleaning up generateResponse speed up my code or was the the system already optimizing my poor design decision? Let’s run some tests!


My particular test was pretty simple. I created a quick little file that included both version of generateResponse, one building the response array inside the method and one building the response array ahead of time. I then had both functions generate a response to a short phrase a million times ina row and kept track of how long it took for them to complete. Here are the results:


The method with array creation took 19.758241891861 seconds

The method without array creation took 9.73499703407288 seconds


So by moving array creation outside of the function we managed to double how fast the function runs. Apparently this particular mistake was too stupid for the compiler to fix on it’s own. Whoops.




A few quick changes and now our code is both cleaner and more efficient. Now to get back to adding new response patterns to DELPHI.



* Perl 5 is both interpreted and compiled depending on how you define interpret and compile.