» Let’s Program A Chatbot 4: Let’s Talk About Test Driven DevelopmentScott Cornaby

Get it? Let’s “talk” about test driven development? Because we’re designing a chatbot. It’s funny*! Yes? No. Okay. Moving on.

What Is Test Driven Development?

Testing is the process of proving that a program can do what it is supposed to do without crashing or generating incorrect output. Good tests also help you find bugs in your programs before your users do. It is very embarrassing to deliver a piece of software that freezes your customer’s computer the first time they start it up. So testing is definitely important.

Software testing is an art unto itself, but the general idea is to come up with a list of sample program inputs that match what you expect real users to try. Then you figure out, by hand, what the program should do for each of those inputs. Finally you feed your inputs into the program one item at a time and double check that the computer does the right thing.

For example, suppose you are programming a bank calculator that figures out monthly payments for car loans. You set up your first test by talking to an accountant and finding out that a $10,000 loan should have a $300 monthly payment. So you feed $10,000 into the loan calculator and make sure it answers correctly. If it doesn’t generate the correct $300 payment then you know you have a bug.

Once you have run the test and fixed any bugs that show up you move on to your next test by talking to your accountant again and generating a new test case. Maybe your bank doesn’t give loans for more than $100,000 at a time so the calculator should return a “Too Large Loan” warning if the user asks for $150,000. So you go back to your calculator, plug in $150,000 and then make sure it prints the warning.

Then it’s back to your accountant, boss or customer for a few dozen more tests to run.

You might have noticed that this sounds really boring and tedious. Who wants to spend an hour feeding input into a program and then going over the output line by line looking for bugs? I don’t!

That’s where automated testing comes in. Instead of running your tests by hand you write a new test program that knows how to talk to your software. You then give your test input and expected output to the test program and let it run all the tests for you. It feeds the input to your program, checks the output for accuracy and then prints up a pretty report letting you know if there were any problems. You still have to come up with the tests on your own, but at least you don’t have to run them.

Automated testing can run thousands of tests with a single click. It’s easier than testing by hand. It’s faster than testing by hand. It’s more accurate than testing by hand. It’s much much less boring then testing by hand. The only real weakness is that it’s hard to automate UI testing or certain types of database driven programs.

You Still Haven’t Mentioned What Test Driven Development Is

Oh, right. My bad. I was having too much fun talking about automated software testing.

Test Driven Development is just the idea that you should setup your automated testing software before you start writing your actual program. You should then run your automated test at least once per day so you can keep track of exactly how much progress you’re making.

It is called “test driven” because the tests are the main driver and motivator of your software project. Your first goal is to write good tests and then the rest of your project focuses on writing code that can pass those tests. This is the opposite of code first development where your first goal is to write your program and only then do you start worrying about how to test it.

Of course, writing the tests before you write the software to be tested means that you are going to be seeing a lot of “errors” the first few times you run your tests. In fact, a test that doesn’t show 100% errors on a blank program probably has a few errors of its own**.

But the 100% error stage doesn’t last long. Once you know your testing software works you can start writing your actual program and before you know it you’ll pass your first use case and change from 100% failure to 1% success. And then you just keep writing and testing your software until the tests finally return zero errors. At that point you can feel very confident that your program really works, that you didn’t forget any features and that your code is as bug free as possible.

Why Would I Want To Use Test Driven Development?

Automatic tests sound cool, but why would anyone build the test before the thing to be tested? Isn’t that a little backwards? Why bother writing a test if you know it’s going to just return 100% failure? Although not a good fit for ALL programming tasks there are several advantages to starting with tests:

First, it lets you catch mistakes as soon as you make them. If your code used to have a 60% success rate but your latest “improvement” dropped that down to 40% then you know there is a big bug somewhere in your most recent code. This makes it easy to find and fix the bug because you only have a few dozen lines to examine. If you had waited to test until your program was “done” you would have had to search the entire code base to find that bug.

Second, writing tests is a good way to double check that your design document is complete. Imagine that you are writing a test to make sure that the program can handle negative numbers in the input. You flip to the design document to look up the “negative input” use case and realize that you forget to decide what should happen. Whoops! Better go back and discuss that with your manager / customer / alter-ego before you go any further.

Third, testing can help schedule your programing. Not sure exactly what to program next? Just find a test that is failing and write the code it needs to succeed.

Finally, test driven development can give you an emotional boost by letting you see progress as it happens. Sometimes in software we can spend weeks writing code without feeling like any progress is being made. This is especially bad if your boss also thinks progress isn’t being made. Having a set of automated tests lets you watch the completion rate climb with every new function and gives you something to show management. “Sure, the user interface is still incomplete but these tests show that we have made significant improvement in the invisible database layer.”

Tools For Tests

Testing software brings with it all the questions associated with normal software. Should you program your own testing suite or use an existing tool? Open source or proprietary? Do you need a powerful tool with all the bells and whistles or will a simple testing tool be enough? Do you want your testing tool to integrate with your programming environment or be a standalone program?

You get the idea. Lots of options to fit every coding scenario you run into.

As for this Let’s Program, we probably don’t need much power. DELPHI is going to be a very simple program that does nothing but listen to user input and then generate output. So instead of messing around with existing testing tools I’m just going to write my own mini-test. Shouldn’t take more than an hour.

The DELPHI Test Suite

As I mentioned, DELPHI only does one thing: process user text input and generate response text. So to test DELPHI all we need to do is come up with a list of sample input along with the DELPHI response we hope to get. Then we just cycle through all the input and raise a warning every time DELPHI says the wrong thing.

Doing the same thing again and again on slightly different pieces of data suggests our test program should involve some sort of loop. And for the sake of convenience it would be great if we could put all the test input and expected responses into a big list.

After thinking about that a little I came up with the idea of putting all of the input/response pairs into one big two dimensional array; basically a two column table where every row will be a different test. The first item in each row will be the sample input and the second item will be the expected response.

Now we can run all of our tests from inside of a single loop. I’ll be using a foreach loop that will run our test code once for every single row in our test array.

Inside the testing loop I will ask DELPHI to come up with a reply based on the input from the current test row. I’ll then compare that response to the expected response from the test row. If they’re the same I’ll tally up a success for DELPHI. If they’re different I’ll print a nice error message to the screen that lets me know which input failed, what response I was expecting and what DELPHI actually said.

With that background knowledge even a non-Perl programmer should be able to make sense of the following test code. A few Perl tricks to look out for though:

“use strict” tells the compiler to yell at me if I bend the rules. Without it Perl will let you get away with bad code. Ignoring strict is useful for quick experiments, but on a serious project you always want “use strict”
$ indicates a variable with only one value, like a number or string or an individual item in an array
@ indicates an entire array
You’ll notice that I create the array with the @ symbol and then switch to the singular $ syntax when filling it’s individual members with data. This is because individual array slots only have one value
In Perl strings and numbers have different comparison operators. ‘ne’ is the string version of ‘!=’
You might notice that within the for loop I access array values with $test->[0] instead of just $test[0]. This is because $test is actually a reference to an array instead of being a true array. Don’t worry about it too much.

With that Perl trivia out of the way here is Version 1.0 of the DELPHI Tester:

#! /usr/bin/perl -w

use strict;

my @testCases;

$testCases[0][0] = "Will this test pass?";
$testCases[0][1] = "I predict that this test will pass";

$testCases[1][0] = "Is the sky blue?";
$testCases[1][1] = "Fate indicates that the sky is blue";

$testCases[2][0] = "Does this program work?";
$testCases[2][1] = "Fate indicates that this program works";

$testCases[3][0] = "Do computers compute?";
$testCases[3][1] = "Fate indicates that computers compute";

$testCases[4][0] = "Do my readers enjoy this blog?";
$testCases[4][1] = "Fate indicates that your readers enjoy this blog";

$testCases[5][0] = "Is it better to be loved or feared?";
$testCases[5][1] = "Fate indicates the former";

$testCases[6][0] = "Why is natural language processing so hard?";
$testCases[6][1] = "Because of reasons";

$testCases[7][0] = "Pumpkin mice word salad?";
$testCases[7][1] = "I'm sorry, could you try rewording that?";

$testCases[8][0] = "Pumpkin mice word salad";
$testCases[8][1] = "I don't want to talk about that. Please ask me a question";

$testCases[9][0] = "Why do you say things like that";
$testCases[9][1] = "Did you forget a question mark? Grammar is important!";

my $testCount=0;
my $successCount=0;

foreach my $test (@testCases){
    my $output = generateResponse($test->[0]);
    if( $output ne $test->[1] ){
        print "Test Case $testCount Failed!!!\n";
        print "Input: ".$test->[0]."\n";
        print "Output: $output\n";
        print "Expected: ".$test->[1]."\n";
    }
    else{
        print "Test Case $testCount Passed\n";
        $successCount++;
    }

    $testCount++;
}

print "--------------------";
print "\n";
print "Passed $successCount out of $testCount tests\n";
if($testCount == $successCount){
    print "All Tests Passed!\n";
}
else{
    print "Test Failure!!!\n";
}

sub generateResponse{
    return "";
}

The ten test cases in this first test represent a pretty good sample of yes/no questions, either or questions, why questions and non-question input. I also tried to get a good mix of singular, plural, first person and third person questions. I’ll probably add a few more tests as the project continues and I realize new conversation patterns that need to be supported.

The First Run

Now that I have a test I should run it and make sure it works. Except that it obviously won’t.

Why not?

See that line inside the foreach loop where it asks for DELPHI to “generateResponse”? That function doesn’t exist yet so my test code won’t even compile.

The best way around this is to write a temporary place-holder function that will pretend to be DELPHI until we can write some actual DELPHI code. Place holder and prototype functions are the only bits of code you are allowed to write before your tests in Test Driven Development. For example, a test driven loan calculator would probably start out with an empty “caluculatePayment”.

Anyways, here is our DELPHI place holder.

sub generateResponse{
    return "";
}

This DELPHI dummy just responds with a blank string no matter what you say to it. Obviously worthless, but it gives the tests something to talk to and allows our code to compile. And now that the code compiles we can run our first test:

Test Case 0 Failed!!!

Input: Will this test pass?

Output:

Expected: I predict that this test will pass

Test Case 1 Failed!!!

Input: Is the sky blue?

Output:

Expected: Fate indicates that the sky is blue

Test Case 2 Failed!!!

Input: Does this program work?

Output:

Expected: Fate indicates that this program works

Test Case 3 Failed!!!

Input: Do computers compute?

Output:

Expected: Fate indicates that computers compute

Test Case 4 Failed!!!

Input: Do my readers enjoy this blog?

Output:

Expected: Fate indicates that your readers enjoy this blog

Test Case 5 Failed!!!

Input: Is it better to be loved or feared?

Output:

Expected: Fate indicates the former

Test Case 6 Failed!!!

Input: Why is natural language processing so hard?

Output:

Expected: Because of reasons

Test Case 7 Failed!!!

Input: Pumpkin mice word salad?

Output:

Expected: I’m sorry, could you try rewording that?

Test Case 8 Failed!!!

Input: Pumpkin mice word salad

Output:

Expected: I don’t want to talk about that. Please ask me a question

Test Case 9 Failed!!!

Input: Why do you say things like that

Output:

Expected: Did you forget a question mark? Grammar is important!

——————–

Passed 0 out of 10 tests

Test Failure!!!

We failed all the tests! Which means we succeeded! All those blank output lines show that the DELPHI placeholder is doing it’s job and the 100% failure rate means that our test code is correctly flagging mistakes for us.

Conclusion

Now that the testing framework is done the stage is set for starting to actually work on the chatbot. Finally, after four posts, we’re going to “Let’s Program A Chatbot” for real.

* Bad jokes are an important part of computer programming. If you can’t handle this you may want to consider a career in a different field. Like accounting.

** Yes, you have to test your test programs before using them. But please try to avoid falling into an infinite loop of testing the tests that test your tests for testing tests.