Let’s Program A Chatbot 3: Choosing A Programming Language

How To Choose A Programming Language

 

Modern programming languages are 99% interchangeable. If you can do something in C you can also do it in Java, Lisp, Visual Basic, Python and so on. There are very few scenarios where you absolutely “need” to use a specific language.

 

But that doesn’t change the fact that every language has strengths and weaknesses. A program that would be difficult to write in Java might be easy to write in Python. A program that runs slow in Lisp might be easy to optimize in C.

 

But the language’s strengths and weaknesses aren’t the only thing you need to think about when starting a project. You, as a programmer, have strengths and weaknesses too. If you have ten years of experience with C++ but have never touched Ruby then odds are you should stick to C++, especially if you have a deadline coming up and can’t spare the time to learn a new language*.

 

So when trying to choose a programming language you need to ask yourself three questions:

      1. How well does this language match my problem?
      2. How comfortable am I with this language?
      3. How much time can I spare for learning new language features?

 

Sometimes you get lucky and find out that your favorite language is a perfect match for the problem you need to solve. Hooray!

 

But other times you’ll have to make a tough choice between a familiar language you know you can *eventually* succeed with and a less familiar language that has some really great features that would instantly solve all your problems if you could just get your code to stop throwing weird errors.

 

And sometimes the choice is so hard you just give up, eat a gallon of ice cream and decide to join an isolated community where computers are illegal and speaking jargon is punishable by death.

 

Perl: A Good Pattern Matching Language

 

With all that theory out of the way we can move on to choosing a language for our chatbot. Since our chatbot is going to be based primarily off of pattern matching we’re going to want a programming language that makes matching patterns easy. And pattern matching should make you think of regular expressions**. And regular expressions should make you think of Perl.

 

I can see a few of you getting a little nervous. Doesn’t Perl have a reputation for being a hard to read language? And aren’t regular expressions famous for causing more problems than they solve? Weren’t we supposed to choose a language we feel comfortable with?

 

Well don’t worry. I use both Perl and regular expressions at work and while I’m no guru I can at least get my code to work 9 times out of 10. Furthermore, I promise to write clean code and will do my best to avoid the Perl code shortcuts that are responsible for making it hard for newcomers to understand.

 

Side note: Although Perl and regular expressions work together really well I should point out that you can also use regular expressions with other languages. In fact, most languages have either built in support for regular expressions or easy to find regex libraries.

 

So if you like C# you can regex in C#. If you’re a Java guy you can regex in Java. Just because I chose Perl for my chatbot doesn’t mean you have to. In fact, porting my chatbot to your favorite language might be a fun exercise for beginning programmers looking for a challenge.

 

Although I suppose I’ll have to actually write this thing before anybody can port anything.

 

Proof of Concept: Can We Really Perl Up A Chatbot?

 

On paper Perl looks a really good pattern matching chatbot language. It has built in support for regular expressions, tons of text processing functions and cross platform support makes it easy to share code with other people (like you, my wonderful readers).

 

But I still feel a little twinge of doubt. Is this really a good idea? I figure the best way to find out is to write some Perl and see if I can make it do what I want.

 

Spoilers: The answer is yes. You can now safely skip to the next post without missing anything. But if you want to see the tests I hacked together to prove this, feel free to read on. Just don’t be surprised if the code is hard to follow. This isn’t production code or even reader education code, just a quick and sloppy experiment.

 

Test 1: Pattern Matching And Response Generation… in Perl

 

The core feature of our chatbot will be the ability to check whether or not the user’s input matches a specific pattern and then build an appropriate response. So that seems like the most logical thing to test first. And so here is my first test:

 

#! /usr/bin/perl

$testInput = "Is Perl a good choice for this program?";

if($testInput =~ /\AIs ([a-zA-Z]+) (.+)\?\z/){
   print "DELPHI: Fate confirms that $1 is $2\n";
}
else{
   print "Didn't work\n";
}

 

This also marks the first bit of code in the Let’s Program and OH MY WHAT IS WRONG THAT IF STATEMENT!?

 

Well, wonderful reader, that if statement happens to be a regular expression. I’ll talk about those more later on. For now just trust me when I say that that bizarre list of characters and symbols translates to “Match a sentence that begins with ‘Is’, ends with ‘?’ and has at least two words in between them”.

 

That regular expression also gives us a copy of the words that it found between the ‘Is’ and ‘?’, which we then slip into the output. That’s what the symbols $1 and $2 are doing.

 

Don’t worry if that didn’t make sense. This is just a test. I’ll explain things more in depth when I start actually programming the chatbot. For now the important thing is that running this program produces this output:

 

DELPHI: Fate confirms that Perl is a good choice for this program

 

Test 1 is a success. We managed to write Perl code that matched user input and transformed it into an appropriate chatbot response.

 

Test 2: Can We Make A List Of Regular Expressions… In Perl?

 

Now we know that Perl can help us match user input to one pattern. But for our chatbot we’re going to need to try and match the user’s input against at least a dozen different patterns. Is there an easy way to do this or is our program going to turn into a giant pile of if and elsif? Time to find out:

 

#! /usr/bin/perl

$testInput = "Is Perl a good choice for this program?";
$testInput2 = "Why is Perl a good choice for this program?";

$inputPatterns[0]=qr/\AIs ([a-zA-Z]+) (.+)\?\z/;
$inputPatterns[1]=qr/\AWhy (.+)\?\z/;

if($testInput =~ $inputPatterns[0]){
   print "DELPHI: Fate confirms that $1 is $2\n";
}
else{
   print "Didn't work\n";
}

if($testInput2 =~ $inputPatterns[1]){
   print "DELPHI: Because I said so\n";
}

if($testInput1 =~ $inputPatterns[1]){
   print "This shouldn't match!\n";
}

if($testInput2 =~ $inputPatterns[0]){
   print "This shouldn't match either!\n";
}

 

Once again, don’t worry if you didn’t catch all that. In this test I basically just stored the regular expressions inside an array instead of writing them directly inside of the if statements. If this works then we can write our chatbot with a nice, clean pattern matching loop instead of endless if statements. But does it work?

 

DELPHI: Fate confirms that Perl is a good choice for this program
DELPHI: Because I said so

 

Success!

 

Test 3: Connecting Output Patterns To Input Patterns… In Perl!

 

Last test proved that we can move our regular expressions out of the if statements and into a nice, clean array. Can we do the same thing with our responses? Here goes nothing…

 

#! /usr/bin/perl

$testInput = "Is Perl a good choice for this program?";
$testInput2 = "Why is Perl a good choice for this program?";

$chatPatterns[0][0]=qr/\AIs ([a-zA-Z]+) (.+)\?\z/;
$chatPatterns[0][1]="DELPHI: Fate confirms that $1 is $2\n";

$chatPatterns[1][0]=qr/\AWhy (.+)\?\z/;
$chatPatterns[1][1]="DELPHI: Because I said so\n";

if($testInput =~ $chatPatterns[0][0]){
   print $chatPatterns[0][1]
}

if($testInput2 =~ $chatPatterns[1][0]){
   print $chatPatterns[1][1];
}

 

Which produces this output:

 

DELPHI: Fate confirms that is
DELPHI: Because I said so

 

Uh oh. Everything matched up properly but something went wrong with the response generation. I was actually expecting this. I want to build DELPHI’s responses using information from the user’s input, but the response array is being built before the user gets a chance to say anything.

 

So if I want to store response patterns in an array I’m going to need to add a little extra code in order to splice the user’s input into the response after it is pulled out of the array but before it gets printed to the screen. Hmm… let’s try this:

 

#! /usr/bin/perl

$testInput = "Is Perl a good choice for this program?";
$testInput2 = "Why is Perl a good choice for this program?";

$chatPatterns[0][0]=qr/\AIs ([a-zA-Z]+) (.+)\?\z/;
$chatPatterns[0][1]="DELPHI: Fate confirms that UIF0 is UIF1\n";

$chatPatterns[1][0]=qr/\AWhy (.+)\?\z/;
$chatPatterns[1][1]="DELPHI: Because I said so\n";

if(@UIF = ($testInput =~ $chatPatterns[0][0])){

   $response = $chatPatterns[0][1];
   for($i=0; $i<@UIF; $i++){
      $find = "UIF$i";
      $replace = $UIF[$i];
      $response =~ s/$find/$replace/g;
   }

print $response;
}

if(@UIF = ($testInput2 =~ $chatPatterns[1][0])){

   $response = $chatPatterns[1][1];
   for($i=0; $i<@UIF; $i++){
      $find = "UIF$i";
      $replace = $UIF[$i];
      $response =~ s/$find/$replace/g;
   }

print $response;
}

 

You’re still not allowed to panic, this is just a test. What I’ve basically done is change the code to generate a list of individual pieces from the original input (Which I call User Input Fragments or UIF). When a match is found the program uses a special type of regex to find every place that the input has a special UIF word and then replace it with data from the actual input.

 

Don’t look at me like that. I said I’ll explain it better later. Just wait one or two more posts. For now the important thing is that running my new test code produces this beautiful output:

DELPHI: Fate confirms that Perl is a good choice for this program
DELPHI: Because I said so

 

Success! I can store responses in an array right alongside the input patterns they are related to. This means that I can teach the chatbot new conversation tactics by just adding new patterns to the master array. No need to write new code!

 

Conclusion

 

Our test have all passed and the language of this Let’s Program is going to be Perl. With that final piece in place we can finally jump into some actual coding. Are you excited? I’m excited!

 

 

Please be excited.

 

 

* I once failed a mildly important college project because I decided it would be fun to code everything in a new language that I knew almost nothing about. By the time I realized I would have been better off sticking with a language I knew it was too late. Don’t let the same thing happen to you!

 

** Regular Expressions are a sort of miniature programming language that specialize in pattern matching. They are a powerful tool for all sorts of text analysis programs.