Let’s Program A Swarm Intelligence 1: Introducing Particle Swarm Optimizers

Posted on January 21, 2014 by Scott

Welcome Back Dear Readers!

That last Let’s Program went pretty well. Built a nifty little chatbot and got some good reader feedback. So let’s give this a second go. Our topic this time? Swarm Intelligences!

What Is A Swarm Intelligence?

A swarm intelligence is a large-scale alien brain capable of controlling billions and billions of deadly warriors at one time. They grow stronger and smarter by absorbing the DNA and memories of other species. They are ruthless, efficient and unstoppable.

I’m Pretty Sure That Isn’t Right

Oh, sorry. Looks like my old Starcraft manual got mixed in with my research material. My bad. Good game though. I really ought to get around to buying the sequel. Rumor has it this one has 3D graphics and everything!

Anyways, let’s start over!

What Is A Swarm Intelligence?

The phrase “Swarm Intelligence” refers to a bunch of different problem solving algorithms that were all inspired by watching how real life animals make group decisions. Things like flocks of birds, schools of fish, colonies of ants and swarms of bees have all inspired their own types of swarm intelligence. While each swarm intelligence algorithm is different most are based around the idea of creating a large number of semi-independent problem solving programs that work together and share information. Why have just one AI working on your problem when you can have 100?

Of course, the fact that there are so many different kinds of swarm intelligences means we can’t really just program a generic “Swarm Intelligence”. We’re going to have to choose a specific kind. After a little research I decided that this Let’s Program is going to focus on a type of AI called a “Particle Swarm Optimizer”.

So for the rest of this series when I say “Swarm Intelligence” know that I’m probably referring to a “Particle Swarm Optimizer”. But always remember, there are lots of different kinds of swarm intelligence and they all work differently.

What Is A Particle Swarm Optimizer?

A “Particle Swarm Optimizer” is an optimizer that uses a particle swarm to do it’s optimizing.

That last sentence was very very accurate but it honestly wasn’t very useful. Let’s break it down a little further. First off let’s cover what an “Optimizer” is.

When we talk about “solving” a problem we usually think of finding one right answer. But not all problems have one right answer. Lots of real world problems actually have an infinite number of solutions that are all “right”. But even if they are all “right” some of those solutions will be better than others.

For example, imagine you run an oil refinery. By adjusting the refinery’s temperature and pressure you discover you can increase or decrease the amount of oil you refine per hour. Any temperature and pressure setting that refines enough oil to pay your bills is a “right” answer, but there are “better” settings that will refine more oil per hour and earn you enough cash to not just pay the bills but also give your employees a big bonus and buy yourself a Ferrari.

Obviously we aren’t going to settle for just any old “right” answer when there is a “better” answer and a Ferrari on the line. This process of trying to find better solutions even after finding one “right” answer is known as “Optimization” and there are lots of different ways to do it.

The “Particle Swarm Optimizer” approach to optimization is to basically make a whole bunch of random guesses at possible best solutions. You then look at which guess performed the best and use that information to adjust all of your guesses. You then see what the best solution from your adjusted guesses was and do the whole thing over again. After several thousand or million random guesses (depending on how much time you can spare) you should eventually have settled on one best guess.

But why is this called a “Particle Swarm Optimizer” and not an “Intelligent Mass Guessing Optimizer”? (Besides the obvious fact that the first name is much much cooler)

Well, let’s go back to our oil refinery example where we can adjust the temperature and pressure. Every guess we make will just be a temperate mixed with a pressure. Now let’s make a graph where the X axis is temperature and the Y axis is pressure. Now let’s mark down every guess we’ve made so far. You should end up with a big cloud of random dots.

Now every time we update our guess we also update our graph. The cloud of random dots will start to move around and eventually drift towards good answers. If you squint a little it will look like the dots are working together to explore new possible solutions. In fact, it will look like a big swarm of particles drifting through space and gathering around good solutions.

Hence “Particle Swarm Optimizer”.

The difference between a bunch of guesses and swarm of particles is just a matter of perspective

Particle Swarm Weaknesses

Now that you know how particle swarm optimizers more or less work you can probably start to see a few potential problems.

First off, particle swarms only work in situations where you can make random guesses and get an answer back immediately. Making millions of guesses isn’t a good strategy if you have to wait several hours or days to find out which guesses did well and which flopped. Usually this means that you can only use a particle swarm to optimize problems that you understand well enough to program into your computer.

For instance, if there is a complex equation explaining how temperature and pressure influence your hypothetical oil refinery you could use a particle swarm to optimize that equation. On the other hand if you’re really not sure how your oil refinery works* particle swarm optimization would be a really bad idea. Every time your program made a new guess you would have to physically adjust the refinery, take some physical measurements and then type them into your computer. Yuck.

So particle swarms only work with complete equations or with automated testing equipment that can perform experiments and report results really really fast.

Also, because particle swarms operate by making intelligent guesses there is no actual guarantee that they will find the “best” solution available. There is always a risk that the swarm will drift past the “best” answer and instead get attached to a “good” answer. This is actually a common problem with lots of AI techniques though, not just particle swarms. And as you’ll see in a few paragraphs this is actually less of a problem for particle swarms than it is for many other algorithms.

Why Use A Particle Swarm Optimizer?

So why would we want to use an AI that requires a full mathematical description of the problem we want to solve, especially if it can’t even guarantee a truly optimal solution? If we’ve already managed to reduce our problem to an equation can’t we just use calculus to find the true best value?

Excellent question, clever reader. If you are just trying to optimize a two variable equation you probably are better off using calculus and just outright solving the thing. Not every problem needs an AI.

But lots of problems that are too complicated to solve with calculus, or at least too complicated to solve quickly. Using calculus to solve a seventeen variable equation is really really hard. If some of the variables are dependent on each other it gets even harder. We’re talking about the sort of problem that would be worth a PhD.

But the particle swarm’s guessing approach works just as well in seventeen dimensions** as it does in two. It doesn’t really care how hard it is to derive or integrate an equation because it never has to do either of those things. It just guesses, checks and adjusts.

So you could spend ten years trying to solve an impossible calculus problem… or you could just plug it into a swarm intelligence, wait an hour or two and get an answer that’s probably close enough to the true optimum for all practical purposes.

But what if you are really worried about missing out on the true optimum? Well, particle swarms are less likely to miss true optimums than many other types of AI. This is because the particles start out spread all over search space. This increases the chance that at least one particle will notice really good data and let the rest of the swarm know that they are looking in the wrong place. A less swarmy algorithm that starts looking in the wrong place is more likely to just get stuck there.

The particles help each other avoid getting stuck on a “merely good” answer

Finally, particle swarms are useful because of their ability to handle “rough” data. Imagine a step function where you use one equation when temperature is below 100 degrees and another equation when it is above 100 degrees. Lots of problem solving algorithms will throw a fit when they hit that gap between equations. But since the particle swarm is just guessing and checking as it flies through problem-space it doesn’t really care that the data does really weird things around 100 degrees. It just notes whether the end result is better or worse than before and keeps going.

The particle swarm has no problem with rough data that would destroy a calculus based AI

Conclusion

Now you know what a swarm intelligence is and have been introduced to particle swarm optimization. Congratulations! You are now part of the less than 1% minority of the population that has ever bothered to study AI in any depth.

But why settle for that? Let’s take it to the next level by actually programming one, putting ourselves in the 1% of the 1% of the population that enjoys amateur AI programming. I have no idea if this is a particularly useful super-minority to be part of… but why let that stop us?

* Why did you buy an oil refinery if you didn’t understand how it worked? Invest more responsibly in the future!

** Have fun imagining a seventeen dimensional swarm of particles flying through seventeen dimensional space.

Different Types Of 70% Complete

Posted on January 9, 2014 by Scott

When building software, all else being equal, it is better for 70% of your features to be 100% functional than it is to have 100% of your features 70% functional.

That is all. Just something to think about when trying to plan a big software project.

Let’s Program A Chatbot: Index And Code

Posted on December 26, 2013 by Scott

Introduction

Have you ever wondered how chatbots work? Do you want to get some practice with regular expressions? Need an example of test driven development? Want to see some Perl code?

If the answer to any of those questions was “Yes” then you’re in luck because I happen to have just finished a little series of posts on writing pattern matching chatbots using Perl, regular expressions and test driven development. Enjoy!

Index

Let’s Program A Chatbot 1: Introduction And Theory

Let’s Program A Chatbot 2: Design Before You Code

Let’s Program A Chatbot 3: Choosing A Programming Language

Let’s Program A Chatbot 4: Let’s Talk About Test Driven Development

Let’s Program A Chatbot 5: Finally, Code!

Let’s Program A Chatbot 6: Don’t Fear The Regex

Let’s Program A Chatbot 7: To Be Or Not To Be, That Is The Use Case

Let’s Program A Chatbot 8: A Little Housecleaning

Let’s Program A Chatbot 9: The Grammar Police

Let’s Program A Chatbot 10: Chatting With The Bot

Let’s Program A Chatbot 11: Bad Adjectives

Let’s Program A Chatbot 12: When The Answer Key Is Wrong

Let’s Program A Chatbot 13: What’s Mine Is Yours

Let’s Program A Chatbot 14: Variety Is The Spice Of Life

Let’s Program A Chatbot 15: “ELIZA Effect” Should Be A Movie Title

Let’s Program A Chatbot 16: Testing On Live Subjects

Let’s Program A Chatbot 17: Blitzcode!

Let’s Program A Chatbot 18: A Bit Better Than Before

Let’s Program A Chatbot 19: Third Time’s The Charm

Let’s Program A Chatbot 20: What Next?

Complete Code

If you follow along with the posts you should be able to write your own chatbot from scratch. But if you don’t have the time for that or just want some reference code I have also provided my complete chatbot, user interface and testing suite: Complete Chatbot Code

Let’s Program A Chatbot 19: Third Time’s The Charm

Posted on December 21, 2013 by Scott

The Third (And Final?) Test User

Last time I tested DELPHI on a human it looked we were only a few modifications away from an acceptable 80% success rate. Now that we’ve made those modifications let’s take a look at a third conversation between the chatbot and another holiday visitor. It looks to me like we may have hit our goal:

DELPHI: HELLO! MY NAME IS DELPHI.

DELPHI: I CAN USE MY MYSTERIOUS POWER TO ANSWER YES OR NO QUESTIONS LIKE:

DELPHI: “WILL IT RAIN TOMORROW?” OR “DID I PASS MY LAST TEST?”

DELPHI: WHAT WOULD YOU LIKE TO KNOW?

>What color is the sky?

DELPHI: I’M SORRY, COULD YOU THINK OF A SIMPLER WAY TO ASK THAT QUESTION? MAYBE AS A YES OR NO QUESTION?

>Is the sky blue or purple?

DELPHI: THE FIRST OPTION. WAIT, NO THE SECOND. WAIT, WAIT, THE FIRST. I MEAN IT THIS TIME!

>Do you like sunsets?

DELPHI: ATHENA DOESN’T THINK SO, AND SHE’S THE GREEK GOD OF WISDOM!

>Is wisdom relative?

DELPHI: FATE INDICATES THAT WISDOM IS RELATIVE

>So Athena could be wrong?

DELPHI: THAT LOOKED LIKE A QUESTION, BUT I’M NOT SURE HOW TO RESPOND. MAYBE YOU COULD SIMPLIFY IT TO A YES OR NO QUESTION

>Is Athena wrong about sunsets?

DELPHI: ATHENA CAN’T BE WRONG ABOUT SUNSETS. THAT WOULD MESS UP THE CROSSWORD PUZZLE I’M WORKING ON

>Do you like crossword puzzles?

DELPHI: OF COURSE NOT. WHY WOULD YOU EVEN THINK THAT I LIKE CROSSWORD PUZZLES?

>Why are you doing a crossword if you don’t like it?

DELPHI: BECAUSE OF REASONS

>What reason?

DELPHI: I’M SORRY, COULD YOU THINK OF A SIMPLER WAY TO ASK THAT QUESTION? MAYBE AS A YES OR NO QUESTION?

>Do you often do things you don’t like?

DELPHI: ATHENA DOESN’T THINK SO, AND SHE’S THE GREEK GOD OF WISDOM!

>Does Athena do things she doesn’t like?

DELPHI: TRUST YOUR INSTINCTS ON THIS ONE

>Do you usually avoid answering questions?

DELPHI: OF COURSE NOT. WHY WOULD YOU EVEN THINK THAT I USUALLY AVOID ANSWERING QUESTIONS?

>Do you usually win arguments?

DELPHI: FATE INDICATES THAT I USUALLY WIN ARGUMENTS

>Do you believe in fate?

DELPHI: I DON’T THINK THAT I BELIEVE IN FATE

>Do you believe in soulmates?

DELPHI: OF COURSE NOT. WHY WOULD YOU EVEN THINK THAT I BELIEVE IN SOULMATES?

>Do you feel lonely?

DELPHI: I DON’T THINK THAT I FEEL LONELY

>What do you think?

DELPHI: THAT LOOKED LIKE A QUESTION, BUT I’M NOT SURE HOW TO RESPOND. MAYBE YOU COULD SIMPLIFY IT TO A YES OR NO QUESTION

>Do you value happiness?

DELPHI: I THINK IT SHOULD BE OBVIOUS THAT I VALUE HAPPINESS

>What makes you happy?

DELPHI: I’M SORRY, COULD YOU THINK OF A SIMPLER WAY TO ASK THAT QUESTION? MAYBE AS A YES OR NO QUESTION?

>Does food make you happy?

DELPHI: TRUST YOUR INSTINCTS ON THIS ONE

>Does talking to people make you happy?

DELPHI: FATE INDICATES THAT TALKING DOES TO PEOPLE MAKE I HAPPY

>Are you enjoying this conversation?

DELPHI: I SEE NO REASON WHY I WOULDN’T BE ENJOYING THIS CONVERSATION

>I like you!

DELPHI: I’M SORRY, I’M NOT SURE WHAT TO SAY ABOUT THAT. TRY A QUESTION LIKE ‘WHY IS THE SKY BLUE?’

>Why is the sky blue?

DELPHI: YOU KNOW, I’M REALLY NOT SURE EITHER. LET ME FORWARD THAT QUESTION TO MY SUPERVISOR

>Who is your supervisor?

DELPHI: THAT LOOKED LIKE A QUESTION, BUT I’M NOT SURE HOW TO RESPOND. MAYBE YOU COULD SIMPLIFY IT TO A YES OR NO QUESTION

>Is Scott your supervisor?

DELPHI: I THINK THAT SCOTT IS MINE SUPERVISOR, BUT YOU SHOULD PROBABLY DOUBLE CHECK WITH SOMEONE ELSE

>Is Scott a good supervisor?

DELPHI: FATE INDICATES THAT SCOTT IS A GOOD SUPERVISOR

>Does he compliment your performance?

DELPHI: MEH, HE PROBABLY DOES COMPLIMENT MINE PERFORMANCE, BUT I’M NOT REALLY INTERESTED IN THAT KIND OF STUFF

>Do you tell jokes?

DELPHI: I THINK IT SHOULD BE OBVIOUS THAT I TELL JOKES

>Tell me a joke.

DELPHI: I DON’T WANT TO TALK ABOUT THAT. PLEASE FEEL FREE TO ASK ME WHY

>Why?

DELPHI: I’M SORRY, COULD YOU TRY REWORDING THAT AS A YES OR NO QUESTION?

>Why don’t you want to tell a joke?

DELPHI: WHY NOT?

>Jokes are funny.

DELPHI: I’M SORRY, I’M NOT SURE WHAT TO SAY ABOUT THAT. TRY A QUESTION LIKE ‘WHY IS THE SKY BLUE?’

>goodbye

DELPHI: Come back anytime

Quick Analysis

This time around I’m going to count an answer as “good” if DELPHI either provided a correct response or if DELPHI complained and the user immediately thought of a more DELPHI compatible way to ask their questions. I’m only going to count an answer as “bad” if it has a huge grammar mistake or if the user seemed confused by DELPHI’s responses.

Going by this very rough metric I get a count of 30 “good” answers against 4 bad answers. That’s a whopping 88% acceptable performance. Not bad at all.

Overall the user did a good job of picking up quickly on what questions DELPHI could and couldn’t answer. The user occasionally explored new kinds of questions but returned to the tried and true formula every time an error appeared. The only real issue from a design perspective came near the end when DELPHI accidentally suggested it knew how to tell jokes, leading the user to try some new input that failed horribly.

This suggests that it might be a good idea to write a few more rules to help DELPHI talk about itself. A simple “Do you” version of our existing “Can you” rule would have been enough to avoid misleading the user about DELPHI’s joke telling capabilities.

But I’m going to leave that as an exercise for my wonderful readers. The only problems I’m going to try and solve are a few grammar issues I noticed.

My Quick Fix Is Mine

Two of DELPHI’s mistakes involved switching “your” to “mine”, resulting in awkward grammar like this:

DELPHI: I THINK THAT SCOTT IS MINE SUPERVISOR, BUT YOU SHOULD PROBABLY DOUBLE CHECK WITH SOMEONE ELSE

Obviously that should have been “my supervisor”. In fact, now that I think about it, “your” should always be swapped to “my”. It’s “yours” with an “s” that matches “mine”. We can fix this by updating the dictionaries we use to power switchFirstAndSecondPerson.

$wordsToPlaceholders[5][0]=qr/\byour\b/i;
$wordsToPlaceholders[5][1]='DELPHImy';

$wordsToPlaceholders[6][0]=qr/\byours\b/i;
$wordsToPlaceholders[6][1]='DELPHImine';

And of course here as some test cases to make sure the fix really fixed things

$testCases[22][0] = "Is that pizza yours?";
$testCases[22][1] = "FATE INDICATES THAT THAT PIZZA IS MINE";

$testCases[23][0] = "Is that dog your pet?";
$testCases[23][1] = "FATE INDICATES THAT THAT DOG IS MY PET";

$testCases[24][0] = "Is that pizza mine?";
$testCases[24][1] = "FATE INDICATES THAT THAT PIZZA IS YOURS";

$testCases[25][0] = "Is that dog my pet?";
$testCases[25][1] = "FATE INDICATES THAT THAT DOG IS YOUR PET";

Conclusion

With those two fixes DELPHI has now achieved an acceptable response rate of over 90%. That’s really impressive for a simple pattern matching program with no memory, no language parsing abilities and no sense of context. Even better, every user who worked with DELPHI admitted that they had fun with the experience and liked the silly random answers. That means we succeeded at our primary goal of creating an entertaining fortune teller.

So I’m done. Bye!

What’s that? You’re still not satisfied? Well then, maybe I can fit one more post into this Let’s Program and give you a few suggestions on how a hardcore reader could take this whole project to the next level.

Let’s Program A Chatbot 16: Testing On Live Subjects

Posted on December 9, 2013 by Scott

You Don’t Need To See The Latest Code Updates

Since last we met all I’ve done is add 25 common adjectives to DELPHI’s common adjectives list and write four more possible response patterns for every chatbot rule. These modifications didn’t involve any actual coding tricks and have made DELPHI too long to conveniently embed inside a blog post. I do promise to publish DELPHI’s complete code as soon as this Let’s Program is over but for today there’s nothing worth showing. Those of you following along at home can feel free to write your own response patterns and add your own common adjectives.

Automated Testing Alone Is Not Enough

Our automated tests made it really easy for us to keep track of whether or not DELPHI was doing what we, the programmers, wanted it to do. And that’s very important! It’s hard to write good software if you don’t have some way of keeping track what your goals are and which you have and haven’t met.

But just because a program satisfies the programmer’s list of goals doesn’t mean it will satisfy the customer’s list of demands. Real world users almost always have items on their wish lists that we programmers completely overlooked.

Test users also help us programmers avoid blind spots in our testing. When a developer tries to write tests for his own code he will subconsciously tend to avoid test cases that he knows will break the program. One common example is that we programmers usually forget to write a test case for blank input because accidentally typing in blank input is the sort of mistake we don’t usually make. That makes it easy for us to forget that the problem exists at all, which can lead to fragile programs that break apart the first time a non-programmer hits enter with no input either on accident or out of curiosity. Having a test user break your program like this is much better than waiting for a customer to make a mistake you overlooked.

So there’s really no way around it: We need to get other human beings to test out our programs and find the bugs we forget to include in our automated tests. We can then use their human feedback to set new goals for the next version of our program and to help write new use cases for our tests.

Once we have new and improved tests it’s back to modifying our code until we’ve once again reached a 100% pass rate. Then we find a new test user and have them poke our program until they find another problem and two. And of course that leads to more use cases which can lead to new tests and new software features. Rinse and repeat this cycle until the software finally reaches a level of quality that everyone feels comfortable with (or until you run out of time and have to give up / publish whatever you have). That’s basically the entire test driven development cycle.

DELPHI’s First Human User*

It’s the holidays now and that means friends and family as far as the eye can see. If you’re anything like me, people you haven’t seen for months are suddenly queuing up at your door to “Wish You A Merry Christmas And A Happy New Year”. These lovely people represent a valuable resource to every hobbyist programmer: A large pool of test users that will be more than happy to spend five minutes playing with your latest project.

So for this blog post I grabbed a handy holiday well-wisher and dropped them in front of DELPHI with no explanation of what it did or how it works. This is the conversation they had:

DELPHI: The oracle greets you. What questions are on your mind?

>How much longer is Xenoblade Chronicles**?

DELPHI: I’m sorry, could you try rewording that?

>Are there more than 10 hours left of Xenoblade Chronicles?

DELPHI: I’m sorry, could you try rewording that?

>What kind of questions can you answer?

DELPHI: I saw a question mark, but I didn’t understand the question. I’m very sorry

>Is a yes or no question better?

DELPHI: Let me flip a coin… the second one

>Should I have chicken or beef for dinner?

DELPHI: The later, unless it’s actually the former

>How many days until Christmas?

DELPHI: I saw a question mark, but I didn’t understand the question. I’m very sorry

>What is the weather like?

DELPHI: I’m sorry, could you try rewording that?

>Is it snowing outside?

DELPHI: Hmmm… echoes of cosmic wisdom show that it is not snowing outside

>Will it snow tomorrow?

DELPHI: it will snow tomorrow, assuming the mole people don’t invade next Tuesday

>Was school cancelled for tomorrow?

DELPHI: I’m sorry, could you try rewording that?

>Will there be school tomorrow?

DELPHI: I lost my tarot cards… but I have some playing cards. They predict that there will be school tomorrow

>What is 4 + 5?

DELPHI: I’m sorry, could you think of a simpler way to ask that question?

>What is the sum of 4 and 5?

DELPHI: That looked like a question, but I’m not sure how to respond

>Can you do math?

DELPHI: I saw a question mark, but I didn’t understand the question. I’m very sorry

>Do I go to high school or middle school?

DELPHI: Fate indicates the former

>What is today’s date?

DELPHI: I’m sorry, could you try rewording that?

>Is today Sunday?

DELPHI: I think that today is Sunday, but you should probably double check with someone else

>Is tomorrow Monday?

DELPHI: tomorrow can’t be Monday. That would mess up the crossword puzzle I’m working on

>Is it the 8th?

DELPHI: I see no reason why it wouldn’t be the 8th

>What month is it?

DELPHI: I’m sorry, could you think of a simpler way to ask that question?

>Is it december?

DELPHI: it can’t be december. That would mess up the crossword puzzle I’m working on

>Is it december or january?

DELPHI: Let me flip a coin… the first one

>Is it Sunday and the 8th?

DELPHI: it can’t be Sunday and the 8th. That would mess up the crossword puzzle I’m working on

>Will it be the 9th or he 10th tomorrow?

DELPHI: Whatever you think is correct

>I believe the 9th is correct. Am i correct?

DELPHI: That looked like a question, but I’m not sure how to respond

>Did I miss any responses?

DELPHI: That looked like a question, but I’m not sure how to respond

>goodbye

DELPHI: Come back anytime

Wow! DELPHI did a pretty bad job holding up it’s end of the conversation. On the other hand, it actually did pretty good for a chatbot with only a handful of rules and only a few weekends of development work. So even though DELPHI has pretty poor performance I’m still happy with it’s performance to effort ratio.

What Did We Learn

Time to put on our thinking caps and analyze exactly what went wrong and what went right in this DELPHI test run. Those of you following along at home might want to break out some paper and jot down your own thoughts before reading my conclusions.

Of course, if you’re a programmer you probably have a high reading speed and the ability to recognize the words on your screen even when you aren’t directly looking at them. So you’ve undoubtedly already absorbed at least one or two of the conclusions I’ve written about below. Just think of it as getting a hint on how to start your own list.

BAD: DELPHI Introduction Doesn’t Give Good Enough Instructions

Since users never read the manual (and DELPHI doesn’t have a manual to read anyways) it is very important for DELPHI to provide gentle guidance on the proper way to ask it questions. And I think it’s fair to say I completely failed at this.

I probably should have warned the user to stick to YES/NO questions in the original prompt. Instead I just invited them to ask whatever was on their mind and got an open ended question about the play-time of a video game my user was interested in. Since it wasn’t a yes no question DELPHI gave up. I also could have done a better job of having DELPHI’s confused messages suggest better question formats. Constantly telling the user that DELPHI doesn’t know how to answer their question doesn’t do any good if I’m not also giving hints on what questions they should be asking.

Fortunately the user was pretty clever and figured out on their own that switching their question to a YES/NO format might help. Unfortunately this lead to our next error.

BAD: DELPHI Can’t Handle Plural And Past Tense Versions Of It’s Rules

The user’s second question should have been easy. After all, it was just an “Is X Y?” question and that was one of the first rules we ever wrote.

>Are there more than 10 hours left of Xenoblade Chronicles?

Unfortunately it turns out that DELPHI only has rules specifically for “Is” and doesn’t have nearly enough brainpower to recognize that “Are” should use the same kind of rule. DELPHI also had difficulty later on when the user went first person and tried an conjugated “Is” into “Am”. There were similar problems with past tense conjugations; DELPHI gave up on a “Was” question and a “Did” question even though logically they’re the same as “Is” and “Do”.

So it looks like we’re going to need to do some work buffing DELPHI up to work with a wide range of tenses and pluralizations: Is, Are, Am, Was, Were, Do, Does, Did.

BAD: DELPHI Doesn’t Know How To Talk About Itself

After their first two questions fell apart my clever test user asked an incredibly intelligent third question:

>What kind of questions can you answer?

Unfortunately that isn’t a pattern DELPHI knows how to respond to. Which is a shame because that would have been the perfect opportunity to slip a mini user manual into DELPHI’s output.

GOOD: Humor Made The User Curious

My test user spent a lot longer with DELPHI than I thought they would. When I asked them what they were doing they admitted they were trying to see how many different ways DELPHI could respond to the same type of question. They also explained that they were trying to come up with new types of questions just to double check they weren’t missing an entire group of sort-of-funny chatbot replies.

This means that even though my chatbot was very flawed it made up for those flaws by being interesting enough that the user wanted to keep playing with it to see what it would say and do next. Since DELPHI is basically a toy the fact that the user enjoyed playing with it is a huge success.

GOOD: 50% Success Rate

If you count up the instances where DELPHI gave a good answer to a question compared to when it gave a default confused answer you’ll find it had very close to a 50% success rate. You might argue that a number that low shouldn’t count as a good thing but I think it’s only fair to point out that DELPHI actually did manage to perform as expected in a wide variety of circumstances. No need to focus entirely on it’s mistakes.

I think it’s also interesting to note that the success rate seems higher in the second half of the conversation than the first. This suggests that the user eventually caught on to what kind of questions DELPHI handled best. So if I do a better job of explaining early on in the conversation that DELPHI prefers YES/NO questions the overall success rate should increase a lot.

Conclusion

As predicted DELPHI wasn’t quite ready for human contact. But it did better than I thought it would and now I have lots of data on what problem areas need to be tackled next. Expect my next post to be a rapid fire series of new test cases and the code to fix them.

* You might think I was DELPHI’s first human user, but I don’t count***.

** Xenoblade Chronicles is a Japanese RPG for the Nintendo Wii that has an epic, and rather long, plot. In retrospect it’s not the sort of thing one should try to speed-run during a holiday get together.

*** Because I programmed it. I wasn’t suggesting I don’t count because I’m not human. I’m totally a real human. Really

Let’s Program A Chatbot 15: “ELIZA Effect” Should Be A Movie Title

Posted on December 5, 2013 by Scott

What Is The ELIZA Effect?

The ELIZA Effect refers to the way that humans tend to think machines act and think like humans. We unconsciously like to believe that computers are actually intelligent, that robots have motives and that our favorite gizmos have emotions.

This human instinct to treat machines like people is named after the original ELIZA chatbot, a simple pattern matching program that pretended to be a psychiatrist by turning everything said to it into a question*. The scientist who designed ELIZA considered it nothing more than a clever trick and was surprised to find that many of the humans he had testing ELIZA started to develop emotional reactions towards it, some going so far as to claim that they felt like ELIZA really cared about the topics they were talking about.

Further studies pretty much proved that the ELIZA effect kicks in just about anytime a human sees a computer or machine do anything even vaguely unpredictable or clever. The moment a program does something the user can’t immediately explain he will begin to assume deep logic and complex motives are taking place, even when the “smart” behavior turns out to be nothing more than a ten line script with three if statements and a call to random(). Even after you show the user there is no real intelligence involved he will still tend to see bits of human personality the machine.

For example, just a few weeks ago the Internet was abuzz with stories of a “suicidal robot”, a Roomba vacuum cleaner that apparently was activated while the owners weren’t watching and then got stuck on a hotplate which eventually caused it to burn to “death”.

The interesting part of this story isn’t that a household robot glitched up and got stuck in a dangerous place. That happens all the time. The interesting part is that almost every human who talked about the story phrased it in terms of a robot making a decision to kill itself (a very human, if depressing, behavior). Even technical people who know better than to assign feelings and motivation to a circuit board couldn’t resist framing the event in human terms.

That’s the ELIZA effect.

Exploiting The Eliza Effect

So… humans like to think that other things behave like humans. That’s not really very surprising. Why should we programmers care?

We should care because we can use the ELIZA effect to hack people’s brains into liking our programs better. We can trick them into being patient with load times, forgiving of bugs and sometimes even genuinely loving our products.

Simple example: When Firefox is restarted after a crash it begins with a big “Well this was embarrassing” message that makes it feel like an apologetic friend that really is sorry that he forgot to save the last slice of pizza for you. It’s surprisingly effective at taking the edge of the frustration of suddenly getting kicked off a web-page.

The ELIZA effect is even more important for people who are specifically trying to write programs that mimic human behavior. Like game developers trying to create likable characters or chatbot designers trying to create a bot that is fun to talk to. For these people getting the ELIZA effect to activate isn’t just a useful side goal, it is their primary goal.

Wait a minute, aren’t WE amateur chatbot designers? I guess we should figure out how to integrate this idea into DELPHI.

Simulating Human Humor

In my experience people will forgive a lot of bad behavior as long as they are laughing. A good joke can break the ice after showing up late to a party and a witty one-liner can fix a lot of embarrassing social mistakes**.

That’s why rule #1 for writing DELPHI responses is going to be “make it quirky”. The funnier the responses DELPHI generates the less users are going to care about how precise and correct they are. Bits of strange grammar and weird phrases will be cheerfully ignored as long as the user is having a good time. And since humor is a very human trait this should do a lot to make DELPHI feel more like a real conversation partner and less like the big foreach loop it really is.

So don’t do this:

Fate indicates that your cat does secretly want to kill you.

Do this!

Let me check my Magic 8 ball(tm). Hmm… future looks cloudy so I don’t know if your cat does secretly want to kill you. Ask again later***

Apologize Profusely. For Everything.

This seems like a really good place for a joke about Japanese etiquette compared to American etiquette but I can’t think of anything funny right now. すみません

Anyways, I’ve noticed that when non-technical people have a computer problem one of the first things they always say is “I didn’t break it! It’s not my fault! It just happened on its own!”

This makes sense. People hate feeling like they are responsible for things going wrong. No one wants to take the blame for a broken machine or a program that stopped working. The only thing worse than a broken computer is a broken computer that is scolding you for breaking it.

So if your program is likely to break or get confused, and this simple chatbot certainly is, your top priority should be to reassure the user that the problem isn’t his fault. The problem is that your poor humble program couldn’t quite handle the user’s request and could the user pretty please try again? We really are very very sorry that this happened at all.

Also, apologizing is a very human behavior that will go a long ways towards hiding our dumb code behind an illusion of human intelligence.

So don’t do this:

I don’t recognize that as a question. Try again

Do this!

I’m sorry, I got confused. Could you ask your question again and keep it simple for me?

Teach Your User How To Be A Better User

This final design tip has less to do with the ELIZA effect and more to do with user psychology. The following tip is vitally important to anyone who wants to build user friendly software: Users never read the manual.

I don’t care how well documented your program is or how many full color screen-shots are included in your manual. 95% of your users are just going to start clicking on buttons and typing in words and then get frustrated if things don’t work the way they want them to.

In a perfect world we would solve this problem by convincing everyone in the world to do the responsible thing and read the entire user manual of every product they buy before they try to operate it. But we live in a broken and fallen world so we’re going to have be sneaky about this.

The goal here is that every-time the user causes an error or makes something strange happen we should slip them a quick tip on how to make sure that problem doesn’t happen again. This way we can feed them the entire manual one bite at a time until they finally figure out everything we wish they had known in the first place.

I’m sure you’ve seen this tactic before. Windows warning you that changing a file’s type can be dangerous. Google politely suggesting alternate searches when you don’t get many results. Video games slipping helpful tips into their loading screens. All are just ways to teach the user how to be a better user without every calling him a bad user or forcing him to read a book.

How do we incorporate this into a chatbot like DELPHI? Well, when we detect the user is having trouble we should not only be incredibly apologetic to make him feel safe and incredibly funny to make him feel relaxed, we should also try to show him how to better format his input.

So don’t do this:

I can’t understand what you’re saying

Do this!

I’m having trouble with your last question. Let’s start with something simpler like “Will it rain tomorrow?”

Conclusion

Writing a program that can act like an intelligent human is hard. Luckily for us humans are easy-going lifeforms that are more than happy to project the illusion of human intelligence onto every machine they see. As long as our chatbot is funny and polite most users will be willing to consider it human enough.

Now I’m going to spend the next few days adding new responses to DELPHI. Once that’s finally done I’m going to recruit a friend to test-chat with DELPHI and my next post will be spent analyzing how well (or how poorly) DELPHI did.

I suppose there is a small chance that DELPHI will do perfectly and this Let’s Program will end. But I seriously doubt it. This chatbot doesn’t even have a dozen rules yet. I’m predicting it won’t be able to handle even half the input the tester gives to it.

* You probably remember ELIZA from when I introduced it back at the beginning of this Let’s Program.

** On the other hand, trying to come up with a witty one-liner under pressure is very difficult and a botched one-liner will just make the problem. So if you accidentally insult someone’s religion/parents/favorite OS it might be best to just shut up and retreat.

***If you want to plug this into our “does” rule you’re probably looking for something like “Let me check my Magic 8 ball(tm). Hmm… future looks cloudy so I don’t know if UIF0 does UIF1. Ask again later

Let’s Program A Chatbot 12: When The Answer Key Is Wrong

Posted on November 23, 2013 by Scott

Unrealistic Expectations

Sometimes you get halfway through a project only to realize you don’t have the time or money to do what you originally planned to do*. When that happens you have no choice but to rethink your plans, either lowering your expectations or setting a new deadline. Admittedly both approaches generally involve getting frowned at by both management and your customers but sometimes you really have no choice. Even the best of developers have limits.

Why am I bringing this up? You’ll understand in a minute, but I will tell you that it involves these still unresolved use cases:

Test Case 2 Failed!!!

Input: Does this program work?

Output: I’m sorry, could you try rewording that?

Expected: Fate indicates that this program works

Test Case 3 Failed!!!

Input: Do computers compute?

Output: I’m sorry, could you try rewording that?

Expected: Fate indicates that computers compute

At first this doesn’t look so bad. The use cases are “Do X Y?” and “Does X Y?” and all DELPHI has to do is respond back “Yes X Y”. Hardly seems like a challenge. We’ll just slip this new rule into our list after the “or” rule and right before the “is” rule.

push(@chatPatterns,
   [qr/\A(?:Do|Does) (.+)\?\z/,
      "Fate indicates that UIF0"]);

Very simple. We look for any question that starts with some form of “Do” (notice the non-capture ?: symbol) and then we just replace that one question word with our “Fate indicates that” prediction. Is that really all it took?

Test Case 2 Failed!!!

Input: Does this program work?

Output: Fate indicates that this program work

Expected: Fate indicates that this program works

Test Case 3 Passed

A success and a failure is still an overall failure. So now we need to find out what went wrong with Test Case 2 that didn’t go wrong with test Case 3. If you look closely at the expected vs actual output the only issue is verb agreement. It should be “program works”, with an ‘s’, but all we got was the original “program work” from the question.

This problem really only shows up in the third person where the question is phrased as “Does X VERB” and the answer needs to be in form “X VERBs”. It’s really a pretty simple grammar rule. At least, it’s simple for a human. DELPHI is going to need a lot of help.

Hmmm… maybe we can solve this by just slipping an ‘s’ onto the end of our response. Of course, since this only applies to third person questions we’ll have to split the original rule into two rules. Notice that only the “does” version glues a final s onto the end of the User Input Fragment from the original input:

push(@chatPatterns,
   [qr/\ADo (.+)\?\z/,
      "Fate indicates that UIF0"]);

push(@chatPatterns,
   [qr/\ADoes (.+)\?\z/,
      "Fate indicates that UIF0s"]);

Test Case 2 Passed

I’m Still Not Sure This Is Really Working

Just gluing an ‘s’ to the end of the input doesn’t seem very sophisticated. Sure, it passed our test case but I’m not sure it will really work in all scenarios. So how about we write a new test case just to make extra sure we really solved our problem?

$testCases[13][0] = "Does adding an s work well?";
$testCases[13][1] = "Fate indicates that adding an s works well";

Nope!

Test Case 13 Failed!!!

Input: Does adding an s work well?

Output: Fate indicates that adding an s work wells

Expected: Fate indicates that adding an s works well

Adding an ‘s’ to the end of the sentence isn’t enough because what we truly want is an ‘s’ on the end of the verb and there is no guarantee that the verb will be the last word in the sentence. So to fix this problem we are going to need to either:

Develop a complex system for identifying the verb in an arbitrary sentence
Decide that we don’t care about adding ‘s’s to verbs

I’m going to go with option number 2 and come up with a new definition of what is considered a “correct” answer to a “does” question.

The New Test Case

There is an easy way around having to reformat our verbs and that is by including the word “does” inside the response. For instance, these two sentences basically mean the same thing:

This sentence looks equal to the other sentence

This sentence does look equal to the other sentence

This means that we can change the response to “Does X Y?” from “Yes, X Ys” to the much simpler “X does Y”. Now we are dealing with the exact same problem we already solved for “X is Y” and “X will Y”.

Here are our updated test cases:

$testCases[2][0] = "Does this program work?";
$testCases[2][1] = "Fate indicates that this program does work";

$testCases[13][0] = "Does this approach work better?";
$testCases[13][1] = "Fate indicates that this approach does work better";

And here is our updated “does” rule (the “do” rule can stay the same):

push(@chatPatterns,
   [qr/\ADoes ($noncaptureAdjectiveChain[a-zA-Z]+) (.+)\?\z/,
      "Fate indicates that UIF0 does UIF1"]);

And, finally, here are the results

Passed 13 out of 14 tests

Test Failure!!!

Did We Learn Anything Useful Today?

The moral of today’s story is that sometimes a test case that is really hard to solve represents a problem with your expectations as much as your program. If you’re on a tight budget or schedule** sometimes it makes sense to stop and ask yourself “Can we downgrade this requirement to something simpler? Can we delay this requirement until a later release?”

After all, good software today and the promise of great software tomorrow is better than insisting on great software today and never getting it.

Although sometimes you can manage to deliver great software today and that’s even better. Reach for the stars, bold readers. I have faith in your skills!

Conclusion

Did you notice that the success rate on our last testing run was 13 out of 14? That means we’re almost done! At least, we’re almost done with the first test version of the code. I’m sure the instant we ask a human tester to talk to DELPHI we’re going to find all sorts of new test cases that we need to include.

But future test cases are a problem for the future. For now we’re only one test case away from a significant milestone in our project. So join me next time as I do my best to get the DELPHI test suite to finally announce “All Tests Passed!”

* Even worse, sometimes you’ll find out that what you want to do is mathematically impossible. This is generally a bad thing, especially if you’ve already spent a lot of money on the project.

** Or if you’re writing a piece of demo software for your blog and don’t feel like spending more than a few dozen hours on what is essentially a useless toy program

Let’s Program A Chatbot 4: Let’s Talk About Test Driven Development

Posted on October 14, 2013 by Scott

Get it? Let’s “talk” about test driven development? Because we’re designing a chatbot. It’s funny*! Yes? No. Okay. Moving on.

What Is Test Driven Development?

Testing is the process of proving that a program can do what it is supposed to do without crashing or generating incorrect output. Good tests also help you find bugs in your programs before your users do. It is very embarrassing to deliver a piece of software that freezes your customer’s computer the first time they start it up. So testing is definitely important.

Software testing is an art unto itself, but the general idea is to come up with a list of sample program inputs that match what you expect real users to try. Then you figure out, by hand, what the program should do for each of those inputs. Finally you feed your inputs into the program one item at a time and double check that the computer does the right thing.

For example, suppose you are programming a bank calculator that figures out monthly payments for car loans. You set up your first test by talking to an accountant and finding out that a $10,000 loan should have a $300 monthly payment. So you feed $10,000 into the loan calculator and make sure it answers correctly. If it doesn’t generate the correct $300 payment then you know you have a bug.

Once you have run the test and fixed any bugs that show up you move on to your next test by talking to your accountant again and generating a new test case. Maybe your bank doesn’t give loans for more than $100,000 at a time so the calculator should return a “Too Large Loan” warning if the user asks for $150,000. So you go back to your calculator, plug in $150,000 and then make sure it prints the warning.

Then it’s back to your accountant, boss or customer for a few dozen more tests to run.

You might have noticed that this sounds really boring and tedious. Who wants to spend an hour feeding input into a program and then going over the output line by line looking for bugs? I don’t!

That’s where automated testing comes in. Instead of running your tests by hand you write a new test program that knows how to talk to your software. You then give your test input and expected output to the test program and let it run all the tests for you. It feeds the input to your program, checks the output for accuracy and then prints up a pretty report letting you know if there were any problems. You still have to come up with the tests on your own, but at least you don’t have to run them.

Automated testing can run thousands of tests with a single click. It’s easier than testing by hand. It’s faster than testing by hand. It’s more accurate than testing by hand. It’s much much less boring then testing by hand. The only real weakness is that it’s hard to automate UI testing or certain types of database driven programs.

You Still Haven’t Mentioned What Test Driven Development Is

Oh, right. My bad. I was having too much fun talking about automated software testing.

Test Driven Development is just the idea that you should setup your automated testing software before you start writing your actual program. You should then run your automated test at least once per day so you can keep track of exactly how much progress you’re making.

It is called “test driven” because the tests are the main driver and motivator of your software project. Your first goal is to write good tests and then the rest of your project focuses on writing code that can pass those tests. This is the opposite of code first development where your first goal is to write your program and only then do you start worrying about how to test it.

Of course, writing the tests before you write the software to be tested means that you are going to be seeing a lot of “errors” the first few times you run your tests. In fact, a test that doesn’t show 100% errors on a blank program probably has a few errors of its own**.

But the 100% error stage doesn’t last long. Once you know your testing software works you can start writing your actual program and before you know it you’ll pass your first use case and change from 100% failure to 1% success. And then you just keep writing and testing your software until the tests finally return zero errors. At that point you can feel very confident that your program really works, that you didn’t forget any features and that your code is as bug free as possible.

Why Would I Want To Use Test Driven Development?

Automatic tests sound cool, but why would anyone build the test before the thing to be tested? Isn’t that a little backwards? Why bother writing a test if you know it’s going to just return 100% failure? Although not a good fit for ALL programming tasks there are several advantages to starting with tests:

First, it lets you catch mistakes as soon as you make them. If your code used to have a 60% success rate but your latest “improvement” dropped that down to 40% then you know there is a big bug somewhere in your most recent code. This makes it easy to find and fix the bug because you only have a few dozen lines to examine. If you had waited to test until your program was “done” you would have had to search the entire code base to find that bug.

Second, writing tests is a good way to double check that your design document is complete. Imagine that you are writing a test to make sure that the program can handle negative numbers in the input. You flip to the design document to look up the “negative input” use case and realize that you forget to decide what should happen. Whoops! Better go back and discuss that with your manager / customer / alter-ego before you go any further.

Third, testing can help schedule your programing. Not sure exactly what to program next? Just find a test that is failing and write the code it needs to succeed.

Finally, test driven development can give you an emotional boost by letting you see progress as it happens. Sometimes in software we can spend weeks writing code without feeling like any progress is being made. This is especially bad if your boss also thinks progress isn’t being made. Having a set of automated tests lets you watch the completion rate climb with every new function and gives you something to show management. “Sure, the user interface is still incomplete but these tests show that we have made significant improvement in the invisible database layer.”

Tools For Tests

Testing software brings with it all the questions associated with normal software. Should you program your own testing suite or use an existing tool? Open source or proprietary? Do you need a powerful tool with all the bells and whistles or will a simple testing tool be enough? Do you want your testing tool to integrate with your programming environment or be a standalone program?

You get the idea. Lots of options to fit every coding scenario you run into.

As for this Let’s Program, we probably don’t need much power. DELPHI is going to be a very simple program that does nothing but listen to user input and then generate output. So instead of messing around with existing testing tools I’m just going to write my own mini-test. Shouldn’t take more than an hour.

The DELPHI Test Suite

As I mentioned, DELPHI only does one thing: process user text input and generate response text. So to test DELPHI all we need to do is come up with a list of sample input along with the DELPHI response we hope to get. Then we just cycle through all the input and raise a warning every time DELPHI says the wrong thing.

Doing the same thing again and again on slightly different pieces of data suggests our test program should involve some sort of loop. And for the sake of convenience it would be great if we could put all the test input and expected responses into a big list.

After thinking about that a little I came up with the idea of putting all of the input/response pairs into one big two dimensional array; basically a two column table where every row will be a different test. The first item in each row will be the sample input and the second item will be the expected response.

Now we can run all of our tests from inside of a single loop. I’ll be using a foreach loop that will run our test code once for every single row in our test array.

Inside the testing loop I will ask DELPHI to come up with a reply based on the input from the current test row. I’ll then compare that response to the expected response from the test row. If they’re the same I’ll tally up a success for DELPHI. If they’re different I’ll print a nice error message to the screen that lets me know which input failed, what response I was expecting and what DELPHI actually said.

With that background knowledge even a non-Perl programmer should be able to make sense of the following test code. A few Perl tricks to look out for though:

“use strict” tells the compiler to yell at me if I bend the rules. Without it Perl will let you get away with bad code. Ignoring strict is useful for quick experiments, but on a serious project you always want “use strict”
$ indicates a variable with only one value, like a number or string or an individual item in an array
@ indicates an entire array
You’ll notice that I create the array with the @ symbol and then switch to the singular $ syntax when filling it’s individual members with data. This is because individual array slots only have one value
In Perl strings and numbers have different comparison operators. ‘ne’ is the string version of ‘!=’
You might notice that within the for loop I access array values with $test->[0] instead of just $test[0]. This is because $test is actually a reference to an array instead of being a true array. Don’t worry about it too much.

With that Perl trivia out of the way here is Version 1.0 of the DELPHI Tester:

#! /usr/bin/perl -w

use strict;

my @testCases;

$testCases[0][0] = "Will this test pass?";
$testCases[0][1] = "I predict that this test will pass";

$testCases[1][0] = "Is the sky blue?";
$testCases[1][1] = "Fate indicates that the sky is blue";

$testCases[2][0] = "Does this program work?";
$testCases[2][1] = "Fate indicates that this program works";

$testCases[3][0] = "Do computers compute?";
$testCases[3][1] = "Fate indicates that computers compute";

$testCases[4][0] = "Do my readers enjoy this blog?";
$testCases[4][1] = "Fate indicates that your readers enjoy this blog";

$testCases[5][0] = "Is it better to be loved or feared?";
$testCases[5][1] = "Fate indicates the former";

$testCases[6][0] = "Why is natural language processing so hard?";
$testCases[6][1] = "Because of reasons";

$testCases[7][0] = "Pumpkin mice word salad?";
$testCases[7][1] = "I'm sorry, could you try rewording that?";

$testCases[8][0] = "Pumpkin mice word salad";
$testCases[8][1] = "I don't want to talk about that. Please ask me a question";

$testCases[9][0] = "Why do you say things like that";
$testCases[9][1] = "Did you forget a question mark? Grammar is important!";

my $testCount=0;
my $successCount=0;

foreach my $test (@testCases){
    my $output = generateResponse($test->[0]);
    if( $output ne $test->[1] ){
        print "Test Case $testCount Failed!!!\n";
        print "Input: ".$test->[0]."\n";
        print "Output: $output\n";
        print "Expected: ".$test->[1]."\n";
    }
    else{
        print "Test Case $testCount Passed\n";
        $successCount++;
    }

    $testCount++;
}

print "--------------------";
print "\n";
print "Passed $successCount out of $testCount tests\n";
if($testCount == $successCount){
    print "All Tests Passed!\n";
}
else{
    print "Test Failure!!!\n";
}

sub generateResponse{
    return "";
}

The ten test cases in this first test represent a pretty good sample of yes/no questions, either or questions, why questions and non-question input. I also tried to get a good mix of singular, plural, first person and third person questions. I’ll probably add a few more tests as the project continues and I realize new conversation patterns that need to be supported.

The First Run

Now that I have a test I should run it and make sure it works. Except that it obviously won’t.

Why not?

See that line inside the foreach loop where it asks for DELPHI to “generateResponse”? That function doesn’t exist yet so my test code won’t even compile.

The best way around this is to write a temporary place-holder function that will pretend to be DELPHI until we can write some actual DELPHI code. Place holder and prototype functions are the only bits of code you are allowed to write before your tests in Test Driven Development. For example, a test driven loan calculator would probably start out with an empty “caluculatePayment”.

Anyways, here is our DELPHI place holder.

sub generateResponse{
    return "";
}

This DELPHI dummy just responds with a blank string no matter what you say to it. Obviously worthless, but it gives the tests something to talk to and allows our code to compile. And now that the code compiles we can run our first test:

Test Case 0 Failed!!!

Input: Will this test pass?

Output:

Expected: I predict that this test will pass

Test Case 1 Failed!!!

Input: Is the sky blue?

Output:

Expected: Fate indicates that the sky is blue

Test Case 2 Failed!!!

Input: Does this program work?

Output:

Expected: Fate indicates that this program works

Test Case 3 Failed!!!

Input: Do computers compute?

Output:

Expected: Fate indicates that computers compute

Test Case 4 Failed!!!

Input: Do my readers enjoy this blog?

Output:

Expected: Fate indicates that your readers enjoy this blog

Test Case 5 Failed!!!

Input: Is it better to be loved or feared?

Output:

Expected: Fate indicates the former

Test Case 6 Failed!!!

Input: Why is natural language processing so hard?

Output:

Expected: Because of reasons

Test Case 7 Failed!!!

Input: Pumpkin mice word salad?

Output:

Expected: I’m sorry, could you try rewording that?

Test Case 8 Failed!!!

Input: Pumpkin mice word salad

Output:

Expected: I don’t want to talk about that. Please ask me a question

Test Case 9 Failed!!!

Input: Why do you say things like that

Output:

Expected: Did you forget a question mark? Grammar is important!

——————–

Passed 0 out of 10 tests

Test Failure!!!

We failed all the tests! Which means we succeeded! All those blank output lines show that the DELPHI placeholder is doing it’s job and the 100% failure rate means that our test code is correctly flagging mistakes for us.

Conclusion

Now that the testing framework is done the stage is set for starting to actually work on the chatbot. Finally, after four posts, we’re going to “Let’s Program A Chatbot” for real.

* Bad jokes are an important part of computer programming. If you can’t handle this you may want to consider a career in a different field. Like accounting.

** Yes, you have to test your test programs before using them. But please try to avoid falling into an infinite loop of testing the tests that test your tests for testing tests.

Let’s Program A Chatbot 3: Choosing A Programming Language

Posted on October 10, 2013 by Scott

How To Choose A Programming Language

Modern programming languages are 99% interchangeable. If you can do something in C you can also do it in Java, Lisp, Visual Basic, Python and so on. There are very few scenarios where you absolutely “need” to use a specific language.

But that doesn’t change the fact that every language has strengths and weaknesses. A program that would be difficult to write in Java might be easy to write in Python. A program that runs slow in Lisp might be easy to optimize in C.

But the language’s strengths and weaknesses aren’t the only thing you need to think about when starting a project. You, as a programmer, have strengths and weaknesses too. If you have ten years of experience with C++ but have never touched Ruby then odds are you should stick to C++, especially if you have a deadline coming up and can’t spare the time to learn a new language*.

So when trying to choose a programming language you need to ask yourself three questions:

How well does this language match my problem?
How comfortable am I with this language?
How much time can I spare for learning new language features?

Sometimes you get lucky and find out that your favorite language is a perfect match for the problem you need to solve. Hooray!

But other times you’ll have to make a tough choice between a familiar language you know you can *eventually* succeed with and a less familiar language that has some really great features that would instantly solve all your problems if you could just get your code to stop throwing weird errors.

And sometimes the choice is so hard you just give up, eat a gallon of ice cream and decide to join an isolated community where computers are illegal and speaking jargon is punishable by death.

Perl: A Good Pattern Matching Language

With all that theory out of the way we can move on to choosing a language for our chatbot. Since our chatbot is going to be based primarily off of pattern matching we’re going to want a programming language that makes matching patterns easy. And pattern matching should make you think of regular expressions**. And regular expressions should make you think of Perl.

I can see a few of you getting a little nervous. Doesn’t Perl have a reputation for being a hard to read language? And aren’t regular expressions famous for causing more problems than they solve? Weren’t we supposed to choose a language we feel comfortable with?

Well don’t worry. I use both Perl and regular expressions at work and while I’m no guru I can at least get my code to work 9 times out of 10. Furthermore, I promise to write clean code and will do my best to avoid the Perl code shortcuts that are responsible for making it hard for newcomers to understand.

Side note: Although Perl and regular expressions work together really well I should point out that you can also use regular expressions with other languages. In fact, most languages have either built in support for regular expressions or easy to find regex libraries.

So if you like C# you can regex in C#. If you’re a Java guy you can regex in Java. Just because I chose Perl for my chatbot doesn’t mean you have to. In fact, porting my chatbot to your favorite language might be a fun exercise for beginning programmers looking for a challenge.

Although I suppose I’ll have to actually write this thing before anybody can port anything.

Proof of Concept: Can We Really Perl Up A Chatbot?

On paper Perl looks a really good pattern matching chatbot language. It has built in support for regular expressions, tons of text processing functions and cross platform support makes it easy to share code with other people (like you, my wonderful readers).

But I still feel a little twinge of doubt. Is this really a good idea? I figure the best way to find out is to write some Perl and see if I can make it do what I want.

Spoilers: The answer is yes. You can now safely skip to the next post without missing anything. But if you want to see the tests I hacked together to prove this, feel free to read on. Just don’t be surprised if the code is hard to follow. This isn’t production code or even reader education code, just a quick and sloppy experiment.

Test 1: Pattern Matching And Response Generation… in Perl

The core feature of our chatbot will be the ability to check whether or not the user’s input matches a specific pattern and then build an appropriate response. So that seems like the most logical thing to test first. And so here is my first test:

#! /usr/bin/perl

$testInput = "Is Perl a good choice for this program?";

if($testInput =~ /\AIs ([a-zA-Z]+) (.+)\?\z/){
   print "DELPHI: Fate confirms that $1 is $2\n";
}
else{
   print "Didn't work\n";
}

This also marks the first bit of code in the Let’s Program and OH MY WHAT IS WRONG THAT IF STATEMENT!?

Well, wonderful reader, that if statement happens to be a regular expression. I’ll talk about those more later on. For now just trust me when I say that that bizarre list of characters and symbols translates to “Match a sentence that begins with ‘Is’, ends with ‘?’ and has at least two words in between them”.

That regular expression also gives us a copy of the words that it found between the ‘Is’ and ‘?’, which we then slip into the output. That’s what the symbols $1 and $2 are doing.

Don’t worry if that didn’t make sense. This is just a test. I’ll explain things more in depth when I start actually programming the chatbot. For now the important thing is that running this program produces this output:

DELPHI: Fate confirms that Perl is a good choice for this program

Test 1 is a success. We managed to write Perl code that matched user input and transformed it into an appropriate chatbot response.

Test 2: Can We Make A List Of Regular Expressions… In Perl?

Now we know that Perl can help us match user input to one pattern. But for our chatbot we’re going to need to try and match the user’s input against at least a dozen different patterns. Is there an easy way to do this or is our program going to turn into a giant pile of if and elsif? Time to find out:

#! /usr/bin/perl

$testInput = "Is Perl a good choice for this program?";
$testInput2 = "Why is Perl a good choice for this program?";

$inputPatterns[0]=qr/\AIs ([a-zA-Z]+) (.+)\?\z/;
$inputPatterns[1]=qr/\AWhy (.+)\?\z/;

if($testInput =~ $inputPatterns[0]){
   print "DELPHI: Fate confirms that $1 is $2\n";
}
else{
   print "Didn't work\n";
}

if($testInput2 =~ $inputPatterns[1]){
   print "DELPHI: Because I said so\n";
}

if($testInput1 =~ $inputPatterns[1]){
   print "This shouldn't match!\n";
}

if($testInput2 =~ $inputPatterns[0]){
   print "This shouldn't match either!\n";
}

Once again, don’t worry if you didn’t catch all that. In this test I basically just stored the regular expressions inside an array instead of writing them directly inside of the if statements. If this works then we can write our chatbot with a nice, clean pattern matching loop instead of endless if statements. But does it work?

DELPHI: Fate confirms that Perl is a good choice for this program
DELPHI: Because I said so

Success!

Test 3: Connecting Output Patterns To Input Patterns… In Perl!

Last test proved that we can move our regular expressions out of the if statements and into a nice, clean array. Can we do the same thing with our responses? Here goes nothing…

#! /usr/bin/perl

$testInput = "Is Perl a good choice for this program?";
$testInput2 = "Why is Perl a good choice for this program?";

$chatPatterns[0][0]=qr/\AIs ([a-zA-Z]+) (.+)\?\z/;
$chatPatterns[0][1]="DELPHI: Fate confirms that $1 is $2\n";

$chatPatterns[1][0]=qr/\AWhy (.+)\?\z/;
$chatPatterns[1][1]="DELPHI: Because I said so\n";

if($testInput =~ $chatPatterns[0][0]){
   print $chatPatterns[0][1]
}

if($testInput2 =~ $chatPatterns[1][0]){
   print $chatPatterns[1][1];
}

Which produces this output:

DELPHI: Fate confirms that is
DELPHI: Because I said so

Uh oh. Everything matched up properly but something went wrong with the response generation. I was actually expecting this. I want to build DELPHI’s responses using information from the user’s input, but the response array is being built before the user gets a chance to say anything.

So if I want to store response patterns in an array I’m going to need to add a little extra code in order to splice the user’s input into the response after it is pulled out of the array but before it gets printed to the screen. Hmm… let’s try this:

#! /usr/bin/perl

$testInput = "Is Perl a good choice for this program?";
$testInput2 = "Why is Perl a good choice for this program?";

$chatPatterns[0][0]=qr/\AIs ([a-zA-Z]+) (.+)\?\z/;
$chatPatterns[0][1]="DELPHI: Fate confirms that UIF0 is UIF1\n";

$chatPatterns[1][0]=qr/\AWhy (.+)\?\z/;
$chatPatterns[1][1]="DELPHI: Because I said so\n";

if(@UIF = ($testInput =~ $chatPatterns[0][0])){

   $response = $chatPatterns[0][1];
   for($i=0; $i<@UIF; $i++){
      $find = "UIF$i";
      $replace = $UIF[$i];
      $response =~ s/$find/$replace/g;
   }

print $response;
}

if(@UIF = ($testInput2 =~ $chatPatterns[1][0])){

   $response = $chatPatterns[1][1];
   for($i=0; $i<@UIF; $i++){
      $find = "UIF$i";
      $replace = $UIF[$i];
      $response =~ s/$find/$replace/g;
   }

print $response;
}

You’re still not allowed to panic, this is just a test. What I’ve basically done is change the code to generate a list of individual pieces from the original input (Which I call User Input Fragments or UIF). When a match is found the program uses a special type of regex to find every place that the input has a special UIF word and then replace it with data from the actual input.

Don’t look at me like that. I said I’ll explain it better later. Just wait one or two more posts. For now the important thing is that running my new test code produces this beautiful output:

DELPHI: Fate confirms that Perl is a good choice for this program
DELPHI: Because I said so

Success! I can store responses in an array right alongside the input patterns they are related to. This means that I can teach the chatbot new conversation tactics by just adding new patterns to the master array. No need to write new code!

Conclusion

Our test have all passed and the language of this Let’s Program is going to be Perl. With that final piece in place we can finally jump into some actual coding. Are you excited? I’m excited!

…

Please be excited.

* I once failed a mildly important college project because I decided it would be fun to code everything in a new language that I knew almost nothing about. By the time I realized I would have been better off sticking with a language I knew it was too late. Don’t let the same thing happen to you!

** Regular Expressions are a sort of miniature programming language that specialize in pattern matching. They are a powerful tool for all sorts of text analysis programs.

Let’s Program A Chatbot 2: Design Before You Code

Posted on October 9, 2013 by Scott

My Obsession With Design Documents

Design documents are an important part of writing software. How important? So important that Jesus used the software design process in one of his parables!

28: For which of you, intending to build a program, sitteth not down first, and counteth the requirements, whether he have sufficient time and skill to finish it?

29: Lest haply, after he hath written much code, and is not able to finish it, all that behold it begin to mock him,

30: Saying, This man began to code, and was not able to finish.

Luke 14:28-30

OK, I may have paraphrased that a bit. But you get the idea. The first step in a successful project is sitting down and deciding exactly what you’re trying to accomplish and then figuring out whether or not it is actually accomplishable.

Careful planning also gives you a chance to notice problems while they are still theoretical and easy to fix. Realizing that you need to add sound effects to a program that is 99% complete requires you to rewrite and debug painfully large amounts of code. But if you realize that you need sound effects during the planning stage you can naturally add them into the code as you work. Much easier.

Example: The Chatbot That Probably Would Have Never Been Finished

Now it’s time for a real life example of how good design documents made my life easier.

When I first started this Let’s Program the one big question was what exactly my chatbot should chat about. Usually when I can’t figure out a topic for a practice project I default to table-top fantasy gaming on the basis that most computer geeks have played Dungeons and Dragons, played a Dungeons and Dragons inspired computer game or at the very least seen a fantasy movie where people hit monsters with swords.

So my first idea was to create a chatbot that could help Dungeon Masters design new adventures. A chatbot that could talk about plot hooks and dungeons and even automatically generate treasure hordes and room descriptions.

And I was so excited by this idea that I was very tempted to just jump straight into playing with code. But I resisted the urge and spent some time doing a nice requirements write-up for all you lovely people in the audience. And during that write-up I realized that I didn’t have nearly enough free-time to build this thing.

Writing a pattern-matching bot that can intelligently talk about fantasy adventures was going to be hard. Programming functions for generating treasures and rooms was going to be tedious. And linking input patterns to function calls instead of output patterns was going to involve advanced coding techniques that I felt would unfairly draw attention away from the simple “chatbot” focus I had planned for this Let’s Program.

So I’m saving that project for a future date.* Thanks to the power of design documents I’ve avoided an embarrassingly ambitious project and you’ve been spared the pain of watching me fail.

Design Document 2: The Fortune Teller Chatbot That I Can Build

I’ve already thrown away one project for being too complex for a good example of introductory chatbot design. What am I going to do instead?

Well, I’m going to write a fortune teller. Where by “fortune teller” I actually mean “random guessing machine”. The basic idea is to create a chatbot that can recognize when it’s being asked a question and then makes a wild guess at what the answer should be. If the user doesn’t ask a question the chatbot will prompt them to do so.

I also plan to make the chatbot as goofy and sarcastic as possible. Not only because I enjoy funny computer programs but also because the humor will hopefully disguise the fact that the chatbot isn’t actually smart enough to know what’s going on. But I’ll talk about that more in the future when I get to the ELIZA effect.

I shall call the chatbot DELPHI because I was an extreme mythology geek in my youth.

UI Design

The user interface is one of the most important parts of a good design document. It should include mockups of every page on a website, every screen on an app and a clear explanation of what every single button should do. It should give you a good idea of everything the user will be able to do and everything they should expect to see.

A good UI design can take up a lot of space!

But this is a text based program so our UI Design will only take a couple paragraphs. What a relief!

Official UI Design: The user opens the program via the command line. The program will print a line prompting them to begin communication. The user is allowed to type text. After they hit enter the program will generate a new line of text that responds to the user’s input as relevantly as possible.

Computer output will be proceeded by the string “DELPHI: ”.

User input will be prompted by the ‘>’ symbol.

Program Interface Mockup:

DELPHI: Welcome! What soul searching questions have you brought me today?

>Will Scott post his next Let’s Program soon?

DELPHI: The stars say it is very unlikely that Scott will post his next Let’s Program soon.

>Why not?

DELPHI: Because space aliens are interfering with the astrological radiation from Jupiter.

> You’re making that up.

DELPHI: I sense you’re not being honest with me. What questions do you really have?

>Goodbye

DELPHI: Until next time

Use Cases

Use Cases rank right up there with thorough UI design when it comes to valuable software development tools. For those few of you who aren’t familiar with the technique the whole point of use cases is to come up with a big list of how you expect your users to user your program. Then you write down what the program should do.

So let’s consider a few of the major question types we expect to see.

Use Case 1: Yes/No Questions

The most obvious questions that a fortune teller will encounter are those that have a simple yes or no answer. Most of these inputs will start with the word “Will”, “Is” or “Do” and end with a question mark. Example:

Will I get a promotion?
Is my code bug free?
Do androids dream of electric sheep?

In this scenarios DELPHI should respond with a randomly chosen positive or negative answer that includes appropriate portions of the user’s question. Example:

The stars agree you will get a promotion.
Ether vibrations suggest it is impossible that your code is bug free.
Fate calculations show that androids dream of electric sheep.

Use Case 2: Why Why Why Why?

Another popular type of question is the “Why” question. As in “Why won’t my code compile?”, “Why is the sky blue?” or “Why won’t my toddler stop asking me why questions?”.

In these scenarios DELPHI should respond with a randomly chosen excuse. The list of possible explanations should be large enough that casual users don’t catch on that answers are being chosen randomly. Explanations might include:

Because of the alignment of the stars.
Because great Cthulhu is beginning to awaken.
I would tell you but it is a secret to everyone.
I used to know, but I forgot.

Use Case 3: A or B?

Users may ask DELPHI to choose between multiple options. This will be signified by the word “or” as seen in these sample inputs:

Should I buy a sports car or a motorcycle?
Do I want chocolate or strawberry ice cream?
Is that a moon or a space station?

In these cases DELPHI should generate a response that randomly agrees with the first or second option. By actually using the words first and second DELPHI will not need to actually include information from the original post. Consider these sample responses:

I've got a good feeling about the first one.
My prognostication engine suggests the later.
Definitely the former... assuming you trust Saturn.

Use Case 4: Goodbye

If the user types “Goodbye” the program will exit.

Default Use Case

It is very very likely that users will say things that do not fit any of our specific use cases. Especially if the user decides to not actually ask a question. In these scenarios DELPHI should generate a random response that urges the user to try again with a question. Possible responses might include:

Chatting is nice, but please ask me question instead.
I'm bored. Go ahead, ask me why I'm bored.
Sorry, I didn't hear you. What was you're question?

Conclusion

I feel like I have a really good grip on what DELPHI version 1.0 needs to do and I hope you do too. Which means the only thing standing between us and some actual programing is choosing an implementation language.

Three guesses what my next post is going to be about.

* Possibly after a zombie apocalypse has left me trapped in my own basement with nothing but months of bleak free time ahead of me.