STANFORD LINGUIST 138/238     -     SYMBSYS 138   -     Autumn 2004
Homework 1
Due: October 5 at the start of class

Read this entire page before starting!!

Implement an ELIZA-like program, using substitutions such as those described on page 12-13. You may choose a different domain than a Rogerian psychologist, if you wish, although keep in mind that you would need a domain in which your program can legitimately do a lot of simple repeating back.

You will be teaming up with a partner in the class, so make sure to pick a partner in class soon, and in any case by Thursday Sep 30.

There are lots of examples of ELIZA and other chatterbots on the web, if you want you may look at them for high-level ideas, or for hints about fun domains, but you may not copy any code off the web.

The simplest (and original) architecture for ELIZA is a simple read-replace-print loop. Read in a line from the user, run a series of regular expression substitutions, and print out the substituted line.

What you will be graded on:

  1. Make sure you deal properly with greetings and farewells.
  2. Make sure your output is grammatical! For example, handle the following simple grammatical phenomena:
    1. Person deixis. Make sure that you call the user "you" and ELIZA (your program) I. So if the user talks about "my mother" your program needs to respond talk about "your mother". And if the user says "Eliza is.." you'll need to respond that "I am...".
    2. Agreement: make sure your verb uses are correct (don't say "I is" or "you am").
  3. Sentence tokenization. If the users input is two sentences, make sure your patterns don't mix the two together. So the simplest algorithm is just to respond to the first sentence and ignore the second sentence. This will require that you do a simple job of breaking up the users input into sentences.
  4. Make sure you write some interesting patterns! Include at least two patterns which use a keyword in the input to change the whole input string to something different. And include at least two patterns which do some sort of complex reordering of the input.
  5. Global State: implement at least one extension to the read-replace-print loop, where you implement "conversational context" by remembering something from what the user typed earlier. This could be as simple as remembering the user's name, or noticing that the user is repeating themselves, or keep a "mad index" relating to how many times the user has insulted you.
  6. Exchange programs with your partner. Look at your partners code, and then have a conversation with your partners program. Use your knowledge of their code to try to cause 2 grammatical errors or confusions or other kinds of problems with their program's output. Please try to choose problems that do not require huge architectural changes like solving the entire problem of human natural language understanding. Save this conversation (if necessary just by cutting and pasting).
  7. Modify your program to fix the 2 problems your partner found (unless they require massive architecture
  8. Write 1 page (300 words) discussing conversational limitations caused by the fact that the structure of your program is a read-replace-print loop. What kinds of more complex conversational behavior would require keeping more knowledge of conversational context and/or having a more complex conversational architecture?

What to turn in:

  1. Your program (you may write in any programming language you want. Perl is probably simplest.).
  2. A transcript of a conversation between your program and your partner.
  3. A description of the improvements that you made in your program after seeing how the conversation went with your partner.
  4. Your 1-page discussion of the limitations of the read-replace-print loop.

How to turn it in: