[an error occurred while processing this directive]

Stanford CS 106B: Grammar Solver

Assignment by Marty Stepp and Victoria Kirst. Based on a problem by Stuart Reges of U of Washington.

This problem focuses on recursion.

This is an individual assignment. Write your own solution and do not work in a pair/group on this program.

Files and Links:

icon
Project Starter ZIP

(open GrammarSolver.pro)
icon
Turn in
:
icon
Demo JAR
icon
Homework Survey

Problem Description:

For this part you will write a function for generating random sentences from a grammar.

Vector<string> grammarGenerate(istream& input, string symbol, int times)

Your function accepts a reference to an input file representing a language grammar, along with a symbol to randomly generate and the number of times to generate it. Your function should generate the given number of random expansions of the given symbol and return them as a Vector of strings.

A formal language is a set of words and/or symbols along with a set of rules, collectively called the syntax of the language, defining how those symbols may be used together. A grammar is a way of describing the syntax and symbols of a formal language. Even programming languages have grammars; here is a link to a formal grammar for the Java language.

Many language grammars are described in a format called Backus-Naur Form (BNF), which is what we'll use on this assignment. In BNF, some symbols in a grammar are called terminals because they represent fundamental words of the language. A terminal in English might be "boy" or "run" or "Mariana". Other symbols of the grammar are called non-terminals and represent high-level parts of the language syntax, such as a noun phrase or a sentence. Every non-terminal consists of one or more terminals; for example, the verb phrase "throw a ball" consists of three terminal words.

The BNF description of a language consists of a set of derivation rules, where each rule names a symbol and the legal transformations that can be performed between that symbol and other constructs in the language. For example, a BNF grammar for the English language might state that a sentence consists of a noun phrase and a verb phrase, and that a noun phrase can consist of an adjective followed by a noun or just a noun. Rules can be described recursively (in terms of themselves). For example, a noun phrase might consist of an adjective followed by another noun phrase, such as the phrase "big green tree" which consists of the adjective "big" followed by the noun phrase "green tree".

A BNF grammar is specified as an input file containing one or more rules, each on its own line, of the form:

non-terminal ::= rule|rule|rule|...|rule

A separator of ::= (colon colon equals) divides the non-terminal from its expansion rules. There will be exactly one such ::= separator per line. A | (pipe) separates each rule; if there is only one rule for a given non-terminal, there will be no pipe characters. Each rule will consist of one or more whitespace-separated tokens.

The following is a valid example BNF input file describing a small subset of the English language. Non-terminal names such as <s>, <np> and <tv> are short for linguistic elements such as sentences, noun phrases, and transitive verbs.

<s>::=<np> <vp>
<np>::=<dp> <adjp> <n>|<pn>
<dp>::=the|a
<adjp>::=<adj>|<adj> <adjp>
<adj>::=big|fat|green|wonderful|faulty|subliminal|pretentious
<n>::=dog|cat|man|university|father|mother|child|television
<pn>::=John|Jane|Sally|Spot|Fred|Elmo
<vp>::=<tv> <np>|<iv>
<tv>::=hit|honored|kissed|helped
<iv>::=died|collapsed|laughed|wept
Sample input file sentence.txt

This grammar's language can represent sentences such as "The fat university laughed" and "Elmo kissed a green pretentious television". This grammar cannot describe the sentence "Stuart kissed the teacher" because the words "Stuart" and "teacher" are not part of the grammar. It also cannot describe "fat John collapsed Spot" because there are no rules that permit an adjective before the proper noun "John", nor an object after intransitive verb "collapsed".

Though the non-terminals in the previous example language are surrounded by < >, this is not required. By definition any token that ever appears on the left side of the ::= of any line is considered a non-terminal, and any token that appears only on the right-hand side of ::= in any line(s) is considered a terminal. Do not assume that non-terminals will be surrounded by < > in your code. Each line's non-terminal will be a non-empty string that does not contain any whitespace. You may assume that individual tokens in a rule are separated by a single space, and that there will be no outer whitespace surrounding a given rule or token.

Your grammarGenerate function will perform two major tasks:

  1. read an input file with a grammar in Backus-Naur Form and turns its contents into a data structure; and
  2. randomly generate elements of the grammar (recursively)

You may want to separate these steps into one or more helper function(s), each of which performs one step. It is important to segregate the recursive part of the algorithm away from the non-recursive part.

You are given a client program that does the user interaction. The main function supplies you with an input file stream to read the BNF file. Your code must read in the file's contents and break each line into its symbols and rules so that it can generate random elements of the grammar as output. When you generate random elements, you store them into a Vector which is returned. The provided main program loops over the vector and prints the elements stored inside it.


Example Logs of Execution:

Your program should exactly reproduce the format and general behavior demonstrated in this log, although you may not exactly recreate this scenario because of the randomness that your code performs. (Don't forget to use the course File → Compare Output... feature in the console, or the course web site's Output Comparison Tool, to check output for various test cases.)

Grammar file name? sentence.txt
Symbol to generate (Enter to quit)? <dp>
How many to generate? 3

 1: the
 2: the
 3: a
Symbol to generate (Enter to quit)? <np>
How many to generate? 5

 1: a wonderful father
 2: the faulty man
 3: Spot
 4: the subliminal university
 5: Sally
Symbol to generate (Enter to quit)? <s>
How many to generate? 7

 1: a green green big dog honored Fred
 2: the big child collapsed
 3: a subliminal dog kissed the subliminal television
 4: Fred died
 5: the pretentious fat subliminal mother wept
 6: Elmo honored a faulty television
 7: Elmo honored Elmo

Expected output: Here are some additional expected output files to compare. It's hard to match the expected output exactly because it is random. But your function should return valid random results as per the grammar that was given to it. Your program's graphical Console window has a File → Compare Output feature for checking your output.

We have also written a Grammar Verifier web tool where you can paste in the randomly generated sentences and phrases from your program, and our page will do its best to validate that they are legal phrases that could have come from the original grammar file. This isn't a perfect test, but it is useful for finding some kinds of mistakes and bugs.

[an error occurred while processing this directive]

CS 106B/X Grammar Verifier

Paste your sentence.txt output line(s) below to see whether they match the format required by the grammar.

Symbol:
Your grammarGenerate output:
pass

Expression evaluator: Paste your expressions.txt E output below to evaluate it to see whether you have created a legal expression.


 

Now that you've seen an example of the program's behavior, let's dive into the implementation details of your algorithm.

Part 1: Reading the Input File

For this program you must store the contents of the grammar into a Map. As you know, maps keep track of key/value pairs, where each key is associated with a particular value. In our case, we want to store information about each non-terminal symbol. So the non-terminal symbols become keys and their rules become values. Other than the Map requirement, you are allowed to use whatever constructs you need from the Stanford C++ libraries. You don't need to use recursion on this part of the algorithm; just loop over the file as needed to process its contents.

One problem you will have to deal with early in this program is breaking strings into various parts. To make it easier for you, the Stanford library's "strlib.h" library provides a stringSplit function that you can use on this assignment:

Vector<string> stringSplit(string s, string delimiter)

The string split function breaks a large string into a Vector of smaller string tokens; it accepts a delimiter string parameter and looks for that delimiter as the divider between tokens. Here is an example call to this function:

string s = "example;one;two;;three";
Vector<string> v = stringSplit(s, ";");   // {"example", "one", "two", "", "three"}

The parts of a rule will be separated by whitespace, but once you've split the rule by spaces, the spaces will be gone. If you want spaces between words when generating strings to return, you must concatenate those yourself. If you find that a string has unwanted spaces around its edges, you can remove them by calling the trim function, also from "strlib.h":

string s2 = "  hello there  sir  ";
s2 = trim(s2);                            // "hello there  sir"

Part 2: Generating Random Expansions from the Grammar

As mentioned previously, producing random grammar expansions is a two-step process. The first step involves reading the input grammar file and turning it into an appropriate data structure (non-recursively). The second step involves recursively walking that data structure to generate elements by successively expanding them.

Generate elements of a grammar with a recursive algorithm. To generate a random occurrence of a symbol S:

For example, the grammar on the previous page could be used to randomly generate a <s> non-terminal for the sentence, "Fred honored the green wonderful child", as shown in the following diagram.

sentence diagram
Random expansion from sentence.txt grammar for symbol <s>

If the string passed to your function is a non-terminal in your grammar, use the grammar's rules to recursively expand that symbol fully into a sequence of terminals. For example, using the grammar on the previous pages, a call of grammarGenerate("<np>") might potentially return the string, "the green wonderful child".

Generating a non-terminal involves picking one of its rules at random and then generating each part of that rule, which might involve more non-terminals to recursively generate. For each of these you pick rules at random and generate each part, etc.

When you encounter a terminal, simply include it in your string. This becomes a base case. If the string passed to your function is not a non-terminal in your grammar, you should assume that it is a terminal symbol and simply return it. For example, a call of grammarGenerate("green") should just return "green". (Without any spaces around it.)

Special cases to handle: You should throw a string exception if the grammar contains more than one line for the same non-terminal. For example, if two lines both specified rules for symbol "<s>", this would be illegal and should result in the exception being thrown. You should throw a string exception if the symbol parameter passed to your function is empty, "".

Testing: The provided input files and main may not test all of the above cases; it is your job to come up with tests for them.

Your function may assume that the input file exists, is non-empty, and is in a proper valid format. If the number of times passed is 0 or less, return an empty vector.


Implementation Details:

The hardest part is the recursive generation, so make sure you have read the input file and built your data structure properly before tackling the recursive part. The directory crawler program from lecture is a good guide.

Loops are not forbidden in this part of the assignment. In fact, you should definitely use loops for some parts of the code where they are appropriate. The directory crawler example from class uses a for-each loop. This is perfectly acceptable; if you find that part of this problem is easily solved with a loop, please use one. In the directory crawler, the hard part was traversing all of the different sub-directories, and that's where we used recursion. For this program the hard part is following the grammar rules to generate all the parts of the grammar, so that is the place to use recursion. If your recursive function has a bug, try putting a cout statement at the start of your recursive function that prints its parameter values, to see the calls being made. Look up the randomInteger function from "random.h" to help you make random choices between rules.


Style Details:

(These are the same as in the Fractals problem.)

As in other assignments, you should follow our Style Guide for information about expected coding style. You are also expected to follow all of the general style constraints emphasized in the Homework 1 and 2 specs, such as the ones about good problem decomposition, parameters, using proper C++ idioms, and commenting. The following are additional points of emphasis and style contraints specific to this problem:

Recursion: Part of your grade will come from appropriately utilizing recursion to implement your algorithm as described previously. We will also grade on the elegance of your recursive algorithm; don't create special cases in your recursive code if they are not necessary. Avoid "arm's length" recursion, which is where the true base case is not found and unnecessary code/logic is stuck into the recursive case. Redundancy in recursive code is another major grading focus; avoid repeated logic as much as possible. As mentioned previously, it is fine (sometimes necessary) to use "helper" functions to assist you in implementing the recursive algorithms for any part of the assignment.

Variables: While not new to this assignment, we want to stress that you should not make any global or static variables (unless they are constants declared with the const keyword). Do not use globals as a way of getting around proper recursion and parameter-passing on this assignment.

Creative Aspect, mygrammar.txt:

Along with your code, submit a file mycolony.txt that contains a BNF grammar that can be used as input. For full credit, the file should be in valid BNF format, contain at least 5 non-terminals, and should be your own work (do more than just changing the terminal words in sentence.txt, for example). This is worth a small part of your grade.

Frequently Asked Questions (FAQ):

For each assignment problem, we receive various frequent student questions. The answers to some of those questions can be found by clicking the link below.

Q: I'd like to write a helper function and declare a prototype for it, but am I allowed to modify the .h file to do so?
A: Function prototypes don't have to go in a .h file. Declare them near the top of your .cpp file and it will work fine.
Q: What does it mean if my program "unexpectedly finishes"?
A: It probably means that you have infinite recursion, which usually comes when you have not set a proper base case, or when your recursive calls don't pass the right parameters between each other. Try running your program in Debug Mode (F5) and viewing the call stack trace when it crashes to see what line it is on when it crashed.
Q: The spec says that I need to throw an exception in certain cases. But it looks like the provided main program never goes into those cases; it checks for that case before ever calling my function. Doesn't that mean the exception is useless? Do I actually have to do the exception part?
A: The exception is not useless, and yes you do need to do it. Understand that your code might be used by other client code other than that which was provided. You have to ensure that your function is robust no matter what parameter values are passed to it. Please follow the spec and throw the exceptions in the cases specified.
Q: I am confused about grammars; what this assignment is about?
A: In terms of this assignment, a grammar is a set of rules that corresponds to a sybmol. For instance, the adj grammar relates to many different adjectives like purple, loud, sticky, and fake. To put it simply, one grammar has many rules. This relationship is also known as non-terminal to terminal. For this assignment, we want you to navigate through these grammars and create them. Notice that some grammars have grammars in their rule sets as well. Upon encountering this situation, remember that it is a grammar that needs to be generated as well.

Take a look at the decision tree in the assignment specifications, this should help give you a visualization of the problem at hand.
Q: What type of data should my Map store?
A: Unforutantely, this aspect of the assignment is up to you to figure out. Remember the functions you're going to have to call, along with the idea of a grammar and how it relates to its rules.
Q: I'm having trouble breaking apart the strings.
A: Use the provided stringSplit function. Print lots of debug messages to make sure that your split calls are doing what you think they are.
Q: How do I spot nonterminal symbols if they don't start/end with "<" and ">"?
A: The "<" and ">" symbols are not special cases you should be concerned with. Treat them as you would any other symbol, word, or character.
Q: Spaces are in the wrong places in my output. Why?
A: Try using other characters like "_", "~", or "!" instead of spaces to find out where there's extra whitespaces. Also remember that you can use the trim function to remove surrounding whitespace from strings.
Q: How do I debug my recursive function?
A: Try using print statements to find out which grammar symbols are being generated at certain points in your program. Try before and after your recursive step as well as in your base case.
Q: In my recursive case, why doesn't it join together the various pieces correctly?
A: Remember that setting a string = to another variable within a recursive step will not translate in all instances of that function. Be sure to actively add onto your string instead of reassigning it.
Q: My recursive function is really long and icky. How can I make it cleaner?
A: Remind yourself of the strategy in approaching recursive solutions. When are you stopping each recursive call? Make sure you're not making any needless or redundant checks inside of your code.
Q: When I run expression.txt, I get really long math expressions? Is that okay?
A: Yes, that's expected.
Q: What does it mean if my program "unexpectedly finishes"?
A: It might mean that you have infinite recursion, which usually comes when you have not set a proper base case, or when your recursive calls don't pass the right parameters between each other. Try running your program in Debug Mode (F5) and viewing the call stack trace when it crashes to see what line it is on when it crashed. You could also try printing a message at the start of every call to make sure that you don't have infinite recursion.

If you are certain that you do not have infinite recursion, it is possible that there are just too many calls on the call stack because you are filling a very large area of the screen. If you tried to fill the whole screen with the same color, it might crash with a "stack overflow" even if your algorithm is correct. This would not be your fault.

Q: Can I use one of the STL containers from the C++ standard library?
A: No.
Q: I already know a lot of C/C++ from my previous programming experience. Can I use advanced features, such as pointers, on this assignment?
A: No; you should limit yourself to using the material that was taught in class so far.
Q: Can I add any other files to the program? Can I add some classes to the program?
A: No; you should limit yourself to the files and functions in the spec.

Possible Extra Features:

Here are some ideas for extra features that you could add to your program:

Indicating that you have done extra features: If you complete any extra features, then in the comment heading on the top of your program, please list all extra features that you worked on and where in the code they can be found (what functions, lines, etc. so that the grader can look at their code easily).

Submitting a program with extra features: Since we use automated testing for part of our grading process, it is important that you submit a program that conforms to the preceding spec, even if you want to do extra features. If your feature(s) cause your program to change the output that it produces in such a way that it no longer matches the expected sample output test cases provided, you should submit two versions of your program file: a first one with the standard file name without any extra features added (or with all necessary features disabled or commented out), and a second one whose file name has the suffix -extra.cpp with the extra features enabled. Please distinguish them in by explaining which is which in the comment header. Our turnin system saves every submission you make, so if you make multiple submissions we will be able to view all of them; your previously submitted files will not be lost or overwritten.

[an error occurred while processing this directive]