[an error occurred while processing this directive]
Assignment by Marty Stepp and Victoria Kirst. Based on a problem by Stuart Reges of U of Washington.
This problem focuses on recursion.
This is an individual assignment. Write your own solution and do not work in a pair/group on this program.
For this part you will write a function for generating random sentences from a grammar.
Vector<string> grammarGenerate(istream& input, string symbol, int times)
Your function accepts a reference to an input file representing a language grammar, along with a symbol to randomly generate and the number of times to generate it.
Your function should generate the given number of random expansions of the given symbol and return them as a Vector of strings.
A formal language is a set of words and/or symbols along with a set of rules, collectively called the syntax of the language, defining how those symbols may be used together. A grammar is a way of describing the syntax and symbols of a formal language. Even programming languages have grammars; here is a link to a formal grammar for the Java language.
Many language grammars are described in a format called Backus-Naur Form (BNF), which is what we'll use on this assignment. In BNF, some symbols in a grammar are called terminals because they represent fundamental words of the language. A terminal in English might be "boy" or "run" or "Mariana". Other symbols of the grammar are called non-terminals and represent high-level parts of the language syntax, such as a noun phrase or a sentence. Every non-terminal consists of one or more terminals; for example, the verb phrase "throw a ball" consists of three terminal words.
The BNF description of a language consists of a set of derivation rules, where each rule names a symbol and the legal transformations that can be performed between that symbol and other constructs in the language. For example, a BNF grammar for the English language might state that a sentence consists of a noun phrase and a verb phrase, and that a noun phrase can consist of an adjective followed by a noun or just a noun. Rules can be described recursively (in terms of themselves). For example, a noun phrase might consist of an adjective followed by another noun phrase, such as the phrase "big green tree" which consists of the adjective "big" followed by the noun phrase "green tree".
A BNF grammar is specified as an input file containing one or more rules, each on its own line, of the form:
non-terminal ::= rule|rule|rule|...|rule
A separator of ::= (colon colon equals) divides the non-terminal from its expansion rules.
There will be exactly one such ::= separator per line.
A | (pipe) separates each rule; if there is only one rule for a given non-terminal, there will be no pipe characters.
Each rule will consist of one or more whitespace-separated tokens.
The following is a valid example BNF input file describing a small subset of the English language.
Non-terminal names such as <s>, <np> and <tv> are short for linguistic elements such as sentences, noun phrases, and transitive verbs.
<s>::=<np> <vp> <np>::=<dp> <adjp> <n>|<pn> <dp>::=the|a <adjp>::=<adj>|<adj> <adjp> <adj>::=big|fat|green|wonderful|faulty|subliminal|pretentious <n>::=dog|cat|man|university|father|mother|child|television <pn>::=John|Jane|Sally|Spot|Fred|Elmo <vp>::=<tv> <np>|<iv> <tv>::=hit|honored|kissed|helped <iv>::=died|collapsed|laughed|wept
This grammar's language can represent sentences such as "The fat university laughed" and "Elmo kissed a green pretentious television". This grammar cannot describe the sentence "Stuart kissed the teacher" because the words "Stuart" and "teacher" are not part of the grammar. It also cannot describe "fat John collapsed Spot" because there are no rules that permit an adjective before the proper noun "John", nor an object after intransitive verb "collapsed".
Though the non-terminals in the previous example language are surrounded by < >, this is not required.
By definition any token that ever appears on the left side of the ::= of any line is considered a non-terminal, and any token that appears only on the right-hand side of ::= in any line(s) is considered a terminal.
Do not assume that non-terminals will be surrounded by < > in your code.
Each line's non-terminal will be a non-empty string that does not contain any whitespace.
You may assume that individual tokens in a rule are separated by a single space, and that there will be no outer whitespace surrounding a given rule or token.
Your grammarGenerate function will perform two major tasks:
You may want to separate these steps into one or more helper function(s), each of which performs one step. It is important to segregate the recursive part of the algorithm away from the non-recursive part.
You are given a client program that does the user interaction.
The main function supplies you with an input file stream to read the BNF file.
Your code must read in the file's contents and break each line into its symbols and rules so that it can generate random elements of the grammar as output.
When you generate random elements, you store them into a Vector which is returned.
The provided main program loops over the vector and prints the elements stored inside it.
Your program should exactly reproduce the format and general behavior demonstrated in this log, although you may not exactly recreate this scenario because of the randomness that your code performs. (Don't forget to use the course File → Compare Output... feature in the console, or the course web site's Output Comparison Tool, to check output for various test cases.)
Grammar file name? sentence.txt Symbol to generate (Enter to quit)? <dp> How many to generate? 3 1: the 2: the 3: a
Symbol to generate (Enter to quit)? <np> How many to generate? 5 1: a wonderful father 2: the faulty man 3: Spot 4: the subliminal university 5: Sally
Symbol to generate (Enter to quit)? <s> How many to generate? 7 1: a green green big dog honored Fred 2: the big child collapsed 3: a subliminal dog kissed the subliminal television 4: Fred died 5: the pretentious fat subliminal mother wept 6: Elmo honored a faulty television 7: Elmo honored Elmo
Expected output: Here are some additional expected output files to compare. It's hard to match the expected output exactly because it is random. But your function should return valid random results as per the grammar that was given to it. Your program's graphical Console window has a File → Compare Output feature for checking your output.
We have also written a Grammar Verifier web tool where you can paste in the randomly generated sentences and phrases from your program, and our page will do its best to validate that they are legal phrases that could have come from the original grammar file. This isn't a perfect test, but it is useful for finding some kinds of mistakes and bugs.
Paste your sentence.txt output line(s) below to see whether they match the format required by the grammar.
Expression evaluator: Paste your expressions.txt E output below to evaluate it to see whether you have created a legal expression.