Java programming help
- A useful website that contains many Java-related links is the
class site for CS 108:
Object Oriented Programming. Check out the links available. You can
find help for both the novice and advanced Java programmer.
Grammar file
- The gfile in /afs/ir/class/cs224n/pp1 is in CNF. Making a parser
that deals with unary, etc. rules is extra credit.
Extra credit
Some ways to earn extra credit are:
- Make a generalized CKY parser that can handle empties and unaries.
- Provide a runtime graph showing that runtime goes up roughly cubicly
with sentence length
Printing Parse Trees
- For printing out the parse trees, you can either use the bracketing
scheme on p.98 of the textbook, or you can use an indented list
- If you want to use the bracketing scheme, please look at p.98 of the
textbook. This is in chapter 3. You can access this
online from any Stanford computer.
- Here is an example of the bracketing scheme following the example
in the book:
[S [NP [AT The]
[NNP children]]
[VP [VBD ate] [NP
[AT the] [NN cake]]]]
Parsing versus Recognizing: Recording/Printing Parse Trees
- The algorithm in the handwritten parsing handout is essentially a
CKY recognizer, rather than a parser. To check that parsing is working,
you want a parser that prints out parses. (It's not strictly necessary
for just counting the number of parses, but it's almost certainly wise
for debugging.)
- For a parser, there are then two ways to proceed.
- The one that I think is conceptually easiest is that each time you enter
a cell in the table, you record in a data structure what two things you
put together to make it. These can be pointers back to the items in the
chart that you combined.
- The other is not to do this and then to do a read-out of the parses from
the table (if I made an S from 1-5, let me consider again every
splitpoint, and the rules for an S, and see how I could have made it,
and then recurse on doing this). This may seem a stupid duplication of
work, but (i) it doesn't change the asymptotic time complexity, and (ii) it
is more economical of memory (it _does_ lower the aymptotic memory
bound). Nevertheless, for the assignment, I'd recommend the first strategy.
- What about the parenthetical note "without duplication!" in the last
line of the algorithm? If there's more than one way to represent a given
category in a given cell, don't I need to record that?
Yes, you need to record it, if adopting strategy one above. But,
to keep the parser cubic in time, you need to carefully differentiate
entering things in the parse triangle, and recording ways of making that
thing. To get a cubic time parser, it is crucial that
you only put a nonterminal, say, XP, in the table once, but if following
strategy one, you will want to record in a list the various ways of
making it.
Part 1.2 - "correct sentences"
- What do you mean by 'correct sentences'?
The basic idea behind this part of the assignment is to develop a
grammar/lexicon/corpus of sentences
that shows us that your parser works. It should correctly
parse grammatical sentences and deal with any ambiguities that exist
in the sentences. The 4 sentences listed in this section are just examples
of grammatical sentences, some of which have ambiguities, and therefore
would have multiple parses, and some which would have only one parse.
Your parser should also return 0 parses for any ungrammatical sentences in
your corpus of sentences (and make sure you put some ungrammatical
examples in the corpus!) One example of an ungrammatical sentence is the
example *Chris saw.
When I use the term corpus of sentences, I am referring to the
mysentences file.
- Do we have to handle the sentences shown in the example?
You do not need to have these exact sentences in your corpus of sentences.
These are just examples. Of course, there is no restriction against using
these sentences, either. Just make sure you have some of your own
sentences, too, and make sure that you explain why you chose the sentences
you did in your report. Basically answer the questions "What were you
testing for with these sentences? How does this show that your parser works?"