![]() |
LINGUIST 138/238     -     SYMBSYS 138   -     Autumn 2004
Homework 3: Part of Speech Tagging |
| Due: October 19 at the start of class |
Read this entire page before starting!!
Do exercises 8.1, 8.2, and 8.3 from in the reading chapter ("Word Classes and Part-of-Speech Tagging") from Jurafsky and Martin. Exercise 8.3 requires a partner in the class, so make sure to pick a partner in class soon, and in any case by Thursday.
Implement the Most-Frequent-Tag algorithm for part-of-speech tagging as discussed in class on Tuesday. You should create your dictionary of possible tags, and your tag frequencies, from the file /afs/ir/class/linguist238/WWW/restricted/brown.train.txt. For any word that appears in your test set but that is not in the dictionary, (i.e., unknown words) assign it the tag NN.
Compute the accuracy of your Most-Frequent-Tag algorithm on the test set in /afs/ir/class/linguist238/WWW/restricted/brown.test.txt.
Have a look at some of the tags that you got wrong. Write me two rules (just descriptively, in English, you don't have to write any code) which would have improved your tagging if you had run them as post-processors to your Most-Frequent-Tag algorithm.
Improve the unknown-word tagging algorithm, to do something smarter than just assigning all unknown words the tag NN. Think about the examples we discussed in class Tuesday.
What to turn in:
How to turn it in:
.
lastname_firstname_hw#.pl (or ,java,etc)
lastname_firstname_hw#.doc (or ,txt,etc)