![]() |
LINGUIST 138/238     -     SYMBSYS 138   -     Autumn 2004
Homework 7 |
| Due: Friday Dec 3 before 10:00am |
Read this entire page before starting!!
Reminder: The due date for this homework is Friday Dec 3 before 10:00am
This homework has two parts
First, you'll need to get a Google key. Go to api.google.com and sign up for a key. (It's under Create a Google Account). PLEASE DO THIS BEFORE THANKSGIVING.
Now you need to make sure you can run the programs to search Google for you. We've copied some relevant scripts for doing this in Java and in Perl onto the AFS directory for the class. It looks like these will currently only work from the Elaine file server in Sweet Hall, so you'll need to ssh into elaine.stanford.edu to run these. (Although if you have Soap or other web services running on other machines feel free to try it from those). We've given you sample scripts that take a Google query and return 10 documents. Both scripts require you to enter the KEY that you get from Google when you register at Google.
For perl, you can use the program /afs/ir/class/linguist238/googly.pl, which we stole from the book Google Hacks. You first modify it by putting your key here in this line:
my $google_key='insert key here';You then run it as follows:
/afs/ir/class/linguist238/googly.pl
It will print out the top 10 results from Google for your query, together with their titles, and the "snippet", the short text that Google returns. Of course, you may want to to modify this code as you work on the homework.
For java, you can do the following from the command line (where KEY is your google key, and Foo is your query string):
java -cp /afs/ir/class/linguist238/googleapi.jar com.google.soap.search.GoogleAPIDemo KEY search Foo
The sample java code is in /afs/ir/class/linguist238/GoogleAPIDemo.java More information on the Google java class is in /afs/ir/class/linguist238/googleapi.
Here's the steps for building a simplified AskMSR-style question answering.
Given a question, rewrite the question in various ways to generate a number of search strings that are likely to match an answer. For example, given:
Where is the Louvre located?you might generate strings like the following:
the Louvre is located the is Louvre located the Louvre located isby writing some simple rules. For example, one rule might be:
If the query begins with the word "where", remove that word and try moving the next word everywhere else in the query.You could even (but don't have to) have more complicated rules, such as
In addition, if the query begins with the word "where" and doesn't end in "located", add the word "located". (or if it does end in "located", replace "located" with "near" or "in)producing more strings like the following:
the Louvre is in the Louvre is near