Lecture Materials

Questions & Answers


Q: I just want to make sure -- assignment 7 is the last mandatory assignment right?

A1:  Yep


Q: If I wanted to learn more about how the crawling is done, could you refer me to where I could learn that?

A1:  Stop by office hours or post on Ed and we can explain how it works!


Q: would mapping also tell you where within those documents the term comes up?

A1:  would mapping also tell you where within those documents the term comes up?

A2:  In more advanced indices, absolutely!


Q: Do we convert the string ‘100,000’ to an integer?

A1:  Nope


Q: What about commas between words?

A1:  Those would be at the end of a particular term, so .strip would get rid of them


Q: So string.punctuation only strips the punctuations at the end?

A1:  .strip only removes characters from the start and end of a string, and string.punctuation is just a list of punctuation characters


Q: does the program automatically ignore it or would we need to write that?

A1:  You need to ignore puncutation by stripping it from each term


Q: but if it is a separate element like $$ does it get ignored by the program

A1:  No, you’ll need to reason through how to do that


Q: if you apply lowercase to all of the terms and a particular term is a number, would that cause an error?

A1:  Nope, numbers just aren’t modified


Q: So we would need to create a condition to check and see if the stripped string is empty before adding it to the dict?

A1:  That sounds like a good idea


Q: does --Yoda-- need to be stripped twice to become yoda? or does .strip do it on one iteration?

A1:  .strip will remove all the punctuation in one fell swoop


Q: Is there a reason why it’s “Gandhi’s wisdom” and NOT ‘Gandhi’s Wisdom’?

A1:  That’s just what it says in the file in the first line


Q: What if we had Ghandi's Wisdom! Do we get rid of the '!' at the end when we store the title in the dictionary for titles?

A1:  Stripping punctuation just like you do for other terms


Q: How do you differentiate between the plural form of a word and a plural possesive when indexing, since the apostraphe at the end of the word will be removed? Ex: Computers vs computers’

A1:  Good question! You can’t do that in this kind of index


Q: Do large search engines just have massive indexes that they search over? Or are the indexes generated upon query?

A1:  Do large search engines just have massive indexes that they search over? Or are the indexes generated upon query?

A2:  Fantastic question! They have enormous indices they’re updating very frequently. Those indices are stored across hundreds or thousands of computers using something that’s kind of like a dictionary :)


Q: What was the reason that dictionaries are quicker to search over than, for example, lists?

A1:  hashing! Which you learn about in cs106b. But basically they have a really cool system to look up where in a dictionary an entry lives

A2:  Dictionaries rely on something called a hash function, which allows you to go directly from a key to where in your memory they would be. Stop by office hours to learn more about how this works! (or take 106b)

A3:  What was the reason that dictionaries are quicker to search over than, for example, lists?


Q: do we currently have the skills to be able to create some kind of an interface and actually link to these articles?

A1:  But you could display the contents of the file on a canvas by reading it!

A2:  You could make an interface using a canvas! The dataset we’re giving you is text files, not URLs, so linking will be a little tricly.


Q: Does the “-s” part of the comamnd line prompt call the search function of our code or the search function built-in to the command terminal?

A1:  The search function in the starter code


Q: When is the last day of CS106A classes?

A1:  Friday is our last concrete lecture and then on Monday, we’ll have a quick class to show the contest winners!


Q: What are the state of the art search algorithms doing now?

A1:  Using a lot of very cool machine learning and AI to rank articles!

A2:  The end of these slides actually have a few overviews of some of the techniques that most modern techniques are based on

A3:  What are the state of the art search algorithms doing now?


Q: Are the number of shards per term determined by popularity of query of that term?

A1:  live answered

A2:  Are the number of shards per term determined by popularity of query of that term?


Q: Very briefly, how was python created? Was python created using python o__o

A1:  live answered


Q: Just like how Google grew to be such a giant today, how do you imagine search engines in 20-30 years?

A1:  live answered


Q: Mehran, what applications of google have you created or helped creating?

A1:  live answered