June 1st, 2020
Q: I just want to make sure -- assignment 7 is the last mandatory assignment right?
A1: Yep
Q: If I wanted to learn more about how the crawling is done, could you refer me to where I could learn that?
A1: Stop by office hours or post on Ed and we can explain how it works!
Q: would mapping also tell you where within those documents the term comes up?
A1: would mapping also tell you where within those documents the term comes up?
A2: In more advanced indices, absolutely!
Q: Do we convert the string ‘100,000’ to an integer?
A1: Nope
Q: What about commas between words?
A1: Those would be at the end of a particular term, so .strip would get rid of them
Q: So string.punctuation only strips the punctuations at the end?
A1: .strip only removes characters from the start and end of a string, and string.punctuation is just a list of punctuation characters
Q: does the program automatically ignore it or would we need to write that?
A1: You need to ignore puncutation by stripping it from each term
Q: but if it is a separate element like $$ does it get ignored by the program
A1: No, you’ll need to reason through how to do that
Q: if you apply lowercase to all of the terms and a particular term is a number, would that cause an error?
A1: Nope, numbers just aren’t modified
Q: So we would need to create a condition to check and see if the stripped string is empty before adding it to the dict?
A1: That sounds like a good idea
Q: does --Yoda-- need to be stripped twice to become yoda? or does .strip do it on one iteration?
A1: .strip will remove all the punctuation in one fell swoop
Q: Is there a reason why it’s “Gandhi’s wisdom” and NOT ‘Gandhi’s Wisdom’?
A1: That’s just what it says in the file in the first line
Q: What if we had Ghandi's Wisdom! Do we get rid of the '!' at the end when we store the title in the dictionary for titles?
A1: Stripping punctuation just like you do for other terms
Q: How do you differentiate between the plural form of a word and a plural possesive when indexing, since the apostraphe at the end of the word will be removed? Ex: Computers vs computers’
A1: Good question! You can’t do that in this kind of index
Q: Do large search engines just have massive indexes that they search over? Or are the indexes generated upon query?
A1: Do large search engines just have massive indexes that they search over? Or are the indexes generated upon query?
A2: Fantastic question! They have enormous indices they’re updating very frequently. Those indices are stored across hundreds or thousands of computers using something that’s kind of like a dictionary :)
Q: What was the reason that dictionaries are quicker to search over than, for example, lists?
A1: hashing! Which you learn about in cs106b. But basically they have a really cool system to look up where in a dictionary an entry lives
A2: Dictionaries rely on something called a hash function, which allows you to go directly from a key to where in your memory they would be. Stop by office hours to learn more about how this works! (or take 106b)
A3: What was the reason that dictionaries are quicker to search over than, for example, lists?
Q: do we currently have the skills to be able to create some kind of an interface and actually link to these articles?
A1: But you could display the contents of the file on a canvas by reading it!
A2: You could make an interface using a canvas! The dataset we’re giving you is text files, not URLs, so linking will be a little tricly.
Q: Does the “-s” part of the comamnd line prompt call the search function of our code or the search function built-in to the command terminal?
A1: The search function in the starter code
Q: When is the last day of CS106A classes?
A1: Friday is our last concrete lecture and then on Monday, we’ll have a quick class to show the contest winners!
Q: What are the state of the art search algorithms doing now?
A1: Using a lot of very cool machine learning and AI to rank articles!
A2: The end of these slides actually have a few overviews of some of the techniques that most modern techniques are based on
A3: What are the state of the art search algorithms doing now?
Q: Are the number of shards per term determined by popularity of query of that term?
A1: live answered
A2: Are the number of shards per term determined by popularity of query of that term?
Q: Very briefly, how was python created? Was python created using python o__o
A1: live answered
Q: Just like how Google grew to be such a giant today, how do you imagine search engines in 20-30 years?
A1: live answered
Q: Mehran, what applications of google have you created or helped creating?
A1: live answered