CS 101
Servers and Backend
Announcements
- Paper clarifications
- OH switch-up: Monday's OH will be with me, 1:30-3:00 in Gates 167
- All alternate exams should have gotten an email - email Tyler if you didn't
Plan for Today
- Recall: requests go to a server, which returns a response
- Today: how do servers figure out what information to return?
Distributed Systems: MapReduce
- Problem: more memory is expensive; more CPU is expensive
- Idea: link a bunch of cheap computers together into a "giant" computer
- Have each computer solve a tiny part of the problem
- Word Count example
Information Storage: Databases
- (Usually) stored across many computers (distributed system)
- Added benefit: can spatially disperse the knowledge
- Like a giant Excel sheet, but with millions/billions/trillions of rows
- Usually can't "see" all the data - choose certain columns at a time, or filter out rows with certain features
- Example: I want to send an email to all users in North America who last logged in between 5 and 7 days ago and who have an outstanding friend request
- Basic Idea: companies store a lot of information, then responses involve searching the saved information based on the request
Google: Getting Information
- Indexes the internet
- "Spiders" "crawl" the internet (Google calls them "Googlebots")
- Start on a page, index that page, follow all outbound links
- Store all the information in a database
- Contains info about the words pages contain
Google: Evaluating Relevance
- Request includes search terms
- Need to derive meaning from order of terms (Natural Language Processing)
- Search all the indexed websites
- Look for terms and their synonyms; terms in the title are better
- PageRank: a measure of how "important" a website is
- Sort of like how academic papers work: being cited by lots of papers is better, and being cited by other important papers is better
Google Search: Recap
Facebook: Storing Friends
- Social Network: people are connected to each other through friendships
- Called a graph in CS
- nodes = people
- edges = friendships
- Other uses of graphs: the Internet, road networks, disease outbreaks, company hierarchies
Facebook as a Network
- Friendship Paradox: your friends have more friends on average than you do
- Triadic Closure: how People You May Know works
- Degrees of Separation and (Kevin) Bacon number
Storing Other Information
- Stores likes, comments, posts, live videos, messages, etc.
- Big idea: give IDs to users and each type of interaction
- Tables in a database for each of these, linked by IDs
- News Feed algorithm: get the content from each of your friends, attempt to rank using relevance, popularity, and recentness
Recap
- Companies decide how to store information
- Many computers involved with most requests
- How companies prioritize information and interpret requests has a huge impact on society