CS 101

Servers and Backend

Announcements

  • Paper clarifications
  • OH switch-up: Monday's OH will be with me, 1:30-3:00 in Gates 167
  • All alternate exams should have gotten an email - email Tyler if you didn't

Plan for Today

  • Recall: requests go to a server, which returns a response
  • Today: how do servers figure out what information to return?
    • Google
    • Facebook

Distributed Systems: MapReduce

  • Problem: more memory is expensive; more CPU is expensive
  • Idea: link a bunch of cheap computers together into a "giant" computer
  • Have each computer solve a tiny part of the problem
  • Word Count example

Information Storage: Databases

  • (Usually) stored across many computers (distributed system)
    • Added benefit: can spatially disperse the knowledge
  • Like a giant Excel sheet, but with millions/billions/trillions of rows
  • Usually can't "see" all the data - choose certain columns at a time, or filter out rows with certain features
    • Example: I want to send an email to all users in North America who last logged in between 5 and 7 days ago and who have an outstanding friend request
  • Basic Idea: companies store a lot of information, then responses involve searching the saved information based on the request

Google: Getting Information

  • Indexes the internet
  • "Spiders" "crawl" the internet (Google calls them "Googlebots")
  • Start on a page, index that page, follow all outbound links
  • Store all the information in a database
    • Contains info about the words pages contain

Google: Evaluating Relevance

figure
Source: Wikipedia
  • Request includes search terms
  • Need to derive meaning from order of terms (Natural Language Processing)
  • Search all the indexed websites
  • Look for terms and their synonyms; terms in the title are better
  • PageRank: a measure of how "important" a website is
    • Sort of like how academic papers work: being cited by lots of papers is better, and being cited by other important papers is better

Google Search: Recap

Facebook: Storing Friends

figure
Source: Wikipedia
  • Social Network: people are connected to each other through friendships
  • Called a graph in CS
    • nodes = people
    • edges = friendships
  • Other uses of graphs: the Internet, road networks, disease outbreaks, company hierarchies

Facebook as a Network

figure
Source: Facebook
  • Friendship Paradox: your friends have more friends on average than you do
  • Triadic Closure: how People You May Know works
  • Degrees of Separation and (Kevin) Bacon number

Storing Other Information

  • Stores likes, comments, posts, live videos, messages, etc.
  • Big idea: give IDs to users and each type of interaction
  • Tables in a database for each of these, linked by IDs
  • News Feed algorithm: get the content from each of your friends, attempt to rank using relevance, popularity, and recentness

Recap

  • Companies decide how to store information
  • Many computers involved with most requests
  • How companies prioritize information and interpret requests has a huge impact on society