MS&E334: Topics in Social Data (Fall 2017)

Johan Ugander, Assistant Professor, MS&E
Email: jugander [at] stanford
Office location: Huang 357
Office Hours: by appointment

Lecture hours: Tu/Th, 3:00pm-4:20pm
Lecture room: Building 380, Room 381U (first floor)

Note: This page is being updated as course material for 2017 is being selected that differs slightly from previous years. The general structure of the course is fixed, with the content of some lectures up for change. This message will be removed when the course content is finalized.

Course Description

This course provides a in-depth survey of methods research for the analysis of large-scale social and behavioral data. There will be a particular focus on recent developments in discrete choice theory and preference learning. Connections will be made to graph-theoretic investigations common in the study of social networks. Topics will include random utility models, item-response theory, ranking and learning to rank, centrality and ranking on graphs, and random graphs. The course is intended for Ph.D. students, but masters students with an interested in research topics are welcome. Recommended: 221, 226, CS161, or equivalents.

Most important links:

Lecture material

The literature below lays the foundation for the lecture material, though only a handful of papers will be discussed in depth. If you have a focused interests in specific papers, feel free to come discuss them with me during office hours. The reference list will almost certainly be expanded in response to class discussions as the course progresses.

Week 1

Lecture 1: Course overview (9/26)

An introduction to the course and high-level tour of content and goals.

Lecture 2: Graphs and graph properties (9/28)

A review of graph definitions and properties. Graphical degree sequences. Combinatorial constraints on graphs.

General reference: Combinatorial constraints:

Week 2

Lectures 3 & 4: Random graph models (10/3, 10/5)

A broad tour of random graph models. Configuration models (uniform distributions over specific spaces of graphs), Preferential Attachment models, power law degree sequences, stochastic block models, ERGMs.
Configuration models:

Power Law literature: Other growth models: SBMs: Planted partition model: ERGMs: Even more models:

Week 3

Lecture 5 & 6 : Graph centrality and ranking (10/10, 10/12)

Katz, Bonacich, Eigenvector, PageRank, Betweenness, Harmonic centrality. Personalized variations.

Foundational papers: More recent perspectives: Centrality, personalized:

Week 4

Lectures 7 & 8: Ranking from comparisons and choice modelling (10/17, 10/19)

Thurstone and Bradley-Terry-Luce models; Random Utility Models; Elo ratings; Item-response theory; Markov chain models.

Markov chain models: Example applications: Other methods that seek status embeddings:

Week 5

Lecture 9: Ranking and permutation data (10/24)
The Mallows model, Plackett-Luce, Rank Aggregation, Self-organizing lists
"Lecture 10": No class, Johan @ MIT (10/26)

Week 6

Lecture 11: Models of social processes: influence and contagion (10/31)
Lecture 12: Influence maximization; complex contagion; Homophily and Influence (11/2)

Week 7

Lecture 13: Causal Inference of Peer Effects (11/7)

Lecture 14: Causal Inference under Interference (11/9)

Weeks 8

Lecture 15: The friendship paradox (11/14)
Friendship paradox literature: Applications of the friendship paradox:
Lecture 16: The small-world phenomena (11/16)
Distance distributions:

Break - Week of Thanksgiving

Week 9: Dissecting Papers (11/28, 11/30)

During Week 9 the course will take on an active discursive style, aiming to synthesize what we've discussed as we dissect the methodologies of recent, complex applied papers. We will take a survey during Week 8 to determine the papers we want to discuss. In recent years the following papers have been discussed. We will only do two papers.

Week 10

In-class presentations of student projects.

Tools and Data

Here are some libraries that might be useful for the problem sets and projects:

Some data sources: