MS&E334: The Structure of Social Data (Fall 2016)
Johan Ugander, Assistant Professor, MS&E
Email: jugander [at] stanford
Office location: Huang 357
Office Hours: TBD
Lecture hours: Tu/Th, 3:00pm-4:20pm
Lecture room: 160-317
Social networks have a rich history of study across many disciplines. Recent opportunities afforded by large-sale online instrumentation and experimentation have begun to provide a rich view of their structure and role in diverse social and economic domains. This course provides a survey of recent research in the study of social networks and large-scale social and behavioral data. Topics will include network models based on random graphs and their properties; centrality and ranking on graphs; ranking from comparisons; social influence and homophily; experimentation and causal inference on networks; heavy-tailed statistical distributions.
Most important links:
The literature below lays the foundation for the lecture material, though not all papers will be discussed in depth. If you have a focused interests in specific papers, feel free to come discuss them with me during office hours. The reference list will almost certainly be expanded in response to class discussions as the course progresses.
Lecture 1: Course overview (9/27)
An introduction to the course and high-level tour of content and goals.
Lecture 2: Graphs and graph properties (9/29)
A review of graph definitions and properties. Graphical degree sequences.
- M.E.J. Newman (2003) "The structure and function of complex networks." SIAM Review 45, 167-256.
- S.L. Hakimi (1962) "On realizability of a set of integers as degrees of the vertices of a linear graph. I", J. SIAM, 10:3, p. 496.
- Havel-Hakimi game: [link]
Lecture 3: Configuration Models (10/4)
Configuration models are uniform distributions over specific spaces of graphs. We discuss the Simple Configuration Model and non-simple Configuration Models, and how to uniformly sample from different graph spaces.
- J. Blitzstein, P. Diaconis (2011) "A sequential importance sampling algorithm for generating random graphs with prescribed degrees", Internet Mathematics.
- B. Fosdick, D. Larremore, J. Nishimura, J. Ugander (2016)
"Configuring Random Graph Models with Fixed Degree Sequences"
Lecture 4: Preferential Attachment, Growth, and Power Laws (10/6)
Power Law literature:
- M. Faloutsos, P. Faloutsos, C. Faloutsos (1999). "On power-law relationships of the internet topology", SIGCOMM.
- M. Mitzenmacher (2004), "A Brief History of Generative Models for Power Law and Lognormal Distributions", Internet Mathematics, vol 1, No. 2, pp. 226-251. [link]
- M.E.J. Newman (2005), "Power laws, Pareto distributions and Zipf's law", Contemporary Physics. [link]
- A. Clauset, C. Shalizi, M.E.J. Newman (2009), "Power-law distributions in empirical data", SIAM Review. [link]
- A.-L. Barabasi, R. Albert (1999). "Emergence of scaling in random networks", Science.
- D. Callaway, J. Hopcroft, J. Kleinberg, M.E.J. Newman, S. Strogatz (2001). “Are randomly grown graphs really random?” Physical Review E. [link]
- D. Liben-Nowell, C. Knipe, C. Coalson (2013). “Indifferent Attachment: The Role of Degree in Ranking Friends”, ASONAM. [link]
- J. Bagrow, D. Brockmann (2013) "Natural Emergence of Clusters and Bursts in Network Evolution", Phys Rev X. [link]
Not discussed: Benford's Law, another "law" that in fact has a rigorous (if and only if) characterization that is directly related to scale-free distributions.
- S. Newcomb. (1881), “Note on the frequency of use of the different digits in natural numbers.” Amer. J Math. 4 39-40.
- F. Benford (1938), “The law of anomalous numbers.” Proceedings of the American Philosophical Society 78 551-572, 1938.
- T.P. Hill (1995), “A statistical derivation of the significant-digit law.” Statistical Science, 10:4, p. 354-363.
- Mark Nigrini (2012), “Benford's Law: Applications for forensic accounting, auditing, and fraud detection”, John Wiley & Sons (Book).
- Radiolab episode: Numbers (2009) [link]
Lecture 5: More graph models (10/11)
Finishing preferential attachment; Stochastic Block Models; ERGMs. Other models.
- P. W. Holland, K. B. Laskey, S. Leinhardt (1983) "Stochastic Blockmodels: First Steps," Social Networks.
- E. M. Airoldi, D. M. Blei, S. E. Fienberg, E. P. Xing (2008) "Mixed membership stochastic blockmodels," JMLR.
- B. Karrer, M.E.J. Newman (2011) "Stochastic blockmodels and community structure in networks," Physical Review E.
Planted partition model:
- A. Condon, D. Karp (2001) "Algorithms for graph partitioning on the planted partition model", Random Structure and Algorithms. (Extended abstract in RANDOM'99, August, 1999).
- F. McSherry (2001) "Spectral Partitioning of Random Graphs", FOCS.
- G. Robins, P. Pattison, Y. Kalish, D. Lusher (2007). "An introduction to exponential random graph models for social networks". Social Networks 29: 173--191.
- S. Bhamidi, G. Bresler, A. Sly (2008) "Mixing time of exponential random graphs", FOCS.
- S. Chatterjee, P. Diaconis (2013) "Estimating and understanding exponential random graph models", Annals of Statistics.
- W. Aiello, F. Chung, L. Lu (2000) "A random graph model for massive graphs", STOC/PNAS.
- P. D. Hoff, A. E. Raftery, M. S. Handcock. (2002) "Latent Space Approaches to Social Network Analysis." JASA.
- S. Lattanzi, D. Sivakumar (2009) "Affiliation Networks." STOC.
- J. Pfeiffer III, S. Moreno, T. La Fond, J. Neville, B. Gallagher. (2014) "Attributed Graph Models: Modeling network structure with correlated attributes." WWW.
"Lecture 6": No class, Johan @ MIT (10/13)
Lecture 7 & 8 : Graph centrality and ranking (10/18, 10/20)
Katz, Bonacich, Eigenvector,PageRank, Betweenness, Harmonic centrality. Personalized variations.
- L. Katz (1953) "A new status index derived from sociometric analysis." Psychometrika, 18(1), 39-43.
- P. Bonacich (1987) "Power and centrality: A family of measures." American Journal of Sociology, p. 1170-1182.
- L. Page, S. Brin, R. Motwani, T. Winograd (1999) "The PageRank citation ranking: Bringing order to the web", Stanford InfoLab.
- A. Ng, A. Zheng, M. Jordan (2001) "Link Analysis, Eigenvectors and Stability", IJCAI. [link]
More recent perspectives:
- P. Boldi, S. Vigna (2014), "Axioms for Centrality", Internet Mathematics 10, p. 222-262.
- S. Vigna (2015) "A Weighted Correlation Index for Rankings with Ties", Proceedings of WWW.
- T. Martin, X. Zhang, M.E.J. Newman (2014) "Localization and centrality in networks", Phys Rev E.
- D. Gleich (2015) "PageRank Beyond the Web", SIAM Review 57:3, pp. 321-363. [link]
- G. Jeh, J. Widom (2003) "Scaling personalized web search", Proceedings of WWW.
- K. Kloster, D.F. Gleich (2014) "Heat kernel based community detection", Proceedings of KDD.
- I. Kloumann, J. Ugander, J. Kleinberg (2016) "Block Models and Personalized PageRank", arxiv [link]
Lecture 9: Comparisons and ranking from comparisons (10/25)
Thurstone and Bradley-Terry models; Elo ratings.
- L. L. Thurstone (1927) “A law of comparative judgment,” Psychological Review.
- M. Glickman (1999) “Parameter estimation in large dynamic paired comparison experiments,” J Royal Statistical Society C.
- R. Herbrich, T. Minka, T. Graepel (2006) "Trueskill™: A Bayesian skill rating system", NIPS.
- D. Hunter (2004) "MM Algorithms for Generalized Bradley-Terry Models", Annals of Statistics. [link]
- K. Tsukida, M. Gupta (2011) "How to Analyze Paired Comparison Data", University of Washington Tech Report # UW-EE-2011-0004. [link]
- S. Negahban, S. Oh, D. Shah (2012) "Rank Centrality: Ranking from Pair-wise Comparisons," arXiv. [link]
- S. Ragain, J. Ugander (2016) "Pairwise Choice Markov Chains" [link]
- Y. Sismanis (2010) "How I won the `Chess Ratings - Elo vs the Rest of the World' Competition," arXiv. [link]
- D. Shahaf, E. Horvitz, R. Mankoff (2015) “Inside Jokes: Identifying Humorous Cartoon Captions” KDD. [link]
Other methods that seek status embeddings:
- B. Ball, M. E. J. Newman (2013) "Friendship networks and social status", Network Science. [link]
Lecture 10: The friendship paradox (10/27)
Friendship paradox literature:
- S. L. Feld (1991). “Why your friends have more friends than you do.” American Journal of Sociology, p. 1464-1477.
- J. Ugander, B. Karrer, L. Backstrom, C. Marlow (2011) “The Anatomy of the Facebook Social Graph,” arXiv.
- F. Kooti, N. O. Hodas, K. Lerman (2014) “Network Weirdness: Exploring the Origins of Network Paradoxes”, Proceedings of ICWSM.
- S. Lattanzi, Y. Singer (2015). “The Power of Random Neighbors in Social Networks” Proceedings of WSDM.
Applications of the friendship paradox:
- R. Pastor-Satorras, A. Vespignani (2002) “Immunization of complex networks.” Phys Rev E; 65: 036104.
- N. A. Christakis, J. H. Fowler (2010) “Social network sensors for early detection of contagious outbreaks", PLOS One.
- D. A. Kim et al. (2015) “Social network targeting to maximise population behaviour change: a cluster randomised controlled trial”, The Lancet.
Lecture 11: Models of social processes: influence and contagion (11/1)
- M. Granovetter (1978) "Threshold Models of Collective Behavior", AJS.
- D. McAdam (1986) "Recruitment to high-risk activism: The case of freedom summer", AJS.
- J. Kleinberg (2007) "Cascading Behavior in Networks: Algorithmic and Economic Issues" in Algorithmic Game Theory (book chapter) [link]
- S. Aral, D. Walker (2012) "Identifying influential and susceptible members of social networks", Science.
- B. State, L. Adamic (2015) "The Diffusion of Support in an Online Social Movement: Evidence from the Adoption of Equal-Sign Profile Pictures", CSCW.
Lecture 12: Influence maximization; complex contagion; Homophily and Influence (11/3)
- D. Kempe, J. Kleinberg, E. Tardos (2003) "Maximizing the spread of influence through a social network", Proceedings of KDD. [2003 KDD version, 2015 journal version]
- D. Centola, M. Macy (2007) "Complex contagions and the weakness of long ties", AJS.
- D. Centola (2010) "The spread of behavior in an online social network experiment", Science.
- D. Centola (2011) "An experimental study of homophily in the adoption of health behavior", Science.
- J. Ugander, L. Backstrom, C. Marlow, J. Kleinberg. (2012) "Structural Diversity in Social Contagion", PNAS.
- S. Aral, L. Muchnik, A. Sundararajan (2009) "Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks", PNAS.
Lecture 13: Causal Inference (11/8)
- C. Manski (1993) "Identification of endogenous social effects: The reflection problem", The Review of Economic Studies.
- C. Shalizi, A. Thomas (2011) "Homophily and contagion are generically confounded in observational social network studies", Sociological methods & research.
- E. Bakshy, D. Eckles, R. Yan, I. Rosenn. (2012) "Social Influence in Social Advertising: Evidence from Field Experiments," Proceedings of EC.
- E. Bakshy, I. Rosenn, C. Marlow, L. Adamic (2012) "The Role of Social Networks in Information Diffusion," Proceedings of WWW.
- J. Zhang, D. Brackbill, S. Yang, D. Centola (2015) "Efficacy and causal mechanism of an online social media intervention to increase physical activity: Results of a randomized controlled trial", Preventitive Medicine Reports. [paper, data]
Lecture 14: Causal Inference under Interference (11/10)
- P. Aronow, C Samii (2011) "Estimating average causal effects under interference between units", arXiv.
- C. Manski (2013) "Identification of treatment response with social interactions", The Econometrics Journal.
- J. Ugander, B. Karrer, L. Backstrom, J. Kleinberg. (2013) "Graph Cluster Randomization: Network Exposure to Multiple Universes", Proceedings of KDD.
- D. Eckles, B. Karrer, J. Ugander (2014) "Design and analysis of experiments in networks: Reducing bias from interference", arXiv.
- H. Gui, Y. Xu, A. Bhasin, J. Han (2015) "Network A/B Testing: From Sampling to Estimation", Proceedings of WWW.
- Y. Xu, N. Chen, A. Fernandez, O. Sinno, A. Bhasin (2015) "From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks", Proceedings of KDD.
- D. Walker, L. Muchnik (2015) "Design of Randomized Experiments in Networks", Proceedings of IEEE. [link]
Lecture 15: Graph clustering and community detection
Lecture 16: The small-world phenomena (smallness and navigability)
- S. Milgram (1967) “The small world problem,” Psychology Today.
- J. Travers, S. Milgram (1969) “An Experimental Study of the Small World Problem,” Sociometry.
- D. Watts, S. Strogatz (1998) "Collective dynamics of 'small-world' networks", Nature.
- J. Kleinberg (2000) "The small-world phenomenon: An algorithmic perspective", STOC.
- D. Liben-Nowell, J. Novak, R. Kumar, P. Raghavan, A. Tomkins (2005) "Geographic routing in social networks", PNAS.
- L Backstrom, E Sun, C Marlow (2010) "Find me if you can: improving geographical prediction with social and spatial proximity", ICWSM.
- Z Yang, W Chen (2015) "A Game Theoretic Model for the Formation of Navigable Small-World Networks", WWW.
- J. Leskovec, E. Horvitz (2008) "Planetary-Scale Views on an Instant-Messaging Network", Proceedings of WWW.
- L. Backstrom, P. Boldi, M. Rosa, J. Ugander, S. Vigna (2012) "Four Degrees of Separation", WebSci.
- A. Jacobs, S.F. Way, J. Ugander, A. Clauset (2015) "Assembling thefacebook: Using Heterogeneity to Understand Online Social Network Assembly", WebSci.
Break - Week of Thanksgiving
Week 9: Dissecting Papers (11/29, 12/1)
During Week 9 the course will take on an active discursive style, aiming to synthesize what we've discussed as we dissect the methodologies of recent, complex applied papers. We will take a survey during Week 8 to determine the papers we want to discuss. In 2015 the following papers were discussed, though we will only do two papers this year:
- R. M. Bond, C. J. Fariss, J. J. Jones, A. D. I. Kramer, C. Marlow, J. E. Settle, J. H. Fowler (2012), "A 61-million-person experiment in social influence and political mobilization", Nature. [paper, supplemental information]
- L. Beaman, A. BenYishay, J. Magruder, A. M. Mobarak (2015) "Can Network Theory-based Targeting Increaes Technology Adaption?", Working paper.
- D. A. Kim, A. R. Hwong, D. Stafford, D. A. Hughes, A J. O'Malley, J. H. Fowler, N. A. Christakis (2015). "Social network targeting to maximise population behaviour change: a cluster randomised controlled trial", Lancet. [link]
In-class presentations of student projects.
Tools and Data
Here are some libraries that might be useful for the problem sets and projects:
Some data sources: