R for Social Network Analysis

R for Social Network Analysis

R is an open-source statistical programming platform and a useful tool for sna with many advantages over traditional sna software packages. With a little coding and patience, one can produce analyses and visualizations that better suit the problem at hand with this singular platform, rather than learning gui application after application.

To begin using R for sna, install R and the following packages: igraph, sna, network, statnet. The links to the right provide a good introduction to the capabilities of each package, and the built-in R documentation for each package provides the detail for each function.

Much of the material here is convered in the more comprehensive "Social Network Analysis Labs in R and SoNIA," on which I collaborated with Dan McFarland, Sean Westwood and Mike Nowak. 

This material is designed to be friendly to users with no experience with social network analysis (sna), but it is not a thorough introduction. For a great online introduction to social network analysis see the online book Introduction to Social Network Methods by Robert Hanneman and Mark Riddle.

Social Networks

The term 'social network' is increasingly used in the mainstream where it is inextricably tied to notions of influence. Mark Granovetter's articles on "The Strength of Weak Ties" (Granovetter 1973) and "Threshold Models of Collective Behavior" (Granovetter 1978) were probably the first to ignite public fascination with social networks and the spread of ideas, but Malcom Gladwell's (2000) best selling The Tipping Point is surely responsible for the most recent public fascination with social networks and the spread of social phenomena. Gladwell writes that change occurs when sociological phenomena (ideas, products, behaviors) reach critical mass; in other words, these phenomena spread through society like diseases. This idea has proven so attractive to that people now use the expression "that video went viral" to describe popular YouTube clips. In Gladwell's "framework," the success or failure of any social epidemic depends on the configuration of the network of social ties, which are analogous to disease vectors. He argues that a relatively few number of people, known as "connectors, mavens, and salesmen" hold the keys to spreading a good idea to a large enough number of people so it 'sticks.' The implication is that with the right combination of these few people on your side, you wield major social influence.

Often, people assume that the number of connections a person has marks him or her as someone among Gladwell's precious few connectors, mavens and/or salesmen. For some people, the social networking sites LinkedIn and Facebook - aside from keeping track of their social network and meeting new people - are a sort of contest to see who can get the most connections. The idea is that people with the most connections are the most valuable.

Is this the way the world works? Physicist and network sociologist Duncan Watts says no. Watts did empirical work examining email networks, and found that hubs were not responsible for the vast majority of the spread. Some might complain about the external validity (generalizability) of his work, but his claim that randomness plays a larger part in network phenomena would explain why advertising executives cannot simply recruit these network hubs and run successful campaigns. On the other hand, physicist Albert-Laszlo Barabasi (Barabasi 2003) writes that Gladwell noticed something that is in fact accurate, and which is not limited to human behavior. To understand what he means, we need to take network theory to a higher level of abstraction and think of people as nodes and their ties to others as links, or edges. In the graph, or network diagram, below, each numbered circle is a node and each arrow between nodes is an edge.

Nodes and Edges

According to Barabasi, Gladwell's rare connectors, or nodes with an anomalously vast number of connections, are present in diverse complex systems. Networks from biology, computer science, and ecology feature this topology - many nodes with few ties, and sparsely distributed connector nodes with a vast number of ties. There is a large literature on contagion in networks, but my purpose here is to motivate the use of R for sna, so I only use two popular examples. Speaking of which, note that both Barabasi and Watts have devised network models that one can use to simulate networks in R, using the igraph package.

Social Network Analysis

Social network analysis is the formal study of systems of people, with an emphasis on their relationships. It evolved from a combination of mathematics and social science fields, including Graph Theory (see Euler 1736 and the Konigsberg Bridge problem), Psychology (see Heider and balance theory) and anthropology (see Granovetter and weak ties). Further information available at Analytictech's Notes on the History of Social Network Analysis. Note that the number of connections, ties, links, arcs, edges or whatever you want to call relationships, is formally called degree. This should not be confused with the pop culture notion of 'degrees of separation' in social networks, which is formally called path distance or geodesic distance.

One more motivating anecdote about the power of sna and then on to using R for it: Google's workhorse search engine ranking algorithm, PageRank, is actually a variant on an sna concept - Bonacich Power Centrality. Bonacich (1987) hypothesized that someone's power in society depends on the power of his or her social contacts. Bonacich formalized this mathematically: c i = B ( c 1 R i 1 + c 2 R i 2 + . . . + c n R in ) , where c i is the person in question, B is the magnitude of the effect, and R ij is the strength of the relationship between the person in question, i, and each of the other people, j, under consideration (See Jeroen Bruggeman's explanation for a more complete explanation). If B = 1 , the formula becomes eigenvector centrality, of which PageRank is a variant. Page, et al. (1998) do not cite Bonacich, but it's safe to say that a social network analyst appears to have been the first to think up the concept.

Works cited:

Albert-Laszlo Barabasi. 2003. Linked. Plume Publishing. pp 56-57.

Gladwell, Malcom. 2000. The Tipping Point. Little Brown Publishing.

Granovetter, Mark. 1973. "The Strength of Weak Ties"; American Journal of Sociology, Vol. 78, No. 6., pp 1360-1380

Granovetter, Mark. 1978. "Threshold Models of Collective Behavior"; American Journal of Sociology, Vol. 83, No. 6, pp 1420-1443

Leisch, Friedrich. "Sweave and Beyond: Computations on Text Documents." Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003). Available in PDF here.

Page, Lawrence; Brin, Sergey; Motwani, Rajeev and Winograd, Terry (1999). The PageRank citation ranking: Bringing order to the Web. Available here.