Finding Insular Friend Groups in Social Networks

Experience the magic of graph spectra: use the eigenvectors of a (tiny) subset of the Facebook graph to find large, insular groups of friends. (The code and data are in the Github repo here)

Introduction

Here I will play with a part of the Facebook friend graph to demonstrate how results from spectral graph theory can be used on a real dataset to infer groups of insular friends and connectedness of the graph. The data come from the Stanford Network Analysis Project (SNAP). The csv data file hosted on GitHub is part of the ego-Facebook dataset on the SNAP website. The data file has 61796 rows, and 1495 unique persons. Facebook friendships are represented naturally by a node for each person, and an edge if and only if two people are friends on Facebook. In the dataset each row represents a friendship (edge) between two people (nodes) as identified by unique identifiers. Therefore we have a graph \(G\) where the nodes/vertices \(V\) are the people and the edges \(E\) represent the friendships.

Graph Representation

This graph can actually be represented as an “adjacency” matrix \(A\) where \[A_{ij} = 1 \text{ if } (i,j) \in E \text{ else 0}\] We can also store the degree of each node in the graph as a matrix \(D\) where the diagonal entry \((i,i)\) is the degree of node \(i\). The Laplacian matrix associated with the graph \(G\) then is defined to be \[L_G = D-A\] and is at the heart of the graph spectral theory and the theory of spectral embeddings which is often used for visializing graphs. Mathematically it can be shown that for any vector \(v\), if one were to interpret the elements of vector \(v\) as assignments to each of the vertices of the graph, then \(v^\intercal L_G v\) represents the sum of squares of all the lengths of edges of the graph. \(L_G\) is symmetric real-valued and therefore all eigenvectors are orthogonal (from the most essential theorem in linear algebra – Spectral Theorem). It can also shown to be positive semi-definite, i.e. eigenvalues of \(L_G \geq 0\).

Using Graph Laplacian to find Number of Connected Components of Graph

The eigenvalues of \(L_G\) contain information about the structure (or, more precisely, information about the extent to which the graph has structure). First up, there is a famous theorem that shows that the multiplicity of the zero\(th\) eigenvalue reveals the number of connected components of the graph. Since there are 6 zero eigenvalues, there must be as many connected components! Neat and insightful!

Sizes of Connected Components

Using spectral graph theory results, we can also find out the number of people in the largest and second largest connected components say. Now, if we look at any of the eigenvectors that have corresponding eigenvalues of 0, then I can make use of the following results from graph theory:

  • Intra-component values are same: All members of a connected component should have the same value in this eigenvector
  • Inter-component values must be distinct: No two connected components can have same values in the eigenvector.

So what this is saying is that each unique value in such an eigenvector corresponds to a connected component and multiplicity of that value corresponds to number of members of that component. Simple and easy! I can then sort the results of the multiplicities (such as in the snippet below) for unique values in eigenvector. The size of the largest connected component is 1484 and that of the second largest connected component is 3. (Note that I use small enough epsilon to take care of the numerical issues with the linear algebra packages.)

from collections import defaultdict

compSizes = []
EPSILON = 1e-12 
for i in range(1495):
    eigval = eigvals[i]
    eigvec = eigvecs[:, i]
    eigvec = np.round(eigvec, 6)
    if eigval <= EPSILON:
        counts = defaultdict(int)
        for val in eigvec:
            counts[val] += 1
        print(sorted(counts.values()))

Spectral Clustering: How to quantify Connectedness?

Say we want to now find insular friend groups in the network. By insular groups, I mean groups that have only limited number of friendships (edges) outside their groups. So really, this is a strcutural question about the vertices and its edges of the graph. Let’s see how connectedness in a graph can be conceptualized.

The conductance of a set of nodes in a graph is a natural measure of how tightly knit/insular that set is, with a lower conductance indicating a more tightly knit set. Given a graph \(G= (V, E)\) with adjacency matrix \(A\), and a subset of the nodes \(S⊂V\), the conductance is defined as: \[cond(S)=\frac{\sum_{i \in S, j \in V\setminus S}A_{i,j}}{\min{(A(S), A(V\setminus S))}}\]

where \(A(S)\) is the sum of degrees of vertices in set \(S\).

In the context of a friendship graph such as ours, the conductance of a set of individuals corresponds to the ratio of the number of friendships between the outside world and that set, to the total number of friendships involving that set. If \(cond(S) = 0\), then that set is disconnected from the rest of the graph, and if \(cond(S) = 1\), then there are no internal friendships among members of that set. So insular groups correspond to low conductance.

Finding insular groups

This idea of conductance is what helps identify “clusters” or insular groups. Say our “insular” groups have limits on minimum and maximum membership – say 150 (min) and maximum 750 (total vertices/2) members. To ensure they are all fairly disjoint, let’s limit the conductance to 0.1 for all these insular groups.

But how do we find these insular groups now? Once again, spectral theory comes to rescue. We know that any eignvector of the 0 eigenvalue will have as many unique entries as there are the number of connected components. But how about an eigenvector whose eigenvalue is not 0 but very close to 0? Spectral theory tells me that I can find insular groups by investigating the eigenvector entries of the eigenvectors (in terms of how many unique or near unique values there are – this number corresponds to number of low conductance groups) whose eigenvalues are small yet non-zero. The reason we want to look at the smallest eigenvalues is that the smallest eigenvectors offer ways of assigning grouping weights to nodes of the friendship graph such that the neighbors are as close together as possible. Closely packed neighbors imply small conductance.

Based on some hit-and-trial, I found three insular groups \(S_i\) as follows

  • S1 : S1 was found in the vector corresponding to 7th smallest eigenvalue. The set comprises all nodes whose corresponding vector entry is .042±.005
  • S2 : S2 was found in the vector corresponding to 10th smallest eigenvalue. The set comprises all nodes whose corresponding vector entry is .014±.003
  • S3 : S3 was found in the vector corresponding to 30th smallest eigenvalue. The set comprises all nodes whose corresponding vector entry is −.027±.002

Conclusions

Spectral graph theory and linear algebra come together to help analyse graph’s structure and connectedness. The mathematical discussions here are far from complete – they’re not even the tip of the iceberg. A good starting point for more reading will be Dan Spielman’s Spectral Graph Theory notes. Hopefully from reading about this miniproject, you will have a sense of how and when graph spectra can become useful. It’s another good tool to have in your repertoire! There are more network related open data sets here if you would like to explore yourself and apply some of the ideas I shared!

Explore all projects

Updated: