library(adaptiveGPCA)
library(ggplot2)
library(phyloseq)
library(phyloseqGraphTest)
#data(AntibioticPhyloseq)
theme_set(theme_bw())

What graphs can we build from microbiome data?

A few examples:

  • Co-occurrence graphs with nodes at samples

  • Co-occurrence graphs with nodes at taxa

  • Minimum spanning trees between samples

  • Nearest neighbor graphs between samples (knn)

Building co-occurrence graphs with phyloseq

There are special packages for the graph data structures, the main ones are igraph, ggnetwork and more recently ggraph.

library(phyloseq)
library(igraph)
library(ggnetwork)

How do we create graphs?

How do we use these graphs?

Testing a relationship between a factor covariate and a graph

Example: Bacteria sharing between mice

We laod the data and in one phyloseq command we can make a co-occurrence graph:

Between samples

ps1  = readRDS("../../data/psmice.rds")
ij4 <- make_network(ps1, type="samples", distance="jaccard",          max.dist = 0.4, 
     keep.isolates=FALSE)

Question 1: What class is the ij object and what are its components?

Plot the network created by ij

plot_network(ij4, ps1, color="family_relationship", line_weight=0.6)
## Warning: attributes are not identical across measure variables; they will be
## dropped

What do you notice about this graph?

Make the graph several times, what do you notice?

Exercise: Change the threshold so that it is more stringent, say max.dist=0.3, how does the graph change?

set.seed(1275641)
ij3 <- make_network(ps1, type="samples", distance="jaccard",          max.dist = 0.3, 
     keep.isolates=FALSE)
plot_network(ij3, ps1, color="family_relationship", line_weight=0.9)
## Warning: attributes are not identical across measure variables; they will be
## dropped

Making graphs by hand without the phyloseq helpers

sampledata = data.frame(sample_data(ps1))
d1 = as.matrix(phyloseq::distance(ps1, method="jaccard"))
gr = graph.adjacency(d1, mode = "undirected", weighted = TRUE)
net = igraph::mst(gr)
V(net)$id = sampledata[names(V(net)), "host_subject_id"]
V(net)$litter = sampledata[names(V(net)), "family_relationship"]

We make a ggnetwork object from the resulting igraph generated minimum spanning tree and then plot it.


gnet=ggnetwork(net)
## Loading required package: sna
## Loading required package: statnet.common
## 
## Attaching package: 'statnet.common'
## The following object is masked from 'package:base':
## 
##     order
## Loading required package: network
## network: Classes for Relational Data
## Version 1.15 created on 2019-04-01.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
##                     Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
##                     Martina Morris, University of Washington
##                     Skye Bender-deMoll, University of Washington
##  For citation information, type citation("network").
##  Type help("network-package") to get started.
## 
## Attaching package: 'network'
## The following objects are masked from 'package:igraph':
## 
##     %c%, %s%, add.edges, add.vertices, delete.edges,
##     delete.vertices, get.edge.attribute, get.edges,
##     get.vertex.attribute, is.bipartite, is.directed,
##     list.edge.attributes, list.vertex.attributes,
##     set.edge.attribute, set.vertex.attribute
## sna: Tools for Social Network Analysis
## Version 2.4 created on 2016-07-23.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
##  For citation information, type citation("sna").
##  Type help(package="sna") to get started.
## 
## Attaching package: 'sna'
## The following objects are masked from 'package:igraph':
## 
##     betweenness, bonpow, closeness, components, degree, dyad.census,
##     evcent, hierarchy, is.connected, neighborhood, triad.census
ggplot(gnet, aes(x = x, y = y, xend = xend, yend = yend))+
  geom_edges(color = "darkgray") +
  geom_nodes(aes(color = id, shape = litter)) + theme_blank()+
  theme(legend.position="bottom")
MST plot

Figure 1: MST plot


Testing relation between graph and factors

Compute the null distribution and p-value for the test, this is implemented in the phyloseqGraphTest package:

library("phyloseqGraphTest")
gt = graph_perm_test(ps1, "host_subject_id", distance="jaccard",
                     type="mst",  nperm=1000)
gt$pval
## [1] 0.000999001

Histogram of the null distribution generatedby permutation using:

plot_permutations(gt)

Different choices for the skeleton graph

It is not necessary to use an MST for the skeleton graph that defines the edges.

Graphs made by linking nearest neighbors [@schilling1986multivariate]

Distance thresholding work also works well (sometimes called geometric graphs).

phyloseq has functionality for creating graphs based on thresholding a distance matrix through the function make_network.

Create a network by creating a edge between samples whose Jaccard dissimilarity is less than a threshold, which we set below via the parameter max.dist .

This is a co-occurrence network.

net = make_network(ps1, max.dist = 0.35)
sampledata = data.frame(sample_data(ps1))
V(net)$id = sampledata[names(V(net)), "host_subject_id"]
V(net)$litter = sampledata[names(V(net)), "family_relationship"]
netg = ggnetwork(net)

ggplot(netg, aes(x = x, y = y, xend = xend, yend = yend)) +
  geom_edges(color = "darkgray") +
  geom_nodes(aes(color = id, shape = litter)) + theme_blank()+
    theme(legend.position="bottom")

Sometimes it will preferable to adjust the permutation distribution to account for known structure between the covariates.

Friedman–Rafsy test with nested covariates

individual mice (the host_subject_id variable).

a litter (the family_relationship variable) effect ?

plot_test_network(gtnn1)

We permute the family_relationship labels but keep the host_subject_id structure intact.

gt = graph_perm_test(ps1, "family_relationship",
          grouping = "host_subject_id",
          distance = "jaccard", type = "mst", nperm= 1000)
gt$pval
## [1] 0.002997003

plot_permutations(gt)

gtnn1 = graph_perm_test(ps1, "family_relationship",
                      grouping = "host_subject_id",
                      distance = "jaccard", type = "knn", knn = 1)
gtnn1$pval
## [1] 0.002

The dual graph

  • Between samples through their shared taxa.
  • Between taxa: do some of the taxa co-occur more often than one would expect ?

Microbial ‘communities’ as they assemble in the microbiome.

Transpose the data.

Note: always prefer Jaccard to correlation networks.

Summary of this lecture

To summarize what you’ve learned in this lecture :

  • We have defined simple graphs and annotated ones, we called networks.

  • We have learnt how to build and plot Markov chain graphs, phylogenetic trees and minimum spanning trees.

  • How to incorporate a known ‘skeleton’ graph into a differential expression analysis and compute perturbation hotspots in the network.

  • Seen how evolutionary models defined along rooted binary trees serve as the basis for phylogenetic tree estimation.

  • Seen how to incorporate a graph into a differential abundance test using their known phylogenies.

  • Create co-occurrence networks and test the effect of factor covariates using permutation tests.

Useful Packages for Incorporating Graphs into an Analysis

Important examples of graphs and useful R packages

ggnetwork and igraph

Combining graphs with statistical data

structSSI and phyloseq. bioNet

A full list packages that deal with graphs and networks is available here: http://www.bioconductor.org/packages/release/BiocViews.html#___GraphAndNetwork


References