library(adaptiveGPCA)
library(ggplot2)
library(phyloseq)
library(phyloseqGraphTest)
#data(AntibioticPhyloseq)
theme_set(theme_bw())
A few examples:
Co-occurrence graphs with nodes at samples
Co-occurrence graphs with nodes at taxa
Minimum spanning trees between samples
Nearest neighbor graphs between samples (knn)
There are special packages for the graph data structures, the main ones are igraph, ggnetwork and more recently ggraph.
library(phyloseq)
library(igraph)
library(ggnetwork)
We laod the data and in one phyloseq command we can make a co-occurrence graph:
ps1 = readRDS("../../data/psmice.rds")
ij4 <- make_network(ps1, type="samples", distance="jaccard", max.dist = 0.4,
keep.isolates=FALSE)
Question 1: What class is the ij object and what are its components?
Plot the network created by ij
plot_network(ij4, ps1, color="family_relationship", line_weight=0.6)
## Warning: attributes are not identical across measure variables; they will be
## dropped
What do you notice about this graph?
Make the graph several times, what do you notice?
Exercise: Change the threshold so that it is more stringent, say max.dist=0.3, how does the graph change?
set.seed(1275641)
ij3 <- make_network(ps1, type="samples", distance="jaccard", max.dist = 0.3,
keep.isolates=FALSE)
plot_network(ij3, ps1, color="family_relationship", line_weight=0.9)
## Warning: attributes are not identical across measure variables; they will be
## dropped
sampledata = data.frame(sample_data(ps1))
d1 = as.matrix(phyloseq::distance(ps1, method="jaccard"))
gr = graph.adjacency(d1, mode = "undirected", weighted = TRUE)
net = igraph::mst(gr)
V(net)$id = sampledata[names(V(net)), "host_subject_id"]
V(net)$litter = sampledata[names(V(net)), "family_relationship"]
We make a ggnetwork object from the resulting igraph generated minimum spanning tree and then plot it.
gnet=ggnetwork(net)
## Loading required package: sna
## Loading required package: statnet.common
##
## Attaching package: 'statnet.common'
## The following object is masked from 'package:base':
##
## order
## Loading required package: network
## network: Classes for Relational Data
## Version 1.15 created on 2019-04-01.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
## Mark S. Handcock, University of California -- Los Angeles
## David R. Hunter, Penn State University
## Martina Morris, University of Washington
## Skye Bender-deMoll, University of Washington
## For citation information, type citation("network").
## Type help("network-package") to get started.
##
## Attaching package: 'network'
## The following objects are masked from 'package:igraph':
##
## %c%, %s%, add.edges, add.vertices, delete.edges,
## delete.vertices, get.edge.attribute, get.edges,
## get.vertex.attribute, is.bipartite, is.directed,
## list.edge.attributes, list.vertex.attributes,
## set.edge.attribute, set.vertex.attribute
## sna: Tools for Social Network Analysis
## Version 2.4 created on 2016-07-23.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
## For citation information, type citation("sna").
## Type help(package="sna") to get started.
##
## Attaching package: 'sna'
## The following objects are masked from 'package:igraph':
##
## betweenness, bonpow, closeness, components, degree, dyad.census,
## evcent, hierarchy, is.connected, neighborhood, triad.census
ggplot(gnet, aes(x = x, y = y, xend = xend, yend = yend))+
geom_edges(color = "darkgray") +
geom_nodes(aes(color = id, shape = litter)) + theme_blank()+
theme(legend.position="bottom")
Figure 1: MST plot
Compute the null distribution and p-value for the test, this is
implemented in the phyloseqGraphTest package:
library("phyloseqGraphTest")
gt = graph_perm_test(ps1, "host_subject_id", distance="jaccard",
type="mst", nperm=1000)
gt$pval
## [1] 0.000999001
Histogram of the null distribution generatedby permutation using:
plot_permutations(gt)
It is not necessary to use an MST for the skeleton graph that defines the edges.
Graphs made by linking nearest neighbors [@schilling1986multivariate]
Distance thresholding work also works well (sometimes called geometric graphs).
phyloseq has functionality for creating graphs based on
thresholding a distance matrix through the function
make_network.
Create a network by creating a edge
between samples whose Jaccard dissimilarity is less than a threshold,
which we set below via the parameter max.dist .
This is a co-occurrence network.
net = make_network(ps1, max.dist = 0.35)
sampledata = data.frame(sample_data(ps1))
V(net)$id = sampledata[names(V(net)), "host_subject_id"]
V(net)$litter = sampledata[names(V(net)), "family_relationship"]
netg = ggnetwork(net)
ggplot(netg, aes(x = x, y = y, xend = xend, yend = yend)) +
geom_edges(color = "darkgray") +
geom_nodes(aes(color = id, shape = litter)) + theme_blank()+
theme(legend.position="bottom")
Sometimes it will preferable to adjust the permutation distribution to account for known structure between the covariates.
individual mice (the host_subject_id variable).
a litter (the family_relationship variable) effect ?
plot_test_network(gtnn1)
We permute the family_relationship labels but keep the
host_subject_id structure intact.
gt = graph_perm_test(ps1, "family_relationship",
grouping = "host_subject_id",
distance = "jaccard", type = "mst", nperm= 1000)
gt$pval
## [1] 0.002997003
plot_permutations(gt)
gtnn1 = graph_perm_test(ps1, "family_relationship",
grouping = "host_subject_id",
distance = "jaccard", type = "knn", knn = 1)
gtnn1$pval
## [1] 0.002
Microbial ‘communities’ as they assemble in the microbiome.
Transpose the data.
Note: always prefer Jaccard to correlation networks.
To summarize what you’ve learned in this lecture :
We have defined simple graphs and annotated ones, we called networks.
We have learnt how to build and plot Markov chain graphs, phylogenetic trees and minimum spanning trees.
How to incorporate a known ‘skeleton’ graph into a differential expression analysis and compute perturbation hotspots in the network.
Seen how evolutionary models defined along rooted binary trees serve as the basis for phylogenetic tree estimation.
Seen how to incorporate a graph into a differential abundance test using their known phylogenies.
Create co-occurrence networks and test the effect of factor covariates using permutation tests.
structSSI and phyloseq. bioNet
A full list packages that deal with graphs and networks is available here: http://www.bioconductor.org/packages/release/BiocViews.html#___GraphAndNetwork