library(phyloseq)
library(ggplot2)
data(GlobalPatterns)
GP <- prune_taxa(taxa_sums(GlobalPatterns) > 0, GlobalPatterns)
GP.chl <- subset_taxa(GP, Phylum=="Chlamydiae")
plot_tree(GP.chl, color="SampleType", shape="Family", label.tips="Genus", size="Abundance") + ggtitle("tree annotation using phyloseq")
This is a bit of a mess, let’s try to reduce the bootstrap numbers to two simple decimals.
phy_tree(GP.chl)
##
## Phylogenetic tree with 21 tips and 20 internal nodes.
##
## Tip labels:
## 100535, 2936, 24341, 579085, 547579, 136933, ...
## Node labels:
## 0.930.581, 0.962.436, 1.000.13747, 0.960.438, 1.000.13748, 0.590.42, ...
##
## Rooted; includes branch lengths.
str(phy_tree(GP.chl))
## List of 5
## $ edge : int [1:40, 1:2] 22 23 24 24 23 25 26 26 25 27 ...
## $ Nnode : int 20
## $ tip.label : chr [1:21] "100535" "2936" "24341" "579085" ...
## $ edge.length: num [1:40] 0.0085 0.0353 0.0037 0.0354 0.0188 ...
## $ node.label : chr [1:20] "0.930.581" "0.962.436" "1.000.13747" "0.960.438" ...
## - attr(*, "class")= chr "phylo"
## - attr(*, "order")= chr "cladewise"
phy_tree(GP.chl)$node.label = substr(phy_tree(GP.chl)$node.label, 1, 4)
plot_tree(GP.chl, nodelabf=nodeplotboot(), color="SampleType", ladderize="left")
plot_tree(GP.chl, nodelabf=nodeplotboot(), ladderize="left", color="SampleType", shape="Family")
A simple dataset containing tree and OTU-table components.
The esophagus is a small and relatively simple dataset by moderns standards. It only contains 3 samples, no sample-data, and a modest quantity of total sequencing per sample that is a relic of an earlier time when resources for this sort of investigation were sparse and sequencing was expensive. Nevertheless, it makes for a nice sized dataset to start displaying trees. (For more details about the dataset and its origin, try entering ?esophagus into the command-line once you have loaded the phyloseq package)
The default tree without any additional parameters is plot with black points that indicate in how many different samples each OTU was found. In this case, the term “OTU” is used quite loosely (it is a loose term, after all) to mean entries in the taxononmic data you are plotting; and in the specific case of trees, it means the tips, even if you have already agglomerated the data such that each tip is equivalent to a rank of class or phylum.
plot_tree(esophagus, title=“Default tree.”)
Making a radial tree is easy with ggplot2, simply recognizing that our vertically-oriented tree is a cartesian mapping of the data to a graphic – and that a radial tree is the same mapping, but with polar coordinates instead.
data(esophagus)
plot_tree(esophagus, color="Sample", ladderize="left") + coord_polar(theta="y")
plot_tree(esophagus, "treeonly", title="method = \"treeonly\"")
A tree with annotation of abundances as size of glyphs and color as source.
plot_tree(esophagus, size="abundance", color="samples")
More examples can be found here
library(adaptiveGPCA)
library(ggplot2)
library(phyloseq)
data(AntibioticPhyloseq)
theme_set(theme_bw())
Defthelsen and Relman (2011) did a longitudinal analysis of 3 patients who were given two courses of antibiotics.
Measurements of about 2500 different bacterial OTUs from stool samples of three patients (D, E, F)
Each patient sampled \(\sim\) 50 times during the course of treatment with ciprofloxacin (an antibiotic).
Times categorized as Pre Cp, 1st Cp, 1st WPC (week post cipro), Interim, 2nd Cp, 2nd WPC, and Post Cp.
Using the tree completely to define the distances between taxa.
DPCOA
pp = processPhyloseq(AntibioticPhyloseq)
Pavoine, Dufour and Chessel (2004), Purdom (2010) and Fukuyama et al. (2011).
Suppose we have n species in p locations and a (Euclidean) matrix \(\Delta\) giving the squares of the pairwise distances between the species on the tree. Then we can
- Use the distances between species to find an embedding in
\(n -1\) -dimensional space such that the euclidean distances between the species is the same as the distances between the species defined in \(\Delta\).
- Place each of the p locations at the barycenter of its species profile. The euclidean distances between the locations will be the same as the square root of the Rao dissimilarity between them.
- Use PCA to find a lower-dimensional representation of the locations.
Give the species and communities coordinates such that the inertia decomposes the same way the diversity does.
out.agpca = adaptivegpca(pp$X, pp$Q, k = 2)
out.agpca
## An object of class adaptivegpca
## -------------------------------
## Number of axes: 2
## Value of r chosen: 0.462
## Fraction of variance explained
## by first 2 axes:
## 0.195 0.156
plot(out.agpca)
out.ff = gpcaFullFamily(pp$X, pp$Q, k = 2)
out.agpca = visualizeFullFamily(out.ff,
sample_data = sample_data(AntibioticPhyloseq),
sample_mapping = aes(x = Axis1, y = Axis2, color = type),
var_data = tax_table(AntibioticPhyloseq),
var_mapping = aes(x = Axis1, y = Axis2, color = Phylum))