library(adaptiveGPCA)
library(ggplot2)
library(phyloseq)
data(AntibioticPhyloseq)
theme_set(theme_bw())
Defthelsen and Relman (2011) did a longitudinal analysis of 3 patients who were given two courses of antibiotics.
Measurements of about 2500 different bacterial OTUs from stool samples of three patients (D, E, F)
Each patient sampled \(\sim\) 50 times during the course of treatment with ciprofloxacin (an antibiotic).
Times categorized as Pre Cp, 1st Cp, 1st WPC (week post cipro), Interim, 2nd Cp, 2nd WPC, and Post Cp.
Using the tree completely to define the distances between taxa.
DPCOA
pp = processPhyloseq(AntibioticPhyloseq)
Pavoine, Dufour and Chessel (2004), Purdom (2010) and Fukuyama et al. (2011).
Suppose we have n species in p locations and a (Euclidean) matrix \(\Delta\) giving the squares of the pairwise distances between the species on the tree. Then we can
- Use the distances between species to find an embedding in
\(n -1\) -dimensional space such that the euclidean distances between the species is the same as the distances between the species defined in \(\Delta\).
- Place each of the p locations at the barycenter of its species profile. The euclidean distances between the locations will be the same as the square root of the Rao dissimilarity between them.
- Use PCA to find a lower-dimensional representation of the locations.
Give the species and communities coordinates such that the inertia decomposes the same way the diversity does.
out.agpca = adaptivegpca(pp$X, pp$Q, k = 2)
out.agpca
## An object of class adaptivegpca
## -------------------------------
## Number of axes: 2
## Value of r chosen: 0.462
## Fraction of variance explained
## by first 2 axes:
## 0.195 0.156
plot(out.agpca)
out.ff = gpcaFullFamily(pp$X, pp$Q, k = 2)
out.agpca = visualizeFullFamily(out.ff,
sample_data = sample_data(AntibioticPhyloseq),
sample_mapping = aes(x = Axis1, y = Axis2, color = type),
var_data = tax_table(AntibioticPhyloseq),
var_mapping = aes(x = Axis1, y = Axis2, color = Phylum))