Tools for phyloseq manipulation and analysis

There is a variety of R packages and tools that extend the phyloseq capabilities. This Lab explores some additional tools, from other developers. Prepare a reproducible report where you analyse the following.

Installation

Launch R/RStudio and install the latest development version of microbiome R package from github (see installation instructions).

To initiate reproducible documentation, do the following in RStudio:

Open a new Rmarkdown (.html) file
Convert that .html file with the ‘knit’ button
Modify the file and knit again to make your own reproducible report

Example data

Example data set will be the HITChip Atlas, which is available via the microbiome R package in phyloseq format. This data set from Lahti et al. Nat. Comm. 5:4344, 2014 comes with 130 genus-like taxonomic groups across 1006 western adults with no reported health complications. Some subjects have also short time series. Note that this data is based on phylogenetic 16S microarrays, not amplicon sequencing. However, similar patterns of microbiome variation are captured by this platform. This is interesting for comparison purposes, and we use it to demonstrate some further tools for microbiome analysis. This data set is readily available through the microbiome R package as a phyloseq object.

Load the data in R with:

# Load libraries
library(knitr)
library(RColorBrewer)

# Install the latest package version
library("devtools")
install_github("microbiome/microbiome")

# Data citation doi: 10.1038/ncomms5344
library(microbiome)
data(atlas1006) 
print(atlas1006)

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 130 taxa and 1151 samples ]
## sample_data() Sample Data:       [ 1151 samples by 10 sample variables ]
## tax_table()   Taxonomy Table:    [ 130 taxa by 3 taxonomic ranks ]

# Rename the data
pseq <- atlas1006

Alpha diversity

Estimate taxonomic diversity for each sample (function microbiome::alpha). This provides the indices from phyloseq and vegan R packages, and some additional indices. See microbiome tutorial for examples.

If diversity estimation is slow, you can pick only a few indices, or pick a subset of samples.

tab <- alpha(pseq, index = "all")
kable(head(tab))

	observed	chao1	diversity_inverse_simpson	diversity_gini_simpson	diversity_shannon	diversity_fisher	diversity_coverage	evenness_camargo	evenness_pielou	evenness_simpson	evenness_evar	evenness_bulla	dominance_dbp	dominance_dmn	dominance_absolute	dominance_relative	dominance_simpson	dominance_core_abundance	dominance_gini	rarity_log_modulo_skewness	rarity_low_abundance	rarity_rare_abundance
Sample-1	99	108.3750	12.983930	0.9229817	3.187815	16.07126	5	0.3675967	0.6937392	0.1311508	0.1669647	0.3447496	0.1759515	0.3379428	1336	0.1759515	0.0770183	0.9998683	0.8488881	2.059488	0.0264718	0.0001317
Sample-2	98	108.5625	16.595578	0.9397430	3.394462	15.04043	7	0.3437543	0.7403459	0.1693426	0.1483521	0.3958182	0.1716594	0.2821246	1742	0.1716594	0.0602570	1.0000000	0.8188881	2.055369	0.0198069	0.0000000
Sample-3	99	110.2500	8.703493	0.8851036	2.864855	16.26890	4	0.3242763	0.6234560	0.0879141	0.1906035	0.2680793	0.2793437	0.3985416	1992	0.2793437	0.1148964	0.9998598	0.8806975	2.053348	0.0412284	0.0000000
Sample-4	100	111.2500	10.709023	0.9066208	3.056922	15.21763	4	0.3924486	0.6638021	0.1070902	0.1559785	0.3336218	0.1957623	0.3813911	2125	0.1957623	0.0933792	0.9999079	0.8604110	2.057767	0.0247812	0.0000921
Sample-5	98	112.0625	12.248008	0.9183541	3.073742	14.59865	4	0.4112500	0.6703956	0.1249797	0.1429785	0.3194244	0.1686667	0.3334167	2024	0.1686667	0.0816459	0.9999167	0.8672179	2.057402	0.0234167	0.0000000
Sample-6	99	113.0625	9.665190	0.8965359	2.941993	15.94374	3	0.2819936	0.6402429	0.0976282	0.1824172	0.3030473	0.2273187	0.3802123	1799	0.2273187	0.1034641	1.0000000	0.8735717	2.060100	0.0380339	0.0000000

Task: Compare the results between alternative diversity indices (visually or statistically).

Question: What is the difference between species richness and evenness?

Visualize diversities as a function of age

Draw regression curve with smoothed error bars with Visually-Weighted Regression by Solomon M. Hsiang. The R version is modified from Felix Schonbrodt.

# Assign the estimated diversity to sample metadata
sample_data(pseq)$diversity <- tab$diversity_shannon

# Generate smoothed regression plot
p <- plot_regression(diversity ~ age, meta(pseq)) +
       labs(x = "Age", y = "Diversity")
print(p)

Core microbiota

Core microbiota refers to the set of species shared by (almost) all individuals. You can read more about the core microbiota definition in Salonen et al. 2012.

Prevalence

Question: What is the most prevalent genus in the example data. How prevalent it is in the sample collection (tip: use the microbiome::prevalence function).

Relative population frequencies; at 1% compositional abundance threshold:

# Transform to compositional abundances
pseq.rel <- microbiome::transform(pseq, "compositional")

# Calculate prevalences for all taxonomic groups
kable(head(prevalence(pseq.rel, detection = 1/100, sort = TRUE)))

	x
Faecalibacterium prausnitzii et rel.	0.9522155
Ruminococcus obeum et rel.	0.9139878
Oscillospira guillermondii et rel.	0.8801043
Clostridium symbiosum et rel.	0.8714162
Subdoligranulum variable at rel.	0.8357950
Clostridium orbiscindens et rel.	0.8314509

Task: Pick up the core microbiota including taxa that exceed 0.1% relative abundance in over 50% of the samples (prevalence). Tip: check the ‘core’ function of the microbiome R package.

A full phyloseq object with just the core taxa is obtained as follows:

# Pick the core (>0.1% relative abundance in >50% of the samples)
pseq.core <- core(pseq.rel, detection = 0.1/100, prevalence = 50/100)

Question: How many core taxa there are in the example data? If there are multiple samples per individual, how does this affect the analysis?

If you only need the names of the core taxa, do as follows. This returns the taxa that exceed the given prevalence and detection thresholds.

core.members <- core_members(pseq.rel, detection = 0, prevalence = 50/100)

Visualizing the core

Core line plots

Determine core microbiota across various abundance/prevalence thresholds with the blanket analysis (Salonen et al. CMI, 2012) based on various signal and prevalences.

This visualization method has been used for instance in Intestinal microbiome landscaping: Insight in community assemblage and implications for microbial modulation strategies. Shetty et al. FEMS Microbiology Reviews fuw045, 2017.

Note that you can order the taxa on the heatmap with the taxa.order argument.

# Core with compositionals:
prevalences <- seq(.05, 1, .05)
detections <- round(10^seq(log10(1e-3), log10(.2), length = 10), 3)

# Also define gray color palette
gray <- gray(seq(0,1,length=5))
# Only include core taxa
pseq.core <- core(pseq.rel, detection = 0.1/100, prevalence = 50/100)
p <- plot_core(pseq.core, plot.type = "heatmap",
           colours = rev(brewer.pal(5, "Spectral")),           
               prevalences = prevalences,
           detections = detections
           ) +
    labs(x = "Detection Threshold") 
print(p)

Landscape visualization

Let us investigate the population variation in the human gut microbiota (the landscape model). Note that DNA extraction method has a remarkable effect on sample grouping.

# For this example, choose only those samples that have DNA extraction information 
ps <- subset_samples(pseq.core, !is.na(DNA_extraction_method))

# Illustrate sample similarities with NMDS
# NMDS is an alternative ordination technique
plot_landscape(ps, "NMDS", "bray", col = "DNA_extraction_method")

Community typing with Dirichlet Multinomial Mixtures

Test community typing with the DMM model. You may need to install the DirichletMultinomial R/Bioconductor package.

Extra task Visualize the landscape using PCA or another ordination method (PCoA, NMDS..). Check other labs from this course for instructions. Indicate the inferred community types with colors.

Other tools

Experiment with the other available tools in the microbiome tutorial

Toolkit Lab: Pune 2019

18 December 2019