Tools for phyloseq manipulation and analysis

There is a variety of R packages and tools that extend the phyloseq capabilities. This Lab explores some additional tools, from other developers. Prepare a reproducible report where you analyse the following.

Installation

Launch R/RStudio and install the latest development version of microbiome R package from github (see installation instructions).

To initiate reproducible documentation, do the following in RStudio:

  1. Open a new Rmarkdown (.html) file
  2. Convert that .html file with the ‘knit’ button
  3. Modify the file and knit again to make your own reproducible report

Example data

Example data set will be the HITChip Atlas, which is available via the microbiome R package in phyloseq format. This data set from Lahti et al. Nat. Comm. 5:4344, 2014 comes with 130 genus-like taxonomic groups across 1006 western adults with no reported health complications. Some subjects have also short time series. Note that this data is based on phylogenetic 16S microarrays, not amplicon sequencing. However, similar patterns of microbiome variation are captured by this platform. This is interesting for comparison purposes, and we use it to demonstrate some further tools for microbiome analysis. This data set is readily available through the microbiome R package as a phyloseq object.

Load the data in R with:

# Load libraries
library(knitr)
library(RColorBrewer)

# Install the latest package version
library("devtools")
install_github("microbiome/microbiome")

# Data citation doi: 10.1038/ncomms5344
library(microbiome)
data(atlas1006) 
print(atlas1006)
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 130 taxa and 1151 samples ]
## sample_data() Sample Data:       [ 1151 samples by 10 sample variables ]
## tax_table()   Taxonomy Table:    [ 130 taxa by 3 taxonomic ranks ]
# Rename the data
pseq <- atlas1006

Alpha diversity

Estimate taxonomic diversity for each sample (function microbiome::alpha). This provides the indices from phyloseq and vegan R packages, and some additional indices. See microbiome tutorial for examples.

If diversity estimation is slow, you can pick only a few indices, or pick a subset of samples.

tab <- alpha(pseq, index = "all")
kable(head(tab))
observed chao1 diversity_inverse_simpson diversity_gini_simpson diversity_shannon diversity_fisher diversity_coverage evenness_camargo evenness_pielou evenness_simpson evenness_evar evenness_bulla dominance_dbp dominance_dmn dominance_absolute dominance_relative dominance_simpson dominance_core_abundance dominance_gini rarity_log_modulo_skewness rarity_low_abundance rarity_rare_abundance
Sample-1 99 108.3750 12.983930 0.9229817 3.187815 16.07126 5 0.3675967 0.6937392 0.1311508 0.1669647 0.3447496 0.1759515 0.3379428 1336 0.1759515 0.0770183 0.9998683 0.8488881 2.059488 0.0264718 0.0001317
Sample-2 98 108.5625 16.595578 0.9397430 3.394462 15.04043 7 0.3437543 0.7403459 0.1693426 0.1483521 0.3958182 0.1716594 0.2821246 1742 0.1716594 0.0602570 1.0000000 0.8188881 2.055369 0.0198069 0.0000000
Sample-3 99 110.2500 8.703493 0.8851036 2.864855 16.26890 4 0.3242763 0.6234560 0.0879141 0.1906035 0.2680793 0.2793437 0.3985416 1992 0.2793437 0.1148964 0.9998598 0.8806975 2.053348 0.0412284 0.0000000
Sample-4 100 111.2500 10.709023 0.9066208 3.056922 15.21763 4 0.3924486 0.6638021 0.1070902 0.1559785 0.3336218 0.1957623 0.3813911 2125 0.1957623 0.0933792 0.9999079 0.8604110 2.057767 0.0247812 0.0000921
Sample-5 98 112.0625 12.248008 0.9183541 3.073742 14.59865 4 0.4112500 0.6703956 0.1249797 0.1429785 0.3194244 0.1686667 0.3334167 2024 0.1686667 0.0816459 0.9999167 0.8672179 2.057402 0.0234167 0.0000000
Sample-6 99 113.0625 9.665190 0.8965359 2.941993 15.94374 3 0.2819936 0.6402429 0.0976282 0.1824172 0.3030473 0.2273187 0.3802123 1799 0.2273187 0.1034641 1.0000000 0.8735717 2.060100 0.0380339 0.0000000

Task: Compare the results between alternative diversity indices (visually or statistically).

Question: What is the difference between species richness and evenness?

Visualize diversities as a function of age

Draw regression curve with smoothed error bars with Visually-Weighted Regression by Solomon M. Hsiang. The R version is modified from Felix Schonbrodt.

# Assign the estimated diversity to sample metadata
sample_data(pseq)$diversity <- tab$diversity_shannon

# Generate smoothed regression plot
p <- plot_regression(diversity ~ age, meta(pseq)) +
       labs(x = "Age", y = "Diversity")
print(p)

Core microbiota

Core microbiota refers to the set of species shared by (almost) all individuals. You can read more about the core microbiota definition in Salonen et al. 2012.

Prevalence

Question: What is the most prevalent genus in the example data. How prevalent it is in the sample collection (tip: use the microbiome::prevalence function).

Relative population frequencies; at 1% compositional abundance threshold:

# Transform to compositional abundances
pseq.rel <- microbiome::transform(pseq, "compositional")

# Calculate prevalences for all taxonomic groups
kable(head(prevalence(pseq.rel, detection = 1/100, sort = TRUE)))
x
Faecalibacterium prausnitzii et rel. 0.9522155
Ruminococcus obeum et rel. 0.9139878
Oscillospira guillermondii et rel. 0.8801043
Clostridium symbiosum et rel. 0.8714162
Subdoligranulum variable at rel. 0.8357950
Clostridium orbiscindens et rel. 0.8314509

Task: Pick up the core microbiota including taxa that exceed 0.1% relative abundance in over 50% of the samples (prevalence). Tip: check the ‘core’ function of the microbiome R package.

A full phyloseq object with just the core taxa is obtained as follows:

# Pick the core (>0.1% relative abundance in >50% of the samples)
pseq.core <- core(pseq.rel, detection = 0.1/100, prevalence = 50/100)

Question: How many core taxa there are in the example data? If there are multiple samples per individual, how does this affect the analysis?

If you only need the names of the core taxa, do as follows. This returns the taxa that exceed the given prevalence and detection thresholds.

core.members <- core_members(pseq.rel, detection = 0, prevalence = 50/100)

Visualizing the core

Core line plots

Determine core microbiota across various abundance/prevalence thresholds with the blanket analysis (Salonen et al. CMI, 2012) based on various signal and prevalences.

This visualization method has been used for instance in Intestinal microbiome landscaping: Insight in community assemblage and implications for microbial modulation strategies. Shetty et al. FEMS Microbiology Reviews fuw045, 2017.

Note that you can order the taxa on the heatmap with the taxa.order argument.

# Core with compositionals:
prevalences <- seq(.05, 1, .05)
detections <- round(10^seq(log10(1e-3), log10(.2), length = 10), 3)

# Also define gray color palette
gray <- gray(seq(0,1,length=5))
# Only include core taxa
pseq.core <- core(pseq.rel, detection = 0.1/100, prevalence = 50/100)
p <- plot_core(pseq.core, plot.type = "heatmap",
           colours = rev(brewer.pal(5, "Spectral")),           
               prevalences = prevalences,
           detections = detections
           ) +
    labs(x = "Detection Threshold") 
print(p)    

Landscape visualization

Let us investigate the population variation in the human gut microbiota (the landscape model). Note that DNA extraction method has a remarkable effect on sample grouping.

# For this example, choose only those samples that have DNA extraction information 
ps <- subset_samples(pseq.core, !is.na(DNA_extraction_method))

# Illustrate sample similarities with NMDS
# NMDS is an alternative ordination technique
plot_landscape(ps, "NMDS", "bray", col = "DNA_extraction_method")

Community typing with Dirichlet Multinomial Mixtures

Test community typing with the DMM model. You may need to install the DirichletMultinomial R/Bioconductor package.

Extra task Visualize the landscape using PCA or another ordination method (PCoA, NMDS..). Check other labs from this course for instructions. Indicate the inferred community types with colors.

Other tools

Experiment with the other available tools in the microbiome tutorial

See also List of R tools for microbiome analysis.