EPI2MEviz

EPI2MEviz is a companion tool for the ONT 16S EPI2ME analysis. This R based shiny app will automatically plot rarefaction curves to determine if enough data was collected, calculate relative abundance to understand community composition and conduct a Bray Curtis dissimilarity principle coordinate analysis to see which samples are similar to each other.

At a glance

For a detailed explanation of the pipeline use these slides.

How it works

Bacterial community sequencing analysis can be summarized in 5 major steps

1) Quality control and filtering: when we are trying to understand the a bacterial community using DNA sequences, we want to make sure the DNA sequences we are using are of high quality. Quality control and filtering ensure that the DNA was will classify are not damaged or misread. We also want to remove any sequences that are either too long or too short based on the piece of DNA we amplified.

2) Identifying the bacteria with DNA: once we have high quality DNA we compare it to a database of known sequences using the Basic Local Alignment Tool (BLAST) algorithm. Think of the song identification tool “SHAZAM”. By playing some of the song, SHAZAM will tell you the song name and artist. However, instead of playing a song, we are using DNA sequences to find the organism it came from.

3) Assessing the number of species: as we are identifying the DNA sequences, we must know if we capture the majority of the diversity within the sample. To do this, we generate rarefaction curves. Rarefaction curves tell us how many species are found based on the amount of DNA sequenced. As we sequence more DNA but the number of species does not plateau, we can infer that we need to continue to sequence the sample. However, if we see the number of species being to plateau as we collected more DNA sequences, we can infer that we have captured the bulk of the diversity within the sample.

4) Calculating relative abundance: after we have identified all of the bacteria in a sample and ensured that we have sequenced enough, we can count the number of DNA sequences per organism identified and compare that to the total number of sequences generated from the run. This will give us an idea of which genera are present in the samples we sequenced and how DNA they account for in our sequencing run.

5) Comparing sample similarity: the final step in the process is to determine which samples are similar. We can do this by taking into account both the number of species identified within the sample and the number of DNA sequences per unique species. We can then compare them using a “Bray Curtis dissimilarity“ matrix in conjunction with a principle coordinate analysis (PCoA) to see which samples naturally cluster together based on their composition. This analysis can be made even more powerful if you include additional data collected with you sample (like the temperature, time of year, pH, turbidity etc.)

TOOL ACCESS

Getting started with EPI2MEviz

The EPI2MEviz pipeline can run on any Windows, Mac or Linux based device and is distributed as a docker image; as long as you have docker installed, you can execute the pipeline from any computer. Unfortunately, at this time, EPI2MEviz is not compatible with chromebooks or ChromeOS based devices.

Download and install docker then run the following lines of code in your terminal to download EPI2MEviz. In-depth instructions for running EPI2MEviz can be found here:

docker pull ethill/EPI2MEviz:beta

Public repositories for EPI2MEviz:

The source code for the EPI2MEviz is hosted on github while the current build is hosted on docker hub.

Periodically run “docker pull ethill/EPI2MEviz:beta” to update to the latest version.