Ulana

Introducing the Unicellular Long-read Assembly aNd Annotation (Ulana) bioinformatics pipeline. Ulana was developed in-house to provide a fast, convenient, and user-friendly option for assembling and annotating bacterial genomes.

Genome assembly is the process of taking these small fragments of DNA and figuring out their correct order and arrangement to reconstruct the complete genome of an organism. It's like piecing together a jigsaw puzzle, but with DNA instead of pictures.

at a glance

Overview of the Ulana pipeline. An in-depth description can be found here.

HOw it works

Genome assembly can be summarized in 5 major steps

1) Quality control and filtering: when we're putting together a big puzzle, we want to make sure that all the puzzle pieces we use are good and not broken or damaged. Quality control and filtering in genome assembly is like checking the puzzle pieces to make sure they are in good shape and fit well together. We must sort through the raw DNA sequences to remove any sequences that have low quality scores or are too short to be sufficiently informative.

2) Generating a draft assembly: once the junk DNA sequences have been removed, the draft assembly process is like putting that puzzle back together. Imagine the genome as a puzzle; the DNA sequences are the pieces we need to put them together to complete the puzzle. This process typically iterates until the number of genomic fragments (aka contigs) cannot be further connected.

3) Polishing the draft assembly: now that we have draft genome assembly we have to make sure it is as accurate as possible. Think of it like having someone double-check the puzzle to make sure everything is in the right place. To do this, we use a technique known as "polishing" (specifically read-mapping) to compare the original DNA reads that were used for assembly with the assembled genome. They check if the reads match up well with the genome. If they find any mismatches or differences, they can make adjustments to correct them.

4) Assembly quality control: before we start further analysis on the genome we must look for any remaining errors or uncertainties in the genome; It's like making sure all the puzzle pieces are in the right place and fit together nicely. We basically need to check if there are any missing pieces, extra pieces, or incorrect connections in the genome.

5) Post hoc analysis: finally we can analyzing the genome itself, similiar to exploring different parts of the puzzle to learn more about the picture. We can compare the assembled genome to other reference genomes or databases to check for consistency, accuracy, and species identification. Furthermore, we examine specific regions of the genome in more detail to understand their functions or look for interesting patterns.

TOol Access

Getting started with Ulana

The Ulana pipeline can run on any Windows, Mac or Linux based device and is distributed as a docker image; as long as you have docker installed, you can execute the pipeline from any computer. Unfortunately, at this time, Ulana is not compatible with chromebooks or ChromeOS based devices.

Download and install docker then run the following lines of code in your terminal to download Ulana. In-depth instructions for running Ulana can be found here:

docker pull ethill/ulana:dev

Public repositories for Ulana:

The source code for the ulana is hosted on github while the current build is hosted on docker hub.

Periodically run “docker pull ethill/ulana:dev” to update to the latest version.