Nabil-Fareed Alikhan

Bioinformatics · Microbial Genomics · Software Development

Practical: Read Classification with Kraken2

Posted on November 11, 2024

Part 4 of 10 in the series: Genomic Quality Control

One of the things to help us understand what's in our data is to classify the reads using Kraken2. We can use Kraken2 to classify reads against a database of known sequences. This is a quick way to get an idea of what is in our data. We can then visualise the results in another tools like Krona or Pavian.

Kraken 2 is a bioinformatics tool and software platform designed for the taxonomic classification of DNA sequences in metagenomic data. Metagenomics involves the study of genetic material collected from environmental samples, such as soil, water, or clinical specimens, to understand the microbial diversity present in these samples. Kraken 2 is a popular tool in this field, as it allows researchers to assign taxonomic labels to the sequences, helping them identify the microorganisms present in the samples.

For this exercise, we have three of the same samples that were processed in three different labs, for a total of nine samples. Usually, you know which organisms have been sent to you, but in this case I will let you figure that out from the data provided.

🙏 Thanks

Many thanks to Andrea Telatin and Thanh Le Viet, who provided these sequence data.

Your tasks are:

💡 Tip

Remember that there are three original isolates (Sample-1, Sample-3, Sample-8), that have been processed by three different groups (Lab-1, Lab-2, Lab-3); This means that we expect "Lab-1-Sample-1", "Lab-2-Sample-1", "Lab-3-Sample-1" to be the same.

Then use this information to answer the following questions:

The rest of this page gives information on how to answer these questions. The answers to these questions is here.

💡 Tip

We previous discussed the requirements regarding yield in "A framework for QC". In this case, we would like at least 20X coverage.

Help! I'm stuck

If you are having problems getting Kraken2 to run, here are the report output files. These files have enough information to answer the questions above. You use these files in Pavian as well.

There are some krona plots available here.

Running Kraken2 on these samples

I will use https://usegalaxy.eu/ as an easy way to run Kraken2, and it will allow you to follow along. You may be able to do this on the command-line later using some instructions here

📝 Note

It's more important to understand the output, so if you are short on time; please skip to the following exercises exploring the results.

Log in to Galaxy and upload the data

Using the data linked above, upload the sequenced reads to Galaxy - be sure to create these in a List of Pairs collection.

alt text

The collection should look like this, a list of nine pairs, and each pair has a forward and reverse.

alt text

Running Kraken2

You should be able to find Kraken2 with the search bar on the left. The input should be Paired Collection and should be the collection of data you uploaded.

alt text

The database I selected was "Preprint refseq indexes PlusPF". If you use a different database, you will get slightly different results.

⚠️ Warning

Remember to "Print a report" under the Create a report dropdown

alt text

If this is all in order, click Run tool. It may take some time to run, so here are the report output files I prepared earlier that you can use for the next step.

Exploring the results in the Kraken2 report

Open the Kraken reports you created in Galaxy, or use the prepared reports here. The report files are text files and should open in any text editor. It will look something like this,

alt text

There are six columns in each report file:

Exploring the results with Pavian

Pavian is available on a seperate website: https://fbreitwieser.shinyapps.io/pavian/. To use it, download the Kraken reports you created in Galaxy, or use the prepared reports here. Extact the report files from the zip file, and upload them into Pavian.

Pavian input

Exploring the results with Krona

There are two steps that take the Kraken2 report and create the visualisation with Krona. You must convert reports with the "Krakentools Convert kraken report file" as shown below.

Kraken2 to Krona input

You must then use the output of this step in Krona, and set the input type to be Tabular.

Kraken2 to Krona input

The output will be an HTML file, with the results of all the samples; You can open this directly in Galaxy using the "eye".

Kraken2 to Krona input

If you have difficulty running Krona, here are the precalulated results

This content was prepared as part of the GenEpi-BioTrain programme funded by ECDC. The GenEpi-BioTrain programme is an interdisciplinary course in genomic and epidemiology, which was held at Institut Pasteur between May 27th and June 7th, 2024.

Series

Genomic Quality Control

A comprehensive guide to quality control in bacterial genomics, from sequencing to assembly

1Why is Quality Control Important in Genomics?
2A Framework for Quality Control in Genomics
3Quality Control Criteria for Sequenced Reads
4Practical: Read Classification with Kraken2
5Practical: Read Classification on Command Line
6Practical: Quality Control for Short Reads
7Quality Control Criteria for Genome Assemblies
8Practical: Genome Assembly Quality Control Exercise
9Glossary for Genomic Quality Control
10Further Reading and Additional Resources
← Previous
Quality Control Criteria for Sequenced Reads
Next →
Practical: Read Classification on Command Line