Nabil-Fareed Alikhan

Bioinformatics · Microbial Genomics · Software Development

Episode 104: The Kraken software suite

📅6 April 2023
⏱️00:21:50
🎙️Microbial Bioinformatics

👥Guests

Jennifer Lu
Staff Scientist, Johns Hopkins University Center for Computational Biology
Natalia Rincon
PhD Student, Biomedical Engineering, Johns Hopkins University
Listen on SoundCloudDownload MP3📝View Transcript

This episode explores the Kraken software suite, a powerful taxonomic classification tool used for metagenomic analysis and pathogen detection. Experts from Johns Hopkins University discuss the evolution of Kraken and its various versions, highlighting its significance in bioinformatics research.

Kraken Versions and Features

The discussion began with an overview of the original Kraken, which employs an exact k-mer matching process. Interestingly, the design is inspired by jellyfish, using a k-mer size of 31. Kraken Unique is a variant that introduces a unique k-mer counting column, allowing users to determine how many unique k-mers are covered by each read. This provides an additional method for verifying microbial identification.

Kraken 2

Kraken 2 was created to handle larger databases efficiently. It does so by using a probabilistic data structure and minimizers, which help map k-mers to shorter sequence sizes. This advancement enables more robust analyses, particularly useful in microbiome research and pathogen detection.

Applications in Microbiome Analysis

Kraken is recognized for its utility in microbiome analysis, notably in pathogen detection. However, the accuracy of its results heavily depends on the genomic data available in its database. This underscores the emphasis on bacterial and viral data. For infectious pathogen detection, Kraken 1 Unique is combined with Bracken to estimate the abundance of species present.

Importance of Genomic Data

The developers highlighted the importance of understanding the availability of genomic data within the database. The accuracy of the results is contingent upon this data, making it crucial for users to ensure the database is comprehensive and up-to-date.

Wider Usage in Bioinformatics

Kraken is widely used in bioinformatics beyond metagenomics. For instance, it can treat a single genome as a metagenome to conduct quality control analyses. In scenarios with conflicting taxa in the reads, Kraken's results help identify the presence of contamination, thus proving essential in sample analysis.

Contamination Detection

The team elaborated on uses of Kraken in contamination work. They detect contamination in pathogen genomes by comparing them against bacteria, human genomes, and databases of vertebrates and plants. For example, they have identified sequences contaminating eukaryotic pathogen genomes, originating from hosts like chicken or cow.

Future Developments

Looking ahead, the Kraken team intends to:

They recognize the growing need to reduce database sizes as more genomes become available and are exploring indexing and sketching techniques to address this.

Conclusion

Kraken remains an indispensable tool for metagenomic analysis and pathogen detection. As it continues to evolve, the Kraken team advises users to prioritize accurate data for effective pathogen identification and classification.

[Note: Terms such as k-mer, probabilistic data structure, and specific tools like Bracken have been italicized to indicate their scientific and technical nature.]

Extra notes

This discussion highlights how Kraken and its associated tools are pivotal in microbial bioinformatics for metagenomic and pathogen detection analyses, while also outlining ongoing challenges in data management and accuracy enhancements.