Released on April 6, 2023
Back to episode listIn a recent episode of the MicroBinfie podcast, Dr. Jennifer Lu and Natalia Rincon from Johns Hopkins University's Center for Computational Biology joined us to discuss Kraken, a taxonomic classification software, and the associated software suite. As key members of the Kraken software development team, they provided insights into the evolution and functionality of Kraken and its surrounding tools.
The discussion began with an overview of the original Kraken, which employs an exact k-mer matching process. Interestingly, the design is inspired by jellyfish, using a k-mer size of 31. Kraken Unique is a variant that introduces a unique k-mer counting column, allowing users to determine how many unique k-mers are covered by each read. This provides an additional method for verifying microbial identification.
Kraken 2 was created to handle larger databases efficiently. It does so by using a probabilistic data structure and minimizers, which help map k-mers to shorter sequence sizes. This advancement enables more robust analyses, particularly useful in microbiome research and pathogen detection.
Kraken is recognized for its utility in microbiome analysis, notably in pathogen detection. However, the accuracy of its results heavily depends on the genomic data available in its database. This underscores the emphasis on bacterial and viral data. For infectious pathogen detection, Kraken 1 Unique is combined with Bracken to estimate the abundance of species present.
The developers highlighted the importance of understanding the availability of genomic data within the database. The accuracy of the results is contingent upon this data, making it crucial for users to ensure the database is comprehensive and up-to-date.
Kraken is widely used in bioinformatics beyond metagenomics. For instance, it can treat a single genome as a metagenome to conduct quality control analyses. In scenarios with conflicting taxa in the reads, Kraken's results help identify the presence of contamination, thus proving essential in sample analysis.
The team elaborated on uses of Kraken in contamination work. They detect contamination in pathogen genomes by comparing them against bacteria, human genomes, and databases of vertebrates and plants. For example, they have identified sequences contaminating eukaryotic pathogen genomes, originating from hosts like chicken or cow.
Looking ahead, the Kraken team intends to:
They recognize the growing need to reduce database sizes as more genomes become available and are exploring indexing and sketching techniques to address this.
Kraken remains an indispensable tool for metagenomic analysis and pathogen detection. As it continues to evolve, the Kraken team advises users to prioritize accurate data for effective pathogen identification and classification.
[Note: Terms such as k-mer, probabilistic data structure, and specific tools like Bracken have been italicized to indicate their scientific and technical nature.]
The podcast discusses the Kraken software suite, a taxonomic classification tool used for metagenomic analysis.
Kraken Versions and Tools:
Additional Tools:
Challenges and Methodologies:
Applications:
Future Directions:
This discussion highlights how Kraken and its associated tools are pivotal in microbial bioinformatics for metagenomic and pathogen detection analyses, while also outlining ongoing challenges in data management and accuracy enhancements.