Released on March 23, 2023
Back to episode listIn Focus: Dr. Jennifer Lu and Natalia Rincon from the Johns Hopkins University Center for Computational Biology
The MicroBinfie podcast recently welcomed Dr. Jennifer Lu and Natalia Rincon to discuss Kraken, a cutting-edge taxonomic classification software. Developed between 2013-2014, Kraken excels at identifying and assigning sequencing reads to specific species, genera, or general bacterial groups. Its ability to efficiently classify millions or even billions of reads sets it apart from other methods like Melan, MegaBlast, and Chime. Known for its accuracy and user-friendliness, Kraken has become a preferred tool in metagenomic analysis.
Following Kraken's initial success, Florian Breitweiser developed Kraken Unique, an iteration that provides more comprehensive information than the standard version. Another important addition to the Kraken suite is Bracken, created by Dr. Jennifer Lu, which estimates abundance. Natalia Rincon's contributions focus on the newest iterations that analyze diversity metrics.
The Kraken family continues to evolve with precise camera-matching technology, classifying taxonomy IDs and generating two main outputs: a detailed text file for every read and a Kraken report offering a breakdown of reads for each taxonomy ID. Despite the number of reads, each taxon in a Kraken report can be meaningful, especially for downstream analysis.
The name "Kraken" is derived from a mythological creature and ties back to "Jellyfish," a camera-counting tool used in building Kraken databases. The original concept of Kraken was developed by Derek Wood.
While originally developed for Illumina reads, Kraken's accuracy with Nanopore reads may be impacted by higher error rates. Nonetheless, the Kraken database is adept at exact k-mer matching, fitting all genomic information into a compact space. Recognized for high accuracy, speed, and simplicity, Kraken and its related tools are widely used in taxonomy classification.
Kraken reports include an additional column to count the number of unique k-mers, thereby validating results. Kraken databases also feature vector sequence information, assigning vector taxonomy IDs as "synthetic sequences."
The Kraken software employs a mix of Perl and C++, with Perl handling input processes and C++ managing memory-intensive tasks such as building sequences, compacting data, and writing bytes.
Dr. Jennifer Lu appreciates the simplicity and accuracy of Kraken's classification algorithm, while Natalia Rincon expresses pride in being part of the Kraken community, working closely with collaborators to enhance compatibility with new Nanopore chemistries.
För more detailed insights, listeners can tune into the episode with Dr. Lu and Natalia Rincon.