Multi-locus sequence typing (MLST) is a molecular typing method used to characterize bacterial strains, including E. coli. It involves sequencing multiple housekeeping genes and assigning unique allele numbers to variations in these genes. By comparing the allelic profiles, known as sequence types (STs), it becomes possible to infer the genetic relatedness and population structure of E. coli isolates. MLST has been instrumental in understanding the epidemiology, transmission dynamics, and evolution of E. coli strains, aiding in outbreak investigations and surveillance efforts. It provides a standardized and reproducible approach for comparing and sharing data across laboratories and has been widely adopted in E. coli research. There are at least three MLST schemes for E. coli, from Wirth et al, Jaureguy et al and Qi et al. The designations commonly encountered like ST131 or ST95 or ST10 are defined in the Wirth et al scheme.
Below is a reading list of key publications to help understand MLST (in general) and how it applies to E. coli. A BibTex file cataloguing of all the publications below is available here. The short summaries of each paper were generated by ChatGPT4.
Typing methods based on whole genome sequencing data
Uelze L, Grützke J, Borowiak M, Hammerl JA, Juraschek K, Deneke C, et al. One Health Outlook. 2020;2: 3. https://doi.org/10.1186/s42522-020-0010-1
This paper highlights the importance of whole genome sequencing (WGS) in investigating foodborne pathogens. WGS enables detailed genetic analysis, aiding in disease outbreak investigations and risk characterization models. Bioinformatics tools are essential for analyzing WGS data, but standardization of typing tools is needed for data comparison across laboratories and establishing a global surveillance system.
Overview of molecular typing methods for outbreak detection and epidemiological surveillance
Sabat AJ, Budimir A, Nashev D, Sá-Leão R, van Dijl JM, Laurent F, et al. Eurosurveillance. 2013;18. https://doi.org/10.2807/ese.18.04.20380-en
This paper discusses the significance of typing methods for differentiating bacterial isolates within a species. It highlights the limitations of traditional methods and the benefits of molecular approaches in improving surveillance and outbreak detection. The text also explores the feasibility of using whole genome sequencing technology and reviews various typing methods for epidemiological purposes.
Multilocus sequence typing: A portable approach to the identification of clones within populations of pathogenic microorganisms
Maiden MCJ, Bygraves JA, Feil E, Morelli G, Russell JE, Urwin R, et al. Proc Natl Acad Sci USA. 1998;95: 3140–3145. https://doi.org/10.1073/pnas.95.6.3140
This paper introduces multilocus sequence typing (MLST) as a portable and effective method for characterizing pathogenic microorganisms. The study specifically focuses on Neisseria meningitidis, determining allele sequences of housekeeping genes and constructing dendrograms to identify clonal groupings. The results demonstrate that MLST, utilizing a subset of six gene fragments, reliably identifies major meningococcal lineages associated with invasive disease. The paper highlights the advantage of MLST's portability and proposes its application to various bacterial species for global epidemiology through a shared database on the internet.
Sex and virulence in Escherichia coli: an evolutionary perspective
Wirth T, Falush D, Lan R, Colles F, Mensa P, Wieler LH, et al. Mol Microbiol. 2006;60: 1136–1151. https://doi.org/10.1111/j.1365-2958.2006.05172.x
This paper describes one of the MLST schemes for E. coli. This paper also explores the evolutionary pathways of pathogenic Escherichia coli (E. coli) by analyzing a global collection of isolates using multilocus sequence typing. It reveals that specific pathogen types have independently emerged in different lineages, with accelerated rates of evolution and frequent genomic alterations. The evolution of virulence is linked to increased rates of homologous recombination, highlighting the role of bacterial sex, and suggests episodic selection for strains capable of evading the host immune response.
Population structure and evolutionary dynamics of pathogenic bacteria
Smith JM, Feil EJ, Smith NH. Bioessays. 2000;22: 1115–1122. https://doi.org/10.1002/1521-1878(200012)22:12<1115::AID-BIES9>3.0.CO;2-R
This paper discusses the significance of recombination in bacterial populations using multilocus sequence typing (MLST). It confirms the existence of clones and high rates of recombination in several bacterial pathogens, highlighting the implications for population structure, virulence, antibiotic resistance, and genetically modified organisms.
Comparative analysis of core genome MLST and SNP typing within a European Salmonella serovar Enteritidis outbreak
Pearce ME, Alikhan N-F, Dallman TJ, Zhou Z, Grant K, Maiden MCJ. International Journal of Food Microbiology. 2018;274: 1–11. https://doi.org/10.1016/j.ijfoodmicro.2018.02.023
This paper evaluates a core genome multilocus typing (cgMLST) scheme for Salmonella enterica isolates in a European outbreak. The scheme provides high-resolution typing, congruent with SNP-based and epidemiological analyses. It confirms the genetic diversity predating the outbreak, demonstrates scalability, and enables comparative analysis of Salmonella outbreaks across laboratories and jurisdictions.
PubMLST database - Multi-Locus Sequence Typing
https://pubmlst.org/multilocus-sequence-typing
PubMLST database, which hosts many MLST schemes. Multilocus sequence typing (MLST) is a method for characterizing bacterial isolates based on the sequences of multiple house-keeping genes. Each isolate is assigned a unique allelic profile or sequence type (ST) based on the alleles at seven loci. MLST allows for unambiguous and direct comparison of isolates using DNA sequencing, enabling precise characterization and comparison of billions of distinct genotypes. MLST's advantages include unambiguous sequence data, easy comparison via centralized databases, and the ability to characterize isolates from clinical material even without culturing them.
Microbinfie podcast - Early days of MLST
https://soundcloud.com/microbinfie/early-days-of-mlst
Microbinfie podcast episode. Ed Feil, a professor of bacterial evolution, and Natacha Couto, a data scientist, discuss multi-locus sequence typing (MLST) in bacterial population genetics. MLST assigns strain identities based on partial sequences and enables comparison of epidemiological databases. While MLST has limitations, it remains widely used, and the Eburst program offers improved visualization. The concept of clonality in bacterial species is explored, and the enduring legacy of MLST's nomenclature for lineages or clones is acknowledged. The discussion emphasizes the need for continued research in bacterial population genetics.
Phylogenetic and genomic diversity of human bacteremic Escherichia coli strains
Jaureguy F, Landraud L, Passet V, Diancourt L, Frapy E, Guigon G, et al. BMC Genomics. 2008;9: 560. https://doi.org/10.1186/1471-2164-9-560
This paper describes one of the MLST schemes for E. coli. This paper also investigates the clonal diversity of bacteremic Escherichia coli strains and their association with genomic content and clinical features. The study reveals that bacteremic E. coli isolates are highly diverse and distributed across different phylogenetic lineages. Certain clonal complexes are associated with urinary origin, but no specific complexes are linked to severe sepsis or unfavorable outcomes. Comparative genomic hybridization analysis identifies genomic characteristics associated with different clonal complexes.
EcMLST: an Online Database for Multi Locus Sequence Typing of Pathogenic Escherichia coli
http://shigatox.net/ecmlst/cgi-bin/index
This resource describes one of the MLST schemes for E. coli. EcMLST is a database system for multilocus sequence typing (MLST) of pathogenic Escherichia coli. An online database and typing system for MultiLocus Sequence Typing of pathogenic Escherichia coli. It provides a portable and accurate method for characterizing E. coli isolates, allowing researchers and public health laboratories to access nucleotide sequence data and allelic profiles for epidemiology and evolutionary studies.
The population genetics of commensal Escherichia coli
Tenaillon O, Skurnik D, Picard B, Denamur E. Nat Rev Microbiol. 2010;8: 207–217. https://doi.org/10.1038/nrmicro2298
This paper discusses the ecological and evolutionary factors shaping the population structure of commensal and pathogenic Escherichia coli. It explores the clonal nature of E. coli, the role of whole-genome sequencing in understanding its phylogenetic history, and the relationships between commensalism, virulence, and antibiotic resistance. The paper also highlights the potential of next-generation sequencing and metagenomics for further research.
A comprehensive and high-quality collection of Escherichia coli genomes and their genes
Horesh G, Blackwell GA, Tonkin-Hill G, Corander J, Heinz E, Thomson NR. Microbial Genomics. 2021;7. https://doi.org/10.1099/mgen.0.000499
This paper discusses the compilation and curation of a comprehensive dataset of over 10,000 Escherichia coli and Shigella genomes. It highlights the need for a better understanding of the genetic diversity of E. coli and its implications for studying biological differences and gene distribution within the population.
The EnteroBase user's guide, with case studies on Salmonella transmissions, Yersinia pestis phylogeny, and Escherichia core genomic diversity
Zhou Z, Alikhan N-F, Mohamed K, Fan Y, the Agama Study Group, Achtman M. Genome Res. 2020;30: 138–152. https://doi.org/10.1101/gr.251678.119
This paper introduces EnteroBase, a software environment that utilizes genomics data to identify population structures within bacterial genera. It showcases its capabilities through case studies involving Salmonella, Yersinia, and Escherichia, demonstrating its ability to analyze transmission patterns, track microevolution, and provide a global overview of genomic diversity. See Case study 3, which explores the genetic diversity and population structure of Escherichia coli using core genome multilocus sequence typing (cgMLST) and single nucleotide polymorphism (SNP) analysis. It demonstrates the presence of distinct populations and clustering patterns within E. coli and other Escherichia species, providing valuable insights into their genetic relationships and diversity.
Rapid and Simple Determination of the Escherichia coli Phylogenetic Group
Clermont O, Bonacorsi S, Bingen E. Appl Environ Microbiol. 2000;66: 4555–4558. https://doi.org/10.1128/AEM.66.10.4555-4558.2000
This paper presents a fast and simple technique for determining the phylogenetic groups of Escherichia coli using triplex PCR. The method, tested on 230 strains, shows high correlation with complex and time-consuming reference methods, offering a more efficient approach for phylogenetic analysis of E. coli.
The Clermont Escherichia coli phylo-typing method revisited: improvement of specificity and detection of new phylo-groups: A new E. coli phylo-typing method
Clermont O, Christenson JK, Denamur E, Gordon DM. Environmental Microbiology Reports. 2013;5: 58–65. https://doi.org/10.1111/1758-2229.12019
This paper describes a new PCR-based method for assigning Escherichia coli isolates to one of eight phylo-groups, including the identification of isolates belonging to other cryptic clades. The method is validated and applied to human faecal isolates, revealing the prevalence of newly described phylo-groups and clades in the E. coli population.
Easy phylotyping of Escherichia coli via the EzClermont web app and command-line tool
Waters NR, Abram F, Brennan F, Holmes A, Pritchard L. Access Microbiology. 2020;2. https://doi.org/10.1099/acmi.0.000143
This paper introduces EzClermont, an in silico tool for phylotyping Escherichia coli based on the Clermont PCR method. The tool enables easy application of the phylotyping scheme to whole-genome assemblies and is evaluated against phylogenomic classifications, providing a web app and command-line tool for classification.
ClermonTyping: an easy-to-use and accurate in silico method for Escherichia genus strain phylotyping
Beghain J, Bridier-Nahmias A, Le Nagard H, Denamur E, Clermont O. Microbial Genomics. 2018;4. https://doi.org/10.1099/mgen.0.000192
This paper introduces the ClermonTyping method and its web-interface, the ClermonTyper, which enables the identification of Escherichia species, E. coli phylogroups, and cryptic Escherichia clades from whole genome sequences. The in silico approach demonstrates high concordance with in vitro PCR assays, providing a valuable resource for strain characterization in epidemiological studies.
mlst (Torstyverse)
https://github.com/tseemann/mlst
Scan contig files against traditional PubMLST typing schemes
Questions or comments? @ me on Twitter @happy_khan
The banner image is an AI generated picture (Midjourney) with prompt; 'Visualisation of ancient blue bacteria, micro macro, colour'. You can share and adapt this image following a CC BY-SA 4.0 licence.