HomeAboutSoftwarePublicationsPostsMicroBinfie Podcast

In support of rMLST

Posted on October 1, 2022
an AI generated picture (Midjourney) with prompt; 'bacterial populations'. You can share and adapt this image following a CC BY-SA 4.0 licence.

There is a follow up post where I dive into the usage of rMLST and apply it futher to Acinetobacter.

I am a big fan of ribosomal MLST (rMLST) from Jolley et al. 2012. We used it extensively in calibrating cgMLST for Salmonella and my then colleague, Zhemin Zhou, used it for designing subsequent EnteroBase cgMLST schemes.

rMLST is a genotyping scheme that uses the genes for the bacterial ribosome protein subunits (rps genes). This is as close as you can get to a universal MLST scheme. rMLST works on a diverse range of bacteria (like 16S) but provides deeper resolution beyond 16S in most species. See Figure 1 & 2 in Jolley et al. 2012.

In Salmonella, rMLST was consistent with MLST. We performed a head to head comparison of eBGs (Salmonella eBurst groups based on 7 gene MLST) with reBGs (Salmonella eBurst groups based on rMLST) and found the clustering from both methods consistent (Adj. Rand; 0.992). There were only six cluster groups that had major conflicts - which, on closer inspection, seemed to be consisted of genomes affected by homologous recombination (Alikhan et al. 2018).

By the way,

eBG (eBurst group) is a group of related STs based on single-linkage clustering, in which the distance between nodes is only one allele. EnteroBase automatically adds new related STs to the most similar eBG unless they are one or two alleles distant from a second eBG.

eBGs are a simple way of describing natural genetic populations for Salmonella, which you can read about in Achtman et al. 2012

In every subsequent comparison I have done, rMLST cluster groups have been mostly consistent with core genome SNP phlyogenies. I find rMLST an incredibly useful starting point for exploring a species (or even an entire genus).

At a recent meeting (IMMEM13), I met many people who were looking to apply different approaches to capture population structure for species without MLST (or defective MLST) schemes. Some wanted to use cgMLST or other complicated methods. I told them they could use any of those methods, but it was perhaps excessive for their purposes. Constructing and validating a cgMLST scheme can be a labourious process. Although, when I finally suggested rMLST as a ready and able alternative, I was met with blank faces.

I have already described my case for why rMLST is a sensible approach, but I thought I would demonstrate rMLST on three genera without MLST (or with a defective MLST) schemes. These were genera mentioned to me at IMMEM and include Legionella, Acinetobacter and Serratia. I may embarrass myself in the process, as I have not worked with these particular genera before. Acinetobacter has two MLST schemes, by the way, and only one of them has major issues as shown in Gaiarsa et al. 2019.

I should also mention here that rMLST database is not readily available in MLST tools like Torsten's mlst, but it is accessible through the PubMLST website for academic non commercial use only. It requires (free) registration and you need to request access for the rMLST database specifically, but the process is fast and painless.

Serratia with rMLST

Above is a minimum spanning tree (GrapeTree) of rMLST profiles for the genus Serratia. Each node represents an 'rEBG'; which is single-linkage clustering applied to rMLST STs. Nodes are collapsed when distance between profiles is two or less alleles. Edges between nodes are shown if the distance is less than 5 alleles. Nodes are colour coded with species assignments provided by rMLST. Profiles here are a subset of all rMLST profiles available on PubMLST. The tree was calculated using MSTree.py from GrapeTree.

Species definition is defined by rMLST in this case, so the fact that it corresponds so well with rMLST is not that impressive. What is important is that there are a good number of rSTs defined in each species e.g. 172 in S. marcessens. There is nothing worse than a genotyping scheme that simply lumps everything in a single sequence type. The fact that each species is clearly seperated is also a good indication that rMLST will be consistent with a robust phylogenetic analysis. It is curious that S. marcessens is split into at least three sub groups, but this is something for a Serratia expert to comment on.

I repeat the same approach with Acinetobacter below. The process and visualisation is exactly the same.

Acinetobacter with rMLST

This has similar promising signs as Serratia. Perhaps more so, since it looks similar to what I would find in Salmonella.

Finally, let us look at Legionella. The process and visualisation is exactly the same.

Legionella with rMLST

Ah! Now I am out of my depth. This is something I can not interpret. It is odd that the species L. pneumophila is so disperse. I cannot say what this means. A Legionella expert might know.

The figures above (at least the first two) are good suggestions that rMLST would be a ready and able genotyping scheme for these genera. To demonstrate this definitively, we would need to:

  • Work with an expert for each of the genera. We need help assessing whether this makes sense.
  • Take a representative set of genomes from each genus.
  • Construct a core genome SNP phylogeny (and assume this as a gold standard).
  • Compare tree topology with groupings from rMLST STs.

None of that is really that hard. Would you like to get invovled?

There is a follow up post where I dive into the usage of rMLST and apply it futher to Acinetobacter.

Questions or comments? @ me on Twitter @happy_khan

The banner image is an AI generated picture (Midjourney) with prompt; 'bacterial populations'. You can share and adapt this image following a CC BY-SA 4.0 licence.