Hello, and thank you for listening to the MicroBinFeed podcast. Here, we will be discussing topics in microbial bioinformatics. We hope that we can give you some insights, tips, and tricks along the way. There is so much information we all know from working in the field, but nobody writes it down. There is no manual, and it's assumed you'll pick it up. We hope to fill in a few of these gaps. My co-hosts are Dr. Nabil Ali Khan and Dr. Andrew Page. I am Dr. Lee Katz. Both Andrew and Nabil work in the Quadram Institute in Norwich, UK, where they work on microbes in food and the impact on human health. I work at Centers for Disease Control and Prevention and am an adjunct member at the University of Georgia in the U.S. Hello and welcome to the MicroBinFeed podcast. Today we're joined by two special guests, Professor Ed Pfeil and Dr. Natasha Kutow. Dr. Ed Pfeil is a professor of bacterial evolution at the University of Bath. His interests include genomic evolution of pathogenic bacteria of both men and animals. Early work is mostly on Staphylococcus aureus, where more recent work includes gram-negatives such as Klebsiella pneumoniae, particularly in an AMR and One Health perspective. But he's also worked on Borrelia, Burkholderia, Wolbachia, Melissococcus, Rhinobacterium, Librio, Septococcus, Neisseria, E. coli, Mycobacterium bartonella, amongst many, many others. And he also has an interest in both bees and aquaculture. Dr. Natasha Kutow is a data scientist at the Center of Genomic Pathogen Surveillance at the University of Oxford. Natasha is a veterinary doctor and got her PhD in 2016. Her research focuses on the molecular epidemiology, population genomics, and ecology of a broad range of bacterial and viral pathogens of both animals and humans. She uses next generation sequencing and bioinformatics to understand transmission of bacterial and viral pathogens, and the emergence and spread of AMR between humans and animals. She's worked on a range of organisms as well, including MRSA, Staph, E. coli, Klebsiella, all of the entero-cocci and entero-mycobacterium, including abscesses and TB, and flu, and also pigs. So welcome to you both. It's great to have you on the show today. And we're talking about MLST, multi-locus sequence typing. If you look at the original paper in PNAS from 1999, you will find the third author is this Edward Feil fellow. So I decided we'll have Ed on to tell us all of the dark secrets of MLST, what really happened 25 years ago. And Natasha also as a user of MLST who mainly works in it from the genomic side looking in. So it should be a fun, fun episode. So let's get started. I'll give an easy thing that people might not understand. Most schemes, particularly, for instance, like the Neisseria one, it's seven genes, right? Or most schemes have, most schemes have seven genes or 10 genes or whatever. What's the decision making for that number? OK, so for Neisseria meningitis, if you look at that original 1998 paper, there are actually 11 genes used in that paper. There were six, what we'd now consider to be classic MLST genes, which is the housekeeping metabolic genes. And there were five more variables of outer membrane proteins and so on. So there was there was half of one and half of the other. So it was it was determined so. So to go back a step before we had MLST, we had a thing called MLE, which was a gel based technique whereby you basically mush up, you have a cell extract with enzymes still working and you run that extract through a starch gel and you stain it with various chromogenic compounds that actually change color to show you the position of how far the proteins got so you could measure protein mobility. And you had various, what were called allozymes, OK? So that was the and that was a really, really that was the first time we could assay molecular variation in population. So that's the technique that went back to the sort of late 60s. So and it was an absolute nightmare to do. So but there was an MLE scheme for Neisseria meningitis, there was one person in the world that could really do it. And that and she's still there. She's still in Oslo. So that was Dominique Cogon. So the way it worked was that before MLST came, everybody basically had to send their strains off to Dominique in Oslo and she would run them on the starch gel because it wasn't a portable technique. So the only way you could score an allele was to run it side by side with with other strains which you knew that the allen were for so you could compare them directly on the same gel. OK, so there was a and there was about 15 loci used in that. I think it's about 15 loci used in that scheme. So when MLST came along, and that was quite an established scheme, so you had the various different lineages in the different steroid groups A, B and C. And when MLST came along, the idea was that you should that it was necessary to recover the same lineages that had been talked about before and defined by MLE, OK? So those those five that were more variable loci, the non-MLST genes, they were thrown out because actually they weren't bringing anything to the party. It was found that just with those six genes, you could more or less recapitulate the MLE groups, but not quite. So later on, there was a seventh, which was just put in just to delineate, just to separate one group that couldn't be couldn't be distinguished on the basis of those six genes. So it's calibrated entirely to fit with the existing MLE genes and to have no more expense and no more resources than that. So that that was the minimum amount of genes that you could have to basically get what you got from the MLE. So that's why seven genes were both chosen. But it worked as a reasonable sort of sweet spot, as it turned out for most of the species. So it's just the tradeoff of expense versus resolution. We should probably step back. I forgot to ask you, what is multi-locus sequence typing? For those, for anyone out there who doesn't know, but someone might not know. Having explained the multi-locus enzyme electrophoresis, so it's essentially it was a typing method where you could define strains on the basis of not complete sequences, partial sequences, up to about 500 base pairs, which was defined by the sequencing, because that's that's how much you could get with with sequencing runs going in opposite directions, how much clean sequence you can get with with sequencing those strands. So it was a way by which you could index variation, nucleotide variation in seven genes and a small number of housekeeping, boring housekeeping genes, which were assumed to represent the kind of underlying evolutionary relatedness, the phylogeny of the of the population. So they're picked to be boring, they were picked to be under purifying selection to minimise any confounding effect of diversifying selection or recombination of those sorts of things. So and that that was kind of a novel idea, which was and the novelty is basically demonstrated by the fact that in the first scheme, Mark Atman insisted on using the more commonly used, highly diverse genes, which were which were subsequently sort of thrown up. The other really novel thing about it was that it uses new fancy shiny thing called the Internet. And it was the first time because it sequence data is digital. It was the first time that anybody had really put epidemiological databases up in a way that anybody anywhere could actually. Type that follow this methodology, there being a hospital in Australia or America or wherever, and immediately be able to compare it with that database to see where that fitted in with the whole big picture. And that is absolutely revolutionary because this was, you know, late 90s, the early days of the Internet, people just hadn't really sort of caught on to how it could be used for epidemiology before. OK, and I should mention that when you take each of these, so you have these set of loci, some some number, and you take the sequence of that fragment and you're measuring the distances, the differences between the number of differences between those loci for each of your strains. And if the sequence is identical, then you give it the same allele number. It's not counted. If it differs by even one base, it counts as a difference. Is that correct, Ed? Yeah, that's correct. Yes. So I didn't explain that. So so you end up with an allelic profile, so an allele number for each of the gene loci that you've sequenced, so like a telephone number. And then that unique telephone number itself gets a number, which is the sequence type, the ST. And there was a reason why it was nonparametric in the sense that it didn't matter whether an allele differed at one base or 100 bases in that it was it was just, I mean, it made the whole thing easier to analyse when you just have like a string of integers. But actually, there was also a recognition even very early on that you could have recombination affecting an allele, which may introduce one base, it may introduce 10 bases, it may introduce 20 bases. So in other words, the number of bases by which two alleles of a given locus differed didn't necessarily tell you anything. thing about the number of evolutionary events that have passed to explain those difference. So it was just either they're the same or they're different. It wasn't, this is, these alleles are really different from each other or these alleles are quite different. It was just different or the same. Yep. And I will point out that you must recover the allele sequence for every loci to have a valid sequence type. A few people have come to me with missing things, missing profiles with missing alleles and saying, what do I do with this? It's like, you can't do anything. It just didn't work. No, that's right. All right. Natasha, do you have any, anything to add on understanding MLST, particularly for people who come from a genomics background? What's your take on it? I mean, I think for me when I was, you know, doing my PhD and I, I started my PhD in 2011 and I didn't have access to whole genome sequencing. And so for me, MLST was, was very useful because it was, it was a way that I could, you know, communicate with other microbiologists or epidemiologists and I could link my data into what's, you know, out there on the internet, like, like I was saying. And so I think, you know, and I had to review all these older techniques that were used for, for epidemiology. And I could clearly see what was the benefit of, of using MLST and, and how, you know, how you could kind of put a name on something or put a number on something and you could speak to other people about, you know, E. coli SC131 would know what, what that was and, and how, you know, pathogenic or dangerous a certain lineage could be. And so I think, I mean, for me it was very useful when I started with my master's and then with my PhD and it's, it still is very useful, very useful typing technique to be able to, to speak to others about a certain lineage and how dangerous or not it can be. So yeah, I'm curious about that from the epi perspective, because when I started, I remember going to conferences and seeing a lot of people say MLST was no good because it didn't have the discriminatory power of PFGE. And PFGE was the way to go, particularly at an epi sense. What is, what is PFGE? Why would you choose one over the other? And maybe dare say, which is better? So PFGE is Pulse Field Gel Electrophoresis, which is another band based method. So that immediately puts it in the same category as the, in a sense as the original, you know, MLE, where you had to compare samples directly on a gel. It was very much standard. It's also, you have to have quite a dedicated lab to do it. It, it was, it got quite sophisticated in, in how far it got digitalised. So there was, there was, there was pretty good software that enabled you to sort of scan these gels and, and define your strains on the, on the basis of the, of the mobility patterns you saw. But it was still not digital, like sequence data is digital. It, it took quite a long time to, to fizzle out because there was a lot of investment, particularly in the States, I think, in, in, in PulseNet, which was all based on Pulse Fields. And, you know, once you put that investment and, and people know what they're talking about and everything sort of seems to work, then it's very hard to sort of turn that ship around. Whether it's better than MLSC, whether it, it, it, it caused more higher resolution, I think it actually depends on the species. So, because, because it works by, it's a restriction, it's a restriction fragment digest, right? So you're just, you're just picking up variations in the presence, absence of particular restriction sites in your, in, in the genome. So in that sense, in a sense, it's, you don't know, it has the big disadvantage in that you, you, you can't go in and actually look at the, what's causing the variation very easily. So you don't know exactly what's going on in terms of the genetics that cause the diff, the variation you're seeing. But at the same, but it did, it was a genome-wide technique. So it did detect changes in accessory, you know, we didn't really know that there was such a thing as an accessory genome in the, in the 90s, but it was detecting parts of the genome, it was reaching parts of the genome that MLSC wasn't. So whether it's better or not kind of depended on the species. You know, if you've got quite a stable genome, then MLSC was fine. If you've got a lot of variation in the accessory genome, then PFGE was picking that up and MLSC wasn't. So it swings around about to, to a degree there, but really the, the, the, in the end, the advantage of MLSC was it's because sequence data is digital and you can put it on, it's much easier to store and much easier to compare. And that actually quite quickly became much easier to do than PulseField, outside those big dedicated labs that were just like big factories for doing it. Natasha, what about you? Did you encounter PFGE early on? I did a lot of, yeah, I did a lot of PFGE. As I said, I, I, I didn't have access to whole genome sequencing at a time. And what we had available was PFGE. We would start off with, with MLSC, of course, and then we would, you know, carry on with, with PFG, especially if we were trying, you know, like I, I didn't mention this, but I, during my PhD, I was trying to find strains, strains that was similar between animals and humans, all sorts of animals, I have to say, not only pigs. And so it was, to me, I had, I needed that resolution that PFGE could give me that, that MLSC could not, because I could find, for example, ST398 in humans and in animals, but I really wanted to, you know, be sure that I had enough resolution to say, okay, this is probably, there was a transmission event here, or there wasn't a transmission event where, you know, there was dissemination of, of this certain strain or there was not. And I think at the, I mean, at least I was not working at a human hospital at the time, but the idea that I have is that when, when it came to hospital outbreaks, they would be using PFGE, like we kind of use now SNP analysis to define, you know, what, what strains were belong to a cluster or, or to an outbreak and what, which strains were, did not belong to, to an outbreak. I think at least for MRSA, that was the level of resolution that was needed and at the hospital level, although like Ed said, you know, you really needed very dedicated people to, to do this. And of course there were also other, there were also other options like MLVA or what was the name of the, of the rep PCR, I think it was for E. coli. So yeah, you had other options, but at least when I, when I was, I was using PFGE to, to have better resolution than, than MLST for MRSA. I've been really enjoying the conversation because I don't know if you know this, but I, I started off in meningitis and so my whole thesis was on nice Syria and everything. And so a bunch of your stories was just, were just giving me flashbacks. Like I, and I was in the, I was actually in the CDC meningitis reference lab for a year. So I remember names like Dominique. I remember looking at the software that Keith Jolly came by to set up stars and everybody doing MLST. So it was very interesting. And I don't know if I have a specific question, but it's just been really good hearing it all. I noticed that you don't really have nice Syria in your biography. Have you, have you kind of moved on from that? Oh, well, well, yeah, I have, I haven't, I haven't worked on. So my PhD was on nice Syria, meningitis and gonorrhea. So, so my PhD was, was to sequence ADK and RecA in meningitis and gonococcus, each in seven genes, each in seven strains. So there was a site and gene sequences and that was my, because he had to clone it and everything, but they, they were the, they were the, they were the, the sequences that went into the MLS, the final MLST schemes. So that was my, so those two loci were my contribution, ADK and RecA. If I remember that paper had like 107 genes. It had 107 genes. Yeah. So I didn't do all those. It's because I'd sequenced those because they weren't, you know, right before genome sequences, you had to clone the gene to get the sequence. And that took a year, a year anyway. A lot of work. So yeah, yeah, yeah, yeah. Thanks for the walk down memory lane. Yeah. Now people complain if the Illumina run takes more than three days. I actually have a question for Ed, if, if that's okay. Yeah. I was going to mention eBursts because you were the developer of eBursts, right? So why, why was there a need to do eBursts? Well, actually in a sense, this comes back to what we were talking about on the other podcast, when it comes to, to figures. Before, so the standard way of visualising the, the MLST data originally was using UPGMA dendrograms. And I distinctly remember my supervisor, Brian Spratt, he kept a magnifying glass in his office drawer so he could look at the dendrograms that came out, even with like a hundred strains, they were really, really hard to actually read and look at. And he literally had the magnifying glass in his desk so he could read all the labels and so on. So I just thought there'd be a better way of drawing it. And Brian had sent me this project which involved, it actually came from a paper from Gutmann and Dichowsen where they, on E. coli, where they showed that you could actually, if you took really, really similar sequences in E. coli, you could actually directly just score mutation events and recombination events because you haven't got all, you haven't got to dissect out all the subsequent events. So this is a trick which has stayed with me to this day, if you look at really, really closely related stuff, everything's a lot easier. So he said, look at the MLST data and see if you can figure out whether alleles differing by recombination or mutation according to how different they are and try and do something quantitative with that. And that came from that paper. And then as a process of doing that, it was really obvious for the Staph aureus data and for the Noceum meningitis data that actually you had this model where you had a central genotype and then you had spokes coming out from that central genotype which differed by one locus. So one, I invented the SLV during that time and that became the one. I said, well, that's what's happening. So why don't we just draw it like that rather than this funny dendrogram where you've actually got the founder, the ancestor actually at the same level as all the descendants. Let's just put that in the middle and have all the sense and not worry particularly at that point about how to connect all those different groups up. Let's just go with that model and draw it. It's a really simple idea, but it kind of just, I guess, got people thinking different ways of visualizing, thinking in circles and this different way of looking at visualizing the data, getting away from that classic dendrogram tree shape which was actually not only almost impossible to look at but actually quite misleading in many ways because you didn't have that model of clonal expansion implicitly in it. So that's what kind of led me to Ebers. And the actual nuts and bolts of it are absolutely incredibly simple. I mean, there's nothing clever there at all. Oh, no, Ebers is great. I wouldn't put yourself down on that. So if you look at the original paper, which I think is in JBEC 2004, and you look at the first figure, that is precisely what Ed is showing. There's this head-to-head of this very fairly, it's a simple dendrogram, but it's actually quite ugly to look at. And then you have- It's really ugly, and it doesn't tell you about how the thing's evolved, right? It hasn't got that clonal radiation thing going on. And then Ed's very cleverly just slapped the actual Ebers figure equivalent in the middle of it just to show like, hey, see this trash? Forget it, this is the new hotness. And then you look at the other couple of figures, then there's these beautiful constellations of clonal complexes. And I think that's the magic, right? As you were saying, one loves a good figure. So even with data sets, they look quite gorgeous. So is it you to blame for the use of minimum spanning trees? Well, I think so. So I did actually contact Bionumerics about this idea at some point. I didn't hear anything from the minimum spanning trees appeared in their next version of their software. But I mean, and essentially, I mean, there are some differences, but the nuts and bolts of basically identifying the center of gravity of these clonal complexes on the basis that they define the maximum number of near neighbors. That was the real trick. So I was always a little bit frustrated that they called it a minimum spanning tree because there's lots of different, like hundreds of thousands of different solutions, minimal ways by which you can connect all these genotypes. But it's not until you actually apply this model of having a central founding genotype defined in terms of the maximum number of neighbors that you can actually pick one solution that kind of makes sense. You still have to be a bit arbitrary if there's details where there might be ties in how you rank things and stuff. But that basic nuts and bolts, it was the same in the minimum spanning tree, which Joao and his colleagues in Lisbon recognized very early, and they came up with PhiloViz, which is brilliant, which was actually definitely an improvement on the original universe. What would you have called minimum spanning trees if you could have named them instead since you had an issue with them? So, well, I don't know, really. I never really thought about it. I mean, it was a minimal spanning tree, but it was misleading in that it was one possible minimal spanning tree out of lots of different solutions, right? And so, oh, I don't know. Eberst? Call it a phylogram. Yeah, man. But I've moved on. It's all right. Well, people still use things like minimum spanning trees to this day for even CGMLST data. Yeah, yeah, no, absolutely, absolutely. And I do like to think at some level that I got people thinking in terms of circles rather than dendrograms, which is a nice contribution, I think. I was wondering, like, as a user of MLST, and I've recently been involved in getting huge datasets and doing MLST, so, yeah, getting the typing for numerous different species, do you think the quality of the genomes, the quality of the MLST schemes that derives from that original paper were as good as the ones that you helped develop? Because I kind of have the idea that, at least for some species, like pseudomonas and Acinetobacter, that maybe because sequencing was not so available at the time that they maybe didn't select the best genes. I think that's completely fair. There's a good example. It's in the pneumococcus of the gene, which is just absolutely, I think it may even be two copies of it in some genomes or something. There's something really fundamentally wrong with it. Yeah, I mean, and again, those genes were picked before there was a genome sequence available. So they were picked on the basis of them being most likely to be, as I said, under boring purifying selection, central metabolism genes, nothing particularly interesting going on in terms of their evolution, but we didn't know how well they were gonna pan out. And I'm sure that in almost all of the schemes, there were some genes which have, some choices were better than others. I'm sure you could have optimized them, but that's, we were working in the dark a little bit. I wanna ask a question that follows on from that, but it's a bit more fuzzy. So I read, I think it was you in a review with Martin Maiden that I was reading recently, and it talks about how, about clonality in bacterial species. And when you think about it, MLST is assuming that the population will fall into these discrete clones, that there are genes that actually will be boring and sort of stiffen and reflect that. Do you feel that, but I do know from certain organisms, so E. coli at a point, if you take the ML, if you look at the original MLST scheme in that paper, they show that it kind of works, but it doesn't. Like there's a point where it starts to break down the certain parts of E. coli where that assumption doesn't seem to hold, at least in the data presented in that paper. And I'm curious, how do you feel about clonality in a bacterial species today? Is that a given? Do you expect that to happen? Or is it because we started looking at pathogenic clones first, and maybe fooled ourselves a bit that that was- Yeah, well, I think this was a realization that came straight away, actually. I mean, there was a big debate around the sort of mid-90s. There was a big debate about how much recombination is going on in bacterial populations, because all the MLE work from the big American labs, Bob Salander and Howard Ockman and that crowd, they're always saying, bacteria don't recombine. It's all clonal. Everything falls into these nice little groups. That was the dogma. When sequence data came along, even before MLST, that dogma began to be chipped away at a little bit. And there was quite a heated debate about where bacterial populations sat between the two extremes. On one hand, you had clonality, no recombination, everything fitting into these nice groups. And on the other hand, you have panmixia, which isn't the word you hear so much these days, but that was the idea that everything's so mixed up that there's absolutely no lineages at all. So, this came to a head with the Maynard Smith paper, How Clonal is Bacteria, where they basically took the data that was available at that point, the MLE data, and it was MLE data, and basically, because this was before MLSC, and basically categorised different types of populations. So, from panmyxia, which was gonorrhoea, the gonococcus at the time was, in that paper, was described as having no clonal lineages at all, everything was just completely equally distant from everything else, and the alleles were in linkage equilibrium, so everything was just completely randomly shuffled to clonal species. And then, in the middle somewhere, you had Neisseria meningitis, which was described as this epidemic structure, which is basically referred to the fact that you had a soup of recombining things, where alleles are just flowing backwards and forwards, but on top of that, you had these croutons of clonal complexes, which were the virulent clonal complexes, which we were mostly focused on. So, that was the sort of first realisation that you could actually have a bit of both, you could have clonality, and you could have a lot of recombination going on at the same time. And then you had to explain that kind of paradox in terms of, what are those, is it all just oversampling particular clonal lineages partly, is it because those clonal lineages, the particular combinations of alleles are adaptive, and this is a selective thing, keeping those things together, probably partly as well, or they could just be hitchhiking on one particularly favourable gene. So, there were nuances entered into that whole population structure from very, very early on, because it was all about those early, that early work was all about answering the question of how much recombination is going on. So, there is confusion about when people say it's a clonal population, they can infer that it's not recombining very much, and that may be the case, but it's not necessarily the case. So, there's ways by which we've known since those days where you could have clones and still have quite a recombining population. So, no, I mean, even from those, I think the closing words of my PhD thesis was something along the lines of, there's no pattern in bacterial population structures. So, I haven't been surprised at any, anything that's come since then, because I had no expectations of what anything would look like. Saphoris is a beautifully well-behaved, lovely organism, something like Klebsiella or some of the Vibrios are just all over the place, you can't tell what's what, something like Burkholderia, you have not so much sequence variation, but you have a lot of allele shuffling and there's everything in between. So, yeah, and we've known that since day one, really. Natasha, what about you? How do you feel about clonality? I'm also curious if you feel that it's, is it useful to think, to teach it that way? Talk about clones all the time. And I'm also curious, based on the organisms that you've both worked on, other organisms you worked on, which, which would you say are more clonal, which are panmixia, which are somewhere in between, just as a schoolyard question for people to remember. So it's interesting to know what to expect. Yeah, I mean, I started realising that maybe, you know, actually, I realised that MLST was not, was not a good choice for me, when I started working with staff through the intermediate. First of all, because the MLST scheme that had been developed for this particular pathogen only contained initially five genes. And, and then it was later changed to seven, seven genes. But when I was looking into the resistant population, there was a population structure, yes. So I was mostly seeing certain clones, ST71, for example, was the main RSP clone at the time when I was doing my PhD. But when I was looking into the susceptible population, I could rarely see the same MLST. I mean, any, you know, any sequence, any strain that I was typing that I was doing MLST, I would find a new ST. So basically, there was no population structure, they were all different, maybe, like, like, we talked about before the MLST scheme was not, was not the best. And definitely in the beginning was not with just five genes. So that was when I first, you know, was first confronted with this, you know, idea of population structure and clonality, something that I heard, I had heard a lot on staff areas. And for me, I think MRSA is, you know, is the best and the classical example of, of, like, Ed said, well behaved pathogen that, that expands through, you know, through basically vertical transmission. And there's, there's this population structure and these clones that are, that are important. I think we, we, we started, I mean, I wasn't there, like, like Ed was in the beginning. But for me, when, when I was reading this, this papers and hearing about clones, I think it made sense at the time, because, you know, and it still makes sense today, because you can define these clones based on the fact that there are more frequent in, in the resistant population, for example, if you, you know, if you take a sample, and you play those, the strains, and then you sequence them, you realize that there's a few, a few lineages that appear more often than others. So there's definitely a higher risk of these certain lineages acquiring certain mobile genetic elements, plasmids or integrons or, or whatever. And then other lineages are more prone to acquire virulence. We don't really know why this happens most in most cases, but there's, you know, there was, there's definitely, I think it's definitely useful to talk about clones, and they exist, because we've seen a certain pattern in the past, and we see a certain pattern still today. Something Natasha touched on earlier when she said, you know, it was used for E. coli 131, ST 131, everyone knows what that is now. And that's that, that, that, that I think is the, the legacy of MLST for all its flaws. There's not, there's not many examples I know of where in the genomics era, key sort of conclusions from the MLST data have been proved completely wrong, which is pretty amazing when you think, you know, the genomes only had seven genes in those days. So obviously, they've been refined and tweaked. But I think the lasting legacy of having the nomenclature for those, those, those key lineages, clones, whatever you want to call them, is it will be the gift that MLST leaves us really. And that, I mean, I guess that that will be set in stone forever now, because those lineages are real, we know they're real. You can say, well, let's split ST 131 up into two different things, or let's combine it with something else. But the lumpers and the splitters thing is always going to happen. So, so, so good for MLST. Well, it's not, it's not dead. I don't think it's finished. No, no, no, no, no, no. Well, no, I mean, the database is certainly on. Yeah, I mean, it's, it's, do people actually just sequence seven genes these days and not the whole genome? I think so. I think people still, people will still do it in reference labs out there. Some people do not. Okay. I thought it's actually easier to do whole genomes these days. For most, for some of us, yes. But for some people, it's still difficult to get access to whole genome platforms. And so MLST is the standard or MLST is what the provincial labs are capable of doing. And that's what they stick with. So people, I, you know, remember Keith telling me that people still submit Sanger traces to PubMLST. Right, right. You know, so it's still there. And, and the, and the fact is, is, is if proprietary companies just disappear tomorrow, you can always go back to it. It's a free and open, it's free and open software, free and open science. Just to do it, the primer sequences are in the paper, go for it. Yeah, yeah, that's true. So yeah, I think, I think it's not quite, quite finished yet, but for the most part, I think it's, yeah, it definitely has. Most of the major labs have moved to straight genomics now. There was, there was a strange time right at the beginning of the sort of genomics era and that there were the next generation platforms where people are actually sequencing whole genome. I saw a couple of papers where people would sequence whole genome, but just report the MLST, just to do MLST. It's like, what about the other 2000 genes? Anyway. No, I still see that. And I see people using genomic data just to report the serology. Right. Things like that. It's like, okay, it's a bit, it's a bit overkill, but all right, fine. Whatever works. I mean, it's just, it's yeah, you put any organism in it, it's the same pipeline. So I guess it's more efficient in the end. All right. Well, that's all the time we have for today. I want to thank both of our guests, Ed and Natasha, for joining us again, and we'll see you next time on the MicroBinfee podcast. Thank you so much for listening to us at home. If you like this podcast, please subscribe and rate us on iTunes, Spotify, SoundCloud, or the platform of your choice. Follow us on Twitter at MicroBinfee. If you don't like this podcast, please don't do anything. This podcast was recorded by the Microbial Bioinformatics Group. The opinions expressed here are our own and do not necessarily reflect the views of CDC or the Quadram Institute.