Hello, and thank you for listening to the MicroBinfeed podcast. Here, we will be discussing topics in microbial bioinformatics. We hope that we can give you some insights, tips, and tricks along the way. There's so much information we all know from working in the field, but nobody writes it down. There is no manual, and it's assumed you'll pick it up. We hope to fill in a few of these gaps. My co-hosts are Dr. Nabil Ali Khan and Dr. Andrew Page. I am Dr. Lee Katz. Both Andrew and Nabil work in the Quadram Institute in Norwich, UK, where they work on microbes in food and the impact on human health. I work at Centers for Disease Control and Prevention and am an adjunct member at the University of Georgia in the U.S. Hello and welcome to the MicroBinfeed podcast. Today we are here in the same room for the second time ever for the podcast, Andrew, Nabil, and myself. We want to go over a few things for fun, because this is the 100th episode. We were thinking about just kind of recapping where we started, where we come from really quick, and then maybe look into the future, maybe do some predictions. Do you want to do some predictions? Sure, why not? Sure. Let's be dangerous. So Nabil and I have flown especially to Atlanta just for this episode to be in Lee's basement. And to do a few other things. But, you know, it's just for this episode. So you're it's a nice honor to be here. Thank you, Lee. Yeah, thank you for having us. Thanks for coming. It's really it's an honor to me to be able to show you the basement and kind of put this microphone here on a chair in the middle of us. Don't give away all the secrets of the podcast, you know, there has to be some mystery there. Like we didn't know that you had a beautiful window and lovely view, you know, on the other side of the microphone. So where did we start this whole thing? Why did we do this? And and I know that we've done this before in a previous episode. So let's just kind of be brief about it, I guess. So I guess we had a seminar series on YouTube and we were getting inviting speakers to come and give slides for a small little lab group like it was a virtual lab group internationally. And then that that was quite difficult to organize, whatever. And then I think we just decided, hey, or you decided, hey, I want to try this podcasting thing, you know. And that was a bit easier to organize because it's just amongst three of us and we knew what we're doing. We could arrange times between ourselves and, you know, just have a bit of fun. Yeah, I think for me, the it's what you say in the intro, every episode, you say the same thing and that's it. That's the origin story. It's just the fact that the field changes really quickly and there's a lot of things that just go unsaid. And so the fastest way we could come up with to capture it and get it out there for people to learn from was some sort of podcast format. I think we tried, thought about different ways of doing it. I wanted to do like a zine, which was like, yeah, OK, that's great, but none of us can actually draw. So what are we going to put in a zine? Oh, you wanted to draw for it. Well, that's what a zine usually is. And if you're not a magazine, a zine, I'll show you what one of them looks like later. If I understood you, would we be in a different place? You might have. We might have been in a different place because I can't draw. So that was me. Yeah, so so that was that was the motivation for me where we got started from capturing all the the quiet conversations that just don't get put into papers. Yeah, I think we went through a few different formats, like should we do a blog, like a GitHub site? I think one of the first suggestions was podcast. But I think that you guys basically convinced me and turned me around to it. So podcast has been a great format. Yeah. And then we recorded a load of episodes before we even started. And in fact, we recorded our very first one. And it was just like a throwaway. We're like, will we even release this to the world? Or will we just chuck it, you know, after you've listened to it? Oh, yeah, yeah, yeah. I think that we said a few provocative things. And originally you want to just just be in the sound booth. And after a few provocative things, I think that we just got you out right away. You're like, no, no, no, I'm not. I'm going to be talking. I have an opinion. So that was back in 2019. I think I think it was September 2019. That's when we launched. Yeah. Well, I think we did loads for a few months before that. Oh, OK. We had about 10, but ready to go before we launched. You're right about that. So we queued up a bunch of them. And I would I would say relatively quickly, we came upon 2020 and our collective focus turned over to SARS-CoV-2 and the podcast changed. Well, initially, for first of all, I was resistant to doing any SARS-CoV-2 on the podcast because, you know, we're doing it every day, all day. Myself and Nabil were redeployed onto that. And it was just insanity at that point. You know, very long hours, very long. Everything, you know, as everything was just scaling up. And but then eventually, I think I I gave in. Well, the problem was, was that we wanted to keep doing the podcast. And it's like you didn't have we didn't have anything else to talk about. Is this like, well, in terms of the field, like that's all we've been doing. And so and that's a tip for people who want to start their own creative outlet. It's really hard to write something, produce something when it's something outside of your regular day to day, because you have to then go special, especially research and prepare it. And so, yeah, we got stuck doing a lot of COVID episodes. We did at least 20, didn't we? Yeah. Yeah. There's a whole bunch of them enough to make a playlist like probably. I think there is a playlist on the SoundCloud of all the COVID episodes. And then we had all the interviews and all the normal kind of stuff and intermingled as well, you know, so we didn't did just two viruses. You know, we did give a little bit to the bacteria people as well. True. So after a while, that cooled down and we started doing, I would say, more intermingled stuff like what you're saying. We were able to interview the bacteria people again and get a little bit more into it. But I don't know, for a couple of years, we were definitely in the weeds with that. When did you feel more comfortable kind of relaxing? Would you say like 2021? I think, yeah, about a year into it, it was a bit easier. And also we got into the routine of it and, you know, editing the shows and just teeing them up and just knowing instinctively when to split an episode when you're talking to someone. And yeah, and we're very good now at scheduling stuff, particularly you, you know, teeing them up. So actually, we've recorded, I think, 108 episodes and we left space for 100 episodes, you know, while we would be here. So we are actually quite organized unexpectedly. Yeah, that's actually pretty nice. Just teeing up like four episodes at a time, six episodes and then editing them. You've been very both of you, actually. But but especially recently, Andrew, like very helpful in editing them. Thank you so much. Now, there's a caveat there because I recently have switched to playback about 1.5 speed podcast episodes with Descript and you get through them quicker. But yeah, at the back of my mind, I'm thinking, well, is there some bad edits in there, you know, that I've missed? Hopefully I've gotten them all. Well, when I edit, I'm so slow. I think I take about three hours per episode or something. And I think I'm somewhere in the middle. Well, it's been fun. So here we are a year later, 2022 passed by. We started getting more into the routine and everything. We're in 2023 now. What's today's staging? January 19th. And and here we are at 100 episodes. You said we have 118 queued up, right? We've 108 recorded, you know, and they're ready to go like. Yeah, very nice. So we talked about the past. Now let's get a little bit into the future, I guess. And one special thing we can do in this podcast is is kind of predict what might happen in the next five years. Yeah. Yeah. What's going to happen next? Let's start of a new year. We've done this milestone. What what may we be talking about in the next 100 episodes and beyond? You know, that's that's that's sort of what I pitched as a discussion point. And hopefully we get something right. Yeah. Yeah. You guys had a few different ideas for what might come to be in five years. Do you want to go first, Nabil? I'll start off with something easy. Genomics is a thing. We're going to use more genomics. Yeah, great. We all have jobs. We all have. Yeah, well, you know. So that's an easy one. I don't think that's controversial. So are you saying it continues or are you saying the prediction is that genomics is more pervasive? I think it's going genomics is becoming trivial. Commonplace. I've been surprised during the pandemic and this year going to different meetings, seeing that there's more and more off the shelf commercial products that are just giving you. put in reads and you get all of the basic bioinformatics output that would have taken us some time to do a few years ago. It was like an academic process to do. Now it's like snap, snap, all automated pipelines, all the softwares containerized, like all of that control that wasn't there before. You don't have to install things yourself kind of thing. Like we've shifted to making a lot of these processes trivial. So that's happened very, very quickly. I think COVID has accelerated that because there's now a lot of interest in genomics. So for me, that's, is it going to be more pervasive? I don't know. Do you think it's going to be in every doctor's surgery? That's the thing. So that's, that's, that's my thing. I think it's, I don't think it is. I don't think it is going to be, I think you can find some of these field, I think it's the field bioinformatics papers where the, you know, nanopore sized sequences or very cheap machines, you'd have one in every lab, one in every GP surgery. I don't think that that's going to happen in the next five years. One in every hospital, maybe? Every rural hospital? Not rural. So we're going to argue this. So my position is, I think we're not at the point, the technology is there, but the expertise, we haven't stood up the expertise to be able to deploy this into a, into a frontline, like a rural or very much like smaller lab. It's going to have to go, samples are still going to be collected and sent to some sort of regional lab. Well, you do have regional teaching hospitals, like in rich countries now do have sequencing as a pretty standard thing. Yeah. Well, what we don't then have is the trickle down because you need so many people and so many bits and pieces, you know, to make it work. Yeah, exactly. So I see that. I just, I see that you can send the MinION on the laptop or the, you can send the MiSeq to anywhere in the world. Hopefully the reagents come along with it. But, but transporting the Binfi experience and the lab experience in particular, that's not quite there yet for me. I don't see that in five years, we're going to have that, but you guys will disagree with me. I mean, I mean, what I'd love to see it is, you know, you go to your doctor and before they prescribe antibiotics, they, they'll take out their phone with a little sequencer attached. They'll do a little test 10 minutes later. It'd say, oh yeah. Well, I know you have TB and you're what, you know, all these drugs won't work on But this drug over here will, you know, and maybe we'll start with that. And maybe from now on, you know, or from then on, all antibiotics would be prescribed based on what will actually work rather than what they kind of think maybe might work. Yeah. It's funny you bring up TB. So this is, this is the thing. So something like TB, I think that process may happen and that may, and that will become trivial, common, but will we do that for every pathogen? Probably not. Like TB is special because TB is a, is a slow grower, difficult, like, so your classic microbiology struggles. So then the genomics really is a cost saving, an effort saving measure there. But where it's simpler, where the, you know, if it ain't basically, if it ain't broke, why fix it? Like if you've already got existing pathological, like diagnostic tests, you're just going to keep using them. But I'm a cynic. But then do we need, you know, if it's not going to change treatment decisions, do we even need to do the test in the first place? You know? Oh, well, that's, that's a sad lesson I've learned from COVID. That, yeah, learning, you know, oh, we can tell you what the variant is, or we can tell you what the lineage is like, yeah, but the, but the public health advice doesn't change. So who cares? You know, that that's, I don't know, maybe I've had a bad experience, a very negative future prediction. This is a, that for me is the thing, like, what's the intervention on the back of that? I guess you can see it in useful in certain settings, like say care homes, or in hospitals where there's outbreaks, or you think there's outbreaks in the, you want to see is this just random, you know, like community introductions? Is it something in the water? Is it, you know, patient to patient, you know, like, where you can actually make a change and have a positive outcome? I can see that in in large hospitals. Yeah, yeah, definitely in large, large hospitals, where you have wards with lots of people. If it's a small rural hospital, there's only one person with Gono. It's only one person with Gono. Like, that's it. Oh, but then well, Gono, now that's a problem. What if you do too much sequencing, and then you figure it out? This person over here gave Gono to this person here. Then you're into all sorts of, you know, challenging, challenging things. Oh, yeah, I don't want to get into all that. That's more your area anyway, as a, as a, as a nicerio man, right? Oh, yeah. Anyway, but let's, um, do we want to say more on on diagnostics? I mean, I would expect there could within five years, it seems like a long time. So I feel like there could be some kind of innovations, like miniaturization of, of some machine or stabilization of reagents, and maybe more labs could possibly have machines laying around. So maybe regional labs, maybe, maybe more than regional labs. On the research side, though, I think that I would hope the big innovations will be on the read length and read quality side, we can already see some of these things come along, you know, but if you get really long, you know, 100, 100k reads, that are nearly perfect quality, you know, Q30, that changes a lot of what we do, you know, and it opens up a lot more doors for what we can possibly do. It saves if we can get single read Q30 out of the box, where we're basically not doing de novo assembly, we save a stupid amount of compute. Yeah, stupid amount of compute, like just processing oodles of short reads is a major computational bottleneck. I mean, if you take that out, you can do you can do your MLST or in silica serotype, you can even do a neighbor joining, you know, call some snips neighbor joining tree, whatever you can do it on your freaking phone, but you can do very low abundance pathogens as well. Yeah. You know, so you don't need to culture. Yeah, it could it could play right into metagenomics. Yeah, it simplifies it so much longer reads, which would just make life so much easier. That would be incredible. That's a good prediction for five years. Yeah, so no more culture. Everything's metagenomics with perfect reads, perfectly long massive reads. Yeah, that's that's quite easy. So I will say that doesn't some culture once in a while. Sure. Let's not let's not throw the baby out with the microbiologist still have a job. Classic microbiology I think is going to be really still be really, really important. Yeah, definitely. I don't want to replace that stuff. Hopefully we just we let's just keep our predictions then with the bioinformatics without without regard to how it affects everybody. Sorry. I guess things do need to change because people are sequencing more and more. So in five years time, you know, we're gonna have probably triple or quadruple the amount of sequence data for bacteria. And our algorithms will need to change as well because you can't just keep going daily to NCBI and downloading all the all the reads again and again and again, you know, there's a limit to what you could actually do and what is useful. So I think we may need different bioinformatics methods and different strategies to actually utilize publicly available data to contextualize what we do. I totally agree. I think a lot of a lot of people come to me and outline a workflow of the study or you read it in a paper. And it's this all against all exhaustive searching, comparing genomes. And it's not tractable at scale, you have to be a little bit more clever about it. I think that's interesting. So algorithms could change in the future, the way that we deal with these could change in the future, because everything is single read no longer, you no longer have to worry too much about a consensus. Well, we've seen that with the COVID data. We've seen that with like, let's take phylogenetics. I mean, they were generating million trees, which is absurd, right? Yeah, like, like 2018. That was insane. That was insane. I mean, okay, it's a very small genome, not much variation, whatever, but just but anyway, that that's, that's, that's absurd, aligning those sequences, that's absurd, you know, and it's possible the some of the work where the incremental trees where you're building a tree and then adding incrementally to it that that was a problem that was an unsolved problem that that we've seen come through with with COVID research. So hopefully that proliferates into into other organisms. Those methods may not strictly be out of the box transferable may need some tweaks. But I think I'm excited about that. I'm excited to see scaling up that we did for COVID for SARS-CoV-2 transfer to other organisms. Yeah, so speaking of that, there are probably innovations that draw from our experience with SARS-CoV-2 that probably can be expanded to other things in the future, too, huh? Yeah. And I think mathematics is a big issue, because how do people analyze data and then interpret data and all that, you know, fine, there's a lot of out of the box pipelines you can install and run, but then how do you interpret the data around we can't just employ an army of biometricians everywhere to analyze all of this data and interpret it, you know, because it does require interpretation and, you know, someone there with their big brain say well this doesn't look real you know is this really an outbreak or is this contamination because we've used two different labs to do this sequencing you know this kind of stuff so we do need to have some well we need more bioinformaticians so we need a lot of more training but also we need some better ways of analyzing data interesting without a human yeah it's not it's just not feasible to sift through all of that anymore trust gtp3 yeah yeah do we want to mention that one we were joking earlier that uh that was great i actually want to try this out so the the the activity would be to start off a genome to to give it like the first what would you say a hundred or a thousand base pairs of of a read that's going through the nanopore and say chat gpt3 this is the first part of my genome tell me the rest of it yeah so i think we were we were having a discussion about another thing that we were coming back to what we were saying earlier but increasing read length and read quality on different sequencing platforms and we were saying well you know why even bother having these new machines do it when we can just take the first 150 base pairs that you get out of the short read shove that in the chat dpt and say like oh just just tell me the next couple of kb or just give me the rest of the whole freaking genome based off that and it should just impute it for you my favorite would be everything would just be k12 yeah everything's fine or sars- cov-2 everything's so yeah what if it predicts like a new lineage based off of what you do yeah some sort of delta cron nonsense i'm very curious if someone can tell us whether sars-cov-2 is the most sequenced thingy oh yeah it is definitely versus say phyx or or anything else oh that's that's i don't know if you can determine that because i think in terms of data and in terms of number of samples right because phyx obviously it's usually the same thing being sequenced over and over again yeah same strain versus sars-cov-2 which is obviously different strains it's tens of millions now i don't know what the exact number is too many too many anyway back to predictions i guess um anything else on post uh so like sequencing and covid yeah so so one more outcome i would say is um something i'm very excited about personally being at cdsc is we have the wastewater system and they have to sequence wastewater to see what pathogens have made their way through bodies into the sewage and see what's going right in the community but that's kind of hampered by you know just having short reads or you know it takes a while to sequence i would predict that sequencing gets better if we get q30 you know maybe though that would make it easier to do all the wastewater my understanding is that uh the dna in the wastewater gets very degraded and so you got very short fragments and the challenge with wastewater sequencing is that it's very short pieces that you have to work with you don't get out big huge long reads but maybe improved methods you know might change over that i didn't know that but um that makes sense but also if if it's you know even if it's a majority that get fragmented maybe for example with what do you call it with the nanopore when you can just you can reject reads that don't look great adaptive sequencing adaptive sequencing maybe adaptive sequencing you reject the reads that are too short and you go on to long reads or you you sell it for well what we do actually at the moment is we put all the fragments into a sage elf which divides up the fragments into different uh different lengths and then we can just chop out which fraction we want and so we could chop out all the really short bits to enrich for longer okay yeah selecting based on size before sequencing is yeah size selection size selection really really key how well you can do that uh would definitely help i'll underline because we all know what wastewater surveillance is trying to do but i'll point out one thing that's really important that people might not realize is the wastewater surveillance is super useful because you are it's a it's intrinsically a pooled anonymized sample you know you're taking it from sewage which is served by whatever thousands of homes or it's or it's you know a district's worth of things so it cannot be uh brought back to a single person and that's really really important so that makes it much much easier to to get as a sample the the other fun part is um and you can use there are other ways of of looking at at water wastewater and and then predicting like what the populate what what you know the number of people you're actually surveying at that time so you can sort of calibrate the prevalence of what you're actually seeing in the sequencing but that's like a super important um reason why we want to do it it's also that you're sort of using you know one man's trash is another man's sequencing project because yeah it's just it's just stuff that's being produced anyway that you're pulling off you're not asking uh you know you're not asking you're not pestering medics who are very very busy to then go and do other sampling for you i wonder uh wastewater would it be possible in the future to look at like the amr genes circulating within a region and then decide well actually these antibiotics probably are not going to work on this population and maybe we should switch to these other ones which are more likely to work you could yeah you could say that i would like to do that yeah it'd be very nice it would help out with recommendations from cdc or elsewhere but it might be different recommendations for each city yeah which would be kind of scary and and hard to to keep track of actually i think on that though i would caution against using just the straight up uh measurements of antimicrobial genes you detect it's not it's not clear to me uh how you differentiate the that sort of measurement against you know what's anthropogenic versus what's just circulating because you are going to have because you know any we go out and pick up some dirt we're going to find super antibiotic resistant microbes in the soil like they're just there so how do you differentiate those towards what's you know what's a uh antibiotic stewardship um what's being done in that space like that that's sort of for me um a danger point where we're just looking at amr genes directly like that it's dangerous because also i'm not an amr person i wish that we got somebody for that today but um i'm recalling that that some genes just don't confer resistance based on which species oh man and that's another fun point as well yeah so but in i mean we're not disagreeing with you i mean i totally think that's that's a useful thing but there's a lot of good science to be done around that as well which will keep us very busy i guess another thing another prediction will be for the next five years would be that the amount of storage space we need is gonna we're gonna have to tackle that in some way because we can't just keep buying more and more and more hard disks and we're gonna have to have more efficient methods for storing data and uh for sequence data and potentially we might have to throw away data what do you think would you throw away something and re-sequence sure why not that would kill me but but maybe my feelings would change in the next five years we're well let me put it this way you throw away your trashy alumina ga2 stuff that you did eight years ago right like like 80 base yes 36 pairs anything or even older stuff and then you get um then you when you re-sequence it you're gonna get some nice tasty you know uh 300 base but what if it gives different results oh well then you're you're you're up your upper creek mate so i guess you have to compare before you delete yeah compare before you delete so um you know hash hash the consensus and see if it's the same we we had actually i mean i can i can look at this with the haiti cholera genome that was sequenced so many different times with so many different ways we we sequenced that with 36 base pairs single end on a whole high c plane wow and that's that's on ncbi and then um later on when we got paired end reads on on the myseq we sequenced that again and we sequenced it again um maybe a couple of times and then uh and uh you know all the long read companies weighed in on on that also so we have pac bio out there and whatever else sounds like uh with ct18 and the sang institute and those sequence there's a technology available but technology validation uh sample you know for everything you know one for every platform every iteration every change in chemistry ct18 is uh yeah it's typey is it typey uh salmonella typey yes sorry yeah sorry yeah if you're playing at home yeah it was original somehow typey those sequenced um so the first so og typey yeah but what we found actually was that i lost vagina had been passaged so many times that i lost a 20k region which is a problem yes interesting so i like this idea um going back to this is a good tangent but still going back to it it could be kind of fun to just like to know that your consensus sequence on better chemistry takes less space, you can get rid of the old 36-base pair high-seq reads, toss that out, that would feel good to free up that space. I'm just making the point that it's not, if you've got, if Andrew's saying like to save storage space you throw the sequencing data away to re-sequence it later, you're hopefully not, you're hopefully going to hit a better platform, you might get more bang for your buck kind of thing. That's an incentive just so that you don't like clutch onto it, no no no, let it go, you'll get something better back next time. But we're nearing a point where like hard disk prices come down about 20% a year per terabyte or gigabyte, but the price reduction in sequencing per base is going down much faster, so like it's fast and Moore's law, so we are going to hit a point where it's going to be cheaper to re-sequence than it will be to store, and that's going to come quite soon. Wow, I like that prediction. I would love to see, I think we should make a plot of it. Oh, we should plot that. We need a data scientist. We need a data, get in touch, we need a data scientist to help us plot it, we'll put it in a review and we can call it, you know, Page's law of diminishing returns or something, I don't know. Page's law. Page's law of diminishing returns. No, no, no, I think we can have the microbenzene law. Personalized medicine coming out of genomics, I mean, do you want to add anything to that? Yeah, I mean, could you just imagine you walk into your doctor's surgery again and they, well, I don't know, you've got, you're riddled with pathogens, right? And then they can give you exactly the right care you need to get you well and safe, you know, much quicker. I'll go one step further and say not only the pathogens that they sequence, but actually your microbiome, based off the community that you're carrying, they can make different recommendations or they could say something about, you know, different prognosis, like based off that. Oh, and then maybe they've got a fridge of yogurts and they'll just pull out one and say, here you go, this one will make you skinny. Yeah. Okay. Sounds like we're starting to get into the homeopathy. I like this future. With science. I like this future. So that ties in with how microbiomes can be tied in with your body type. Yeah, why not? Because if the sequencing is so easy and cheap, why not just have that on file at the same time? Yeah. Maybe when they sequence you, they find, oh, you know, you're missing this particular lineage of this particular book, which we know does this thing. Maybe, I don't know, helps you digest gluten or something like that. And yeah, they just give you some supplement and maybe it's not yoga, maybe it's, you know, suppositories and whatnot, depending on where it needs to get to. But, you know, it could be quite routine. Or athletes, you know, if you're like a runner, are we going to have a doping scandal, except it's basically around fecal? You get Lance Armstrong's sample and then you win the Tour de France. You send up your poo and then they dope your microbiome. You dope your microbiome. Microbiome doping. That's one of our predictions. In the next five years, there's going to be a case in the tabloids about microbiome doping. You're going to fail your vaccine test. The amount of money people spend on trainers, which give you that one percent edge, you know, for a marathon, say, like people will pay good money, you know, to get that tiny little edge, you know, even if it means posting off your poo. That sounds like a spinoff right there. I think we should stop talking about it and actually just get some seed funding around this. Oh, yeah. So if anyone wants to come in on our business idea, we will take a nice share and you can do a lot of hard work. Definitely. But beyond that, I think we should see more personalized medicine based on microbiome. I think for kids, it might be for infants. It might be very, very useful looking at that. And kids and the elderly would probably benefit from that. So kids would be I agree with you, but then also, I don't know, sort of besides the point, but also really is realistic that you have a lot of scared parents who may not want to or may want to. I'm not a father, so I can't I don't know about the social implications of it. But I mean, in terms of the science, I see that that's that could be a thing. Will we tell our kids go out and eat more poo and eat more soil? And, you know, no, why, why, why bother? What we're going to do is we're going to sequence the soil. We're going to sequence the poo and we're going to get out the organisms that are beneficial. And we're going to say eat this highly processed, awful product instead. I just sort of thought there, right at the moment, if you walk out your backyard and you do some sequencing, long root sequencing, you're going to find novel species, probably novel genus. Right. No matter where you go, you'll find them. But in five years time, will it be slim pickings? Will people have gone out and basically sequenced most of the diversity out there? Huh. And then when we've run out of names, you know, we've named everything and now it's like, oh, well, you know, well, it's difficult to predict how much we don't know. So so. So so it could be like an activity like in a high school to take the science class out to the side and do some sequencing on the soil sample and then submit it to NCBI. Yeah, because they've found a new species each and now they all name after themselves. I think they do this for phages already. Oh, there is a pro. I don't. Oh, man, I'm going to get into trouble. Some phage people are going to jump on me about it. There is a program where, yeah, you go out and isolate new phages. If you find a new one, you can name it. And so some some viruses out there have very stupid names because a person just picked, you know, John Lou Picard, this random names. But, yeah, you might see that more widely going out and just naming, looking at the diversity of life. So cool. What else is coming up? I mean, I think an obvious thing is data dashboards. That's the new hotness. Everyone loves data dashboards. Yeah, that's that's the new readmapper. When we were doing things, everyone had to have like their own readmapper. I think now it's going to be every everyone, every biophysician needs to have their own data dashboard they've developed. I'm just wondering, there's been no real, you know, readmappers have kind of plateaued and it's like, you know, no one is really advanced on Hangley's, you know, Minimop2. People just said, yeah, that's grand. I'm sure there's others coming out. Same with assemblers, you know, there was a period where there was an assembler every week. But now people are like, yeah, we're happy with a shovel or spade, skis or whatever, you know, it's we need more disruptive science. I don't know. There's no more disruptive science ever. I read an article on that. We'll put a link to that article in the show notes. But yeah, no data dashboards. I mean, I am thinking we're going to see a lot of data dashboards. We're going to see a lot of R Shiny stuff coming out. We're going to see a lot more. So do you do you predict that it's going to be like every time like you see you look over someone's shoulder, like that person has their own custom dashboard for themselves or that they like or are you kind of also saying like someone makes their own dashboard and they're going to there's going to be like a new one in JOS every week? No, no, I mean, I just mean, yeah, I mean, the former more. I'm just saying I'm just thinking because we're operating at scale. You are going to have more data aggregation and then visualization of that. These dashboards that just show these trends over time, breakdowns of sequence or whatever you want. That's going to be something that people are going to be more interested in because you have enough data to do it. You didn't. The old days is like you had a genome. OK, there's nothing to aggregate there. Yes, exactly. I wonder, will we still be doing a short read sequencing in five years time? No, no, we're going to. No, no. OK, we are going to be we're going we are going to be doing short read sequencing, but we're going to do what we've always done and shift the goalposts. So when I started short read sequencing was solid. That was 30 base pairs. Oh, no. Right. Like that was short, you know, 35 base pairs. So that was 30. First selection was 35. Right. First iteration. That was short. And then long was the four, five, four. I don't know if you can remember the read length of four, five, four at that time when the because I'm going to mix the numbers up early on. It was like 200. Yeah. Yeah. And then you had and then you had, you know, you're saying it was obviously your 800 ish point. Then that's your short, medium, long. And so what have we done? We've just increased the numbers. And now short is 300. 300 is not short. 300 is pretty good. This sounds right. I agree. So putting it to 300. So short read. So I agree. We're going to change. We're not going to see short read. We're not going to see 150 base pairs. But we're going to see short read becoming two KB. That's the short read, you know, or maybe eight. Maybe that's ambitious, but it's fine. I could see that because, you know, So, especially with government, it's like you've bought the machine, you're not going to just replace it every year just because your venture capitalists tell you to or something. You have the machine. I can totally see that the reagents change and help you get longer and longer reads on that. The goalposts change, like you said. Yeah, you're going to edge forward a bit more, but your short read is going to become long. But then what's your long read going to become? The whole chromosome. Nice. Right? Yeah. I want that, yeah. The whole chromosome or half a chromosome should just fall out, right? I mean, that's the dream. That's the goal. What have you been doing? Yeah. Well, you know what? We kind of started with that idea of Q30 chromosome long reads and we went full circle. Maybe it's Q50, you know? We can push to bar it. Q50, yeah. Well, like if you look at PacBio and their kind of chemistry where they go right around a consensus, you know, and you might go around the same fragment 10 times and it builds consensus on the single read. You know, maybe wind up somewhere else. Yeah. Yeah, exactly. So not only a five megabase read, but you read around the chromosome five times for a 25 megabase read. Yeah. And you get a Q50. That's nice. Yeah. I mean, that basically removes your genome assembly step. Yeah. So you sequence it and the genome just falls out. I love that. Then you got to think, but then the best part is of that, though, is that people can't hide behind, oh, well, I was assembling genomes. They actually have to do some biology then. They have to actually think about what's encoded on that genome. What about the structure of that genome? What about, you know, gene loss, gene gain? And they actually have to go back to thinking about some genetics. Actually, I'll put a prediction out there that I'm not the originator of this prediction, but when I was reading, I found this paper several months ago, a brief history of bioinformatics. And they made this point in the paper where it was people who do computation in chemistry, they're chemists. People who do computation in physics, they're physicists. People who do computation in biology, they're bioinformaticians. And the prediction was this is going to slowly emerge together. And so if you're saying people have to actually focus on the biology, maybe the prediction in five years is that bioinformatics finally merges into biology well enough. I think so, because I'm not seriously like, what, Circa 2015, how many seminars you went to and the whole thing was just the mechanics of picking out the best, the best algorithms. How do you polish it? How do you clean it up? It was completely devoid of, I mean, the organism you're worrying about GCAT, like content, you're sort of worrying about like codon bias for that. But the actual content of that genome was irrelevant. And if you take that step out, and the thing just falls out of the sequencer anyway, then what are you going to have in the seminars? You're going to have to talk about something else. Yeah. All right. We're on opposite sides. We're in different areas here. So you're saying, Andrew, no way, and Nabil, you're agreeing? Absolutely. There we go. So we can't be wrong, though. We take both positions. Yay, we predicted it. How are we going for time? Let's call it. And I have a quiz for you. Oh, OK. Oh, no. Oh, no. I have a quiz for you. So originally, I was thinking of bringing in a mystery person. You have to guess who the mystery person is. I won't do that. No. But then, you know, you guys get me to mispronounce your city all the time on every single podcast episode, which I now know is pronounced Norwich. Well, thank you. Maybe we'll keep that up on the intro anyway, just for fun. Yeah. But now that you are in Atlanta, why don't I give you a quiz on how to pronounce the cities in Georgia? Oh, no. Oh, well, that's going to be hard. Oh, so he's got he's definitely got the home side advantage here. We're in a lot of trouble now. Give it to us. All right. Let's get this over with. So I just said a city name that started with an A that we are in that you're visiting. How does a native pronounce it? Atlanta. Atlanta. There you go. So you have to swallow that letter, Atlanta. Atlanta. Atlanta. Atlanta. Where is this place? It's a Dekab. Oh, that's the Dekab is the county. It's pronounced correctly. Dekab. Dekab. Dekab. Mm hmm. OK, that doesn't seem too bad. Obviously, we can say Decatur. Yeah, we ignore what it actually is written as and say Decatur. That's right. So Decatur. All right. So we had that on a previous episode going on about Decatur. We did. I don't know if you have this kind of onion in the grocery stores. You know that if you get a sparkling wine, then you get champagne if it's right from the right region. Yes. Do you know that we have a certain region in Georgia where you get onions? Really? Yes. Really? You have a specific onion? And if you don't get it from this region, then it's called a yellow onion. OK. OK. If you do get it from this region, I'll see if I can write down the same physical onion or same species or whatever. It probably is. Right. People say it tastes sweeter. So you don't know this word yet, do you? Maybe you don't have it. Maybe not. Oh. What the hell? Oh, I know this word. It's from a cartoon. I don't. Vidalia. You don't have this. Vidalia. Vidalia. Vidalia. Vidalia. OK. And this is the name of a city where they grow this onion. Oh. OK. So the way you pronounce it. You're going to say Vidalia. Yeah. Yeah. Oh, really? Vidalia. Vidalia. Yeah. For English, for British speakers, it's Vidalia. So and this is just like champagne. If you get this onion from this region, then you can call it that. OK. And it's the name of the town. That's wild. All right. I'm going to find at least one more off of this YouTube video. I'm curious now. I want to taste this protected onion. Yeah. Bring one if you're allowed to. Bring one to bed. Because we're looking straight at a picture of it. It looks just like an onion. It is an onion. I mean, like an unremarkable onion. OK. I'm going to give you one more town that was famous for a gold rush in the 1800s. I've been there several times. What is that word? Oh, God. So like Roald Dahl. Dahl-one-gah. Dahl-one-gah? Well, maybe you're misreading an L for an I there. So here it is on the board here. No, it's a Dahl. Is that an L or an I? That's an L. So it's like Dahl. Roald Dahl. Roald Dahl. Dahl-one-gah. Dale-a-nay-gah. Dahl-one-gah. Crap. Dale-a-nay-gah. Wow. Dale-a-nay-gah. It's spelled D-A-H-L-O-N-E-G-A. Yep. What did you say? Dale-a-nay-gah. Dale-a-nay-gah. Maybe a more micro-bin-feed type quiz could be pronouncing server names for Salmonella. You don't have any more strange local words for us to pronounce? I think I'll stop there. It doesn't have to be towns. It can just be, like, words. All right. All right. Do you say y'all? They do say y'all. We do say y'all. I say y'all, actually. I've adopted it. I like y'all. Because English doesn't have a collective. I meant you all. We'd say yous or yousers. Yous. Yous guys. Very Dublin. I've been saying yousers. Sounds like a part of a turkey. That's a gizzard. That's what it sounds like. What about, I don't know, what's this stuff around here that we can't pronounce? Well, you got to cab down. That's the big one. What's the name of the... Get a Google Maps. I know. What's the name of the street? What's that one? Ponce... Ponce... Ponce... Ponce... Ponce de Leon. Ponce de Leon. Ponce de Leon. And then there's one more that we had fun with before today. On the way over to CDC, we were passing by the Emory Campus. The Emory Campus has about four or five buildings names. Okay, so can I spell it out? No, you spell out the word. So that I feel you're saying what I think you're saying. R-O-L-N-I-N-S. R-O... R-O-L... L. Double L. Double L. I-N-S. I-N-S. Well, first of all, it's not R, it's OR. Really? Well, as in Ireland would say OR. It's okay to say R. Or. Or. Or. So that's, yeah. So everyone listening in, just think how you'd pronounce that word. It's Romeo, Oscar, Lima, Lima, India, November, Sierra. Very good. Try that out in your head. What does that pronounce? They don't say Sierra anymore, they say sugar. Oh, really? They say sugar? Yeah. I'd say, well, Rollins. Correct. Let's see. I'd say Rollins. I'd say, I mean, I normally say Rollins. Yeah. All right. I can, I can cap it there. I just knew I was going to have a little bit of fun, but I don't know. I don't know. I don't know. I don't know. I don't need to put you too much more on the spot. Well, thank you very much for hosting us for the 100th episode. It's been great. You know, all the episodes have been great, you know. And this is a very special one, you know. We've gotten this far. I never expected us to get to 100. Never. I thought, I definitely thought I was gonna fizzle out. It's been pretty awesome with you guys. Yeah, no, I thought we'd do like 10 and then we'd never, we'd be like, oh, okay, yeah, no, that was good. Let's do it again. Yeah, yeah, okay. I'm free there, so I'm not free. Okay. And we'd never get the calendars to sync up again. And it was just like, oh, okay, yeah. It's been too long. I'll forget about it. But no, we kept going. I don't know. And we haven't run out of ideas yet either. No. It's weird. We have a whole spreadsheet of ideas. There's more ideas. It's actually, yeah. There's just more and more to talk about. There's a few subjects that have been back in my mind we still haven't got to yet. Oh yeah. We should do more metagenomics, I think, and that kind of area. Absolutely, metagenomics. The MLST and SNP showdown has to happen still. Yep, yep. Been meaning to put those head to head. Tons of stuff coming down, coming. Yeah, looking forward to it. And I hope everyone listening has enjoyed the journey with us that is looking forward to new content as well. Well, this has been the Micro Benfie Podcast coming from Lee's basement. Party on, Wayne. Party on, Goff. It took me a second. Glad you got it. Thank you so much for listening to us at home. If you like this podcast, please subscribe and rate us on iTunes, Spotify, SoundCloud, or the platform of your choice. Follow us on Twitter at Micro Benfie. And if you don't like this podcast, please don't do anything. This podcast was recorded by the Microbial Bioinformatics Group. The opinions expressed here are our own and do not necessarily reflect the views of CDC or the Quadram Institute. I'm going to get into trouble about the phage comment. Oh, sorry. Oh, they're going to get me. They're going to at us. They're going to at us. They're going to at me. If you do know the program, just let me know.