Hello, and thank you for listening to the MicroBibCade podcast. Here, we will be discussing topics in microbial bioinformatics. We hope that we can give you some insights, tips, and tricks along the way. There is so much information we all know from working in the field, but nobody really writes it down. There's no manual, and it's assumed you'll pick it up. We hope to fill in a few of these gaps. My co-hosts are Dr. Nabil Ali Khan and Professor Andrew Page. Nabil is the Head of Informatics at the Quadram Institute in Norwich, UK, and Andrew is the Director of Technical Innovation for Liagen in Cambridge, UK. I am Dr. Lee Katz, and I am a Senior Bioinformatician at Centers for Disease Control and Prevention in Atlanta, United States. Welcome back. We're here at day three of GMI, where Page is slowly taking over everything, which is great. Being in Page and all that. So, I'm here with Lee again, and Ruth. So, Ruth, do you want to introduce yourself? Sure. I'm Ruth Timme. I'm with the U.S. Food and Drug Administration, and I run the Genome Tracker Program, FDA's Genomic Epidemiology Program for foodborne pathogens. Awesome. So, you're the guys who deposit, like, hundreds of thousands of salmonella in the public domain? Hundreds of thousands is probably an overestimate, but we deposit a lot. Yeah, more than a lot. Yeah, at one point, I think I was the biggest submitter of Listeria, and then I was quickly overtaken by Ruth on submitter of pathogens for NCBI. So, what do you actually do day-to-day? That's a hard question. Well, so, FDA funds a couple dozen laboratories to sequence data, and so there's a lot of back and forth with working with them, making sure the data is submitted in a standardized way, and if it's not, you need to update it. So, there's a lot of work like that. And then I also work a lot of interagency stuff with the U.S., CDC, USDA. And then on the international side, I work with PAGE to try to make sure that standards we're putting in place are kind of adopted internationally. And you've just been working on the DOM. What is the DOM? So, the pathogen data object model is this idea that whole genome sequence data for any kind of pathogen should be submitted and stored in the same structured format in the INSDC. And you would think this is a very simple idea, but you would be surprised at how many different ways you can submit the pathogen genome to the INSDC and store metadata in all different locations. So, it's very confusing for submitters and people querying to figure out where that information is. But is this yet another standard, or is it like a standard to rule them all? Well, it would just be one standard within INSDC. Once you set it there, then the third-party apps can just plug into it. You don't have to understand that. You don't have to worry about it, that your Salmonella standard is going to be different from a virus standard. It's all the same. That's really cool, actually. Long overdue. Although, yeah, we should have had that years ago. But it is doing some good work. So, what was before the DOM model? We were submitting stuff before that. How does it make things better? How are we doing it before DOM model? So, I thought that this had already been said, because we had said it for foodborne pathogens a decade ago, and we all plugged into it. We have third-party applications plugged into it. Makes submission very easy. Makes retrieval very easy. I thought that's how the world worked. And then when I started seeing the COVID genomes come in as biosample records only, and people were merging metadata onto the flat file, and then building pipelines to do that and query it. It's like, oh, no. Then I realized, actually, in the virus world, that's not a standard. Yeah, I mean, viruses do everything differently, don't they? Even naming conventions and everything are totally crazy. So, if you query viruses in the INSDC, I don't know, 50% of them don't contain a biosample. Wow. Actually, yesterday, I was accused of not uploading data when I had to INSDC, but it's actually because it's so complicated when you deal with, say, multi-country, multi-institution organizations, and these standards don't necessarily reflect everything. You know, one lab gives it to another lab, gives it to another lab, gives it to us to sequence, and we upload, and then it's a huge, big, roundabout way of doing everything. You got a gotcha question yesterday about that. Yeah, yeah, and I did look, and actually, it was all in there. We had actually submitted our COVID raw data with metadata. With a biosample? With a biosample, of course. Oh, absolutely. And with the consensus genomes as well. You know, a lot of people don't do that, but for some reason, it's not showing up as it should. You know what, genome, I think it still doesn't have a biosample. Am I wrong about this? Is Wuhan one? Doesn't have a biosample. Yeah, it's just a GenBank record. I think it's frustrating that when the first one comes in like that, it's really easy just to, you know, copy that. Yeah, anyway, hopefully for the next pandemic, we will learn some lessons. One thing I wanted to get into, just really quick, is Andrew, you keep saying, like, so everything is phage here. What are we doing at GMI? And I thought maybe Ruth had a good answer to that. Like, where is the divide? I'm not sure. I think my emerging thoughts are GMI is a good place to discuss the standards that are being set in other organizations. GMI has a strong intergovernmental presence here. Had good collaborations with WHO, FAO, and ministries of health in other countries. And, you know, one role, and I'm not sure what the future of GMI is, right? But one role could be that we look at standards set by groups like phage and then figure out how to implement them in the real world. Yeah. Which I think is a much harder part than just creating the standards. Yeah, I feel like what I'm seeing, I'm agreeing with you. What I'm feeling is phage has a lot of hands on the technical stuff that nobody's wanted to pay attention to before the virus rolled just now. And that GMI is the political part of this whole thing, where they have context, how to implement it, how to get uptake from different governments. I think that's a really good distinction. Maybe. Maybe GMI can just be like a subgroup of phage, where, you know, it's the political subgroup. I've thought about that. I'm not sure how that would work. It'd be a disaster, wouldn't it? No comment from government people here. But it is actually quite nice that there's so much overlap between the two groups. And, you know, you've got phage, which is actually getting stuff done and producing papers, producing, you know, actual functional code, producing specifications, you know, things that are useful to the world. And then GMI, obviously, is bringing people together to discuss those. And, you know, it's a presentation platform. Awesome. Well, thank you very much, Ruth, for talking to us. And I'm sure we will catch up to you again. Thanks for having me. Thanks. Thank you so much for listening to us at home. If you like this podcast, please subscribe and rate us on iTunes, Spotify, SoundCloud, or the platform of your choice. Follow us on Twitter at Microbinfee. And if you don't like this podcast, please don't do anything. This podcast was recorded by the Microbial Bioinformatics Group. The opinions expressed here are our own and do not necessarily reflect the views of CDC or the Quadram Institute.