Hello, and thank you for listening to the MicroBinfeed podcast. Here, we will be
discussing topics in microbial bioinformatics. We hope that we can give you some
insights, tips, and tricks along the way. There's so much information we all
know from working in the field, but nobody writes it down. There is no manual,
and it's assumed you'll pick it up. We hope to fill in a few of these gaps. My
co-hosts are Dr. Nabil Ali Khan and Dr. Andrew Page. I am Dr. Lee Katz. Both
Andrew and Nabil work in the Quadram Institute in Norwich, UK, where they work
on microbes in food and the impact on human health. I work at Centers for
Disease Control and Prevention and am an adjunct member at the University of
Georgia in the US. This episode, we're doing another software deep dive. This is
where we interview the author of a bioinformatics software package. We talk
about some of the obscure and interesting details of popular programs that do
not make it into the paper. We have Robert Petit, who is talking to us today
about Bactopia. Robert received his master's in bioinformatics from the Georgia
Institute of Technology in Atlanta, Georgia, USA, and his PhD from Emory
University, also in Atlanta. During his graduate studies, he worked on mostly
Staphylococcus aureus genomics, but he was also involved in sequencing of the
first whale shark genome and developing a typing scheme for identifying Bacillus
anthracis, which is the causative agent of anthrax in metagenomic sequences. A
major component of Robert's work was the development of Staphopia, a
bioinformatics workflow specifically designed for the analysis of Staphylococcus
aureus genomes. This work ultimately laid the groundwork for Bactopia, which
we'll be discussing today. Currently, Robert is working with the Wyoming Public
Health Laboratory in the US to help build their bioinformatics infrastructure to
help complement their sequencing efforts in response to SARS-CoV-2. In the show
notes, we'll have some links for the docs, the repo, and the publication. So
Robert, hello. You and I have had some intersection in our history before. I was
also at Georgia Tech at the same time as you, and I don't know if you knew this,
but I was an undergrad at Emory, so we kind of flip-flopped. I was undergrad
over there, and then you went to graduate over there. So let's just start it
off. What is Bactopia? So first, before I get into Bactopia, a quick thank you
to the 70, 80-plus tool developers that Bactopia uses. Bactopia includes a bunch
of tools, and basically, without these people developing these tools, there's no
way Bactopia exists, and so thank you very much. So what is Bactopia? Bactopia
is an all-in-one workflow for the complete analysis of bacterial genomes. My
punchline is get through the analysis steps quickly so you can get to the fun
part of the data, the results, much quicker. So what I remember is that you guys
were starting off on Staphopia, and maybe that'll be like a good intro into what
this is all about. Would you say so? Yeah, definitely. So Bactopia kind of goes
back to 2010, when I was a master's student, and get into the master's program,
like, you got to find a lab to work with, and I send all the emails out and
couldn't get a lab. And so at that point, I'm like, oh, crap, what do I do? So I
even started looking into ecology programs, because my undergrad was in ecology,
and I loved field ecology. But King Jordan, Georgia Tech, suggested, hey,
there's this guy, Tim Reed, over at Emory University who's doing bacterial
genomics stuff, and he's like, you should send him an email. And he sent him an
email, and the rest is history. And so now I'm in bacterial genomics, and I'm
quite happy with it. Otherwise, I'd probably be in a field taking samples of
animals, and I'd probably be happy with that. You know, that's kind of fun, too.
But so Tim had this grand idea, hey, you know, there's all these public Staph
aureus genome samples, and when I say all these public Staph aureus genome
samples, I'm talking, like, there's, like, hundreds of genomes available on the
sequence read archive and the European nucleotide archive. And it's like, hey,
we could download these and process them and, you know, include them in our own
analyses, but we'd have to develop a workflow for it. And so that led to
Staphopedia, which was the predecessor to Staphopia. And so we developed this
PHP LAMP infrastructure, where you would submit your FASTQs to our website, and
it would go on the back end and launch a bunch of shell scripts and spit out
some results. That LAMP is that's Linux, Apache... MySQL and PHP. Yep. Yeah. So
it was back in the heyday before all, like, the static site generators and all
that. I put together Staphopedia. I still have the source code somewhere on
Dropbox. It's one of those, like, historical things, I'm like, I gotta keep
this. At that point, we were versioning with SVN, and I think that took about a
year before we even started versioning. We got away from this whole web
platform, submit these large FASTQ files to this random website that would
process your genomes, just because at that point, take a few hours to submit
your genomes, your one genome, and then take hours to get your results. And so
at that point, I converted, there was a name change at some point from
Staphopedia to Staphopia, compliments to Tim Reed on the name. And then at that
point, we shifted from this Bash shell script type thing to a even bigger Perl
script that was managing the analyses. And I think at that point, we just had
the basics, take in the FASTQs, do some quality metrics. Is it a good enough
FASTQ? Will it pass? Is it actually a FASTQ? MLST, annotate the genomes, and
then do some blast stuff. And so eventually I segued from Perl and learned
Python, so that meant Staphopia followed and it turned into another monolithic
workflow within another language. So it turned into a complete Python script,
but I thought I'd get fancy and do some Python modules and create a Python
library. So all the steps are in different little files and that worked great
for a little while. Then eventually rewrote again, except this time using Rufus,
which was a workflow manager written in Python. And so I think at this point
we're somewhere 2015-ish. So within those five years, Staphopia was rewritten
multiple times, only for Staphopia to get rewritten one more time. And that the
final, the final rewrite was in Nextflow, probably 2016, 17-ish. So I mean,
we're looking at six, six plus years of development in Staphopia before it got
to its final form. And I think the transition from 2010 bioinformatics,
specifically bacterial genomics, was very different from like what it looked
like in 2010 was very different from what it looked like in 2016. But Staphopia
contained a lot of the philosophy of 2010 because that's kind of where it
started. And so it didn't really have the nuts and bolts of like 2016, where we
were starting to get into containerization and all that. And some of the newer
workflow programs. We put out Staphopia and the first question I was always
asked was, that's cool, but can I use it for my bug? And it's like, I think so.
Like it should just be just, you know, modify this, modify that. And most people
were like, okay, so no, I can't use it for my bug. And so we were eventually
asked, hey, we have some homophilus influenza samples that we want to process.
You think you could send them through Staphopia? And so we're like, okay, we'll
do it. We did some ad hoc, you know, take out these steps that are Staph
specific or change this reference genome to a, so it was very manual and very,
we're just manipulating because we know the backend. Whereas if you were to hand
it to somebody else, they would have been like, it's going to take me ages to,
you know, figure out where all the steps are. And this Bactopia's first citation
is actually disguised as Staphopia. So we had this like very early ad hoc
conversion of Staphopia to Bactopia and it worked. But at that point it was kind
of, let's take this to the next level and get this caught up with the times. So
Staphopia is still kind of 2010 ish, 2012 ish. Let's go ahead and we had the
opportunity to completely rewrite again because job security, I guess, just keep
rewriting. But no, it made sense because you know, we were kind of rewriting
with the, with the, the progression, the evolution of the community and the
field. At this point, I mean, you were talking about, by the way, this is really
awesome. Like this kind of history, because we don't think older than Git
anymore. Like you started with SVN and Dropbox. Oh, it reminds me like when we
get there, like at one point in Staphopia, similar to like Torsten has all his
tools, the binaries in the programs, Staphopia had all the binaries you needed.
And I'm sure I violated, you know, numerous licenses, but you know, we had to
get these programs out to people. It had about a gigabyte tarball you could
download that had all the reference data and all the binaries to help you run it
on your system. And it was hosted on my Dropbox for a while. So I guess, I guess
it's, I'd never got a notice from Dropbox saying, Hey, too many people are
downloading, Staphopia has stayed pretty small, but yeah, it's one of those
where you're like, you know, ad hoc things that you did back in 2013, 2012 to
get, to make it easier. Let's hope we're past the statute of limitations there.
I mean, it was a different time.  always tell me about how they used to get
GenBank sent to them on floppy disks, like all of GenBank and you're just like,
yeah, you're going to, you're going to package RefSec, you're going to package
like the database with the binaries with your software. And now I think what is
the size of a play with Backtopia that does allow you to download and stage data
sets. If you want to download a cracking data database, like 90 gigs, like TTDB,
if you try and build that into Backtopia, that is like 90 gigs to put in there,
just that, just that one step and then there's everything else on top of it. So
it's definitely changed and the way you write the tools have to change. I mean,
this is, this is just like a really fun view into history. I feel like maybe,
maybe in bioinformatics, our generation time is just so fast. Like maybe I'm a
couple of generations down into it now. Cause we were, we were saying, you know,
floppy, you're saying floppy disks and GenBank, which I have also heard that
that was not my time. I guess I would be more the CD-ROM generation because we,
we had sequences from four or five, four sequences on CD-ROMs that were, that's
how they delivered it to us. I have a funny feeling. My, my doctoral lab had
TigerFam on a disk on a CD drive, like that was sent out. I think, I can't
remember, maybe it was something someone made, or maybe that was something that,
that TIGR actually sent out. Tiger had the, you could download it, but I don't
know if they actually sent it out. We did download it in grad school. And so
like another generation, I feel like people are making like hand woven
pipelines. And I'm part of that and using things like SVN and, and, and a whole
bunch of other things. Was it CSV? CVS is the, CVS and Mercurial, that was the
predecessor to SVN, wasn't it? Yeah. I feel like I used Mercurial at some point.
Yeah. I used Mercurial and it was a lot like Git and I was using it for a while.
And then everyone was just like, we're using Git. So that's why I switched over.
But actually I feel like Mercurial was, was, was more advanced than SVN. So it's
just, it's funny seeing like the culmination. I said, I would say like of all
this stuff and it's like, why don't we have a good pipeliner right now? Like why
even I I'm like so guilty of this. I've, I've made so many pipelines, just like
hand weaving them. And, and why aren't I using something like next flow or be
pipe or, or something. And I think this is just a really good. I think you've
been giving yourself a really good history of why you've been getting into
backtopia. So I, I just wanted to say, I appreciate that. I think for me, the
whole history that, that sort of rabbit hole of going through all these
different software and rewriting it over and over again, reminds me of this
quote, this famous quote about writing, which just goes, every writer has only
one story to tell. And he has to find a way of telling it until the meaning
becomes clearer and until the story becomes once more narrow and larger, more
and more precise, more and more reverberating. And that is basically, you can,
you can do the text, you can stick it on the wall, because that's not exactly
what's going on here, it's just this refinement and in doing so understanding
deeper, the kind of problem that we face, but this is, which is fairly straight,
let's, let's drop into that. Like the, what stopopia was doing. You mentioned a
little bit, but let's, let's focus on backtopia, which is the current iteration.
Like what does it actually provide to the user? Like the analysis, because I've
used the software. I like the software as it's analytical outputs. It's running
a fairly straightforward tools. So it's a culmination of about 10 years of, you
know, those rewrites to where I'm like, I think I'm to the point I could just
fire this off. And this gives me all the results. I'm most likely going to need
for, if somebody says, Hey, I have this genome, this bacterial genome. Can you
process it? And I can process it. And here's most of the results are going to
use. And so I think what backtopia does is simplifies that process of going from
raw sequence to just a slew of results that you may or may not need. But at
least they're there when you do need them. And there's no, like, I need to rerun
this specific analysis and the target audience I think would be those that, you
know, they, they may be novices in bioinformatics, but sequencing is so easy to
get now that they it's either, you know, I got these a hundred sequences. Now,
what do I do? You could throw them in the pipeline and get all the results
instead of starting from the beginning and trying to figure out how do I write a
workflow or, you know, which tools do I need to include in the workflow, which
analysis steps in this whole, can I even do this? Do I need to hire a
biopharmacist to run this or completely rewrite a workflow? You know, there's
all kinds of nuances. And I just wanted to create something that one I'm going
to use on a daily basis, but two, something that hopefully others can pick up
and get to a result and get out of the nuts and bolts of the weeds of like a
gritty bioinformatics workflow development, command line and all that. Now I do,
I do encourage people to learn the command line and understand what Backtopia is
doing, not just, you know, run it through and get some stuff and do stuff with
it. So, and I think that kind of goes into the documentation for Backtopia where
I've kind of tried to highlight what each step is doing and how it's doing it
and which tools are involved in each step. So that way they can go back and say,
all right, this step, this program ran this and now I can take this program and
go see what that program's doing and learn more about it if I want to. And so
it's kind of like this, this hybrid, get your stuff faster, but potential
training introduction into bioinformatics or specifically, just let's just make
the assumption that anything I talk about is bacterial genomics. The WellShark
was my introduction and my exit. I love being able to process thousands of
samples on a small infrastructure and not like require terabytes of memory to, I
mean, I can still eat up terabytes of memory doing some bacterial stuff, but
just to assemble the WellShark took quite a bit and it's just like, we need
more, we need more resources. So for Bactopia, what would be a typical use case
that someone has used the platform for and what are the kinds of outputs that
they would get, like tangible outputs they would get as a user that would be
useful for them? A typical use case is I have this bacterial genome. Is it, does
it have resistance to certain antibiotics? So Bactopia would take in your raw
FASTQ files, assemble them. It'll do quality control to filter out bad reads.
It'll create an assembly. And then from that assembly, it can annotate the
genome and then make predictions about antimicrobial resistance. And then you
may be interested in what multilocus sequence type it is. Is it clustered or is
it specific to a sequence type that is known to be associated with certain
virulence? And so you would get that outputs as well. But that, that is getting
into Bactopia has a general, general workflow where you provided no datasets, no
species specific datasets or no general datasets. And another workflow where if
those datasets are provided, they'll supplement the initial Bactopia run. And so
those datasets can be MASHs, GenBank, RepSeq, Sketch. So basically you can query
your sequence against all the RepSeq using MASH or SourMASHs, GenBank, Sketch,
where you can get an idea. It's like a preliminary, a quick way to say, I
sequenced this, do these kind of align with that? So if you sequence Staph
aureus and it comes up E.coli, then there's something for you to figure out.
Then you can also include, you know, reference genomes to call variants against.
So if you have your, your reference of interest, you can say, Hey, I want to
know what SNPs and indels are in there. Include all your genes, proteins,
primers that you want to blast against the, but I think in most cases, most
people are going to run this Bactopia datasets command, build a species specific
database, and then run it through that organism. So does this do any, any
phylogenetics? It can. So this, this is kind of getting another subset of
Bactopia. So, so there's this Bactopia datasets, which includes, tries to pull
in public datasets to supplement your analysis. There's Bactopia, which is kind
of like this isolate based sequence analysis. So something you would just run on
all isolates or all samples. And then there's Bactopia tools, which are separate
workflows for comparative genomics. And so those Bactopia tools is where you
would run your phylogenetic analyses and all that. I chose to keep them separate
mostly because that's kind of like checkpoints. If I, if I want to run, you
know, a thousand samples, I don't necessarily want to run all a thousand samples
and then build trees just because the time constraints or the time to take, it
takes to build those versus run all a thousand samples, figure out which ones I
want to include and not include. And so I can exclude samples based on low
quality, low coverage. And Bactopia does that in the, the main workflow where it
will, before it processes genomes, it'll say, Hey, this is the minimum
requirement to, you know, go further in the pipeline. And that's just to prevent
downstream. You tried to assemble this genome with two X coverage and it just
failed. And so why not just catch it at the beginning? And that's, that's users
can control how much coverage and all that. But yeah, it's, I like to keep it
all separate and it kind of goes into the resource requirement of running
isolates one at a time versus running a complete phylogenetic analysis where I'm
going to have to ask for a much bigger machine to run a pan genome and
phylogenetic analysis of a thousand samples versus I can process.  all thousands
of samples on a small desktop, it might take a while, but on a cluster it would
go much faster. This will, and again, it allows me to keep Bactopia somewhat
species agnostic and include Bactopia tools which can become very species
specific. And that would be like, run Cleverate on your Klebsiella's or
Aggravate on your Staphylorea samples, stuff like that. And so it's, I like the
separation and I don't always know what comparative genomic tools I'm going to
want to run. And I like being able to just, after I process all my genomes, say,
all right, these are what I'm going to do. Yeah, I mean, I really like the
workflow and what I think Bactopia does, since it is sort of this thing where
you've been wrangling with this process that we all go through of genomic
analysis, microbial genomics analysis. These breakpoints are exactly where I
would also have breakpoints myself, if I was doing it by hand, or if I was doing
it with a student, like if I had a doctor student and I was saying, okay, here's
the sequencing data, I want you to get to this point and then we'll have a
meeting, you know, assuming they know how to do the, run the software, whatever.
It is very much run MLST, do the assemblies, do some QC stuff on it, run
crafting, give me the species, species breakdown for the reads, whatever. And
then stop and then take stock of that, check if these are okay. And then move to
the next stage, which is the species specific, and then check if that's okay.
And then start thinking about more computationally intensive analysis, like the
trees and the, I mean, I think I want to have a quick and dirty tree up front,
but like, but not something, not something heavy handed and that quick and dirty
tree, I would probably just throw away anyway. You would go back and, you know,
as you find samples that are no good, you would throw them out and, you know,
you'd have a better tree at the end. So this matches my workflow, which is why I
was so excited to read the stuff, read the paper and have a look at the software
as well, that this kind of, it's, it's my, it's my brain as an exo-python, like
what would I tell someone to do? So even that as a, even if you could, so I
think for anyone listening out there, if you're like listening to this and
thinking, I'm not going to use this software because whatever, I think just as a
concept, the workflow, the way it's laid out is fantastic. And that just gives
you a good way to think about all of the different tools and how all of these
things interact and what actually needs to be run before the other. You want to
run your MLST and you want to know which ST the strains are in before you launch
like this massive thousand taxa tree. Because if one of those guys are like, not
what you think it is, and a little bit too divergent, your tree is going to look
like nonsense. You're not going to get any sites because that outlier is going
to have no core SNPs compared to everything else. That's going to, that's going
to mess your tree up. That's kind of, in our, in our paper for Beck Tokyo, we,
we looked at all lactobacillus, the genus. And so I think it's worth mentioning
that Backtopia is set up to pull public data from either GenBank, RefSeq, SRA,
ENA, just because we've, we've only, we've always, me and Tim have always had
this, like, there's all this public data, why don't we use it type thing. So we,
when we develop Backtopia, like I purposely, one of the inputs you can provide
is SRA accessions, and it'll go download that. Sorry. And now Nexpo includes
that as a default channel. Like you, you can do that with Nexpo now, but we
were, this is before that. And so, but going back to lactobacillus, exactly to
your point in the bill, it's like the genomes we process, we're not all
lactobacillus. Like they were labeled lactobacillus. There was a yeast sample in
there. There was some streptococcus. I don't know. It was only a handful, but we
quickly, like we, we built this quick and dirty tree. And like, it was just
like, there's this whole section that you could see, these are, these are
something else. But what we ended up doing is we, we built this tree, and then
we wanted to focus on a specific species, lactobacillus. So we used FastANI to
get a, the ANI, a reference genome against all the samples. And then we said
within this reference, or this, the span of ANI values, we're going to pull
those samples out that only include those samples in our actual pan-genome and
core-genome tree. And so that, that was the type of things where you're like,
that step where you're like, we need to figure out if we want to include all
2000 samples or just these samples that match our criteria. Yeah. I think is it,
is that FastANI calculation and then subsequent tree, neighbor joining tree, I
suppose, a part of Bactopia, a part of Bactopia? Yeah, that would, that would be
a Bactopia tool. So Bactopia outputs come in a structured format that we can
basically programmatically access. Basically, I just put stuff in nice spots.
And so that way we can, we can find them easy. And so Bactopia tools understand
this structure, and then you can give a whole, a whole folder of Bactopia
results. And then these text files that say, I want to either include just these
samples, or instead of the alternate, I want to exclude these samples, which
failed to meet some criteria, and that could be low quality. Or in our case, we
only wanted to include this specific set of lactobacillus, or, you know, we want
to exclude all genomes that had low coverage or had too many contigs. Something
that would really mess up the downstream analysis. Yeah. I think one, one other
thing that people come to me these days is the next question is how do you sub-
sample? And I'm not sure if Bactopia has a solution for this. I'm not demanding
it. I'm just saying like, this might be a future thing to think about for
anybody. You run 67,000 Staph aureus genomes, and then you want to make a tree
or you want to make a tree that's meaningful. Like you don't need 67,000 tips to
say something. You need like a thousand-ish. How do you go from 67,000 to 1,000?
That's honestly something where I, I would love to know the answer to. So for
Staph aureus, we created this non-redundant data set. We call it the nerd set,
which was basically, we picked a high quality genome from each sequence type.
And then we use that to, as our, so I think the Staph aureus, it, it went from
at that time, 40,000 genomes to about four, 400-ish that all represented a
unique sequence type. And then you could kind of use that to sub-sample your
genomes based on where they fell on the small tree. But I think ideally it would
be something that, similar to like Big C's approach, where you, you just put in
some sequence and it gives you a bunch of public samples that meet the threshold
or similarity. Cause I think for me, at least I'll always want to use public
data just because one, you know, have we seen this before? Is there something
that we can, you know, is there something in the past in these public databases
that could help me in the analysis currently? How we do that? That's, that's, I
hope someone comes up with a super clever approach that says, all right, here's
200,000 public samples. And here's the 500 that are most likely similar years or
meet your threshold. I've seen a few attempts at this is kind of like
fingerprinting with some degree of granularity and you can pick which level of
relatedness you want. I mean, the obvious ones you can make, you can make the
CG-Lemma-C, which is quite explicit, but then there's other like kind of rough
mash-esque measurements that you can use. And I mean, a lot of it is still quite
immature at the moment. I think people are kind of just reaching the same point
we are with this, where we're sort of thinking we need to, we now have too many
genomes to deal with. Now we have too many public genomes to sift through and we
need to kind of have an input that allows us to dig down to what we want. And I
think there will be a solution eventually. This, this is like a thing, I was
just curious if there was something in Bactopia or if you'd run into an easy,
cheap way to do it. I mean, my cheap way is to use RMLST and just pick one for
RMLST type, because RMLST is basically genotyping one protein. So it basically
works on any organism and you can use that just kind of cookie cutter on
anything. That's the cheap, that's the cheap code we use in the interface papers
and things like that. So. Those would have been genomes you'd already processed
though? Well, we had the assemblies and we had already done the RMLST type from
them. Yeah. So we kind of, you do have to look at everything, have all the data
somewhere and then pick out of it. Kind of like there's the other way, because
it's saying like, here's a workflow that works on your strains and it looks
outwards. And this would kind of be like, no, I understand the population
structure of the species and now I'm going to drill down. So I might not, I
think just trying to fit that into Bactopia as a thing would be difficult, but
it would make sense to have it call out to this magic thing that has this index
to ask what other ones could I use? Yeah. See, that's, I think you mentioned it
earlier too, where it becomes kind of like species specific population
structures. And it's like, there's no way we could be experts in all population
structures. It's this kind of segues into, we kind of started these curated
Bactopia datasets. We have familiarity with Staph aureus. And so Tim has a set
of reference genomes that he uses for his Staph aureus analysis. The potential
is, you know, there could be these experts in their field that are their
bacterial species that could contribute to these curated datasets to say, Hey,
you're going to run this bacteria. Here's a set of data that some expert in the
field has said, this is what you'll want to include because, you know, when I
don't know what references might be important to a certain species, whereas the
expert will say, you'll want this reference. This is the one that we always use
and stuff like that. So, yeah, it would be  in to see where that goes. For Steph
Aureus, there's, there's a lot, still a lot more to do, but we at least have a
work in proof of concept for it. So Robert, everyone has their own favorite
analysis. If you were going to put out like a call to the community, I'm sure
you already have this. What's the mechanism for me to put in like my favorite
tool for my favorite bug. So on Git, create an issue. There's a button that says
feature requests say, Hey, I want this in Backtopia. And there's quite a few
that, so it's kind of like I use that Backtopia on a day-to-day basis. So it
contains stuff that I'm currently analyzing. And I would love for it to turn
into this thing that everybody else is using. And they say, Hey, you know, in
this, this bug, we use this. Can you include that? And I'm happy to do that. The
only requirement is it has to be on Bioconda just because I think that's a good
starting point. The installation has already been figured out. There's
biocontainers has a Docker. The Galaxy group is building the singularity images.
So it's already laid in place. If it's not on Bioconda, I'm more than happy to
say, how difficult will this be to put on Bioconda and make an attempt to put it
on Bioconda. And that has led to a few tools in Backtopia to be included on
Bioconda. But this, when I first started Backtopia, Nextflow was on DSL one. And
now they have this DSL two, which allows you to really create some fun
workflows. And so basically it becomes way more modular. This is kind of fun
story. I'm sitting at my desk one day. If you submit a GitHub issue, I usually
try to respond fairly quickly just because like Backtopia is my fourth kid. I
want to take care of it. So it's, it's one of those where it's just like, you
know, Hey, it's not even that it's, if you're creating a GitHub issue, most,
most times you ran into a problem. And I have a correction. This, this predates
three of your kids, right? No. One of your kids? Well, Staffopia does. Backtopia
predates one. Yeah. So I guess, yeah, Backtopia would be third kid. But so it's,
it's, if somebody's using your tool and they've run into an issue and all
likelihood you may have lost them. And like, so I try to be pretty quick and as
helpful as I can with the GitHub issues. Like there, there's some that it's like
hundreds of comments and it's just like, I'm just trying to, you know, help,
help you get up running. So I'm sitting there again, like, because like you get
the whole Slack integration would get, you know, you get GitHub issue gets
created and I get a ping notification on Slack and it's like, Oh, that's an
issue. Let me take it out. And so this one was actually a pull request. I'm
like, Oh, that's cool. So I'm gonna submit a pull request. And there's this guy
named Davi Marcon and he was doing an internship with Abhinav Sharma. And so he
submitted a pull request. It just, it just had the line that tell the next flow
to use version two of DSL. And so I'm like, wait, what's happening? Like I've
been wanting to do this. And then over the next few months, I'm like cyber-
stalking their branch or their fork of Backtopia because I'm like, Oh man,
they're, they're making pretty good progress. And the benefits of DSL two is now
we can create workflows, sub-workflows and modules and completely reuse code. So
Staffopia has been eaten by Backtopia. So there is a sub-work, there's a
workflow where you run Backtopia and then you run the Stafforia specific
analyses that were included in Staffopia. And so now you can execute Staffopia
in version two, but this really allows me to include new workflows really
easily. I've benefited a lot in this DSL two bit from the Nexpo core group, the
NF core group. I'm piggybacking off of their, their work because they do some
really fun stuff with Nexpo. And so there's going to be this integration with
Nexpo, the NF core modules into Backtopia and all that. And this again, like
tools that I'm using one-off I'm we're, we're trying to push the NF core modules
so that other people can also, you know, they may not want to use Backtopia, but
they, they can use the NF core modules to rapidly pull these one-off workflows
and then glue them together like I've done in Backtopia. And so, yeah, I'm, I'm
super excited and super thankful for Davi and Abhinav push to this DSL two bit.
I think it's going to be pretty fun. Well, thanks for a great discussion. This
was another one of our software deep dives. There's always some interesting
facts about how these different tools came into being. Today, we're talking
about Backtopia with developer Robert Petit. You can check out the software on
GitHub and Conda. And that's all the time we have for this episode. See you next
time. Thank you so much for listening to us at home. If you like this podcast,
please subscribe and rate us on iTunes, Spotify, SoundCloud, or the platform of
your choice. Follow us on Twitter at microbinfee. And if you don't like this
podcast, please don't do anything. This podcast was recorded by the Microbial
Bioinformatics Group. The opinions expressed here are our own and do not
necessarily reflect the views of CDC or the Quadram Institute.