Hello, and thank you for listening to the MicroBinfeed podcast. Here we will be
discussing topics in microbial bioinformatics. We hope that we can give you some
insights, tips, and tricks along the way. There's so much information we all
know from working in the field, but nobody writes it down. There is no manual,
and it's assumed you'll pick it up. We hope to fill in a few of these gaps. My
co-hosts are Dr. Nabil Ali Khan and Dr. Andrew Page. I am Dr. Lee Katz. Both
Andrew and Nabil work in the Quadram Institute in Norwich, UK, where they work
on microbes in food and the impact on human health. I work at Centers for
Disease Control and Prevention and am an adjunct member at the University of
Georgia in the U.S. Hello and welcome to the MicroBinfeed podcast. Nabil and
Andrew are your hosts today, and this is part three of our extended holiday
special on bacterial taxonomy. Professors Ian Sutcliffe, Phil Hugenholtz, and
Mark Pallant continue with us, and we're closing off talking about nomenclature
and the recent renaming of phyla. So my opening question to anyone is, what is
the difference between nomenclature and taxonomy? Let's give it to Mark.
Taxonomy is the scientific classification, and you know, how do you decide
what's in which group and how those groups fit within other groups and so forth.
Nomenclature is just how do we stick a label on the things that we have
discovered or that we've circumscribed. And traditionally, taxonomic
nomenclature has relied on Latin as the kind of lingua franca that was the
language of the science at the time of Linnaeus. Linnaeus used Latin and Greek
roots. So the names that we have used for bacteria and for archaea are based on
Latin and Greek roots primarily. That's one of the rules in the code. These
names have to be presented as Latin. They're a long way from classical Latin,
they're what we call neo-Latin, but they're made up in a way that they look like
Latin words. That's one of the rules. In fact, if you look at the rules of
nomenclature generally otherwise, they're very lax. And in fact, they go so far
as to say that you can use completely arbitrary coinages. You can do whatever
you like as long as you make them Latin. Now, the thing is, if you use classical
Latin and Greek roots, you have to have a certain degree of understanding of how
those languages work to do it correctly. And one of the big bugbears is that in
Latin, adjectives agree with the noun in terms of gender. So if the noun is
masculine, the adjective has to have a masculine form. So, for example, in
Staphylococcus aureus, we have aureus has to agree with the ending coccus
because they're both in the masculine. So we couldn't call it Staphylococcus
aurea or aureum because that would be bad Latin. But the problem is that most
people now are not learning Latin at school. They don't have any familiarity
with it and they just don't appreciate these issues. And so there are a group of
nomenclature experts who say, well, we know how these rules work and you must
apply them. And people, when they want to rename, when they want to name a new
species, they have to go along and make sure that they comply with that. It
turns out, of course, that even the taxonomists of yesteryear didn't know their
Latin and Greek well. There are still some validly named species that break the
rules, you know, have a genus name that's in the neuter and then an adjective
that's in the feminine form and they haven't been corrected. And one can argue,
is this important or not? In some ways, you can argue that if someone presented
a paper and they didn't and it was written in English, but they used slang or
they didn't bother with spelling conventions or grammatical conventions, that's
a bad thing. We'd all say that was wrong. But if we're going to do that, why
don't we say that it's wrong to use bad Latin, malform Latin or whatever? The
other argument is that these are just arbitrary labels. And one of the things
that the code makes clear is that they don't have to actually mean what they
say, they don't have to mean anything. So the fact that we call Haemophilus
influenzae Haemophilus influenzae, we still do that, even though we now
recognize it has nothing to do with influenza. And the code makes clear that you
can't just go and rename something because your knowledge increases about its
phenotypic properties or you can't broaden or narrow the way in which the name
is placed because of that. If it turns out you called something, you know, we've
done it. We've called things, we've called it a chicken chip microbe. If it
turns out that chicken chip microbe actually occurs in pigs as well, we don't go
back and rename it. We just keep the original name. And so when you look at the
fundamentals, really, that's all there is to it. They don't have to be Latin and
you can draw them from wherever you like. There is a whole incrustation of of
recommendations that say, well, when it's a Greek root, you put an O between it
and the next root, when it's a Latin one, you put an I and all this sort of
stuff. But these are just recommendations and they can be ignored. And in many
ways, they're just fussy and they are actually intimidating to people, off-
putting. And I think it's time that we actually swept this away. I'll speak a
bit in a moment about an even more radical plan to sweep it away. At the very
basic, we can just say that let's just stop being quite so fussy about the way
in which descriptions and names are, the way in which names are formulated and
described. Well, actually now I want to hear your radical plan, but sweeping it
all away. But no, just I was going to add that I'm speaking very personally, so
not speaking as chair of the ICSP, but for myself, I'm pretty relaxed about this
idea of Latin being what one might call approximate Latin rather than perfect
Latin, because at the moment we have a small group of people who serve the
bacteriological and archaeal communities very well by checking the formation of
names. But the reality is, is that probably maybe in a decade's time, that's a
dwindling band of people by their own admission. And the reality is that in
probably a decade's time, there'll be very few people that can recognise whether
a name is malformed or not. And actually, I think by then we will just get on
with it. If the name sounds vaguely like a Latin binomial, we'll accept it as a
valid name. And actually, it's worth, it won't happen quickly, but it's worth
pointing out that the code itself, the code of nomenclature that the ICSP looks
after is a living document. It evolves over time. So it evolves very, very
slowly, parts of that go back over 150 years to the Kendall Laws of 1000. We are
now currently on the 2008 revision, but the ICSP itself is presently has a
public debate underway through Slack, which will end at the end of December and
will lead to the publication of a 2022 revision of the code. And some of the
things that change are minutiae that people that don't have a grammaticist's
understanding of Latin won't be able to follow. You know, it is a code that
evolves over time. And I can certainly, casting my eye, you know, more than a
decade into the future, see it evolving to a point where the requirements on
absolutely perfect Latin perhaps get relaxed. Well, those requirements aren't
even in the code. It just says the words have to be Latin. It doesn't say they
have to be perfect Latin. And as I say, many of these so-called recommendations,
incrustations upon the basic heart, the core of the code, that is a series of
rules. We can talk about this in several ways. One point that's worth making is
that principle one of the code says aim at stability in names. And I propose
that we talk about that in a short while. Before that, let me just talk about
some ideas about how we can make names as we go forward. So working with Aharon
Oren and we're working on the chicken gut, I said to him, look, we want to name
600 species. How are we going to come up with 600 names quickly? And normally,
as Ian's pointed out, it's one name, one paper. People have loads of time to
sort of think about that name and polish it off and whatever. But we need 600.
So I said, well, what we can do is we can just use this combinatorial approach.
So we'll just take, you know, the first route will be the host or the context,
like chicken or bird. Second one will be, well, it's the gut or feces, you know,
the sample that we're dealing with. And the third one, we just we can just use a
lot of generic words that mean microbial, microbium, biome, or soma or plasma
or, you know, that don't mean they have no specific meaning. Obviously, we can't
use things like coccus because that implies we know the morphology down the
microscope. But but there are many of these terms that can can be used. And if
you did that in a combinatorial way, you end up if you use 10 routes at each
position in the first, second, third out of just 30 routes, you end up with a
thousand names. And we applied that approach. And it was productive in that
setting. I then went on with Aharon and we wrote a paper where we suggested a
million new names using the same kind of approach. But it soon became clear that
there were two problems with that. One is that even if you use that approach and
use this combinatorial approach to generate many, many names, it turns out that
the number of new species that are out there, in particular ecological context,
are much, much higher than than we can make names for. So if we just wanted to
make names for gut microbe with our most creative thinking, we could perhaps
come up with a few hundred or a few thousand. But we know that if we look at all
the vertebrate gut microbiota out there, there are going to be tens of thousands
or hundreds of thousands of names. So that descriptive approach to trying to
describe things and create a name, it didn't scale very well. And the other
problem is that if you start reading all the different routes and you start
concatenating them, you end up with some very, very long names. And one of the
key points is that these names are just handles and we want them to be handy and
easy to use. We don't want to have a name like electrio-intestinal microbial,
you know, we don't want to have a long word like that. We want them to be short
and punchy so that we can remember them. And so that was where we got to sort of
a year or so ago. But looking at GTDB, Phil.  set as a problem in that, as it
turns out in GTDB, they named all of these uncultured species, but they named
them with placeholder names that were more like, you know, telephone numbers or
postcodes that were useful placeholders, but they totally hard to remember to
say, you know, SP00066918914, you know. That's a good species that one. We'd
have genus, genus names by E2. And I said, no, that what we got to do is we've
got to rename all of that stuff. Like we did with the chicken gut, where we did
600 Linnaean binomials, we've got a properly formed Latin names to everything in
GTDB. And from what I remember, about a third of the things in GTDB species are
named, two thirds of them have yet to be named. So can I just play devil's
advocate for one second? I mean, several people have said to me, and I've been
on Twitter too, of course, that they kind of like the placeholder names, because
then they know that's an uncultured species or uncultured lineage, as opposed to
one with a Latin name. So what's your thoughts on that? Well, there is obviously
within the code, one of the things that we haven't mentioned is that the current
code has this problem with uncultured, but it does have a get out. It says, if
it's uncultured, you can stick candidatus in front of it. You still give it a
perfectly formed Latin name, stick candidatus. In GTDB, you decided not to do
that. NCBI still does do that. There are arguments for and against it as to
whether it's cumbersome and all that sort of stuff. But there is already a way
of flagging things to say, well, this is candidatus. But in a sense, where do
you draw a line? Because some of those things that haven't been cultured have
been very well characterized. People have reconstructed their metabolism, done
studies on them. Mycobacterium leprae really shouldn't have a name, even though
it's in the approved list, because it's never been grown in exotic culture.
There's another organism called Mycobacterium lepromatosis, which has a very
similar pathology and ecology, and that doesn't have a valid name because it
can't be grown. So, you know, this fundamental issue, why should we flag things
just on this operational criteria and they're going to be able to grow it and
stick it into culture collections? That seems to be a bit mistaken. I know I've
argued that, well, we can muddle through with candidatus, but fundamentally, I
don't think that that is the way forward. The issue that I have with the use of
the candidatus status is that, for clarification, candidatus names lack
priority, which means that they can be effectively overwritten by anyone who
says, well, those people called it candidatus XY, but I want to call it
candidatus, I want to name it AB. You know, that is a fundamental flaw, without
getting too detailed about it, the concept of priority. In reality, that is
very, very seldom happens. It's been one or two cases, thousands of candidatus
names. It hasn't happened very much in the past because we've been dealing with
relatively small numbers of organisms coming into culture. But if people are
naming, if we have a facility to name uncultivated organisms, then people might
be naming at scale, as indeed you have done yourself. And then people might say,
well, actually, I'm going to overwrite all those candidatus names with my name.
You know, if people start publishing papers that have 10,000 names in them, then
we could have absolute nomenclature. But then in that situation, this is down to
peer reviewers and editors to do their job and say, you can't name it, that's
already been named. The idea that there are these kind of Olympian gods of
nomenclature that those names have standing, or they don't, that's irrelevant
for most people. But to use the sort of argument you would have used, you know,
if somebody proposed a whole load of names, they would challenge the editor and
say, why can't I do that? These names don't have priority. My point is, the
editor would say, the first principle of the code is aimed at stability of
names. You're just creating confusion. But unless names have priority, they
don't have, they don't achieve that stability because somebody can always
overwrite. I'll just finish by saying on that point that there is a reason why
rules of priority exist in all of the major codes of nomenclature for plants,
for animals, and for bacteria in our care. And the fact, therefore, that
candidatus names lack priority is its Achilles heel. Well, I would argue that
you're talking about the 2% that have been cultured up to now, this little
parochial argument over the last century or so. In the millennia to come, when
we discover the million other bacteria that haven't been named yet, nobody's
going to care about priority because we'll be naming the tens of thousands or
hundreds of thousands at a time. Nomenclature systems obtain their authority and
obtain their stability to go back to principle one from their rules. So yes, the
C code was mentioned there and I'll probably explain this as quickly as I can
through some historic context. I made a gag when we were talking earlier. It was
a cheap gag, but it was a good one. The committee should really be better called
the International Committee for the Systematic Culture of Prokaryotes and the
code itself would be better off called the International Code for the
Nomenclature of Prokaryotes. That's because rule 30 of the code places culture
at the heart of the ability to validly name bacteria archaea under the current
system. And that of course creates this headache, it's been an elephant in the
room for a long time now, that we cannot validly name uncultivated organisms. So
one of the first people to really sort of take this on, Ramon Ruffalo-Mora and
Costas Konstantinidis, I hope I'm pronouncing Costas's name right there,
published a fairly provocative article in ISMI journal about this and about the
same time Barney Whitman published some very high-profile proposals to amend the
code of nomenclature that would allow the use of genomic sequences type and
would therefore bring uncultivated organisms under the IGES, under the umbrella
of the code of nomenclature. Personally I was in favour of that, but when it
went to the vote of the ICSP it became very clear that the majority of the ICSP
were not. The two-thirds majority of the voting members of the ICSP voted down
the Whitman proposals and that meant that the code stayed as it was, it meant
that the valid naming of uncultivated taxa cannot be achieved under the ICNP.
And so it became inevitable based on that, I think, that a separate code of
nomenclature that would allow the valid naming of uncultivated taxa would be
developed and a group of people has been working on that. In interest of full
disclosure, I am one of those people and so is Phil, so we have been working on
the development of a parallel code of nomenclature called the SEEK code which
would allow the naming of uncultivated taxa and provide a naming and
nomenclature framework. A manuscript describing with the first draft of that
code or version one should we say of that code is currently under peer review.
So people interested in the SEEK code can find out about a bit more about it
through the ISMI website. There is a link from within the ISMI website. One of
the reasons for that is that I mentioned earlier that the ICSP operates under
the umbrella of the IUMS. One of the visions that we have for the operation of
the SEEK code, because codes of nomenclature do evolve over time, they need
structures to maintain them and we expect that the SEEK code will be
administered under the umbrella of the ISMI society and that is very much a work
in progress. But we would like to think that we could use the SEEK code to, I
think very rapidly, validly name large numbers of taxa from the classification
of uncultivated organisms like the GTDB that we have been discussing earlier. So
that initiative is coming and hopefully coming sooner rather than later. The
idea is to put the first draft of the code there for community feedback. Yeah.
How can you publish the paper without having done that first? I think you could
compare us actually to where we were in 1948 with cultivated organisms. So in
1948 the first draft of what became the bacteriological code was published, if
my memory is correct, initially in Journal of Bacteriology and then there was a
mirrored publication in what was then the Journal of General Microbiology and
that said, well this is a workable code and then the community got on with it.
But the world's changed since then, you know, it didn't have preprint in those
days, it didn't have this, you know, basically you sat on what you were doing
until the last moment and then published it and everyone agreed, oh you were the
expert and we can't query you. We live in the age of Twitter, we live in the age
of democratisation and Seed Code really is, you know, it's not shown itself in a
great light by the fact that it's hidden itself away. You say there's a website
but there's not much on that website to tell you what's going on. What I hear
down the grapevine is that they've arbitrarily drawn a QC boundary to say we're
only going to allow you to name these things, so they've trampled all over the
whole idea of freedom of taxonomic thought. So it's a code of taxonomy rather
than nomenclature. I personally, I welcome Seed Code in principle but in
practice I'm a bit concerned about the way it's being rolled out. I think it'll
evolve, I think it may evolve fairly rapidly. We need to put the structures in
place to allow that. We are reaching out to the community. In February 2021 we
had online workshops that were attended by many hundreds of delegates which, you
know, frankly having been to taxonomy sessions at physical meetings you're lucky
if you can get 50, you know, so we did reach a reasonably good audience with
those. And the very first draft of the code was made available as reading
material.  o ddatblygiadau sydd wedi bod ar gael i ddarparwyr yn y
gweithgareddau hynny. Felly yn hynny, ac rydyn ni wedi cael cymorth ddefnyddiol
iawn o'r cymuned, roedd yn ychydig yn cefnogol, roedd ychydig bethau sydd wedi
cael eu cymryd ymlaen gan drafnidiaeth. Rwy'n credu bod y cymhlethedd o'r
ddatganiad, rydyn ni, rydw i, rydw i'n cymryd y pwynt y gallwch ei ddweud ein
bod ni'n gyfarwyddwr cyfarwyddus o gyffredinwyr, ond rwy'n credu bod y
cymhlethedd o'r ddatganiad wedi cymryd yr ymdrech, efallai, sy'n cymdeithasol i
ddod ymlaen gyda drafnidiaeth gweithredol cyntaf, ac nawr, beth sy'n digwydd yw
ein bod ni'n cymryd cymorth o'r cymuned ar sut i wneud ei ddatganiad mwy
ffwngsinol a fwy cyffredin i aelodau pobl. Roeddwn i'n bersonol yn fwy
hyfforddiol o sefydlu ymdrechion cyffredinol, ond rwy'n gwybod y syniad y
byddwch chi'n dechrau gyda'r ymdrechion cyffredinol cyhoeddus, gallwch chi eu
hysbysu mewn rhan gyda'r adnoddau o'r cymuned. Os ydych chi'n dechrau gyda
standardau LAC, ac yn ddiweddar, ymdrechwch eich sefyllfa chaotig, byddai
hynny'n gwneud mwy o ddarganiad neu dda. Rydw i'n meddwl y byddai'n rhaid i ni
edrych i weld a allwn ni'i rhoi ar y wefan ac efallai dim ond y cyllid
cyffredinol cyhoeddus ar y wefan. Gallwn ni eisiau gwneud hynny trwy'r amser
mae'n dal i'w ystyried, dwi'n dweud bod y cyllid cyffredinol mewn adnoddau
cyhoeddus oherwydd mae'n ddigon llawn i fynd. Mae'n gwneud, mae'n gwneud, mae'n
gwneud cymdeithas cyffredinol i ysgrifennu pam nad ydyn ni'n mynd ar y ffordd
hon ac yna mae'r cyllid cyhoeddus yn cael ei ddatblygu yn y ddangosrwydd
hyfforddiant. Yn fy ngwneud y byd, ond pam angen bod cyllid cyhoeddus mor fawr?
Yn gwneud y byddwn yn ei greu'n haws a'n boblus, ac un o'r gloriau fawr o'r
cyllid cyhoeddus ar hyn o bryd yw pan ydych chi'n ei ddweud, trwy'r cyhoeddus
cyhoeddu, ac edrychwch ar y rhan. Mae'n llawn a hamdden. Ac mae adnoddau
cyffredinol a gwneud arddau, nid yw'r ffordd i fynd, ond i, rydych chi'n gwybod,
i ddod yn ôl a dweud, dwi'n mynd i gael ymdrechion cyhoeddus sy'n gweithio, yn
hytrach na rhoi ffyrdd ar ffyrdd ar ffyrdd ar ffyrdd. Iawn, ond rydyn ni'n
defnyddio cymorth ar hynny, rwy'n credu. Iawn, Mark, rydych chi wedi sôn am
ennill ennill cyhoeddus, ond beth ydych chi'n ei ddweud o hynny? Wel, os edrych
ar y cod, mae'n dweud eich bod chi'n gallu defnyddio ennill cyhoeddus, ac
efallai os edrych yn ôl i Linnaeus, y cysylltiad rhwng y ennill a'r ysgrifennydd
oedd yn amlwg, ac roedd llawer o enghraifftau o ennill cyhoeddus dros ddegedau
neu ffyrdd yn cael eu defnyddio yn taxonomi. Os edrych ar y broblem o sgiliau,
mae gennym 30 oed, 1,000, 40,000 o ddynion i fod yn ennill yn GTDB, y fersiwn
mwyaf diweddar, a dweud Phil, mae 17,000 arall yn dod yn y fersiwn nesaf, os ydw
i'n cofio'n gywir, defnyddio'r ysgrifennydd ddim yn cyhoeddus. Felly, efallai os
ydych chi'n rhannu un munud yn meddwl, dwi'n mynd i meddwl o ennill Llain sy'n
golygu rhywbeth i bob un o'r ysgrifennydd, byddai angen mlynedd o waith ar gyfer
chi. Felly, yr hyn y mae'n rhaid i ni ei wneud yw ddod â ymgeisydd newydd,
ymgeisydd newydd, un sy'n gallu gysylltu â'r cyfrifoldeb bod gennym ennill Llain
sy'n edrych fel ennill Llain, ond mae'n arbennig. Mae'n unig y gallwn ei
ddefnyddio. A'r hyn rydyn ni'n ei eisiau pan rydyn ni'n eisiau ennill y pethau
rydyn ni'n gallu eu gweld. Felly, yr hyn rydw i'n ei wneud, ac rydych chi wedi
helpu fi gyda'r codi, dwi'n dysgu Python ar 61 oed i wneud hyn i ddigwydd, yw
ein bod ni'n defnyddio'r dechrau llyfr Llain, y cyntaf llyfrau, o'r llyfrau yma
yn y dechrau Llain, de-replicio hynny, ac yna gofyn y llawd o ddechrau Llain
sy'n gallu cael ei ddefnyddio i ffurfio y ddynion ffemininellau. Ac yna, rydyn
ni'n edrych ac rydyn ni'n ysgrifio trwy 6 miliwn ymdrechion yn y gweithdrau
Cymraeg i sicrhau bod y nofion hynny erioed wedi cael eu defnyddio o'r blaen.
Ysgrifio nhw ar gyfer pob y nofion sy'n cael eu defnyddio yn taxonomi, gwneud yn
siŵr nad ydyn nhw wedi cael eu defnyddio o'r blaen, ac yn digwydd gyda'r 60,000
o nofion y gallwn ei ddefnyddio i'r dynion hwnnw sydd wedi cael eu defnyddio.
Rydyn ni wedi gwneud hynny, rydyn ni wedi ddefnyddio cyffredin, rydyn ni wedi
ddefnyddio nofion. Rydyn ni hefyd wedi mynd mor ffwrdd, mae'r trafodaethwyr sy'n
hoffi gweld brotologau gwrth ddarllen, rydyn ni hefyd wedi creu protologau gwrth
ddarllen. Mae'r ddogfen sydd gyda'r holl brotologau ar gyfer y nofion newydd ar
gyfer bacterïau, y dynion a phlaenau eraill mae'r trafodaethwyr hwnnw yn 10,000
o ffyrdd. Ac felly roeddwn i'n rhoi hynny allan i'r cymuned i ddweud, edrychwch,
mae'r nofion hyn yn dda iawn, maen nhw'n edrych fel nofion Latyn. Os nad ydych
chi'n gwybod Latyn, nid ydych chi'n gallu ddweud nad oeddent yn ymddangos o
ffyrdd Latyn gwirioneddol, ond maen nhw'n dda, maen nhw'n gwneud y swydd, beth
ydych chi'n meddwl? Ac roedd gennym dŵr Twitter ac roedd yn gwneud dŵr i ddŵr i
ddŵr. Dŵr i ddŵr o bobl sydd wedi dweud, ie, mae hyn yn well na
phlaenoriaethwyr, mae dŵr i ddŵr o bobl yn amlwg nad oedden nhw'n hoffi hynny.
Ac mae'n canfod ar gyfer ystyriaeth ar hyn o bryd, mae'n canfod fel cyffredin.
Er mwyn i mi feddwl, mae hyn yn unig, dydw i ddim yn gweld unrhyw ffordd eraill
y gallwn ei ennill y nifer o ddynion sydd angen ei ennill ar y sgiliau, ar y
sgiliau sy'n angen, a mynd ymlaen yn rhan o'r fframwaith Linnaean, rydw i'n
ddiolchgar yn y cymorth o'r plan hwnnw, er mwyn i'r ecologwyr microbiaid i mi
hoffi pethau fel y halo, rhywbeth neu rhywbeth, oherwydd yna dwi'n gwybod
rhywbeth amdano, ac rydw i'n cymorth ar gyfer y ffaith nad yw'r ennill yn rhaid
i'r ffisiologi ymdrechu'r ffisiologi, ond dwi'n dweud y bydd yr etorhaeth nesaf
efallai'n gwella drwy edrych ar y genoem ymlaen. Rydyn ni'n cael y technolegau,
er enghraifft, os yw'r genoem yn cynnwys y gen MREB, dyna'r proteín sy'n
ymdrechu'r ffordd, felly yna dwi'n gwybod nad yw'n cocas. If it doesn't have
MREB, it's more likely to be a coccus. We could look for sulfate-reducing genes
or other genes that you could actually guide those names a little bit if we
wanted to capture that, because we do have the blueprints, we do have an ability
to rapidly screen them and pull out key genes that would help, and I totally
take the point that a name just has to be a nice-looking handle that doesn't
have to reflect the taxonomy, but I am a little concerned with the early
iteration of this where we put words, you know, cacoplasma, and then there was
other names that were pretty similar, and I did notice that, but I think now in
this latest iteration of arbitrary, you actually selected it to be as
phonetically distinct from each other as possible. And by the way, plasma, I
would look for genes, absence of cell envelope genes to indicate that it doesn't
have a, maybe just has a cell membrane like a microplasma, because I would have
thought plasma would be attached to that kind of organism. I'm broadly
supportive of this. I absolutely agree that it's a very practical way, pragmatic
way of naming at scale. My concern is a little concern, which is that, say I'm a
PhD student that has been beavering away in a lab for two or three years. I've
sequenced a genome. I've been working on analyzing the content of that genome.
And I'm just getting around to writing my paper on describing that genome and
naming the organism as a candidatus organism. And then this chap Palin
overwrites the name in GTB, for that taxon in GTB with his arbitrary name.
You've had your thunder stolen. And how do we, how do we prevent that happening?
There's a genome deposited in the public domain. At the time that genome's
deposited in the public domain, the person will have assigned the name to it if
they wanted to assign a name to it. The fact that it's got a placeholder in GTDB
means that nobody has bothered to do that. So we're not overwriting anyone's
ability to give names. They can give those names perfect. There's a perfectly
well-formed path for doing that. If you submit a MAG to NCBI, you can give it a
name. They won't accept the name until the paper's out, but that's a very narrow
window. And the thing is, these are candidatus names. So if someone says, I want
to, I don't like that. I want to overwrite it. And the community says, people
working on it says, I don't like that name. We're going to give it a descriptive
name. They can do that. We're not, we're not forcing anyone's arm here. We're
just saying, when I do an analysis of chicken gut, chicken feces, or pig feces,
or horse feces, and we run the GTDB toolkit over it, over, well over half of
what we get out there are just these placeholders. These are names that, you
know, what on earth is this all about? And, and, you know, if I'm trying to talk
to a, to a collaborator, and we found that sitting on this thing, it's just a
mess trying to use that. It's far easier to have Latin names. And, and so we're
not trying to say to anyone, this is it, we've, we've colonized your area of the
microbial world forever and laid down our flag. We just say, look, this is,
these are effectively placeholders because they're candidators. Since we're on
the subject of phyla, I wanted to ask Phil about this recent renaming of phyla
that the people do seem to care about. So Phil, I think you're the one who's
been in the process. Why are you asking me? Because you've been in the process
and I appreciate your Twitter thread that explained it all quite nicely, the
situation. So I was wondering if you could recap on that and talk about some of
the, some of the flack you've been taking. Well, I haven't personally been
taking any flack, but I thought that the, I thought the ICSP and NCBI taxonomy
was taking some flack. And basically this is around a very sensible proposal to
include phylum in the code, in the pro-coded code. So phylum is the thing I
learned about in school. And you're saying it doesn't, it's not actually. It
wasn't officially in the code. So that means there are no real rules around
governing. So it's very important that it's in the code. There's some, there's
some useful properties about being in the code. First of all, you have to have a
nomenclatural type. That's, that's important to, to provide a fixed point of
reference for a group. And I draw your attention here to, because phylum wasn't
officially in the code, you could define a phylum pretty much as anything
without any nomenclatural type. And so you have this thing and I'm, I'm as
guilty as any for naming phyla without actually saying, well, what is that
connected to? The problem is if somebody makes another tree and you've in a
previous paper, you've said, oh, these 316s sequences represent my phylum, you
know, phyllobacteria. And then it splits up in another tree. I don't know which,
where the name should carry through to. So that's why that's important. And it
also has no priority if it's not, if it's not formally recognized. And we've
seen that multiple times where the same group has given multiple different
phylum names. So it's actually a long overdue and important process for the
ICSP, which voted on this last year. So everybody.  yes, we'll take on the rank
of phylum. And then there was some specific questions. So in the prokaryotic
code, all of the other ranks have fixed suffixes or the higher ranks. So you
have ACA for family, you have alias for order, you have ear for class. And so
we, to standardize, you want to have a standardized suffix for phylum, which was
voted on as OTA. And then in order to apply this, then the other thing is to
make the type, the nomenclature type a genus, that was the other vote, which is
the same for family, for instance. And so family has a type genus. This is fine
for the majority of phyla, because if you look, these non-official phylum names
are often built off the first genus of an early genus that's described in that
group, like Nitrosporota or Nitrospirae, depending on if you use the old or new
name, is built from that genus name. But there are a couple of really important
exceptions, and that's the proteobacteria and the Firmicutes. And that's what's
got everybody in a big pickle, because if you follow the new rules, then the
Firmicutes become the Bacillota after Bacillus, and the proteobacteria become
the Pseudomonadata after Pseudomonas. And that's what caused all of the big
Twitter furor over. And I was the very funny comment about from some wits saying
that I made a straw poll asking people what they would like. And it was the same
as when people were up in arms about Pluto no longer being a planet, and they
made a poll and everybody said, no, we want it to be a planet. And so that's
what's happened. The majority of people want to keep proteobacteria and
Firmicutes. And there's another little interesting twist before I hand over to
people that know more about nomenclature than me. And that came from the idea
that in GTDB, we've re-circumscribed what proteobacteria are. So now in GTDB,
proteobacteria are just the alpha, beta, gamma classes. And we've somewhat re-
circumscribed Firmicutes as well. So some people are saying, actually, you would
like a different name. So it's clear that it's a different taxonomic entity. So
that's an interesting consideration as well. If you look at the definition of
Firmicutes, it just says a phylum for gram-positive bacteria. And that was how
it was described several decades ago. And what is slightly concerning is that
when these new names of phyla were published, they just said, oh, it's going to
be called Bacillota. But it's got the same description as Firmicutes. And you
think, is that really where we're at in the 21st century, that we define a
phylum by saying, oh, it's the phylum for gram-positives? What should have
happened is that the phyla should have had names and circumscriptions that were
modern. And there's still an opportunity to do that. And there's even an
opportunity to save the old names, or nearly save the old names. If we just took
one of your unnamed genera, maybe one of the split genus that's in GTDB, and you
stick an AR after the end of it, to say that you don't recognise it as being
part of the original genus. If we got one of those where we've got deposits in
two different type culture collections, we could name that the type genus. And
we could call it Firmicutes. And we could have Firmicutota, which would be near
enough the same as Firmicutes. And we could save the name. But it's just this,
the trouble is that the people that do this stuff are not creative people.
They're very much driven by rules. And they have to follow the rules. And they
have to roll those rules out. And there's an argument to say being consistent.
But if you combine consistency with creativity, you can get around a lot of
these problems. So I'm half-minded to go and do that, actually, just to publish
a paper that says, here's a new genus called Firmicutes. And here's a new genus
called Proteobacterium. And we'll name a new phylum after it. And we'll
circumscribe the phylum using the techniques of GTDB, rather than waving around
saying, what's the phylum for green positives. Sometimes the nomenclature
experts do get themselves into a situation where they say, we have to make this
right. And we don't care whether the community cares about the changes. 20-odd
years ago, Hans Truber did this, where he said, oh, there's a load of bacteria
that have been named, where the species name is actually a noun. But it
describes a thing. It doesn't say of the thing. It describes a thing. So you're
calling this bacterium a pineapple, rather than saying of the pineapple. And we
must change that. And he went and put all these changes over things that were
already established names. And people complained at the time. So it's a
difficult issue. I mean, Ian's going to say, well, people will get used to it.
And maybe they will. Let's see what Ian actually has to say. I was going to say
that, but not straight away. The thing that I was going to say is that the
elegance, and it is a genuinely elegant document in the way it's constructed,
and the new rules. And this is where rules are important for nomenclature, are
actually pretty clear now, which is that the priority will follow from this,
which is what will prevent you from proposing alternative names, is that the
rules, the way the code is now written, means that the phylum that contains the
genus Bacillus must be called the Bacilliota. And that is really quite
straightforward, I think. Those historic definitions, like, well, these are gram
positive, genuinely guff, because they're not all gram positive for a start.
Sorry, now the negative qtism. Yeah, but I'm sure you could find some gram
variable ones if you look at the original descriptions, you know. So that system
is nomenclature is really quite clear, that the phylum that will eventually be
named that contains the Clostridia, you know, can be given a name, one would
hope it would be given a high profile name, like Clostridiota, you know, that's
really quite straightforward, and people will get used to it. I'm in this
interesting situation, I got involved in a spat with some veterinary scientists
about our proposals to rename Rhodococcus equi in the genus Prescutella as
Prescutella equi, which wasn't universally popular. And yet I am long in the
tooth enough to remember the fact that that same, that many of the same people
were very upset about the proposals to rename Carinibacterium equi as
Rhodococcus equi. And so they all got used to calling it Rhodococcus equi
quickly enough. I actually think the younger generations that come through adopt
the current classification and the current names pretty quickly. I'd be
interested to look back and see if there was a big outlaw when the purple
bacteria were renamed to Proteobacteria. The only way in which this matters is
when someone uses the GTDB toolkit or an equivalent from the NCBI and wants to
name their stuff and come up with a taxonomy. And so the names that you apply,
Phil, are the ones that people are going to care about. The idea that someone in
authority has named it, that doesn't really have any impact. It's what actually
happens. So if you go to the NCBI taxonomy, Firmicutes are still there. For
Proteobacteria, they haven't actually, there's no Pseudomonota in the NCBI
taxonomy at the moment. Even though they made that declaration of something,
they haven't, if you go to their taxonomy, they haven't changed it yet. And you
haven't changed it in GTDB. And so it doesn't matter what the so-called experts
or people that, and it would be interesting to see what the C code wants to do
with it as well. But yeah, it's an interesting question. And there's no right or
wrong answer to this. Is it like just forgetting to change the date on the
calendar when you get into January, you know, change the year and we'll catch
up? Or is it something more fundamental that actually we're all used to this and
we want to stick with the old names? There's nothing to stop people using the
old names as vulgar names anyway. We've had decades of calling the filer by
names that don't have any standing. So people can continue to use those names.
Well, my only final comment would be the reminder that the arguments are about
classification. So the ICSP oversees a code of nomenclature, but the rules only
apply to nomenclature. And really what people are getting upset about is whether
a classification matches their perception of the world or not. And that's
taxonomic opinion, which I think... There's two issues. There's a change of
names because of change of taxonomic opinion. And there's a change of names
because rules of names or whatever. What's happened here is a change of names
because the nomenclature experts want to change them. That's what I'm saying.
Nobody has reclassified firm duties in a different way when they call it. They
just ported across decades old grand positives. And that, I think, is troubling.
And this is where GTDB actually is consistent. It has an approach. And the names
in GTDB, perhaps we need to just roll out protologs. So even the name things in
there to say, this is the GTDB taxonomy. Here's a protolog to name this species
according to the rules of GTDB, because this differs from what people said in
the past. And certainly for the higher level, it will be. I mean, none of those
things have ever been defined before. As far as I'm concerned, the world of
taxonomy began with GTDB and everything that went before was chaos. Alison
Murray, who Phil and I have worked with on the CCO project, alerted me to this
quote, which is apparently from Bill Bryson's A Short History of Nearly
Everything. And he wrote that taxonomy is described sometimes as a science and
sometimes as an art, but really it's a battleground. And Cowan, who I mentioned
earlier, writing in the 1950s, summarised this, that the taxonomists do like a
good scrap. You know, earlier on we quoted Darwin. Let's quote Newton now. I
mean, Newton, when he came to the end of his life, he said, I do not know what
may appear to the world, but to myself, I seem to have been only like a boy
playing on the seashore and diverting myself in now and then finding a smoother
pebble or a prettier shell than ordinary, whilst the great ocean of truth lay
all undiscovered before me. The arguments we've been having are about a few
pebbles of clinical importance or a few shells representing cultured organisms.
What we should rejoice in is the fact that there is this great ocean of
microbial truth, as you've called it, this sublime scale of the microbial world
through the techniques that Phil and others have been developing. We now have a
glimpse of that great ocean and we now have a way of charting it as we go
forward. And we should be rejoicing in that instead of arguing this angels in a
pinhead stuff. You know, the great vision is there. Darwin's dream is real.  and
it's here, and now, in sequences. On that, I think we will close. That was a
marathon effort. I want to thank our esteemed guests, Professors Ian, Phil, and
Mark. This has been an almost crash course on bacterial taxonomy. I've been
Nabil, with my co-host Andrew, and I wanted to thank you all for tuning into our
holiday special of the MicroBinFeed podcast. We will have a lot of extended
references for you to read in the show notes, see the description on your
podcast platform, and we'll see you next time. Thank you so much for listening
to us at home. If you like this podcast, please subscribe and rate us on iTunes,
Spotify, SoundCloud, or the platform of your choice. Follow us on Twitter at
MicroBinFeed. And if you don't like this podcast, please don't do anything. This
podcast was recorded by the Microbial Bioinformatics Group. The opinions
expressed here are our own and do not necessarily reflect the views of CDC or
the Quadram Institute.