I work in the field of microbial genomics; we are interested in the genome of microbes. I specialise in using computers to analyse genomes. They call me a bioinformatician. These days, these skills are in demand. Sometimes people ask me what they would need to know to do what I do. I find it difficult to explain, because I did not do a degree in Bioinformatics. The field changes quickly, so it is difficult to give a firm answer.
I have many conversations with experts in their own field (medicine, microbiology) who struggle with using Bioinformatics in their work. Giving them a superficial answer does not seem to help. For instance, I may be asked "Which genome assembly program is the best?". Perhaps, they ask something more sophisticated like "Which metrics do you use to determine a good genome assembly?". Although in explaining that I find there is some critical piece, they simply do not understand, which blocks them from moving forward - even with trivial tasks. Colleagues have encountered this problem as well, and many argue that the area of Bioinformatics is complex and requires specific training in computing. Some will go so far to argue that a user of Bioinformatics software must learn some level of computer programming. I disagree. I feel that there is some part of my mode of thinking, that if explained clearly, would be enough to help others get their work done.
Yes, of course, there are some areas and topics that truly are difficult. There are some areas that require a deep understanding of the data and the processes used to create them. These would require specialist skills, such as programming. That can be said for most disciplines. It is patronising when people point this out. Let us accept this as a given, but beyond that, there are parts of Bioinformatics that are unnecessarily obtuse. It is these areas I wish to clarify here.
So what are the concepts that a user, who has specific work to be done, needs to know to do Bioinformatics?
Let us start at the beginning. What is Bioinformatics?
Obscure. Bioinformatics is definitely obscure, as there are many definitions of what Bioinformatics actually encompasses. My definition leans on breaking down the word itself. Bio + Informatics. Hence,
Bioinformatics is the storage, manipulation, and presentation of data about biological systems; transforming data into information.
As far as my audience of microbiologists above, it is this that they really wanted to learn. It is this that I spend most of my time doing.
Some will argue that designing algorithms that facilitate these transformations is part of Bioinformatics, and it is. But being able to create these algorithms is not a mandatory skill for a user using Bioinformatics software. The algorithms just need to be understood well enough to allow the work to be done properly. I would shift such algorithm development under the banner of "computational biology" to make this distinction.
Within this defined scope of Bioinformatics, I will admit, the field is not a -logy in the way biology is. It does not qualify as a major branch of knowledge. Many bioinformaticians will joke that Bioinformatics is just about converting data between file formats. In a reductive way, that's true, at least for what I do day-by-day. Our focus is engineering an effective solution for that data manipulation rather than, say, trying to test of a scientific hypothesis.
Indeed, anything approaching a -logy in Bioinformatics is framed in the philosophy of other fields. Good Bioinformatics to the biologist is the solution that gives the best explanation to the biological question, regardless of the implementation. Whereas good Bioinformatics to the software engineer is the solution that gives a good explanation in the most computationally elegant way. Both assessments are rooted in the values of the respective field.
I am being deliberately strict with my definition of Bioinformatics because I am gearing this to a specific audience -- people who need to use it to get a specific task done. For a scholar exploring this area (e.g. a PhD student) I would relax my definition and include more topics about the fundermentals of computer science and biology.
With this definition done, we can then discuss the topics that need to be known to serve our specific goal. See the next section, the process of transforming data
Questions or comments? @ me on Twitter @happy_khan
The banner image is an AI generated picture (Midjourney) with prompt; 'what is Bioinformatics'. You can share and adapt this image following a CC BY-SA 4.0 licence.