17th Jul 2022

Renaming human genes to stop Excel from converting them to dates

Why Excel is scientist’s best friend (or is it?)

The latest marvel in the genetics world has been the full unravelling of the human genome: an achievement so spectacular that many deemed it impractical or even impossible when it was proposed at the end of the last century.
Since then, our knowledge on the location, function and importance of thousands of different genes has expanded greatly, as scientists have kept a neat record of their names and functions for future reference.



Or so we thought.

How are new genes named?

Couple of years ago, a major change in the names of genes sparked. The way scientists decide how to call them is quite straightforward and simple, at least for us as human beings. Every gene is given a ‘symbol’: series of letters and numbers, that represent a unique name. With the recent discovery of the full human genome, we can wait with a smile for the inevitable ‘Excel strikes back’ problem that occurred several years ago. 

Excel and dates

But what caused a fifth of the genetic data to be affected? It’s quite simple: when a scientist sees the acronym for Membrane Associated Ring-CH-Type Finger 1 (MARCH1), he instantly recognises the function of that specific gene. But for Excel that means 1st of March, like the date.



This might seem a little strange, as Excel is probably the most used spreadsheet application in the scientific word, but we should remember that when it was created programmers didn’t focus on genes not being changed into dates, but probably had more boring uses for it in mind. If a scientist doesn’t remember to format columns so this auto-formatting doesn’t happen, someone will surely have a great time going through all the cells in a spreadsheet searching for dates in between the genes and amending them later in time.

The Excel data mishap

The Excel date error became quite a big issue once scientists started sharing information with each other, leading to the corrupted data making its way across different individuals, organisations and even research papers. That’s when something needed to change.



Truth to be told, reminding people to format columns isn’t really the optimal solution, as is changing to a different platform: Excel is so integrated in people’s lives that a sudden change will surely cause an even bigger stirrup. 

Sorting the mess out

So, what is the solution they came up with you might ask?



That’s when the HUGO Gene Nomenclature Committee (HGNC) stepped in and proposed guidelines for gene naming. Their aim was to counter Excel’s auto-formatting of gene and protein names by changing letters in names that can be easily mistaken by the application. Since then, this issue has been successfully kept at bay.

Naming genes back in the day

Long before the time of naming genes according to their function, pioneers in the then new field of genetics decided to have some fun naming new genes and proteins.



Take for example the gene ‘Indy’. It certainly doesn’t have the same ring to it as SEPTIN1, but still, it sounds kind of scientific. Well, it had a completely different meaning. A mutation in this gene can greatly extend the lifespan of a fruit fly, so it was appropriately named ‘I’m not dead yet’ (INDY).



This gene, alongside the Clown, Tinman, Van Gough and other Drosophila genes coming from literature often make it in the top 10 funniest gene names, which surely cannot be mistaken for dates in Excel. But is this really how genes should be named? 



Surely it is the more entertaining option without a doubt, but we should probably stick to the more scientific method and hope names don’t become dates.