Genes Renamed to Stop Microsoft Excel From Mistaking Them for Dates
Scientists changed the official guidelines for naming genes to stop Microsoft Excel from misreading them as dates, according to the HUGO Gene Nomenclature Committee (HGNC) website.
Ridiculous but true, it's no fun when genetic studies are lost to a busy-body algorithm.
Genes renamed to stop Microsoft Excel from misreading them as dates
The human genome holds tens of thousands of genes — tiny bows of RNA and DNA come together to create the characteristics and genetic traits that make each person unique. Every gene has a name and alphanumeric code — called a symbol — which scientists denote as a way to coordinate research.
However, in the last year, roughly 27 human genes were renamed because Microsoft Excel repeatedly misread them as dates, according to The Verge.
Microsoft Excel is a juggernaut workhorse in the spreadsheet-savvy industry and scientists use it all the time not only to track their work, but also to conduct clinical trials. But sadly, it's normal settings are customized to suit more ordinary applications.
In other words, when a user types a gene's alphanumeric symbol into the spreadsheet — like MARCH1, which stands for "Membrane Associated Ring-CH-Type Finger 1" — Excel misreads this and transforms the entry into a date: 1-Mar.
Consequences of Excel errors
Obviously, this is frustrating, but it's also dangerous — since it corrupts data scientists have to sort through, brute force, line-by-line. This error is widespread, affecting even peer-reviewed scientific works, reports The Verge.
One 2016 study analyzed genetic data juxtaposed to 3,597 published papers, and found that roughly one-fifth was affected by the Excel bug.
"It's really, really annoying," said Dezső Módos — a systems biologist at the Quadram Insitute in the UK — to The Verge. Módos analyzes newly-sequenced genetic data, and says Excel errors are extremely common because the software is typically the first thing scientists deal with when processing numerical data. "It's a widespread tool and if you are a bit computationally illiterate you will use it," he added. "During my PhD studies I did as well!"
This isn't a simple problem to remedy. Excel has no "on/off" ticker for auto-formatting, which means the only way to work around the error is to change the data type for each column. Moreover, even if a scientist fixes their data, when the next one steps up to the spreadsheet in Excel without knowing, they're just as likely to add new errors, corrupting the data again.
On naming: solving Micorosft Excel's data problem
This is a problem best solved top-down: the scientific body in charge of standardizing genes' names — HGNC — published new guidelines for scientists to use when naming genes. This includes "symbols that affect data handling and retrieval," according to the guidelines.
Unlike the last few years, scientists will now keep Excel's auto-formatting in mind when deciding what to name genes and the proteins they express. For example, the symbol MARCH1 is now MARCHF1, and SEPT1 is now SETPIN1, and so forth. Of course, HGNC will keep a record of old names and symbols to lower the risk of future confusion.
As of writing, 27 gene names were changed in this manner in the last year, said Elspeth Bruford — HGNC coordinator — to The Verge. But it's taken until now to declare the change is happening to the world at large. "We consulted the respective research communities to discuss the proposed updates, and we also notified researchers who had published on these genes specifically when the changes were being put into effect," said Bruford.
As the scientific world and its research move forward on increasingly digital mediums, it seems gene research is among the first to adapt itself to streamline their ability to interface with the digital side of the scientific community. And, we can be sure Microsoft Excel spreadsheets won't be the last immovable object forcing science to rethink their procedures in the name of scientific progress.
H/T The Verge