Meta AI creates first ever database of 600 million metagenomic structures

'These structures provide an unprecedented view into the breadth and diversity of nature,' say the researchers.
Loukia Papadopoulos
Proteins as depicted by Meta.jpg
Proteins as depicted by Meta.


In a world first, Meta’s artificial intelligence (AI) has produced the structures of the metagenomic world at the scale of hundreds of millions of proteins, according to a blog by the company published on Tuesday.

“Proteins are complex and dynamic molecules, encoded by our genes, that are responsible for many of the varied and fundamental processes of life. They have an astounding range of roles in biology,” wrote the Meta research team who also published a paper on the matter in the preprint database bioRxiv.

“The rods and cones in our eyes that sense light and make it possible for us to see, the molecular sensors that underlie hearing and our sense of touch, the complex molecular machines that convert sunlight into chemical energy in plants, the motors that drive motion in microbes and our muscles, enzymes that break down plastic, antibodies that protect us from disease, and molecular circuits that cause disease when they fail — are all proteins.”

Across the planet and in our human bodies

Metagenomics uses gene sequencing to discover proteins in samples from environments across the planet and even in our human bodies. It is common knowledge that a vast number of proteins exist beyond the ones that have been cataloged and annotated in well-studied organisms and now these proteins are coming to the surface.

Meta AI creates first ever database of 600 million metagenomic structures
A map of tens of thousands of high-confidence predictions.

“Metagenomics is starting to reveal the incredible breadth and diversity of these proteins, uncovering billions of protein sequences that are new to science and cataloged for the first time in large databases compiled by public initiatives such as the NCBI, European Bioinformatics Institute and Joint Genome Institute, incorporating studies from a worldwide community of researchers,” continued the Meta research team.

The discovery is made using a program called ESMFold with a model that was originally designed for decoding human languages. The finds have been compiled into the open-source ESM Metagenomic Atlas and could one day be used in the production of new drugs, the characterization of unknown microbial functions, and the discovery of evolutionary links between distantly related species.

Most Popular

Meta shared a database of more than 600 million metagenomic structures, as well as an API that will allow scientists to easily retrieve specific protein structures relevant to their work.

Protein predictions

ESMFold is not the first program to make protein predictions. Google-owned company DeepMind also has a protein-predicting program called AlphaFold that sought to identify proteins this year as well. Meta researchers, however, claim that ESMFold is 60 times faster than AlphaFlod although its outcomes have yet to be peer-reviewed.

The scientists' further state that their new atlas “is the largest database of high resolution predicted structures, 3x larger than any existing protein structure database, and the first to cover metagenomic proteins comprehensively and at scale.”

“These structures provide an unprecedented view into the breadth and diversity of nature, and hold the potential for new scientific insights and to accelerate discovery of proteins for practical applications in fields such as medicine, green chemistry, environmental applications, and renewable energy,” concluded the research team.

message circleSHOW COMMENT (1)chevron