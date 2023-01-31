A study regarding the research was published in the journal Nature Biotechnology Thursday. The project was a combined effort from researchers at the University of California-San Francisco and the University of California-Berkeley and Salesforce Research, which is a science arm of a software company based in San Fransisco.

The significance of using a language model

Researchers say that a language model was used for its ability to generate protein sequences with a predictable function across large protein families, akin to generating grammatically and semantically correct natural language sentences on diverse topics.

"In the same way that words are strung together one-by-one to form text sentences, amino acids are strung together one-by-one to make proteins,” Nikhil Naik, the Director of AI Research at Salesforce Research, told Motherboard. The team applied "neural language modeling to proteins for generating realistic, yet novel protein sequences.”

The study was based on training the model with 280 million protein sequences from over 19,000 families, which was "augmented with control tags specifying protein properties."

According to Motherboard, the use of conditional language models by the team allows for significantly more control over what types of sequences are generated, making them more useful for designing proteins with specific properties.

The use case scenarios of such a development

The flexibility of such a model to generate functional artificial proteins across protein families has promising applications. According to the team, "additional analyses suggest that our model has learned a flexible protein sequence representation that can be applied to diverse families like lysozymes, CM, and MDH."