AI has generated first functional protein structure in exciting medical discovery

Artificial intelligence (AI) has long been at the forefront of medical developments. Now, researchers used AI to create artificial protein sequences in a progressive discovery that could potentially be implemented to treat various diseases.

Researchers from Salesforce Research in collaboration with Tierra Biosciences and the Fraser Lab at UCSF conducted the study.

“We show that AI can learn the language of biology to create artificial proteins across multiple protein types that are functional and unseen in nature,” researchers explain.

They have created ProGen – an AI language model that can learn the language of proteins to generate artificial protein sequences across multiple families in the same way as other AIs can produce pictures or text. With enough information, AI-generated content can become indistinguishable from that of a human.

“A key insight for our work is that proteins can be represented as a language made up of amino acids, the 20 molecules that make up every protein. In the same way that words are strung together one-by-one to form text sentences, amino acids are strung together one-by-one to make proteins.”

ProGen was trained on 280 million protein sequences from protein databases. It produced impressive results: not only was it able to learn how to generate sequences of amino acids in a structurally-viable manner (similar to text-focused AI), but its sequences were shown to perform their intended function in the real world.

Researchers chose five families of lysozymes – antibacterial proteins, which were the first antibiotic ever discovered – to test ProGen. They then selected more than 100 natural and artificial proteins from the chosen families to see which ones are properly functioning.

“Among our artificial lysozymes, 73% were found to be functional antibacterial proteins, as compared to natural proteins which were 59% functional. Artificial proteins from all five evolutionary families of lysozyme showed activity,” researchers concluded.

Additionally, they discovered that ProGen managed to generate proteins that have up to 100 amino acids that are different from a 170-amino acid-long natural lysozyme sequence without losing its antibacterial properties. Researchers compared such results to a text that retains its meaning but has about 50% different words than the original.

The discovery – a first-of-its-kind fully AI-generated structure of a protein, according to the study – could allow for the swift development of various disease treatments.

AI has been previously been shown to detect COVID-19 in people's voices via a phone app with 89% accuracy. The AI model is also being used to develop an app that can predict exacerbations in chronic obstructive pulmonary disease.