Google details its protein folding software and scholars offer alternatives

Thanks to the development of DNA sequencing technology, it has become easy to obtain the sequence of the base encoding the protein and translate it into the sequence of the amino acids that make up the protein. But from there, we often get stuck. The actual function of a protein is specified only indirectly by its sequence. Instead, the sequence determines how the amino acid chain folds, bends, and forms a particular structure in three-dimensional space. Its structure usually determines the function of a protein, but it may require years of laboratory work to obtain it.

For decades, researchers have tried to develop software that can sequence amino acids and accurately predict the structure they form. This is a matter of chemistry and thermodynamics, but until last year its success was limited. At that time, Google’s DeepMind AI group announced the existence of AlphaFold. This is usually a highly accurate predictor of the structure.

At the time, DeepMind said in a future peer-reviewed treatise that was finally released yesterday that it would provide everyone with the details of that breakthrough. In the meantime, some academic researchers were tired of waiting, taking some of DeepMind’s insights and creating their own. A paper explaining the effort was also published yesterday.

AlphaFold stains

DeepMind has already explained the basic structure of AlphaFold, but the new paper provides more detailed information. AlphaFold’s structure contains two different algorithms that communicate with each other for analysis, each allowing improved output.

One of these algorithms looks for protein sequences that are evolutionary related to the algorithm in question, understands how those sequences are aligned, and adjusts for small changes, insertions, or even deletions. Even if you don’t know the structure of these relatives, you can provide important constraints, such as whether certain parts of the protein are always charged.

According to the AlphaFold team, about 30 related proteins are needed for this part to work effectively. Usually, you quickly come up with a basic placement and improve it. These types of improvements may include shifting the gap to place important amino acids in place.


The second algorithm, which runs in parallel, breaks the sequences into smaller chunks and attempts to resolve each of these sequences, making sure that the structure of each chunk is compatible with the larger structure. This is why alignment of proteins and their relatives is essential. Correcting the structure becomes a real challenge when important amino acids end up in the wrong chunk. Therefore, the two algorithms communicate and allow the proposed structure to be fed back to the alignment.

Structure prediction is a more difficult process, and the original idea of ​​an algorithm is often subject to more significant changes before the algorithm settles on the final structural improvement.

Perhaps the most interesting new detail in this treatise is where DeepMind goes through and disables various parts of the analysis algorithm. These show that of the nine different functions they define, all appear to contribute at least a little to the final accuracy, and only one has a dramatic effect on it. One involves identifying points in the proposed structure that are likely to require changes and flagging them for further attention.


In a presentation that coincided with the release of the treatise, DeepMind CEO Demis Hassabis said: Publication of the complete source code and system methodology. “

However, Google had already explained the basic structure of the system, so some researchers in academia wondered if existing tools could be adapted to systems with structures like DeepMind. And with a delay of seven months, the researchers had enough time to act on that idea.

Using DeepMind’s first description, researchers identified five features of AlphaFold that they felt were different from most existing methods. So they implemented various combinations of these features and sought to find an improvement over the current method.

The easiest way to get started was to use two parallel algorithms. One is dedicated to array alignment and the other is to perform structure prediction. However, the team eventually split the structural part of things into two different functions. One of these functions simply estimates the two-dimensional distance between the individual parts of the protein, and the other processes the actual position in three-dimensional space. All three exchange information and each provides hints on which aspects of the other task need to be further improved.


The problem with adding a third pipeline is that the hardware requirements increase significantly, and in general, scholars do not have access to the same type of computing assets as DeepMind. As a result, a system called RoseTTAFold did not perform as well as AlphaFold in terms of prediction accuracy, but it was better than previous systems that the team could test. But given the hardware it was run on, it was also relatively fast, taking about 10 minutes when run on a protein with a length of 400 amino acids.

Like AlphaFold, RoseTTAFold breaks proteins into smaller chunks, resolves them individually, and then tries to put them together into a complete structure. In this case, the research team realized that this could have additional applications. Many proteins form extensive interactions with other proteins to make hemoglobin work. For example, it exists as a complex of four proteins. If the system works properly, supplying two different proteins will allow the system to know both its structure and where it interacts. Testing of this has shown that it actually works.

Healthy competition

Both of these treatises seem to explain positive progress. First, the DeepMind team deserves full appreciation for the insight they had in building the system in the first place. Obviously, setting it up as a parallel process that communicates with each other has taken a huge leap in its ability to estimate protein structure. Rather than simply trying to recreate what DeepMind did, the academic team took some of the key insights and took them in new directions.

Currently, there is a clear performance difference between the two systems, both in terms of the accuracy of the final output and the time and computational resources that need to be dedicated to it. However, both teams appear to be working on openness, so it’s quite possible that the other will be able to adopt their best features.

Whatever the outcome, we are clearly in a new location compared to just a few years ago. People have been trying to solve protein structure prediction for decades, but more than we can’t solve when the genome provides us with a large number of protein sequences that we have little idea of ​​how to interpret. It’s a problem. The demand for time in these systems can be enormous, as the majority of the biomedical research community benefits from software.

Science, 2021. DOI: 10.1126 / science.abj8754

Nature, 2021. DOI: 10.1038 / s41586-021-03819-2 (About DOI).




