Researchers develop deep learning model that outperforms Google AI system in predicting peptide structures

Schematic diagram of the PepFlow architecture. Source: Nature Machine Intelligence (2024). DOI: 10.1038/s42256-024-00860-4

Researchers at the University of Toronto have developed a deep learning model called “PepFlow” that can predict any shape of a peptide, a chain of amino acids that is shorter than a protein but serves a similar biological function.

PepFlow combines machine learning and physics to model the range of folding patterns that peptides can adopt based on their energy landscape. Unlike proteins, peptides are highly dynamic molecules that can adopt many different conformations.

“Until now, it has not been possible to model all possible conformations of a peptide,” said Osama Abdin, lead author of the study, who recently completed his PhD in molecular genetics at the University of Toronto's Donnelly Centre for Cellular and Biomolecular Research. “PepFlow uses deep learning to capture precise, accurate conformations of peptides in just a few minutes. These models have the potential to aid in drug development through the design of peptides that act as binding agents.”

The research was published today in the journal Nature Machine Intelligence.

The role of peptides in the human body is directly related to how they fold, as their 3D structure determines how they bind and interact with other molecules. Peptides are known to be highly flexible and can adopt a variety of folding patterns, which allows them to participate in many biological processes that are of interest to researchers developing therapeutics.

“The focus of the PepFlow model was on peptides because they are incredibly important biomolecules and are highly dynamic in nature, so we need to model different conformations of peptides to understand their function,” said Philip M. Kim, lead researcher on the study and professor at the Donnelly Center. “Peptides are also important as therapeutic agents, as seen by GLP1 analogs such as Ozempic, which are used to treat diabetes and obesity.”

Kim, who is also a professor of computer science in the University of Toronto's Faculty of Arts and Sciences, said peptides are also cheaper to produce than larger proteins.

The new model extends the capabilities of AlphaFold, the leading Google Deepmind AI system for predicting protein structures. PepFlow outperforms AlphaFold2 by generating a range of conformations for a given peptide, which is not what AlphaFold2 was designed for.

What sets PepFlow apart is the technological innovation behind it: For example, PepFlow is a generalized model inspired by the Boltzmann Generator, an advanced physics-based machine learning model.

PepFlow can also model peptide structures that undergo unusual formations, such as ring-like structures that result from a process called macrocyclization. Peptide macrocyclization is currently a very promising area of ​​drug development.

Although PepFlow improves on AlphaFold2, it is the first version of the model and therefore has its own limitations. The study authors pointed out several ways in which PepFlow could be improved, such as training the model with explicit data on the solvent atoms that dissolve the peptide to form a solution, or constraints on the distances between atoms in ring-like structures.

PepFlow is built to be easily extended to account for additional considerations, new information, and potential applications. Even in its first version, PepFlow is a comprehensive and efficient model that may facilitate the development of therapeutics that activate or inhibit biological processes through peptide binding.

“Modeling with PepFlow gives us insight into the actual energy landscape of peptides,” says Abdin, “and although it took two and a half years to develop PepFlow and a month of training, it was worth it to move beyond models that predict only one peptide structure and into the next realm.”

Further information: Osama Abdin et al. “Direct conformational sampling from peptide energy landscapes by hypernetwork conditional diffusion.” Nature Machine Intelligence (2024). DOI: 10.1038/s42256-024-00860-4

Provided by University of Toronto

