ForceGen: Using A Diffusion Model To Help Design Novel Proteins

Although proteins are composed out of only a small number of distinct amino acids, this deceptive simplicity quickly vanishes when considering the many possible sequences across a protein, not to mention the many ways in which a single 1D protein sequence can fold into a 3D protein shape with a specific functionality. Although natural evolution has done much of the legwork here already, figuring out new sequences and their functionality is a daunting task where increasingly deep learning algorithms are being applied. As [Bo Ni] and colleagues report in a research article in Science Advances, the hardest challenge is designing a protein sequence based on the desired functionality. They then demonstrate a way to use a generative model to speed up this process.

They set out to design proteins with specific mechanical properties, for which they used the known unfolding characteristics of various protein sequences to train a diffusion model. This approach is thus more akin to the technology behind image generation algorithms like DALL-E than LLMs. Using the trained diffusion model it was then possible to generate likely sequences of which the properties could then be simulated, with favorable results.

As a large data set aid, such a diffusion model could conceivably be very useful in fields even beyond protein synthesis, automating tedious tasks and conceivably speeding up discoveries.

One thought on “ForceGen: Using A Diffusion Model To Help Design Novel Proteins

  1. I was under the impression that a deep learning system had already “figured out” protein folding and now it’s being analyzed to identify the logic behind it’s shape predictions. Whatever the case, we really need a deep learning system that can decode deep learning systems. :)

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.