How does generative chemistry work, and how can it help me?

Author

Michael Parker, PhD

Generative chemistry is one of the hottest topics in drug discovery right now, and generative AI has become a household name in the last few years. But what do these buzzwords actually mean in practical terms, and how can chemists use these tools to help them discover new medicines?

The role of generative chemistry in drug discovery

A key difficulty in finding new drugs is the sheer size of chemical space. The number of potential drug-like molecules is incomprehensibly vast: an often quoted figure is that there are ~10⁶⁰ molecules obeying Lipinski’s rule of five (Reymond, 2015). This figure is notable for exceeding the total number of atoms in the entire solar system by a factor of several thousand. More conservative estimates are available. However, even at the bottom end these are still way beyond scales that are explorable, or even conceivable, for human beings.

The key selling point of generative chemistry is that it offers a way to automate exploration of chemical space. It can mine the best compounds for your specific problem and present you with an array of optimised solutions that you never would have considered otherwise. In practice, of course, things are rarely so simple. But generative chemistry still offers a valuable tool in the chemist’s arsenal.

Traditional generative chemistry vs. AI-driven methods

While generative AI methods steal all the headlines at the moment, it is worth remembering that generative chemistry predates the current AI boom (a good indicator that the field has value beyond the hype). Traditional generative methods typically rely on combining simple building blocks from libraries of known scaffolds, fragments, or reagents (Sadybekov, 2022). These methods offer fast, powerful, and simple ways to explore relevant chemical space. They also avoid some of the pitfalls that can occur with AI techniques. However, these methods tend to have less flexibility than deep-learning based methods. This is because they are limited to combining a fixed set of building blocks. In contrast, an AI model can provide a deeper understanding of a larger chemical space.

Which AI models are used in generative chemistry?

AI approaches rely on neural networks to generate new compounds meeting some set of criteria. These networks fall broadly into two camps: diffusion models and sequence models.

Diffusion models are based on the same technology that powers image generators: iterative de-noising. The AI model is trained to remove noise from an input, an image (for an image generator), or a molecular graph (for a compound generator). By running this de-noising step repeatedly, the trained model can gradually convert pure noise into a coherent output.
Sequence models, conversely, are based on the technology used to process and generate text. They input and output compounds as chemical strings, most frequently SMILES strings. These strings are then represented internally as a sequence of vectors. In a language model like ChatGPT, those vectors each represent a word in the text, and the vector describes the properties of that work in the context of other words in the sentence. For a chemical language model, the vectors each represent a symbol in the chemical string, and the vector describes its properties in the context of the larger molecule.

How should we measure the performance of generative models?

Exactly which type of model architecture is going to give better performance will depend on the specific problem at hand, and is arguably less important than how the model is deployed. Data scientists naturally focus on metrics. However, chemists mostly don’t care if a model is 96% accurate or 97% accurate. They care about seeing interesting, relevant compounds in a way that integrates with their existing workflows cleanly.

A key difficulty with this is determining what constitutes an interesting, relevant compound in the first place. Models can be trained to optimise for a given set of parameters or constraints. The output compounds can then be filtered for further constraints. However, ultimately, it is impossible to account for all possible variables. This necessitates some level of expert human oversight, either to filter down the generated compounds after generation, or to work alongside the AI guiding the process.

Human/AI synergy: an Augmented Chemistry® approach

An analogy I like here is chess. For a long time, the combination of an AI and an expert human player would outperform either individually. This is because the human could provide large-scale strategy and big-picture thinking that the machine lacked, and the AI could provide the brute force move evaluation that is impossible for human brains. This is no longer the case for chess, as computers have got ever more powerful. Still, drug discovery is vastly more complex and ever-shifting. Therefore, it seems likely to me that human oversight in some form will remain crucial to the process indefinitely.

AI models are not magic, and while they can be extremely powerful, they are also fallible, often in ways that a human would find laughably foolish (see Chevrolet’s AI assistant agreeing to sell a car for a dollar, or Google Gemini recommending the addition of glue to pizza toppings). We should treat generative chemistry AIs as another tool to empower expert chemists, not as a substitute for them, and build our software around that goal.

Realistic expectations for generative chemistry AI

Can generative chemistry AI instantly solve all your problems and replace half your staff? Despite the breathless claims of LinkedIn influencers, no it can’t. What it can do is help chemists explore the vastness of chemical space, accelerating the optimisation and development of lead compounds and uncovering exciting new drugs that otherwise they might never have seen.

Learn more about Optibrium’s approach to generative chemistry

If you want to see what my colleagues and I have been up to, and understand how Optibrium’s generative chemistry methods within Nova and Inspyra work, you can watch our webinar on-demand: ‘An augmented approach to generative chemistry ’.

Michael Parker, PhD

Michael is a Principal AI Scientist at Optibrium, applying advanced AI techniques to accelerate drug discovery and improve decision-making. With a Ph.D. in Astronomy and Astrophysics from the University of Cambridge, he brings a data-driven approach to solving complex scientific challenges. Michael is also a thought leader, contributing to discussions on the impact of AI in pharmaceutical research.

Cookies

The role of generative chemistry in drug discovery

Traditional generative chemistry vs. AI-driven methods

Which AI models are used in generative chemistry?

How should we measure the performance of generative models?

Human/AI synergy: an Augmented Chemistry® approach

Realistic expectations for generative chemistry AI

Learn more about Optibrium’s approach to generative chemistry

About the author

Michael Parker, PhD

More generative chemistry resources

A practical guide to implementing AI

Perspectives on generative chemistry – potential and reality

Practical applications of matched series analysis