Optibrium demonstrates superior molecular docking method for small molecules and macrocycles
CAMBRIDGE, UK, 22 October 2024 – Optibrium, a leading developer of software and AI solutions for molecular design today announced…
When we talk about “large molecules,” we often think of biologics like monoclonal antibodies, proteins, and nucleic acids. But the classification can also extend to macrocycles – an important class of therapeutic compounds. Macrocycles are characterised by a “large” cyclic (ring-like) structure with 12 or more atoms, and they can include cyclic peptides, peptidomimetics, and other diverse chemical structures. These molecules have gained significant attention because they can target tough-to-reach, flat protein surfaces with high specificity—something that’s often challenging for smaller molecules. Though smaller than typical biologics, their size, complexity, and molecular weight place them squarely in the “large molecules” category.
Researchers often try to repurpose tools developed for “small molecules” when looking to model macrocycles. However, the size and flexibility of macrocycles make it tricky to model them accurately – whether it’s their 3D structure, how they interact with targets, or predicting their properties like activity, binding affinity, solubility, and permeability. Even though cyclisation can pre-organise macrocycles for specific binding, they still tend to retain significant flexibility, which means there are a lot of possible conformations to consider. On the other hand, QSAR models, which work well for small molecules, often struggle with macrocycles, especially peptidic ones. This is partly due to lack of large, high-quality datasets for these compounds, and the models can’t always handle the complex conformational flexibility of large molecules.
Modelling macrocycles effectively requires leveraging advances in computational techniques to overcome the unique challenges these large molecules present. Let’s look at how we can approach different aspects of macrocycle modelling.
To predict how a macrocycle will behave biologically, we need to understand its 3D conformation. However, tools designed for small molecules often fall short when applied to macrocycles due to their larger size and flexibility.
A key factor in a macrocycle’s activity is strain, which is the energy cost incurred when the molecule adopts a less-than-ideal conformation. Predicting this accurately is critical for understanding a macrocycle’s potential efficacy.
To tackle this, we need tools specifically optimised for macrocycles that are accurate, and fast. For example, ForceGen™ (part of our BioPharmics™ suite) is a force-field based tool that generates reasonable starting structures and performs full conformational searches, considering ring strain and flexibility. It’s been shown to work well for both small molecules and larger macrocycles, only requiring input in SMILES format. Plus, it allows users to incorporate biophysical constraints for more refined results.
When working with macrocycles, comparing 3D structures is crucial to identify key features that might impact target binding. While 2D structure comparisons are often used, they don’t fully capture the nuances of how ligands interact with receptors. Even if two ligands look different in 2D, their 3D features could be remarkably similar, which is key for effective binding.
This is where 3D alignment comes in – matching the spatial arrangement of molecules and their key features, like shape, hydrogen bond donors/acceptors and electrostatic properties. The challenge, however, is that macrocycles are flexible, and their 3D structures may not always be known, making accurate alignment tough.
The eSim™ module from our BioPharmics suite leverages the power of ForceGen to consider all possible conformations of each ligand when performing 3D structure alignment. By focusing on key features mentioned above, eSim ensures the most optimal alignment for both macrocycles and non-macrocycles.
When experimental data is lacking, predicting how a macrocycle will interact with its target is key to guiding its design. Docking is a powerful tool for this, as it helps predict how a ligand fits into a protein’s binding site. Two components of a docking algorithm include:
For macrocycles, the first component poses significant challenges and leads to docking failures. A good docking tool not only reproduces known cognate ligand poses but also accurately predicts how non-cognate ligands will interact with the target. Our Surflex-Dock™ tool excels at this by using prior knowledge of bound ligands to refine docking predictions. By combining ForceGen (for generating conformational ensembles), and eSim (for aligning macrocycles to known ligands, whenever available), with an optimised scoring function, Surflex-Dock can predict the bound conformations of macrocycles with high accuracy. We have recently shown its success, particularly for non-cognate macrocyclic ligands.
A key part of macrocycle modelling is accurately predicting their properties, like binding affinity, solubility, and permeability. While traditional or ML-based QSAR models are commonly used, they often face limitations. The main challenges are:
For macrocycles, the molecular descriptors often fail to capture the flexibility and conformational diversity these molecules can exhibit.
One solution is Imputation – a technique that pools data from a set of macrocycles and uses it, along with descriptors, to build ML models. By learning correlations between compound descriptors and experimental measurements, imputation can “fill-in” missing data, leading to more accurate property predictions. Learn more about the differences between and applications of imputation and QSAR.
This method can also uncover non-linear relationships between experimental endpoints, giving deeper insights into how macrocycles work. For example, our Cerella™ imputation platform has successfully predicted macrocyclic peptide activities, helping guide optimisation and experiment design.
Another way is to incorporate 3D structural data into QSAR models. When predicting binding affinities, QSAR models based on 2D features often miss the mark because they only capture correlative relationships. By including 3D structural information, we can better address key physical phenomena in protein-ligand interactions, thus capturing causal relationships.
The challenge, though, is handling flexibility. QuanSA™, another tool in our BioPharmics suite, tackles this. It is a 3D QSAR method that predicts binding affinities by constructing field-based models of ligand binding pockets, without needing the target’s structure. QuanSA learns from the multiple ligand conformations generated, making it especially useful for complex molecules like macrocycles.
So, what’s the best way to model large molecules like macrocycles? There are multiple ways to address this question, each tailored to overcome the unique challenges posed by these complex entities. For structural modelling, coupling high-quality conformational ensembles with molecular docking, similarity assessments, and binding affinity prediction can provide robust computational support for macrocycle design. For property prediction, methods like QSAR and Imputation help predict key properties such as binding affinity, solubility, and permeability. Especially, incorporating 3D structural data leads to more accurate predictions. Together, these techniques form a powerful toolkit for modelling large molecules, driving better drug design and optimisation. As these methods continue to evolve, they hold great potential to accelerate macrocycle development.
Himani is a Senior Scientist at Optibrium, specialising in computational drug discovery. With a PhD in Bioinformatics and Computational Biology from the Indian Institute of Science, she has extensive experience in AI/ML techniques for modeling drug candidates, including small molecules and peptides. She is also involved in 3D ligand-based and structure-based drug design.
Before joining Optibrium, Himani completed a postdoctoral fellowship at the MRC Laboratory of Molecular Biology, focusing on cell reprogramming and transcriptomics.
CAMBRIDGE, UK, 22 October 2024 – Optibrium, a leading developer of software and AI solutions for molecular design today announced…
The Chemical Information & Computer Applications Group (CICAG) and Biological & Medicinal Chemistry Sector (BMCS) of the Royal Society of Chemistry are once again organising a conference to present the current advances in AI and machine learning in Chemistry.