Medicinal chemists sometimes ask us why the values from their QSAR models don’t always match their experimental data. It can be a real point of frustration. Not only does it waste your time rather than saving it, but when you lose trust in your models, you may end up abandoning them altogether and miss out on the benefits they can bring. 

If that sounds familiar to you, let’s explore together what might be the root cause. 

Why QSAR models matter in drug discovery 

Before we start troubleshooting, let’s remember why we build and run these models in the first place. Good QSAR models help you understand the properties of a compound prior to synthesis. 

When used as part of a multiple-parameter optimisation strategy, they make it possible to prioritise compounds with the optimal balance of potency, ADMET and physicochemical properties. Just as importantly, they highlight situations where such a balance is not possible. In drug discovery, if you’re going to fail, you want to do it quickly, cheaply, and confidently. 

Often, you can’t test every compound in every assay. There just isn’t enough time or resource. Predicting properties with QSAR models can help to fill in those gaps and avoid costly late-stage surprises. 

So why don’t my predictions match my in-house data? 

When working with medicinal chemists and their data, we see five common explanations behind these discrepancies. 

1. Your data quality is letting you down 

The first thing to do is assess the quality of the data used to build your model. 

For a successful model, you need a balance of both active and inactive compounds in your training dataset. Without this, your model can’t learn to distinguish what makes a molecule active against your target, making it worthless to your project. 

It’s also important to consider the reliability of your experimental data. If it’s too noisy, your model will struggle to make confident predictions that you can rely on in your research. 

2. You’re trying to predict complex, late-stage outcomes 

Predicting properties like in vivo clearance is incredibly valuable for identifying optimal drug candidates, but also incredibly complex. These outcomes often involve multiple biological mechanisms and are determined by multiple interconnected factors. 

Dissecting SAR this complex would require an enormous amount of data, and unfortunately these are typically the sorts of experiments where you have the least amount of data. 

When you’re trying to predict more complex biological properties, consider alternative approaches like imputation. Unlike traditional QSAR, imputation can identify relationships between different experimental endpoints as well as structural descriptors. This additional layer of information is key. If you’d like to learn more, our CEO Matthew Segall has written a helpful blog on the differences and applications of QSAR and imputation predictive models

3. You’re modelling the wrong properties 

Sometimes you need to take step back and ask yourself: “Should I even build a model for this property?” 

For instance, enzymes that exhibit broader substrate specificity (such as cytochrome P450s and glutathione S-transferases (GSTs)) will interact with so many different compounds that your model might end up predicting everything as ‘active’.  

In these cases, using physicochemical properties as surrogates for activity might be a more productive approach. For example, when predicting P450 3A4 metabolism, we often observe a strong correlation with logD, which can be a more effective way to prioritise your compounds. 

4. You’ve strayed outside of the domain of applicability 

QSAR models perform best when predicting properties for compounds chemically similar to their training data. We define the chemical space within which a QSAR model can make reliable predictions as the domain of applicability

For example, when designing new compounds, once you extrapolate into new chemical space, you might notice decreased confidence in your predictions. This is a sign that these compounds differ from those used to build the model; the model has less information on which to base its prediction. This can indicate that the model is no longer applicable and it’s time to rebuild. 

There are some approaches to take during model building to ensure your model has a robust domain of applicability. For example, select the training set to cover the full diversity of the available data. One way to do this is by clustering the data and then putting the cluster heads and singletons into the training set. Divide the remaining cluster members between the training, validation and test sets. This will ensure that your training set doesn’t exclusively come from one cluster while your test set comes from another. 

It’s also important here to consider local versus global models and which is most appropriate to use: 

  • Global models are built on larger, more diverse chemical datasets. They capture broader, universal relationships and are applicable across multiple projects. They might miss subtle patterns within specific series. The ADME QSAR models in StarDrop™ are examples of global models. 
  • Local models are trained on smaller datasets focusing on a particular chemical series or mechanism. They excel at distinguishing subtle SAR differences but struggle to extrapolate to new chemical space, potentially needing more frequent model rebuilding. 

5. There’s a more practical way to apply your model  

How you apply your model matters just as much as how you build it. If you’re looking to prioritise compounds for synthesis, then a categorical model can be a more practical approach. It enables you to quickly flag compounds as “make these,” “maybe,” and “reject these.” 

For example, you may have two compounds (A and B) that can’t be meaningfully distinguished. Inherent uncertainties in the model and/or assay may show contradicting information: where they’re predicted as A>B, but then measured as B>A. 

There is no need to get hung up on this or lose trust in your model. Using a categorial approach here that identifies both compounds as “make these” is still valuable for decision making. 

Building better QSAR models 

Understanding these common pitfalls helps to establish realistic expectations for your QSAR models and when you might need to rebuild or rethink your approach. If you’re finding that your model isn’t working, make sure to run through these five checkpoints. 

Key resources for QSAR modelling in StarDrop  

With StarDrop’s ADME QSAR module, it’s easy to predict a wide range of ADME and physicochemical properties using a ready-to-implement list of models.   

How can you know if StarDrop’s predictive models work? We explain the four pillars that ensure our models behave as expected in this blog.  

Want to take a sneak peek at the new and updated models heading to StarDrop? Catch our on-demand webinar to hear the latest developments, including models for intrinsic clearance, and P-gp substrates and inhibitors.  

Need models tailored specifically to your chemistry and data? Explore our Auto-Modeller module, which provides an intuitive workspace to build and validate custom predictive models, no matter your level of expertise. 

About the author

Tamsin Mansley, PhD

President, Optibrium Inc. and Global Head of Application Science

Tamsin holds a PhD in Organic Chemistry from University of East Anglia in the UK and pursued Postdoctoral studies in the labs of Prof. Philip Magnus at University of Texas, Austin.

She is an experienced drug discovery scientist, having worked as a medicinal chemist at Eli Lilly and UCB Research. Her interests lie in coupling machine learning and artificial intelligence techniques with generative chemistry approaches to explore chemistry space and guide compound design.

Linkedin

Tamsin Mansley, PhD

More about QSAR Modelling