Addressing toxicity risk when designing and selecting compounds in early drug discovery
Summary In this article, ‘Addressing toxicity risk when designing and selecting compounds in early drug discovery‘, we discuss the application…
In early-stage drug discovery, medicinal chemists rely on predictive models to help guide which compounds to synthesise or test next. Ideally, these models would provide highly accurate numerical predictions of properties like potency, solubility, or metabolic stability. However, early discovery is often characterised by limited experimental data, making it challenging to build robust models that generate precise numerical outputs.
This is where categorical models – such as red / amber / green (RAG) classifications – come into play. Rather than attempting to predict an exact value, categorical models provide qualitative insights that help with prioritising compounds with good properties and deprioritise weaker candidates. But how exactly do these models provide value, and why should medicinal chemists consider using them?
Building highly accurate numerical models requires large, high-quality datasets. However, in the early stages of discovery:
These factors make it difficult to confidently rank compounds based on numerical scores alone.
Categorical models take a different approach by classifying compounds into broad categories based on predicted properties. A common example is the red /amber/green system where:
Instead of struggling with uncertain numerical values, chemists can use these classifications to quickly filter, prioritise, and optimise compound selection.
Categorical models may seem like a compromise compared to numerical predictions, but in many cases, they are actually more practical and actionable. Here’s why:
Because categorical models focus on classification rather than exact prediction, they can still be useful when training data is limited. Even with small datasets, a well-calibrated RAG model can help guide decision-making with greater reliability than a weak numerical value.
Rather than presenting medchemists with an arbitrary number, categorical models provide a clear go/no-go decision framework. This makes it easier to triage compounds and focus on the most promising ones without over-interpreting uncertain numerical values.
In drug discovery, compound progression is rarely based on a single numerical threshold. Instead, teams typically make holistic go/no-go decisions considering multiple factors. A categorical model mirrors this qualitative decision-making approach, making it a more intuitive tool for medicinal chemists.
A single numerical score often fails to capture the complexity of drug discovery. Categorical models fit nicely into MPOs to evaluate multiple properties in parallel, filtering out problematic compounds based on a combination of predicted factors (e.g., potency, solubility, metabolic stability).
Imagine a medicinal chemistry team working on a novel kinase inhibitor programme. They have 500 virtual compounds to choose from, but only enough resources to synthesise 20.
Using a categorical model, the team scores each compound as red, amber, or green, based on predicted potency and selectivity. They:
This streamlined decision-making process enables the team to make better choices, faster, ultimately increasing the efficiency of their discovery efforts.
While categorical models offer several advantages, they are not without challenges:
In early drug discovery, where data is sparse and uncertainty is high, categorical models offer a practical and effective alternative to numerical predictions. By providing clear, interpretable classifications, these models help medicinal chemists, prioritise synthesis, filter out weak candidates, and make more confident decisions.
Rather than wait for perfect numerical models, teams can leverage categorical predictions to drive faster, smarter compound selection – accelerating the path to new drug candidates.
Summary In this article, ‘Addressing toxicity risk when designing and selecting compounds in early drug discovery‘, we discuss the application…
OA paper outlining the practical applications of deep imputation on large-scale drug discovery data. It compares deep learning to traditional QSAR methods.
Data curation for model building A model can only be as good as the data it has been trained on.…