There is a common assumption that AI or ML models used in drug discovery are free from human biases and can only make objective decisions based on pure logic and reason, but is that true? 

Like all humans, drug discovery scientists suffer from inherent biases that influence our decision making. Our intuition can sometimes be a limitation. It can cause us to miss opportunities by disregarding ideas that don’t follow our experiences or traditional thinking. Additionally, we’re susceptible to ‘confirmation bias’ that leads us to prioritise data that confirms our hypotheses rather than contradicts it. This inappropriately limits the search before choosing a lead molecule or series.

Image title: What is confirmation bias?

As AI models are unaffected by emotion, ego, and personal experience, they can overcome these challenges. They can uncover missed opportunities with more ‘out-of-the-box’ thinking. 

However, they do inherit some of the biases baked into their training data, and even biases caused by the structure of the models themselves. I’ll explore this topic further in this blog. 

Outside of drug discovery: Where are AI biases most evident? 

Where biased AI algorithms are perhaps most evident, or most worrying, is in recruitment. The only data available to train an AI model on how to select candidates comes from existing human-biased hiring. It has been shown that they frequently learn to identify proxy variables that correlate with race and gender, even if those characteristics are anonymised. 

Similarly, common text and image generation models, trained on huge chunks of the internet, inherit the prevailing cultural and societal norms present in the training data, including those we might wish to exclude. Later stages of reinforcement learning in the training process try to counteract biases acquired in this way but are relatively crude. One particular example of this was when Google Gemini was found to inaccurately included people of colour when asked to generate images of WWII German soldiers, which led to a temporary suspension of its image-generating functionality. 

An AI drug discovery model is only as good as its training data 

While these examples are far from the realm of drug discovery, they serve to illustrate the fundamental problem – a model is only as good as the data that it is trained on, and the data is never perfect.  

To take a simple example, consider a QSAR classification model, which is trained to predict whether a molecule is active against a target. Our training data will most likely include a selection of molecules that we think are likely to be active, because otherwise we probably wouldn’t have bothered testing them in the first place. This naturally induces a bias towards positive predictions of activity. We might want to keep this bias in place, if it is more important to identify possible leads than to rule out false positives, and in fact we might even want to exaggerate the bias by upsampling the active compounds to make sure that our model doesn’t just predict everything to be inactive. 

Similarly, a generative chemistry model trained on ChEMBL compounds will attempt to produce molecules that look like those in the ChEMBL database. While this database is a fantastic resource, it is not an unbiased sampling of drug-like compounds. In particular, being biased towards the compounds that made into scientific publications and patents. While this may be a desirable bias as these are likely to be ‘better’ compounds than those that fell by the wayside, it also reflects the biases of the authors who wrote the papers, and the companies that filed the patents.

Inductive bias in AI/ML models

More subtle still is inductive bias. This is a form of bias that is inherent to the AI/ML model itself, rather than the data it is trained on. Inductive bias is produced by the simplifying assumptions required to make model fitting possible in the first place. Given a set of x/y values, there are infinitely many possible functions that would fit them. Without making any assumptions about what functional forms are reasonable, it would be impossible to choose.  

If we restrict ourselves to just linear functions, we collapse the possibilities to a much smaller set, and we can easily identify an optimal linear fit to the data with linear regression. However, we now have a model that will try to fit linear functions to data, regardless of the true functional form. Similarly, decision tree-derived models will fit a series of step functions, neural networks will fit continuous curves, and Gaussian processes will fit functions that match their kernels.

1D step function example: Random forest vs neural network 

How this manifests in modelling can be complex. However, we can predict that models that allow for discontinuities (like random forests or gradient boosting) will cope better with sharp transitions, such as activity cliffs. A toy example is shown below, where I’ve fitted a random forest and neural network to a 1D step function: because the neural network has an inductive bias towards smooth curves, it misses the points at the corners of the step function, while the random forest with its discontinuities does a much better job. 

Importantly, the inverse can also be true: where a smooth function better represents the data, the random forest is likely to underperform the neural network. Neither of these is ‘better’ in general, they just have different inductive biases that mean they perform better under different circumstances.  

Using biases to optimise our models

By understanding the biases inherent to our data and models, we can better understand and optimise their performance, as well as understanding their limitations. In our AutoModeller package, for example, we train and optimise a set of different ML models to predict target molecule properties, to ensure that we end up with a model whose inductive bias aligns with the problem we are trying to solve. 

Summary

In conclusion, while AI can significantly reduce bias in drug discovery, identifying novel molecules that may have been disregarded by chemists, the idea that AI models are unbiased is not correct. They are biased both by their training data, and by the structure of the models themselves, which predispose them to return certain answers. These biases can be very different from human biases, if the training data isn’t derived from human output, and can cause models to fail in unexpected ways. However, by understanding these biases and how they affect the model performance we can better optimise our models and learn when to trust their outputs. 

About the author

Michael Parker, PhD

Michael is a Principal AI Scientist at Optibrium, applying advanced AI techniques to accelerate drug discovery and improve decision-making. With a Ph.D. in Astronomy and Astrophysics from the University of Cambridge, he brings a data-driven approach to solving complex scientific challenges. Michael is also a thought leader, contributing to discussions on the impact of AI in pharmaceutical research.

Linkedin

Dr Michael parker, Principal AI Scientist, Optibrium

More about AI in drug discovery