Artificial intelligence (AI) is having its moment in drug discovery. From retrosynthesis to de novo design to in silico PK/PD prediction, every team seems to be testing “the next best model”. Press releases shout about performance lifts, investment decks pitch pipelines powered by deep learning, and everyone has a slide about how AI will revolutionise early-phase drug discovery. 

And yet, talk to any bench scientist and the reaction is… complicated. 

AI may be winning the headlines, but it hasn’t won the lab. 

That’s because most teams evaluating AI tools are encountering a familiar pattern: what looked powerful in a paper or a polished demo falls apart in the messiness of real-world drug discovery data, workflows, and decision-making. And often, that disconnect comes down to a very basic misunderstanding: 

AI is not magic. It’s maths. 

Ground truth matters more than algorithm hype

In drug discovery, we deal in imperfect data. Assays are noisy. Endpoints are missing. Relationships between properties are nonlinear, context-dependent, and (more often than not) poorly understood. This isn’t a bug in your pipeline. It’s biology. 

But many AI tools assume the opposite. They treat sparse, heterogeneous drug discovery data as if it’s clean and complete. That may work on benchmark datasets or Kaggle challenges, but in a live discovery program? Not so much. 

Rather than focusing on using newer maths, successful AI drug discovery platforms must respect the data. They need to be able to handle uncertainty, flag missingness, and surface the most informative patterns even when the matrix is mostly empty.  

It’s not about squeezing out better accuracy metrics. It’s about making useful predictions in the face of ambiguity. 

Importantly, AI tools need to acknowledge when they don’t know the answer. We often see that AI will confidently make incorrect statements (see our ChatGPT ‘strawberry’ example). This is a harmless example when we’re dealing with spelling, but confident mistakes like this can be detrimental to your time and resources in the lab. 

Usefulness > performance

The obsession with algorithmic performance is understandable. Everyone wants to beat the baseline, outscore the model next door, or claim to be “state of the art.” 

But in practice, most discovery teams don’t care about the last decimal point. They care about: 

  • Prioritising which compound to make next 
  • Avoiding unnecessary assays 
  • Designing optimal molecules that balance potency with PK 
  • Making better decisions with limited time and budget 

A model that enables any of those steps, even if it’s not the most accurate, is more valuable than a perfect model that no one uses.

The real differentiator: Integration

Here’s the hard truth: the science behind many AI tools is converging. Most platforms are using similar model architectures under the hood. What separates a platform that changes your workflow from one that gathers dust is integration

If you’re thinking of adopting AI into your workflow, consider the following: Can it work with your data format? Can it sit inside your design cycle? Can it communicate uncertainty clearly enough for a chemist to trust it? Can you use it without reading a user manual? 

So what should you do?

If you’re evaluating AI platforms for your pipeline, stop looking for magic. Instead, start asking meaningful questions: 

  • Does this tool respect real-world data, or just perfect training sets? 
  • Does it support human decision-making, or try to replace it? 
  • Does it fit in with how your team actually works? 
  • Is it transparent about what it doesn’t know? 

You don’t need hype. You need help solving hard problems faster. AI can absolutely provide that help, as long as it’s built with scientists and their real-world data in mind. 

Cerella: Proven, deployable AI built for complex drug discovery data

Cerella wasn’t built to show off algorithms. It was built to be used. 

It’s designed to deal with imperfect, sparse, and multi-endpoint datasets. It combines compound descriptors with partial experimental data to predict unmeasured assay results, estimate uncertainty, and identify the most informative experiments to support compound progression.  

Most importantly, it tells you when it is uncertain about its answer. So, you can have confidence in your decision making. 

See real-world case studies where Cerella has supported drug discovery decision making. 

About the author

Patrik Nikolić, MSc

Patrik is a medicinal chemist with a strong background in drug discovery, dedicated to streamlining workflows and advancing innovative therapies. At Optibrium, he serves as Scientific Strategy Partner, driving initiatives that empower researchers.

He holds a Master’s in Medicinal and Pharmaceutical Chemistry and a Bachelor’s in Biotechnology and Drug Research from the University of Rijeka, graduating summa cum laude with expertise in in silico design, enzyme assays, and biomolecular studies.

Linkedin

Patrik Nikolic

More about AI in drug discovery