Cerella’s property prediction and uncertainty estimates allowed FMC to pinpoint the active compounds within their large screening data sets, guiding the purchasing of compounds and the prioritisation of experiments.
The Challenge
Agrochemical companies are constantly on the lookout for new compounds, to combat rising resistance and changing regulations. This means extensive, expensive high throughput screening and follow-up testing. FMC wanted to find a way to increase their probability of finding active compounds, using AI.
The Solution
Cerella was used to develop models on two data sets initially, one for insecticides and one for herbicides. Both data sets were very sparse, with only a small proportion of the possible measurements taken, and covered many different species and endpoints.
Cerella ‘filled in the gaps’ in the sparse data matrices, and evaluated the correlations between different inputs (i.e. the measured endpoints) and outputs (predicted endpoints).
A Cerella model was also developed to predict the properties of a virtual library of 1.25 million untested compounds. By harnessing multi-parameter optimisation, these were then scored for likelihood of success against a required activity profile, based on Cerella’s predictions and uncertainty estimates.
| Insecticides | Herbicides | Fungicides | |
| Number of compounds in data set | ~80,000 | ~200,000 | ~350,000 |
| Number of different measured endpoints | 30 | 77 | 200 |
| Number of species measured against | 6 | 15 | 14 |
| % of total possible values measured | ~12.5% | ~7.5% | ~4.5% |
Cerella made significantly better predictions than other machine learning approaches and could show the correlation between different endpoints, and hence the most important measurements to take to give useful predictions for more complex endpoints.
Based off this initial success, FMC also chose to apply Cerella to an even larger fungicide data set, again highlighting useful correlations and making predictions with confidence.
FMC were able to use Cerella’s predictions to inform the purchase and testing of their upcoming data sets, guiding their workflows to increase efficiency, and access better compounds, faster.