Cambridge, UK
September 16 – 18, 2024

Join Samar Mahmoud and Scott McDonald at the 7th RSC-BMCS AI in chemistry conference. Samar will be presenting a poster produced in collaboration with Merck. The poster titled “Application of Deep Learning Imputation to Peptide Bioactivity and Property Prediction” features work carried out using CerellaTM , a proven tool for improving drug discovery.

Come and meet the team to learn about how Optibrium’s range of drug discovery software can increase the speed, efficiency, and productivity of your chemistry discovery programs.

Abstract:

Predicting bioactivities and properties is pivotal for advancing peptide-based therapeutics. It would
facilitate the reduction of time and cost in experimentation through rapid virtual screening of large
peptide libraries and prioritizing candidates with high target activity, and other favourable properties
such as better stability, higher permeability etc. However, accurate prediction of peptide properties
remains a significant challenge due to the complex structure-activity relationships between peptide
sequence, conformation, and diverse biological activities, which current computational models often
struggle to fully capture. Additionally, limited experimental resources and/or diverse peptide libraries
hamper comprehensive testing and characterisation, leading to sparse and noisy datasets.

In this context, we introduce the application of a deep learning platform, which employs an imputation
approach to derive valuable insights from limited and noisy experimental data. Imputation considers
correlations between different experimental endpoints, in addition to their correlations with
molecular descriptors, as considered by conventional quantitative structure-activity (QSAR) models
[1]. This enables the platform to use limited data from early experimental measurements to more
accurately predict downstream, expensive experimental outcomes that are intractable with QSAR
methods. We show that our platform significantly outperforms conventional machine learning
methods that predict properties based on molecular descriptors alone, highlighting the benefit of
imputation. Various descriptors that capture characteristics of peptide molecules were assessed,
including whole molecule descriptors, position specific sequence-based descriptors, and 3D
structure-based descriptors. The models achieved a median coefficient of determination (R2) of 0.74
vis-à-vis the median R2 of 0.58 or 0.59 achieved using other methods on datasets comprising
measurements from 30-40 experimental assays.

We further explore the low-cost experiments that aid in imputing low-throughput, high-cost experiments identified by the platform. Such insights guide the prioritisation of running the low-cost yet informative experimental measurements. We discuss the significance of our approach, underscoring its potential to improve peptide modelling in drug discovery.

[1] Irwin, B. W. J., Whitehead, T. M., Rowland, S., Mahmoud, S. Y., Conduit, G. J., and Segall, M. D.
(2021). Deep imputation on large scale drug discovery data. Appl. AI Lett. 2, e31. doi:10.1002/ail2.31

Meet the team

Samar Mahmoud, PhD

Principal Scientist in Cheminformatics

LinkedIn

Samar Mahmoud

Scott McDonald

Director Of Business Development

LinkedIn

Scott McDonald

Related content from across the site

7th RSC-BMCS AI in chemistry conference

The Chemical Information & Computer Applications Group (CICAG) and Biological & Medicinal Chemistry Sector (BMCS) of the Royal Society of Chemistry are once again organising a conference to present the current advances in AI and machine learning in Chemistry.