Welcome to the Optibrium Community

Forgot login?

Publications & Presentations



Latest Publications & Presentations

Data Imputation through Deep Learning

Monday, 09 November 2020 10:25

This article was published in Innovations in Pharmaceutical Technology magazine, Autumn/Winter 2020

Data Imputation through Deep Learning: Expanding the data available for drug discovery by up to 100x
Matthew Segall*, Benedict Irwin*, Thomas Whitehead†, Samar Mahmoud*, Greg Shields*, Graham Turner*, Alex Elliott*, Stefan-Bogdan Marcu*, Robert Parini†, Edmund Champness*, Gareth Conduit†‡
*Optibrium Ltd., Cambridge, UK, info@optibrium.com, † Intellegens Ltd., Cambridge, UK, info@intellegens.ai, ‡Cavendish Laboratory, University of Cambridge, Cambridge UK


Machine learning (ML) methods are routinely used in drug discovery to build models that can predict the properties of compounds directly from their chemical structure. These quantitative structure-activity relationship (QSAR) models take ‘features’ of chemical structures (often referred to as ‘descriptors’) as input to predict one or more properties, including activities against biological targets or in phenotypic assays and a broad range of absorption, distribution, metabolism, excretion and toxicity (ADMET) properties. However, even the most sophisticated ML methods can struggle to produce high-quality predictions, due to the limitations of drug discovery data: The number of compounds with data for any given experimental endpoint is small when compared with machine learning data sets in many other fields; the overlap of compounds measured in different endpoints is even smaller; and the data generated by biological assays are noisy due to experimental variability.

Imputation methods take a different approach, using the limited property data that are available as inputs, to ‘fill in the gaps’ where measured values are not yet available. An example of an imputation method is Alchemite™, which applies deep learning to both compound descriptors and sparse assay data, as illustrated in Figure 1. The resulting model ‘learns’ directly from correlations between experimental endpoints, in addition to relationships between structural features of compounds and the experimental data. This approach makes better use of the sparse and noisy data in drug discovery, to produce more accurate predictions than QSAR models, which enables better targeting of the most promising compounds.

You can download this article as a PDF


Predicting Reactivity to Drug Metabolism: Beyond P450s – Modelling FMOs and UGTs

Tuesday, 02 June 2020 12:33

Preprint Paper.


We present a study based on density functional theory calculations to explore the rate limiting steps of product formation for oxidation by Flavin-containing Monooxygenase (FMO) and glucuronidation by the UDP-glucuronosyltransferase (UGT) family of enzymes. FMOs are responsible for the modification phase of metabolism of a wide diversity of drugs, working in conjunction with Cytochrome P450 (CYP) family of enzymes, and UGTs are the most important class of drug conjugation enzymes. Reactivity calculations are important for prediction of metabolism by CYPs and reactivity alone explains around 70 – 85 per cent of the experimentally observed sites of metabolism within CYP substrates. In the current work we extend this approach to propose model systems which can be used to calculate the activation energies, i.e. reactivity, for the rate-limiting steps for both FMO oxidation and glucuronidation of potential sites of metabolism. These results are validated by comparison with the experimentally observed reaction rates and sites of metabolism, indicating that the presented models are suitable to provide the basis of a reactivity component within generalizable models to predict either FMO or UGT metabolism.

You can download this paper as a PDF


Advances in the Study of Drug Metabolism

Thursday, 28 May 2020 15:34


Advances in the Study of Drug Metabolism – symposium report of the 12th Meeting of the International Society for the Study of Xenobiotics (ISSX)
Published by Taylor & Francis; Drug Metabolism Reviews

ISSN: 0360-2532 (Print) 1097-9883 (Online) https://www.tandfonline.com/doi/full/10.1080/03602532.2020.1765793


The 12th International Society for the Study of Xenobiotics (ISSX) meeting, held in Portland, OR, USA from July 28 to 31, 2019, was attended by diverse members of the pharmaceutical sciences community. The ISSX New Investigators Group provides learning and professional growth opportunities for student and early career members of ISSX. To share meeting content with those who were unable to attend, the ISSX New Investigators herein elected to highlight the “Advances in the Study of Drug Metabolism” symposium, as it engaged attendees with diverse backgrounds.
This session covered a wide range of current topics in drug metabolism research including predicting sites and routes of metabolism, metabolite identification, ligand docking, and medicinal and natural products chemistry, and highlighted approaches complemented by computational modeling. In silico tools have been increasingly applied in both academic and industrial settings, alongside traditional and evolving in vitro techniques, to strengthen and streamline pharmaceutical research. Approaches such as quantum mechanics simulations facilitate understanding of reaction energetics toward prediction of routes and sites of drug metabolism. Furthermore, in tandem with crystallographic and orthogonal wet lab techniques for structural validation of drug metabolizing enzymes, in silico models can aid understanding of substrate recognition by particular enzymes, identify metabolic soft spots and predict toxic metabolites for improved molecular design. Of note, integration of chemical synthesis and biosynthesis using natural products remains an important approach for identifying new chemical scaffolds in drug discovery. These subjects, compiled by the symposium organizers, presenters, and the ISSX New Investigators Group, are discussed in this review.

You may obtain an eprint


Translational Toxicology: Data Visualisation Across Phases

Wednesday, 13 May 2020 08:49


Translational Toxicology: Data Visualisation Across Phases
Presented by Aishling Cooke


This presentation introduces our work on a visualisation application for the EU project, eTRANSAFE (https://etransafe.eu/) - a consortium that aims to build a large software system for investigating translational toxicology. It centres on the process of designing visualisation software for translational toxicology, with particular reference to the challenges that the many different sources of toxicology data pose - how can you present toxicology data from heterogeneous data sources together in visualisations to allow interesting patterns to be seen?
Toxicology data are available from a wide range of sources from pre-clinical trials to post-marketing adverse event reports. These data are in different formats and contain information at different granularities, thus creating a tool to allow the user to visualise these together is complex. We will discuss these challenges and considerations in developing a visualisation tool which allows users to explore these data concurrently, without intensive transformation of the data by the end-user, and conclude with a demonstration of our current version to show examples of the functionality thus far.

You can download this presentation as a PDF


Imputation versus prediction: applications in machine learning for drug discovery

Monday, 04 May 2020 10:38


This article was published in Future Drug Discovery : Vol.2, NO.2

Imputation Versus Prediction: Applications in Machine Learning for Drug Discovery
Benedict W J Irwin, Samar Mahmoud, Thomas M Whitehead, Gareth J Conduit & Matthew D Segall


Imputation is a powerful statistical method that is distinct from the predictive modelling techniques more commonly used in drug discovery. Imputation uses sparse experimental data in an incomplete dataset to predict missing values by leveraging correlations between experimental assays. This contrasts with quantitative structure–activity relationship methods that use only descriptor – assay correlations. We summarize three recent imputation strategies – heterogeneous deep imputation, assay profile methods and matrix factorization – and compare these with quantitative structure–activity relationship methods, including deep learning, in drug discovery settings. We comment on the value added by imputation methods when used in an ongoing project and find that imputation produces stronger models, earlier in the project, over activity and absorption, distribution, metabolism and elimination end points.

You can link to Future Drug Discovery here


Practical Applications of Deep Learning to Impute Heterogeneous Drug Discovery Data

Wednesday, 29 April 2020 13:59

Preprint Paper.


Contemporary deep learning approaches still struggle to bring a useful improvement in the field of drug discovery due to the challenges of sparse, noisy and heterogeneous data that are typically encountered in this context. We use a state-of-the-art deep learning method, Alchemite™, to impute data from drug discovery projects, including multi-target biochemical activities, phenotypic activities in cell-based assays, and a variety of absorption, distribution, metabolism, and excretion (ADME) endpoints. The resulting model gives excellent predictions for activity and ADME endpoints, offering an average increase in R2 of 0.22 versus quantitative structure-activity relationship methods. The model accuracy is robust to combining data across uncorrelated endpoints and projects with different chemical spaces, enabling a single model to be trained for all compounds and endpoints. We demonstrate improvements in accuracy on the latest chemistry and data when updating models with new data as an ongoing medicinal chemistry project progresses.

You can download this paper as a PDF
You can download the supplementary material as a PDF


Predicting pKa Using a Combination of Quantum Mechanical and Machine Learning Methods

Monday, 18 November 2019 13:23

Journal of Chemical Information and Modeling. Publication Date (Web):May 1, 2020

Peter Hunt1, Layla Hosseini-Gerami2, Tomas Chrien1, Jeffrey Plante3, David J. Ponting3, Matthew Segall1

1Optibrium Ltd. 2Department of Chemistry, Cambridge. 3Lhasa Ltd


The acid dissociation constant (pKa) has an important influence on molecular properties crucial to compound development in synthesis, formulation and optimisation of absorption, distribution, metabolism and excretion properties. We will present a method that combines quantum mechanical calculations, at a semi-empirical level of theory, with machine learning to accurately predict pKa for a diverse range of mono- and polyprotic compounds. The resulting model has been tested on two external data sets, one specifically used to test pKa prediction methods (SAMPL6) and the second covering known drugs containing basic functionalities. Both sets were predicted with excellent accuracy (root-mean-square errors of 0.7 – 1.0 log units), comparable to other methodologies using much higher level of theory and computational cost.

You can download the paper as a PDF
You can download the supplementary material as a PDF


Design of Drug-like Hepsin Inhibitors against Prostate Cancer and Kidney Stones

Friday, 11 October 2019 10:53

This article was published in ACTA Pharmaceutica Sinica B; V.9 No.5 September 2019; including results from StarDrop.

Vincent Blay, Mu-Chun Li, Sunita P. Ho, Mashall L. Stoller, Hsing-Pang Hsieh, Douglas R. Houston


Hepsin, a transmembrane serine protease abundant in renal endothelial cells, is a promising therapeutic target against several cancers, particularly prostate cancer. It is involved in the release and polymerization of uromodulin in the urine, which plays a role in kidney stone formation. In this work, we design new potential hepsin inhibitors for high activity, improved specificity towards hepsin, and promising ADMET properties. The ligands were developed in silico through a novel hierarchical pipeline...

You can link to ACTA Pharmaceutica Sinica B here


Practical Applications of Deep Learning to Imputation of Drug Discovery Data

Wednesday, 04 September 2019 15:03

Presented by Ben Irwin, on 28 August 2019 at the ACS National Meeting and Exposition in San Diego, USA

Presentation Overview

Problems with pharma data − Define solutions to these problems

Alchemite : A novel deep learning algorithm for imputation − Imputation = Filling in the blanks

Walkthrough deep learning imputation on a real project − Early screen data − Validation − Late stage models − Comparison with standard QSAR methods

Larger applications and future prospects


You can download the presentation slides as a PDF


Mechanism and Prediction of UGT Metabolism

Wednesday, 04 September 2019 09:10

Presented by Mario Oeren, on 27 August 2019 at the ACS National Meeting and Expo in San Diego, USA

Presentation Overview

UGT metabolism − A short overview

Mechanistic studiesAb initio − Semi empirical

QSAR models − Results from mechanistic studies − Steric and orientation descriptors



You can download the presentation slides as a PDF


Predicting pKa Using a Combination of Quantum and Machine Learning Methods

Thursday, 01 August 2019 14:22

This poster was presented at the 12th International ISSX Meeting, 28-31 July 2019

Peter Hunt, Layla Hosseini-Gerami, Tomas Chrien, Matthew Segall

The dissociation of a proton from a heteroatom has a significant impact on the charge distribution and interactions of a molecule. These influence many important molecular properties, including binding to target and off-target proteins, absorption, distribution, metabolism and excretion (ADME) and pharmacokinetic (PK) properties such as solubility, tissue or cellular distribution and permeability. Therefore, the ability to predict the propensity of a molecule to lose or gain a proton in water is crucial for the development of new chemical entities with desirable PK, ADME and binding properties.

You can download the poster as a PDF


Predicting Routes, Sites and Products of Drug Metabolism

Wednesday, 31 July 2019 09:10

Presented by Matt Segall at 12th International ISSX Meeting 2019, Oregon, USA

Presentation Overview

  • Approaches to predicting metabolism − Empirical vs mechanistic
  • Predicting P450 metabolism − P450 regioselectivity − WhichP450
  • Beyond P450s − Flavin containing monooxygenases (FMO) − UDP glucuronosyltrasfreases (UGT)
  • Conclusions


    You can download the presentation slides as a PDF

  •

    A Single Deep Learning Model for Confident Imputation of Heterogeneous Drug Discovery Endpoints

    Wednesday, 24 July 2019 10:43

    This poster was presented at the Gordon Research Conference, Integrating Big Data and Macromolecular Protein Structures into Small Molecule Design; 14-19 July 2019

    Benedict Irwin, Julian Levell, Thomas Whitehead, Matthew Segall, Gareth Conduit

    We have previously described a novel deep learning method for data imputation, Alchemite™ (Whitehead et al J Chem Inf Model (2019) 59 pp 1197-1204). This accepts both molecular descriptors and sparse experimental data as inputs, to exploit the correlations between experimentally measured endpoints, as well as structure activity relationships (SAR). It has been demonstrated to outperform quantitative SAR (QSAR) models, including multi-target deep learning methods, on a challenging benchmark data set of compound bioactivities. Here we will describe the application and validation of this method on drug discovery data covering two projects and diverse endpoints, including activities in both biochemical and cellular assays and absorption, distribution, metabolism and elimination (ADME) endpoints.

    You can download the poster as a PDF


    N- and S-Oxidation Model of the Flavin-containing Monooxygenases

    Wednesday, 03 July 2019 14:44

    This poster was presented at the Eighth Joint Sheffield Conference on Chemoinformatics; 17-19 June 2019

    Peter Walton, Mario Öeren, Peter Hunt, Matthew Segall

    Existing computational models of drug metabolism are heavily focused on predicting oxidation by cytochrome P450 (CYP) enzymes, because of their importance in phase I drug metabolism, reactive metabolite formation, and drug-drug interactions. Due, in part, to the success of these models, new drug candidates are typically well-optimised with respect to CYP metabolism However, novel metabolites are observed due to other, less-studied, enzyme families such as the flavin containing monooxygenases (FMOs) are found in multiple tissues, including the liver, and have five active isoforms (FMO 1-5). In common with CYPs, FMOs are responsible for phase I, oxidative metabolism and catalyse a variety of reaction types, including N- and S-oxidation, demethylation, desulphuration and Bayer-Villiger oxidation.

    The objective of this study was to elucidate the reaction mechanism of FMO-mediated oxidation to inform the development of models to predict the metabolism of novel substrates.

    You can download the poster as a PDF


    Drug Discovery Today: Capturing and Applying Knowledge to Guide Compound Optimisation

    Wednesday, 19 June 2019 10:42

    This article was published in Drug Discovery Today; V24 No.5 May 2019

    Matthew Segall, Tamsin Mansley, Peter Hunt, Edmund Champness

    Successful drug discovery requires knowledge and experience across many disciplines, and no current 'artificial intelligence' (AI) method can replace expert scientists. However, computers can recall more information than any individual or team and facilitate the transfer of knowledge across disciplines. Here, we discuss how knowledge relating to chemistry and the biological and physicochemical properties required for a successful compound can be captured. Furthermore, we illustrate how, by combining and applying this knowledge computationally, a broader range of optimisation strategies can be rigorously explored, and the results presented in an intuitive way for consideration by the experts.

    You can download the Drug Discovery Today article as a PDF


    Imputing Compound Activities Based on Sparse and Noisy Data

    Monday, 08 April 2019 10:35

    Presented by Matt Segall at ACS 2019, Orlando, Florida

    Thomas Whitehead†, Matthew Segall*, Benedict Irwin*, Peter Hunt*, Gareth Conduit† (*Optibrium Ltd., †Intellegens)


    New results show the increase in accuracy by focussing on the most confident results as a reduction in RMSE, instead of increase in R^2, following feedback from earlier presentations; and we also illustrate the application of the Alchemite™ model to virtual compounds, i.e. based only on molecular descriptors. This shows it is equivalent in performance to a conventional multi-target DNN, but also retains the ability to focus the most accurate results based on the confidence in the model predictions.

    Learn more about Alchemite, a novel deep learning algorithm. Unlike many deep learning methods, this approach is capable of being trained using sparse and variable input data, typical of those available in drug discovery. This enables Alchemite to learn from correlations between experimental endpoints, as well as between molecular descriptors and protein activities, to more accurately impute the missing activities.

    You can download the presentation slides as a PDF


    N- and S-Oxidation Model of the Flavin-containing MonoOxygenases

    Wednesday, 27 March 2019 15:32

    At the American Chemical Society National Meeting and Expo in Orlando, Florida; Peter Walton presented his research entitled ‘N- and S-Oxidation model of the Flavin-containing Monooxygenases’. The presentation covers the work he and his colleagues have undertaken to determine how the Flavin-containing MonoOxygenase group of enzymes work to metabolise compounds. Extensive computational tests support their theory concerning the reaction mechanism and the results can be used to predict the likely metabolites of a wide variety of drugs.

    You can download the slides here as a PDF


    New UK Collaborative Uses AI to Predict Missing Data Points in Compound Data

    Wednesday, 13 March 2019 16:52

    A new UK collaboration focuses on taking sparse data – data where a significant amount of points are missing from the complete sets – or “noisy” data – data where a significant amount of variables could contribute to issues and changes in results – and making predictive models that fill in missing points with degrees of certainty and without having to undergo costly experimentation.

    You can download the article here as a PDF

    You can link to Rx Data here


    SBDD From a Diversified NP-Inspired Chemical Space

    Wednesday, 13 March 2019 11:44

    At the 2019 Streamlining Drug Discovery Symposium in Frankfurt, Didier Roche from Edelris presented 'SBDD From a Diversified NP-Inspired Chemical Space'.

    You can download the presentation slides here as a PDF


    Turning High Quality Data Into Actionable Insights

    Friday, 22 February 2019 14:33

    At the 2019 Streamlining Drug Discovery Symposium in Frankfurt, Rosalind Sankey (Elsevier) presented 'Helping Medicinal Chemists Identify New Opportunities during Lead ID and Optimisation - Turning High Quality Data into Actionable Insights'.

    You can download the presentation slides here as a PDF


    Latest Forums

    Read more >

    Popular Downloads

    Read more >