This website stores cookies on your computer. These cookies collect information about how you interact with our website and allow us to remember you. We use this information to improve and customise your browsing experience and for analytics and metrics about our visitors on this website and other media. To find out more about the cookies we use, see our Privacy Policy.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference not to be tracked.
Predicting sites of metabolism (SoM) enable chemists to be more efficient in optimising the structure of new chemical entities and helps them to identify potentially toxic metabolites early in a project. Historically, predictive models have focused on human isoforms of the Cytochrome P450 (CYP) family of enzymes due to their primary importance in the metabolism of drug like compounds. However, predictive models for other enzymes, e. g., Aldehyde Oxidases (AOs) Flavin-containing monooxygenases (FMOs) and Uridine 5′-diphospho-glucuronosyl-transferases (UGTs) are increasing in prevalence.1,2
Here, we present models that predict the regioselectivity of metabolism for isoforms relevant to the metabolism of drug-like compounds in humans: AO1, FMO1, FMO3, UGT1A1, UGT1A4, UGT1A9, and UGT2B7.
Reactivity Accessibility Models
Our approach combines a mechanistic element to estimate the reactivity of potential sites of metabolism with a machine learning model to capture steric and orientation effects (accessibility) within the active site. The reactivity of a potential SoM is described using quantum mechanical calculations that estimate the activation energy (Ea) of product formation. The accessibility descriptors capture distances from the potential SoM to specified functional groups (e.g., acidic and basic groups) as counts of bonds. The reactivity and accessibility descriptors for each potential SoM are then associated with the data from the experiments to build quantitative structure-activity relationship models.
Experimental Data
The data is curated from public sources that provide detailed information on the experimentally observed SoM. Since the models are intended to distinguish the observed SoM from all the potential SoM, the molecules included in the datasets have two or more potential SoM, out of which at least one is experimentally observed to be metabolised. Each potential SoM on a molecule was labelled as either experimentally observed or not by the corresponding isoform.
Studies Using Density Functional Theory (DFT)
For DFT calculations, the simulated systems must be small, but retain their chemical characteristics. We tested a series of simplifications for AO, FMO2 and UGT2 enzymes to ensure that the reactivity of the reaction centres was not significantly modified. The substrate structures were not simplified, to ensure that long-range effects within the compounds were considered. The results were validated using experimental data on site-specific rates of metabolism.
Studies Using Semi-empirical Methods
DFT calculations can take hours to days. Therefore, to calculate reactivity within a reasonable timeframe, we use semi-empirical methods, reducing the calculation time to minutes. However, semi-empirical methods are known to introduce systematic errors depending on the environment of the SoM, which must be corrected to achieve accurate predictions.
QSAR Models
For small data sets, the data for each isoform was split into training and test sets (80:20). For larger data sets, the data was split into training, validation and test sets (70:15:15). The split was made randomly by compound; thus, all potential SoM of one substrate were in the same subset set. The models were trained using Gaussian Processes (GP) method in StarDrop.
Conclusions
The presented work adds seven novel models which predict the regioselectivity of metabolism for relevant enzyme families and isoforms for metabolism of drug-like compounds. The models show excellent performance for the prediction of the primary SoM. In combination with the existing CYP models, we can cover the majority of observed metabolic pathways.
The Chemical Information & Computer Applications Group (CICAG) and Biological & Medicinal Chemistry Sector (BMCS) of the Royal Society of Chemistry are once again organising a conference to present the current advances in AI and machine learning in Chemistry.