The volume of distribution (VDss) is an in vivo pharmacokinetic parameter representing the hypothetical volume into which the dose of drug would have to be evenly distributed to give rise to the same concentration observed in the blood plasma. This provides an indication of the distribution of the drug in the body: A low VDss indicates high water solubility or high plasma protein binding, because more of the drug remains in the plasma; a high VDss suggests significant concentration in the tissues, for example due to tissue binding or high lipid solubility.

Here we describe models of VDss that can be downloaded for use within StarDrop, built with StarDrop’s Auto-Modeller and based on data published by Gombar and Hall 1.

Data

Gobar and Hall published an article describing the building and validation of models of human, clinical pharmacokinetic parameters, namely VDss and clearance. The data sets with which these models were built and validated were provided in the supplementary information to their paper 1.

The models of VDss described by Gombar and Hall were trained with a set containing 569 compounds with published clinical data. For the purposes of building and validating models within the StarDrop Auto-Modeller, this data set was divided into independent training, validation and test sets containing 399, 85 and 85 compounds respectively. The set split was performed using the Auto-Modeller’s default clustering method with a cluster Tanimoto index of 0.7 (see the StarDrop Reference Guide for more details). The VDss data was transformed into log units, in common with the approach of Gombar and Hall.

The VDss models built by Gombar and Hall were tested using two external test sets: 22 compounds obtained from a paper by Berelini et al. 2 and 9 compounds published by Poulin and Theil 3. These data sets were also used as external test sets in this work, to allow direct comparison with the models of Gombar and Hall.

Methods

The Auto-Modeller was applied the to the training, validation and test sets as described above. The default descriptors and parameters for descriptor selection were used and models were generated using the partial least squares (PLS), radial basis functions (RBF), random forests (RF), and four Gaussian Processes methods (GPFixed, GP2DSearch GPRFVS and GPOpt).

Details of the parameters and descriptors used are provided in the supporting information, which can be downloaded as described below.

Results

The performance of the models built with the Auto-Modeller are shown in the table below (only the best of the Gaussian Processes models is shown):

ModelTraining setValidation setTest set
R2 (log units)RMSE (log units)Med FDMax FD% <3FDR2 (log units)RMSE (log units)Med FDMax FD% <3FDR2 (log units)RMSE (log units)Med FDMax FD% <3FD
PLS0.430.471.9149740.420.472.129740.520.431.63773
RBFN/AN/AN/AN/AN/A0.670.361.69810.680.351.41284
RF0.910.191.37980.620.381.510760.630.371.51684
GPFixed0.730.321.549880.620.381.811760.640.371.61482
R2 = coefficient of determination, RMSE = Root Mean Square Error, Med FD = Median Fold Difference, Max FD = maximum fold difference, %<3FD = percentage less than 3-fold different

These models cannot be compared directly with the models generated by Gombar and Hall on the basis of these results, because only the performance of the model trained with the full data set of 569 compounds is reported in reference 1. However, for reference, the authors report a model trained with support vector regression (SVR) had a median fold deviation of 1.62 and a maximum observed deviation of 8.86-fold on the training set. Gombar and Hall also report a multiple linear regression (MLR) model trained with 560 compounds (after removing outliers) had an R2 of 0.78 on the training set.

To allow direct comparison of the models generated with the Auto-Modeller and previously published models, the results of applying the models to the independent test set derived from Berellini et al. 2 are shown in the table below:

ModelRMSE
(log units)
Med FDMax FD% <3FD
PLS0.361.75.777
RBF0.291.74.591
RF0.301.84.391
GP Fixed0.351.66.882
Gombar and Hall SVR0.351.94.686
Gombar and Hall MLR0.632.17859
RMSE = Root Mean Square Error, Med FD = Median Fold Difference, Max FD = maximum fold difference, %<3FD = percentage less than 3-fold different

For further comparison, the results of applying the models to the independent test sets derived from Poulin and Theil 3 are shown in the table below:

ModelRMSE
(log units)
Med FDMax FD% <3FD
PLS0.161.31.9100
RBF0.151.22.0100
RF0.161.31.8100
GP Fixed0.181.42.0100
Gombar and Hall SVR0.201.62.1100
Gombar and Hall MLR0.311.43.978
Poulin and Theil0.181.22.9100
RMSE = Root Mean Square Error, Med FD = Median Fold Difference, Max FD = maximum fold difference, %<3FD = percentage less than 3-fold different

The distribution of observed VDss values in the Poulin and Theil set is too narrow to allow a meaningful coefficient of determination (R2) to be calculated. Therefore, the two independent test sets were combined and the resulting R2 values are shown in the following table:

ModelR2
(log units)
PLS0.40
RBF0.59
RF0.56
GP Fixed0.39
Gombar and Hall SVR0.40
Gombar and Hall MLR-0.89
R2 = coefficient of determination

Based on the results above, the RBF model appears to have the best overall performance of the StarDrop models. However, the RF and GPFixed models also show good performance and may be worth considering. The GPFixed model offers the advantage of producing an estimate of the uncertainty in each prediction on a compound-by-compound basis, although it is notable that the performance on the Berellini and Poulin and Theil data sets is inferior to the RBF and RF models.

Installing and using the models

RBF model

Download RBF model

GPFixed model

Download GPFixed model

PLS model

Download PLS model

Supporting information

The data sets and detailed outputs from the modelling process may be downloaded

Download data set and outputs

Installation files

RBF model

Download RBF model

GPFixed model

Download GPFixed model

PLS model

Download PLS model

Supporting information

The data sets and detailed outputs from the modelling process may be downloaded

Download data set and outputs

How to use the model

To use these models within StarDrop, download and save the model in a convenient place.

Load the model into StarDrop using the open button on the bottom left of the Models tab.

Alternatively, the directory in which the model file has been saved can be added to the paths from which models are automatically loaded when StarDrop starts by selecting the File->Preference menu option and adding the directory under Models in the File Locations tab.

The models predict the logarithm of VDss in L/kg. To convert this to a VDss in L/kg, use the mathematical function tool in StarDrop ( f(x) on the toolbar) and enter one of the following equations:

Model Equation
RBF pow(10, {log(VDss) RBF})
RF pow(10, {log(VDss) RF})
GPFixed pow(10, {log(VDss) GPFixed})
PLS pow(10, {log(VDss) PLS})

References

1 Gombar VK, Hall SD. Quantitative Structure−Activity Relationship Models of Clinical Pharmacokinetics: Clearance and Volume of Distribution. J. Chem. Inf. Model. 2013;53(4):948–957.

2 Berellini G, Springer C, Waters NJ, Lombardo F. In silico Prediction of Volume of Distribution in Human Using Linear and Nonlinear Models on a 669 Compound Data Set. J. Med. Chem. 2009;52(14):4488-4495.

3 Poulin P, Theil FP. Prediction of Pharmacokinetics Prior to In Vivo Studies. 1. Mechanism-Based Prediction of Volume of Distribution. J. Pharm. Sci. 2002;91(1):129-156.