Blood-brain barrier (BBB) penetration is a measure of the ratio between the compound concentration in brain and blood. Good BBB penetration is required for compounds intended for targets in the central nervous system (CNS). Alternatively, for peripheral targets, poor BBB penetration reduces the risk of CNS side effects.

Shen et al. [J. Chem. Inf. Model. 2010,50( 6) pp. 1034-1] published a paper describing the generation and validation of QSAR models of HIA and blood-brain-barrier penetration (BBB). The data sets with which these models were built and validated were provided in the supplementary information to this paper and the HIA data have been used to build the models described herein. Models of the HIA data from this article are described in another article.

Full details of the data, methods, results and use of these models are provided below.

Data

Shen et al. published a paper describing the generation and validation of QSAR models of human intestinal absorption (HIA) and blood-brain-barrier penetration (BBB) 1. The data sets with which these models were built and validated were provided in the supplementary information.

Shen et al. used a data set of 1593 compounds classified as high or low brain penetration. This was divided into a training set of 1093 compounds (832 high and 261 low) and a test set of 500 compounds (451 high and 49 low). An additional external test set of 246 compounds (155 high and 91 low) was also identified from a paper published by Li et al. 2.

As discussed below, it was notable that models built with the training set selected in 1 performed poorly on the external test set from 2, suggesting that the compound structures in the external test set are not well represented in the training set. Therefore, to ensure as broad a coverage of chemical diversity as possible, the three sets were combined to create a single set containing 1838 compounds and split using StarDrop’s Auto-Modeller into training, validation and test sets in the proportions 70:15:15, using a Y-based sampling. The resulting data sets are summarised in Table 1 and are included in the supporting information, as described below.

Data setNumber HighNumber Low
Training1042246
Validation20174
Test19481
Table 1 Overview of data set split generated using the Auto-Modeller.

Methods

The Auto-Modeller was applied the to the original data sets provided by Shen et al. [1] to allow direct comparisons with the models generated in this paper. The Auto-Modeller was subsequently applied to the revised data set split, as described above.

In both cases, the default descriptors and parameters for descriptor selection were used and models were generated using the decision tree (DT) and random forest (RF) methods.

Details of the parameters and descriptors used are provided in the supporting information, as described below.

Results

Shen et al. data set split

Table 2 Results of RF model generated with the Auto-Modeller using data set split in [1] and compared with the bet models published in [1] and [3]. TP is number of true positives, TN is number of true negatives, FP is number of false positives and FN is number of false negatives. k is the kappa statistic as described in Section 6.8.7 of the StarDrop Reference Guide.

ModelTrainingTestExternal test
TPTNFPFNkTPTNFPFNkTPTNFPFNk
Shen et al.832258300.9944942720.89149197260.20
Zhao et al.81524615170.9244343680.84
Auto-Modeller (RF)832261001449381110.851452764100.26
Table 2 shows a comparison of the random forests (RF) model generated by the Auto-Modeller with the best model of Shen et al. and a model of the same data set reported in a previous work by Zhao et al. 3. Revised Data Set Split

As noted above, the poor performance on the external test set of both the models generated by Shen et al. and the RF model generated by the Auto-Modeller suggests that the compound structures in the external test set are not well represented in the training set. Therefore, the Auto-Modeller was applied to the revised data set split, as described above. The best model resulting from this split was a RF model and its performance is summarised in Table 3.

Table 3 Results of RF model generated with the Auto-Modeller the revised data set split. TP is number of true positives, TN is number of true negatives, FP is number of false positives and FN is number of false negatives. k is the kappa statistic as described in Section 6.8.7 of the StarDrop Reference Guide.

ModelTrainingValidationExternal test
TPTNFPFNkTPTNFPFNkTPTNFPFNk
Auto-Modeller (RF)1040246011.00200561810.93194661500.95
Table 3 Results of RF model generated with the Auto-Modeller the revised data set split. TP is number of true positives, TN is number of true negatives, FP is number of false positives and FN is number of false negatives. k is the kappa statistic as described in Section 6.8.7 of the StarDrop Reference Guide.

The results for this model are excellent, with a k statistic above 0.9 on both the validation and test set.

Installing and using the model

BBB Shen training model

Download BBB Shen training model

The RF model generated with the data set split in Shen et al. 1

BBB Shen full set model

Download BBB Shen full set model

The RF model generated using the revised split of the full set published in Shen et al. 1

Supporting information

The data sets and detailed outputs from the modelling process may be downloaded.

Download data set and outputs

Installation files

BBB Shen training model

Download BBB Shen training model

The RF model generated with the data set split in Shen et al. 1

BBB Shen full set model

Download BBB Shen full set model

The RF model generated using the revised split of the full set published in Shen et al. 1

Supporting information

The data sets and detailed outputs from the modelling process may be downloaded.

Download data set and outputs

How to use the models

To use these within StarDrop, download and save these files in a convenient place.

Load them into StarDrop using the file open button on the bottom left of the Models tab.

Alternatively, the directory in which the model files have been saved can be added to the paths from which models are automatically loaded when StarDrop starts by selecting the File->Preference menu option and adding the directory under Models in the File Locations tab.

References

[1] Shen et al. J. Chem. Inf. Model. 2010,50( 6) pp. 1034-1041

[2] Li et al. J. Chem. Inf. Model. 2005, 45(5), 1376–1384

[3] Zhao et al. J. Chem. Inf. Model. 2007, 47(1), pp. 170–175