This model predicts aquatic toxicity of a compound against Tetrahymena pyriformis expressed as pIGC50 (=-log IGC50). Built using the StarDrop Auto-Modeller, this model is based upon literature data and can be downloaded and used with StarDrop 4.2 and onwards.
Summary
This model predicts aquatic toxicity of a compound against Tetrahymena pyriformis expressed as pIGC50 (=-log IGC50).
Data source
To build this model we used the data set reported by Zhu et al.1 The data set consists of 1203 diverse organic compounds with experimentally measured toxicity against Tetrahymena pyriformis. Zhu et al.1 reported QSAR modelling results of this set by six academic groups. Each group used its own QSAR approaches but applied their procedure to the same training set (644 compounds) to build a model and validated their predictive models on same validation set I (449 compounds) and validation set II (110 compounds). We used the same data set split on the three subsets to build our model as in the paper1.
Model details
StarDrop’s Auto-Modeller was applied to the data set using predefined split into the training, validation and test sets. The other parameters were based upon the default software values. The best model was produced by the Gaussian Processes technique with forward variable selection (GPFVS). The model uses 73 StarDrop descriptors including logP, molecular weight, PSA, charge descriptors and various counts of atoms and functional groups. The performance of the model is summarised in Table 1.
Number of compounds | R2 | RMSE | |
---|---|---|---|
Training | 644 | 0.92 | 0.30 |
Validation (val I) | 339 | 0.87 | 0.38 |
Test (val II) | 110 | 0.70 | 0.49 |
The graphs of predicted versus observed pIGC50 values for the validation and test sets are given in Figures 1 and 2.
Uncertainty in prediction
Together with each prediction the model provides an individual uncertainty of prediction, the standard deviation in prediction. The uncertainty value σ given by the model is suitable for a new molecule with an unknown observed pIGC50 value. If an experimental value for the molecule is known and is compared with the predicted value, then the standard deviation should also include the noise in the observed values.
In this model the estimated standard deviation of error in the observed values was determined as 0.346 (the level of noise present in the training and validation sets). Therefore, for a molecule in the validation set the uncertainty in prediction will be equal to
Comparison to other toxicity models
Papers 1 and 2 give a detailed description of QSAR toxicity models built on this data set, compare their performance and discuss applicability domain issues. Briefly, the R² on the validation set (val I) ranges from 0.71 to 0.87 and the R² on the test set (val II) ranges from 0.38 to 0.83. Our model performs well in comparison with these models. In performance against the validation set it matched the best model published in 1.
Installing and using the model
Model file
Save the model file into the StarDrop model files directory (in a default installation this will be “C:\Program Files\StarDrop\modelfiles”).
When you start StarDrop the model will appear in the list of available models.
Installation files
Model file
How to use the model
Save the model file into the StarDrop model files directory (in a default installation this will be “C:\Program Files\StarDrop\modelfiles”).
When you start StarDrop the model will appear in the list of available models.
References
- H. Zhu, A. Tropsha, D. Fourches, A. Varnek, E. Papa, P. Gramatica, T. Öberg, P. Dao, A. Cherkasov and I.V. Tetko. Combinatorial QSAR Modeling of Chemical Toxicants Tested against Tetrahymena pyriformis. J. Chem. Inf. Model., 2008, 48 (4), pp 766–784.
- I.V. Tetko, I. Sushko, A.K. Pandey, H. Zhu, A. Tropsha, E. Papa, T. Öberg, R. Todeschini, D. Fourches and A. Varnek. Critical Assessment of QSAR Models of Environmental Toxicity against Tetrahymena pyriformis: Focusing on Applicability Domain and Overfitting by Variable Selection. J. Chem. Inf. Model., 2008, 48 (9), pp 1733–1746.