This model predicts aquatic toxicity of a compound against Tetrahymena pyriformis expressed as pIGC50 (=-log IGC50). Built using the StarDrop Auto-Modeller, this model is based upon literature data and can be downloaded and used with StarDrop 4.2 and onwards. If you’ve not used custom models before, details on how the model was built and how to install it are available on the following pages, along with the model file…
- Save the model file (Toxicity Tpyriformis.aim) into the StarDrop model files directory (in a default installation this will be “C:\Program Files\StarDrop\modelfiles”).
- When you start StarDrop the model will appear in the list of available models.
This model predicts aquatic toxicity of a compound against Tetrahymena pyriformis expressed as pIGC50 (=-log IGC50).
To build this model we used the data set reported by Zhu et al.  The data set consists of 1203 diverse organic compounds with experimentally measured toxicity against Tetrahymena pyriformis. Zhu et al.  reported QSAR modelling results of this set by six academic groups. Each group used its own QSAR approaches but applied their procedure to the same training set (644 compounds) to build a model and validated their predictive models on same validation set I (449 compounds) and validation set II (110 compounds). We used the same data set split on the three subsets to build our model as in the paper .
StarDrop’s Auto-Modeller was applied to the data set using predefined split into the training, validation and test sets. The other parameters were based upon the default software values. The best model was produced by the Gaussian Processes technique with forward variable selection (GPFVS). The model uses 73 StarDrop descriptors including logP, molecular weight, PSA, charge descriptors and various counts of atoms and functional groups. The performance of the model is summarised in Table 1.
Table 1. Aquatic toxicity model performance on training, validation and test sets
The graphs of predicted versus observed pIGC50 values for the validation and test sets are given in Figures 1 and 2.
It should be noted that the test set (validation set II) represents an external test set for the model. It became available after the original set had been split into training and validation sets by Zhu et al. (see  for the details). We used the same splits and did not use the test set for model building or selection.
Uncertainty in prediction
Together with each prediction the model provides an individual uncertainty of prediction, the standard deviation in prediction. The uncertainty value σ given by the model is suitable for a new molecule with an unknown observed pIGC50 value. If an experimental value for the molecule is known and is compared with the predicted value, then the standard deviation should also include the noise in the observed values.
In this model the estimated standard deviation of error in the observed values was determined as 0.346 (the level of noise present in the training and validation sets). Therefore, for a molecule in the validation set the uncertainty in prediction will be equal to
Comparison to other toxicity models
Papers  and  give a detailed description of QSAR toxicity models built on this data set, compare their performance and discuss applicability domain issues. Briefly, the R² on the validation set (val I) ranges from 0.71 to 0.87 and the R² on the test set (val II) ranges from 0.38 to 0.83. Our model performs well in comparison with these models. In performance against the validation set it matched the best model published in .
Figure 1. Predicted versus observed pIGC50 values for the validation set
Figure 2. Predicted versus observed pIGC50 values for the test set
- H. Zhu, A. Tropsha, D. Fourches, A. Varnek, E. Papa, P. Gramatica, T. Öberg, P. Dao, A. Cherkasov and I.V. Tetko. Combinatorial QSAR Modeling of Chemical Toxicants Tested against Tetrahymena pyriformis. J. Chem. Inf. Model., 2008, 48 (4), pp 766–784.
- I.V. Tetko, I. Sushko, A.K. Pandey, H. Zhu, A. Tropsha, E. Papa, T. Öberg, R. Todeschini, D. Fourches and A. Varnek. Critical Assessment of QSAR Models of Environmental Toxicity against Tetrahymena pyriformis: Focusing on Applicability Domain and Overfitting by Variable Selection. J. Chem. Inf. Model., 2008, 48 (9), pp 1733–1746.