This model predicts aquatic toxicity of a compound against Tetrahymena pyriformis expressed as pIGC50 (=-log IGC50). Built using the StarDrop Auto-Modeller, this model is based upon literature data and can be downloaded and used with StarDrop 4.2 and onwards.

Summary

This model predicts aquatic toxicity of a compound against Tetrahymena pyriformis expressed as pIGC50 (=-log IGC50).

Data source

To build this model we used the data set reported by Zhu et al.1 The data set consists of 1203 diverse organic compounds with experimentally measured toxicity against Tetrahymena pyriformis. Zhu et al.1 reported QSAR modelling results of this set by six academic groups. Each group used its own QSAR approaches but applied their procedure to the same training set (644 compounds) to build a model and validated their predictive models on same validation set I (449 compounds) and validation set II (110 compounds). We used the same data set split on the three subsets to build our model as in the paper1.

Model details

StarDrop’s Auto-Modeller was applied to the data set using predefined split into the training, validation and test sets. The other parameters were based upon the default software values. The best model was produced by the Gaussian Processes technique with forward variable selection (GPFVS). The model uses 73 StarDrop descriptors including logP, molecular weight, PSA, charge descriptors and various counts of atoms and functional groups. The performance of the model is summarised in Table 1.

Number of compoundsR2RMSE
Training6440.920.30
Validation (val I)3390.870.38
Test (val II)1100.700.49
Table 1. Aquatic toxicity model performance on training, validation and test sets

The graphs of predicted versus observed pIGC50 values for the validation and test sets are given in Figures 1 and 2.

Uncertainty in prediction

Together with each prediction the model provides an individual uncertainty of prediction, the standard deviation in prediction. The uncertainty value σ given by the model is suitable for a new molecule with an unknown observed pIGC50 value. If an experimental value for the molecule is known and is compared with the predicted value, then the standard deviation should also include the noise in the observed values.

In this model the estimated standard deviation of error in the observed values was determined as 0.346 (the level of noise present in the training and validation sets). Therefore, for a molecule in the validation set the uncertainty in prediction will be equal to

Comparison to other toxicity models

Papers 1 and 2 give a detailed description of QSAR toxicity models built on this data set, compare their performance and discuss applicability domain issues. Briefly, the R² on the validation set (val I) ranges from 0.71 to 0.87 and the R² on the test set (val II) ranges from 0.38 to 0.83. Our model performs well in comparison with these models. In performance against the validation set it matched the best model published in 1.

Figure 1. Predicted versus observed pIGC50 values for the validation set
Figure 1. Predicted versus observed pIGC50 values for the validation set
Figure 2. Predicted versus observed pIGC50 values for the test set
Figure 2. Predicted versus observed pIGC50 values for the test set

Installing and using the model

Model file

Download model file

Installation files

Model file

Download model file

How to use the model

Save the model file  into the StarDrop model files directory (in a default installation this will be “C:\Program Files\StarDrop\modelfiles”).
When you start StarDrop the model will appear in the list of available models.

References

  1. H. Zhu, A. Tropsha, D. Fourches, A. Varnek, E. Papa, P. Gramatica, T. Öberg, P. Dao, A. Cherkasov and I.V. Tetko. Combinatorial QSAR Modeling of Chemical Toxicants Tested against Tetrahymena pyriformis. J. Chem. Inf. Model., 2008, 48 (4), pp 766–784.
  2. I.V. Tetko, I. Sushko, A.K. Pandey, H. Zhu, A. Tropsha, E. Papa, T. Öberg, R. Todeschini, D. Fourches and A. Varnek. Critical Assessment of QSAR Models of Environmental Toxicity against Tetrahymena pyriformis: Focusing on Applicability Domain and Overfitting by Variable Selection. J. Chem. Inf. Model., 2008, 48 (9), pp 1733–1746.