avatar
ewubbo Registered
Interestingly, I read a short item in New Scientist lately, announcing that casino's were beginning to implement computer systems to 'unmask' card counters (so they could subsequently be thrown out of the casino, I presume). Nature, fortunately, may not take such deliberate sabotaging measures.

The low intuitiveness of QSAR is indeed one of its biggest problems; but as you are probably also aware, another big problem is that usually the training set has so few compounds relative to the number of descriptors and descriptor combinations (products/squares/splines) that can be chosen, that often a large amount of overfitting is taking place; QSARs using very different formulas (and would therefore probably yield conflicting predictions for new compounds) can have almost the same 'predictive' quality on the training set, at least within the accuracies allowed for by biological experiments. In some of my darkest moments, I fear that most published QSAR is a sort of veiled similarity measure, asserting that if compounds are very much alike, they will have very much the same activity. The extrapolation value of QSAR (if a formula can predict which compounds are more active than any in the training set) is, to my knowledge, thought to be rather low.

I do think however that the assertion that data mining can help us is correct, even though it must probably be handled with more sophistication than most people (including me with my substructure mining experiments) have achieved so far.