Welcome to the Optibrium Community





Forgot login?
Register

Forum

Search Forum

Welcome, Guest
Please Login or Register.    Lost Password?

Tautomeric form(s) in training set
(1 viewing) (1) Guest
This forum is for everything else!
#297
Tautomeric form(s) in training set
alexisparenty
Fresh Boarder
Posts: 1
graphgraph
User Offline Click here to see the profile of this user
Karma: 0
I am using the “Check for Duplicates“ functionality of StarDrop to curate my dataset from duplicates after a merge. I have noticed that Stardrop does not recognize tautomeric forms (as far as StarDrop is concerned, a pair of tautomers are wrongly assumed to be two different molecules). I can remove the tautomers using other tools, such as Knime, it’s not a problem… My question is: Should I actually remove them?!

Could my model actually get better if I train it with all possible tautomeric forms? Sorry if this is a naïve question! I am new to this field, but I have a feeling that I should remove the extra tautomeric form(s) of the same molecule since the model builder only uses in its learning algorithm physicochemical descriptors that are independent of the structural representation that human use… Is that right? It hurts my eyes to see most ketones in their enolic forms, but does it matter for a computer?!

If tautomeric forms are treated in the same way by the model, I think they should be removed as they would bias the calculation. Is that correct?
Many thanks,
Alexis
 
The administrator has disabled public write access.
#299
Re:Tautomeric form(s) in training set
huntpe1
Fresh Boarder
Posts: 1
graphgraph
User Offline Click here to see the profile of this user
Karma: 0
Dear Alexis,

I saw your question and I thought I would take the opportunity to carry on the conversations we were having at Novartis before I left.

My response is that different tautomeric forms should be considered as different molecules because they will have different electrostatic potential distributions around them. In this regard I think Stardrop does do the correct thing and treats them as different molecules. When I have considered tautomers in QSAR model building (using different software I should add), I have included the tautomers in the data table as separate rows but then manually selected one of the possible tautomers for that molecule and included only that form in the model building process. Now if you have many tautomeric forms and many molecules in your data set then the number of possible models that you could generate by this method would be huge. So if this is your data set I wouldn't recommend it.

For large data sets I would consider tautomers in one form (eg all keto) and generate a model for that form. I would generate a different model for the all enol form of the molecules and see what difference (if any) it made to the predictive ability of the model. In the past it has made very little difference compared to the overall inaccuracy of the models. If you are able to use tautomer prediction algorithms, then you will have a predicted relative abundance figure for each tautomer. This figure could be used in some QSAR methods to weight the contribution that a particular molecule (row) of a table contributes to any QSAR model.

All the above depends as you say on which molecular descriptors you use in your model building process. If you use electrostatic based descriptors or those which take into account bond orders or number of attached Hydrogens on any atom then yes the tautomeric form will have an impact. If you use descriptors like LogP then it won't matter at all (unless the LogP program you use doesn't recognise the fragment in its tautomeric form..). So to your final point if you are using descriptors that cannot distinguish different tautomers then I would remove those different tautomeric forms from the model building (using a consistent method like "all keto" mentioned above).

I hope this answers your questions but if you have any others then please get back in touch.
 
The administrator has disabled public write access.
Go to topPage: 1
Latest Forums

Read more >

Popular Downloads

Read more >