In StarDrop, you can choose to cluster your data based on common substructure, chemical structure or property values. Structure and property-based clustering use the ‘dbclus’ algorithm developed by Butina et al. (J. Chem. Inf. Comput. Sci. 1999, 39, 4, 747–750) and differ only in the way the similarity between compounds is measured.

In the case of clustering by structure, a Tanimoto index is used to compare molecules; when you choose to cluster by properties, the comparison is based on the Euclidean distance between compounds using the properties that you have selected.

Clustering based on the common substructure uses a maximum common substructure algorithm to group compounds containing a significant common substructure. To speed this up, a Tanimoto similarity is used to identify compounds that will not have a substantial common substructure before performing a full comparison based on the chemical graph.

More details about clustering methods can be found in Section 5.1 of the StarDrop Reference Guide, which you access from the Help menu.

More StarDrop resources