Abstract
Using the DUD-E+ benchmark, we explore the impact of using a single protein pocket or ligand for virtual screening compared with using ensembles of alternative pockets, ligands, and sets thereof. For both structure-based and ligand-based approaches, the precise characterization of the binding site in question had a significant impact on screening performance. Using the single original DUD-E protein, Surflex™-Dock yielded mean ROC area of 0.81 ± 0.11. Using the cognate ligand instead, with the eSim method for screening, yielded 0.77 ± 0.14. Moving to ensembles of five protein pocket variants increased docking performance to 0.84 ± 0.09. Results for the analogous ligand-based approach (using the five crystallographically aligned cognate ligands) was 0.83 ± 0.11. Using the same ligands, but making use of an automatically generated mutual alignment, yielded mean AUC nearly as good as from single-structure docking: 0.80 ± 0.12. Detailed results and statistical analyses show that structure- and ligand-based methods are complementary and can be fruitfully combined to enhance screening efficiency. A hybrid approach combining ensemble docking with eSim-based screening produced the best and most consistent performance (mean ROC area of 0.89 ± 0.08 and 1% early enrichment of 46-fold). Based on results from both the docking and ligand-similarity approaches, it is clearly unwise to make use of a single arbitrarily chosen protein structure for docking or single ligand query for similarity-based screening.
It feels like only yesterday that we announced the arrival of StarDrop 8. And here we are again. Though this time, that’s rather the point. Discovery teams need to iterate quickly, moving from hypothesis…
Download the guide and learn what generative chemistry is, where it fits in the discovery workflow, and best practices to avoid common pitfalls.