Introduction to QuanSA: Quantitative Surface-field Analysis

Affinity prediction challenges:

The things we want to predict are in the future. They do not come from the same statistical population as the molecules/activity-data from which we can induce models. This violates the central assumption of machine-learning: predict on things that come from the same population as things used to train.

QuanSA uses a surface representation:

To address these challenges, it is necessary to use a physics-driven domain knowledge in the model induction process. The actual molecular surfaces and their properties are not well represented by the atom/bond depictions used to symbolize molecules. Surfaces are congruent even when they don’t look like they should be.

QuanSA: Quantitative Surface Field Analysis

The QuanSA method

To define a ‘pocket field’, an initial alignment of all training molecules is constructed and function parameters at the observer points are learned based on activity data1.

The QuanSA method
  • The QuanSA pocket field is iteratively refined using multiple instance machine learning; considering multiple poses for each compound means that no assumptions are made about the ‘right’ pose.
  • Building/applying a model is tractable, taking just hours to build or refine.
  • QuanSA models require no known target; models can be informed by protein structure or applied on purely phenotypic data.
  • A new molecule can typically be run in seconds; thus, very large-scale applications are possible.
  • Predictions are supported with a score,  a pose and quality metrics.
  • Structurally novel molecules are often well within the domain of applicability, accurately supporting scaffold-hopping.

QuanSA benchmarking vs FEP+

Schindler 2020 and Abel 2015 FEP+ comparison

A critical application is to accurately predict affinities for future molecules. QuanSA and FEP+ models were built and evaluated2 for sixteen targets from two published  datasets using temporal segregation. Training set compounds were selected based upon similarity to the FEP+ reference ligand, forcing the QuanSA models to extrapolate. The study compared the accuracy across the targets, as summarized in the plots below.

  • QuanSA and FEP+ have similar accuracy.
  • Both methods are highly synergistic; a hybrid (mean) score increases accuracy compared to either method.
  • QuanSA is ~1000x faster than FEP+, alleviating screening bottlenecks.
QuanSA benchmarking against FEP+

QuanSA project application

Active learning to identify a mimic of a macrocyclic natural product

Scaffold replacement as part of an optimization process is a complex challenge. Using a data set of ~1,100 time-stamped compounds, we applied an iterative procedure to refine a QuanSA model, starting with a macrocyclic natural product lead (UK-2A), and rapidly identify a non-macrocyclic fully synthetic broad-spectrum crop anti-fungal (FPX)3.

Active learning to identify mimic of a macrocyclis natural product
  • Iterative model refinement efficiently guided candidate selection to the desired product.
  • FPX was identified in round 5 as one of the most active predicted molecules
  • The model effectively learned the non-macrocyclic scaffold.
  • Only 100 molecules were selected vs over 1,000 in the project, representing a 10x improvement in efficiency.
FPX

Conclusions

  • QuanSA builds physically realistic causal models based on ligand structures alone.
  • QuanSA and FEP+ are equivalent in accuracy and synergistic, but QuanSA is ~1000x faster and has a broader domain of applicability.
  • Active learning with QuanSA enables more efficient lead-to-candidate design – 10x in this case study.

References

  1. Cleves, A.E. and Jain, A.N. (2018). JCAMD, 32, 731-757 doi.org/10.1007/s10822-018-0126-x
  2. Cleves, A.E., Johnson, S.R., and Jain, A.N. (2021). JCIM, 61, 5948-5966 doi.org/10.1021/acs.jcim.1c01382
  3. Cleves, A.E., Jain, A.N., Demeter, D.A., et al. (2024). JCAMD 38, 19 doi.org/10.1007/s10822-024-00555-3