Conventional ligand fitting and refinement in X-ray electron density maps relies on single conformers and B-factors, that often yields ligands with unrealistically high conformational strain. xGenis a real-space ligand fitting and refinement method that balances electron density fit with ligand conformational strain. It is applicable to small molecules and macrocyclic peptides alike. It produces occupancy-weighted ensembles yielding substantially reduced strain energies compared to deposited structures.

Applying the xGen method to over 3,000 protein-ligand complexes revealed that strain estimates calculated using PDB ligand coordinates were unusually high. It further showed that strain increases superlinearly with ligand size and established a strong inverse correlation between ligand efficiency and per-atom strain, demonstrating strain as a predictive factor in drug design.

The xGen method

Refinement and de novo fitting using xGen

xGen ensembles achieve better density fits and reduce the ligand strain of deposited PDB models by ~50% for both refinement and de novo fitting. Average strain for:

150 macrocycles: 3.7 kcal/mol vs 6.8 kcal/mol

76 non-macrocycles: 2.5. kcal/mol vs 4.2 kcal/mol

Real-space refinement of macrocycles (left) with 3DV1 shown. xGen ensemble (orange) vs. PDB reference coordinates (coloured by B-factors) showing improved RSCC/RSR.
De novo fitting (right) with 3O57 shown. xGen ensemble (orange) captures both primary (cyan) and alternate (dark blue) PDB conformers with improvement in RSCC.

Ligand strain, size, and efficiency relationships

Applying xGen to ~3000 protein-ligand complexes revealed that strain energies calculated using deposited PDB ligand structures are artifactually high.

Grazoprevir-NS3/4A protease (3SUE). PDB ligand (green) fits the electron density well (left) (RSCC = 0.95) but show high strain (16.1 kcal/mol) calculated as difference between surrogate conformer (yellow) energy (Esurr) and global minimum conformer energy (Egmin). xGen ensemble (orange) maintains fit quality (improved RSCC/RSR) while reducing strain by 75% to 3.9 kcal/mol (right).

Ligand strain increases superlinearly with molecular size, following a predictable distribution.

The distributional model provides practical upper bounds for conformational search protocols and design strategies. For example, 4.5 kcal/mol for 25 atoms and 9.4 kcal/mol for 40 atoms (green arrows).

There is also a strong inverse relationship between ligand efficiency i.e., how tightly a ligand binds for its size, and ligand strain-per-atom (τ = −0.35, p ≪ 0.001).

High-efficiency ligands seldom have high per-atom bound conformational strain, whereas low efficiency ligands have variable strain estimates. This further highlights the importance of minimising the strain during ligand optimisation.

Conclusions

  • xGen offers a paradigm shift for ligand modelling, producing physically realistic conformer ensembles for ligands
  • Ensemble-based fitting yields ligands with lower strain estimates, suggesting greater biological relevance
  • Ligand strain is superlinear and is a predictive factor for drug design and optimisation: If a ligand has high strain relative to expected distribution, aim to optimise its geometry and if it already has low strain, improve protein-ligand interaction footprint

References

Jain AN, Cleves AE, et al (2020), J. Med. Chem.63 (18) https://doi.org/10.1021/acs.jmedchem.2c01744

Jain AN, Brueckner AC, et al (2023) J Med Chem 66(3) https://doi.org/10.1021/acs.jmedchem.0c01373

Acknowledgements

Merck: Alexander C. Brueckner, Mikhail Reibarkh, and Edward C. Sherer

Optibrium: Mario Öeren and Kyle Kroeck