# Research

We use statistical thermodynamics concepts to develop models and computational algorithms to study biophysical processes. Given the complexity of biological systems, it is important to strike a balance between theoretical rigor and physiochemical intuition to capture essential features of the systems and deliver accurate yet computationally tractable models. Because computer hardware and algorithms continue to advance, the rigor/computational complexity "sweet spot" is a continuously moving target. Many models that were computationally intractable only a few years ago are not routine. Therefore in this area, it is important to be on top of both theoretical concepts as well as the latest technological advances.

### Thermodynamics of Protein-Ligand Binding

**Background.** Molecular recognition is an essential component for virtually all biological processes. Many pharmaceutical drugs act by binding to enzymes and signaling proteins, thereby altering their activity. There is great interest in the development of computer models capable of predicting accurately the strength of protein-ligand association. Modeling protein-ligand equilibria, however, is a very challenging and still largely unsolved problem. Ideally, such a model should incorporate enough detail to address important medicinal questions such as drug specificity, resistance, and toxicity. Often these properties are very sensitive to subtle changes (sometimes involving only a few atoms) in ligand composition and protein sequence. Models that describe key parts of the system at the atomic level have the best chance of resolving these differences.**Statistical Thermodynamics Theory and Computer Models.** Thermodynamically, the strength of the association between a ligand molecule and its target receptor is measured by the standard free energy of binding. Rigorous statistical mechanics theories of molecular association equilibria exist. I have authored a book chapter (Gallicchio & Levy 2011a) reviewing the latest theoretical developments on the theory of non-covalent association and their implications for computer models aimed at binding free energy estimation. The theoretical account covers the role of conformational heterogeneity, and entropic and conformational reorganization concepts that, although crucial for the understanding of binding equilibria, are generally under-appreciated and rarely properly accounted for in computer models. We have recently developed an analytical model of alchemical molecular binding (Kilburg & Gallicchio 2018) based on the idea that the interaction energy between the receptor and the ligand is the convolution of two random processes, one that follows the central limit statistics and the other that follows max statistics. Even more recently we used this analytical model to identify novel perturbation potentials and soft-core functions to accelerate the convergence of alchemical binding free energy calculations (Pal & Gallicchio 2019). Computational binding free energy models are commonly based on molecular mechanics force fields and ways to efficiently sample molecular configurations (Gallicchio & Levy 2011b). Some of the most promising approaches we have identified employ extended ensembles combined with Hamiltonian-hopping techniques.

**The Single-Decoupling Binding Free Energy Method (SDM)** While at Rutgers, I led the development of a novel approach to absolute binding free energy estimation and analysis we called the Binding Energy Distribution Analysis Method (BEDAM)(Gallicchio et al. 2010). More recently, we implemented the method in the OpenMM molecular simulation software and we renamed it the Single-Decoupling Binding Free Energy method to underscore the fact that it requires only one free energy leg rather than two with the more commonly used Double-Decoupling Method. The model is based on sound statistical mechanics theory of molecular association and efficient computational strategies built upon parallel Hamiltonian replica exchange sampling and multi-state thermodynamic reweighting. The ability to carry out extensive conformational sampling is one of the main advantages of SDM over existing FEP and absolute binding free energies protocols with explicit solvation which suffer from a limited exploration of conformational space (Pal & Gallicchio 2019).

### Implicit Solvent Modelling

**Importance of Solvation Effects.** It is hard to overstate the importance of water-solute interactions. Virtually all biological processes occur in water solution and often water molecules play a fundamental role in enzymatic reactions, and in modulating binding and conformational equilibria. Atomistic models of biomolecules need to include some description of water-solute interactions to achieve at least a qualitative level of fidelity. A variety of approaches have been tried to model solvation effects. **Why implicit solvation?** Solvent models used in molecular simulations can be roughly divided into two camps. Models in the first category explicitly include individual solvent molecules with their interactions described at the same level of theory as solute interactions. These are called *explicit* solvent models. Models in the second camp describe the solution as a continuous medium often described by macroscopic parameters such as density, dielectric permittivity, and surface tension. These *implicit* solvent models take a variety of forms and are often described as approximate versions of explicit models. Although this is often true in practice, statistical thermodynamics theory tells us that the level of accuracy achievable by implicit models is no more nor less high than explicit models. This is because both models are to be judged on how accurately they describe the solvent potential of mean force of the solute, a well-defined statistical thermodynamic quantity which measures the distribution of conformations of the solute in solution.

I have been using both kind of solvation models in my research. However, lately, I got interested in implicit solvent models for two main reasons. The first is that advanced conformational sampling algorithms such as replica exchange molecular dynamics are more easily applied with implicit solvation. More fundamentally, however, implicit solvation simplifies the modeling of conformational equilibria and, in particular, protein-ligand binding. Calculations of protein-ligand binding affinities using explicit solvent are complicated by issues such as limited conformational sampling, slow equilibration of water molecules between the bulk and the binding site, and the need to perform calculations for both the solution and receptor environments. As we have shown (Gallicchio et al. 2010), these limitations are circumvented to a significant extent with an implicit solvent description, allowing us to acquire insights into dynamical and entropic aspects of binding that is qualitatively important and conceptually valid regardless of the specific representation of solvation (Kilburg & Gallicchio 2018). The quality of an implicit solvent description is, however, a concern for quantitative predictions.

**The AGBNP model.** The Analytical Generalized Born plus Non-Polar (AGBNP) model (Gallicchio & Levy 2004) (Gallicchio et al. 2009) originated from the need for an implicit solvent model that would incorporate as much realism as possible and at the same time would be usable with Molecular Dynamics, which requires analytical (that is differentiable) and computationally efficient energy functions. AGBNP is based on an efficient implementation of the pairwise descreening form of the Generalized Born (GB) model, an approximate description of the continuum dielectric electrostatic model. GB basically augments conventional fixed charge force field interactions (Coulomb interactions and Lennard-Jones interactions) with "GB interaction" terms that depend on parameters, called Born radii, that depend on the solute geometry. The calculations of the Born radii is by far the most complex part of the computation of the GB energy. AGBNP also includes non-electrostatic terms incorporating lessons learned over a decade of research. We found that models based on the decomposition of non-polar hydration into a cavitation free energy (described by the solute surface area) and van der Waals dispersion forces (modeled using a function based on the atomic Born radii) give a better description of fundamental processes such as protein folding and association. The research community is recognizing the benefits of this decomposition and is progressively abandoning the traditional surface area (SA) models of non-polar hydration in favor of these new models.

One of the design principles of AGBNP is to reduce as much as possible the use of empirically adjusted parameters. For example, many commonly used GB/SA implicit solvent models employ empirically adjusted functional forms not only for energetic terms but also for essentially geometrical quantities, such as surface areas and Born radii. I believe that this practice leads to inaccuracies and poor transferability. We were able to show that advanced computational geometry algorithms make the parametrization of geometrical quantities unnecessary and, by doing so, leads to models that are accurate not only on average but also in the details. Being able to accurately model both large and small conformational changes are important in many applications and in particular in protein-ligand binding. We are continuously updating the AGBNP model to expand its range of applicability. Recent efforts have been focused on the GPU implementation of AGBNP as part of OpenMM molecular simulation software (Zhang et al 2017).