Protein-Ligand Binding | Implicit Solvation | HIV-related

HRV chimeric virus bound to 2F5 Fab Binding energy distributions

We use of statistical thermodynamics concepts to develop models and computational algorithms to study biophysical processes. Given the complexity of biological systems, it is important to strike a balance between theoretical rigor and physiochemical intuition to capture essential features of the systems and deliver accurate yet computationally tractable models. Because computer hardware and algorithms continue to advance, the rigor/computational complexity "sweet spot" is a continuously moving target. Many models that were computationally intractable only few years ago are not routine. Therefore in this area it is important to be on top of both theoretical concepts as well as the latest technological advances.

Thermodynamics of Protein-Ligand Binding

Background. Molecular recognition is an essential component for virtually all biological processes. Many pharmaceutical drugs act by binding to enzymes and signaling proteins, thereby altering their activity. There is great interest in the development of computer models capable of predicting accurately the strength of protein-ligand association. Modelling protein-ligand equilibria, however, is a very challenging and still largely unsolved problem. Ideally such a model should incorporate enough detail to address important medicinal questions such as drug specificity, resistance, and toxicity. Often these properties are very sensitive to subtle changes (sometimes involving only few atoms) in ligand composition and protein sequence. Models that describe key parts of the system at the atomic level have the best chance of resolving these differences.

Statistical Thermodynamics Theory and Computer Models. Thermodynamically, the strength of the association between a ligand molecule and its target receptor is measured by the standard free energy of binding. Rigorous statistical mechanics theories of molecular association equilibria exist. I have recently authored a book chapter (Gallicchio & Levy 2011a) reviewing the latest theoretical developments on the theory of non-covalent association and their implications for computer models aimed at binding free energy estimation. The theoretical account covers the role of conformational heterogeneity, and entropic and conformational reorganization concepts that, although crucial for the understanding of binding equilibria, are generally under-appreciated and rarely properly accounted for in computer models. Computational binding free energy models are commonly based on molecular mechanics force fields and ways to efficiently sample molecular configurations of the complex. We have recently published a review in Current Opinions of Structural Biology (Gallicchio & Levy 2011b) covering some aspects of the conformational sampling problem. Some of the most promising approaches we have identified employ extended ensembles combined with Hamiltonian-hopping techniques.

The Binding Energy Distribution Analysis Method (BEDAM) We have recently developed a novel approach to absolute binding free energy estimation and analysis we called the Binding Energy Distribution Analysis Method (BEDAM)(Gallicchio et al. 2010) based on a sound statistical mechanics theory of molecular association and efficient computational strategies built upon parallel Hamiltonian replica exchange sampling and histogram reweighting. The method takes its name from the technique it employs to extract standard binding free energies from the statistical analysis of the probability distributions of the energies of association over a series of conformational ensembles connecting the bound and unbound states. The ability to carry out extensive conformational sampling is one of the main advantages of BEDAM over existing FEP and absolute binding free energies protocols in explicit solvent which suffer from limited exploration of conformational space. Benchmarking calculations illustrate the power and accuracy of the methodology.(Lapelosa et al. 2012). The BEDAM method has been employed in the international SAMPL blind challenges where computational methods are tested on their ability to predict undisclosed experimental data. The method has earned top marks in all SAMPL challenges so far.(Gallicchio & Levy 2012)(Gallicchio et al. 2014)(Gallicchio et al. 2015) Ongoing development is aimed at optimizing and automating BEDAM calculations so that they can be used for binding free energy-based screening of large ligand libraries, to complement and build upon conventional virtual screening docking protocols.     

Implicit Solvent Modelling

Importance of Solvation Effects. It is hard to overstate the importance of water-solute interactions. Virtually all biological processes occur in water solution and often water molecules play a fundamental role in enzymatic reactions, and in modulating binding and conformational equilibria. Atomistic models of biomolecules need to include some description of water-solute interactions to achieve at least qualitative level of fidelity. A variety of approaches have been tried to model solvation effects.

Why implicit solvation? Solvent models used in molecular simulations can be roughly divided into two camps. Models in the first category explicitly include individual solvent molecules with their interactions described at same level of theory as solute interactions. These are called explicit solvent models. Models in the second camp describe the solution as a continuous medium often described by macroscopic parameters such as density, dielectric permittivity, and surface tension. These implicit solvent models take a variety of forms and are often described as approximate versions of explicit models. Although, in practice this is often true, statistical thermodynamics theory tells us that the level of accuracy achievable by implicit models is no more nor less higher than explicit models. This is because both models are to be judged on how accurately they describe the solvent potential of mean force of the solute, a statistical thermodynamic quantity which measures the distribution of conformations of the solute in solution.
    I have been using both kind of solvation models in my research. However, lately I got interested in implicit solvent models for two main reasons. The first is that advanced conformational sampling algorithms such as replica exchange molecular dynamics are more easily applied with implicit solvation. More fundamentally however, implicit solvation simplifies the modeling of conformational equilibria and, in particular, protein-ligand binding. Calculations of protein-ligand binding affinities using explicit solvent are complicated by issues such as limited conformational sampling, slow equilibration of water molecules between the bulk and the binding site, and the need to perform calculations for both the solution and receptor environments. As we have shown (Gallicchio et al. 2010), these limitations are circumvented to a significant extent with an implicit solvent description, allowing us to acquire insights into dynamical and entropic aspects of binding that are qualitatively important and conceptually valid regardless of the specific representation of solvation. The quality of the implicit solvent description is however a concern for quantitative predictions.

The AGBNP model. The Analytical Generalized Born plus Non-Polar (AGBNP) model (Gallicchio & Levy 2004) (Gallicchio et al. 2009) originated from the need of an implicit solvent model that would incorporate as much realism as possible and at the same time would be usable with Molecular Dynamics, which requires analytical (that is differentiable) and computationally efficient energy functions. AGBNP is based on an efficient implementation of the pairwise descreening form of the Generalized Born (GB) model, an approximate description of the continuum dielectric electrostatic model. GB basically augments conventional fixed charge force field interactions (Coulomb interactions and Lennard-Jones interactions) with "GB interaction" terms that depend on parameters, called Born radii, that depend on the solute geometry. The calculations of the Born radii is by far the most complex part of the computation of the GB energy.
    AGBNP also includes non-electrostatic terms incorporating lessons learned in over a decade of research. We found that models based on the decomposition of non-polar hydration into a cavitation free energy (described by the solute surface area) and van der Waals dispersion forces (modelled using a function based on the atomic Born radii) give a better description of of fundamental processes such as protein folding and association. The research community is recognizing the benefits of this decomposition and is progressively abandoning the traditional surface area (SA) models of non-polar hydration in favor of these new models.
    One of the design principles of AGBNP is to reduce as much as possible the use of empirically adjusted parameters. For example many commonly used GB/SA implicit solvent models employ empirically adjusted functional forms not only for energetic terms but also for essentially geometrical quantities, such as surface areas and Born radii. I believe that the this practice leads to inaccuracies and poor transferability. We were able to show that advanced computational geometry algorithms make the parametrization of geometrical quantities unnecessary and, by doing so, leads to models that are accurate not only on average but also in the details. Being able to accurately model both large and small conformational changes is important in many applications and in particular in protein-ligand binding.
    We are continuously updating the AGBNP model to expand its range of applicability. We introduced (Gallicchio et al. 2009) new terms to better describe solvation effects beyond the standard linear continuum dielectric approximation. We have also been working on improving the quality of geometrical solute descriptors such as atomic surface areas. The AGBNP model is currently yielding promising results for protein-ligand binding free energies (see above) and we intend to continue to optimize it to extend its accuracy.

HIV modelling

Background. The AIDS pandemic remains one of the most serious challenges to world health. AIDS is caused by the HIV virus, a highly infectious and fast-mutating retrovirus which compromises the immune system and evades our natural defenses. Several strategies are being pursued to find a cure. The development of an effective vaccine against HIV is probably still the only viable long-term strategy to completely eliminate AIDS. Despite numerous attempts, however, the discovery of an AIDS vaccine has remained elusive. Antiviral drugs have been very useful to slow the progress of the disease. Due to the uncanny ability of the HIV virus to quickly develop resistance mutations, many older drugs are now ineffective and new drugs are continuously being developed to try to be a step ahead of the evolution of the virus.
    For several years now we have been applying computer modelling tools to study some elements of the HIV virus to help uncover new ways to jam their functions. This work has been in close collaboration with the Arnold's laboratory at the Center of Advanced Biotechnology and Medicine (CABM) at Rutgers University. The Arnold's lab has been at the forefront of the development of antiviral drugs that inhibit the reverse transcriptase (RT) enzyme of the HIV virus. The lab has also actively pursued the development of HIV vaccine candidates based on a rhinovirus vehicle (the virus of the common cold) carrying a small section of the HIV virus on its surface.

HIV-RT Inhibition. Reverse transcriptase (RT) is an essential enzyme in the life cycle of the HIV virus. Like for all retroviruses, the genetic information of HIV is stored in RNA. This needs to be transcribed into DNA before the the virus can replicate. Inhibition of RT, which carries out the transcription, blocks HIV replication and slows down the progress of AIDS. RT is a fairly large protein complex with multiple sites and functionalities targeted by antiviral drugs.
    In collaboration with the Arnold's lab we have been focusing on the so-called Non-Nucleoside binding site of RT. Drugs targeting this area are called Non-Nucleoside Reverse Transcriptase Inhibitors (or NNRTI's for short). Efavirenz (Sustiva) and Nevirapine (Viramune) are two examples of NNRTI's currently used in AIDS therapy. These drugs function by inserting themselves into a buried region of the RT protein thereby causing a conformational change that interferes with viral DNA synthesis.
    We have been studying several aspects of NNRTI inhibition mechanism.(Gallicchio 2012)(Frenkel et al. 2009)(Su et al. 2007) Computer modelling is not yet capable to describe the binding reaction of NNRTI's in its entirety. Probably the greatest challenge is the modelling of the formation of the Non-Nucleoside (NN) binding pocket. This pocket is not observed in absence of the bound drug. It is the binding of the drug molecule that induces the conformational change of RT that creates the binding site and in turns blocks enzymatic activity.

Vaccine Design. The discovery of an effective vaccine against AIDS, although so far elusive, remains the best option to eradicate the disease especially in poor countries. In collaboration with the Arnold's group we applied for the first time molecular simulations and binding free energy concepts to aid the formulation of HIV vaccine constructs. The idea is to display HIV epitopes on the coat of a rhinovirus (the virus the common cold) to create a chimeric virus that would confer protection against the HIV virus. The initial goal was to optimize antigenicity by presenting the epitope in such a way that it would bind strongly to a known human neutralizing neutralizing antibody (2F5). Here the composition of the binding interface was biologically constrained and therefore preorganization of the epitope to the bound conformation was the only viable route for optimizing the binding affinity. We hypothesized that those presentation constructs with the highest fraction of epitope conformations compatible with antibody complexation would minimize the reorganization free energy and present the highest binding affinity for the antibody. We thereby conducted molecular simulations to guide the optimization of the presentation of the epitope which resulted in a series of detailed predictions regarding the length and positioning of the epitope that would result in good binding.(Lapelosa et al. 2009) Subsequent biochemical work in Arnold's laboratory confirmed the computational prediction and, remarkably, yielded some of the most antigenic vaccine constructs of this kind to date.(Lapelosa et al. 2010) To this day this study is only one of a few examples of modeling applications to vaccine design, and the only computational study to our knowledge that successfully applied reorganization free energy concepts to macromolecular binding. It also represents a fine example of the potential benefits of tight collaboration between modelers and structural biologists.
Subpages (1): Research Blog