Research‎ > ‎

Research Blog

A CentOS Docker Image for OpenMM/SDM Development

posted Mar 29, 2020, 5:11 PM by Emilio Gallicchio   [ updated Mar 29, 2020, 5:16 PM ]

It is difficult to maintain an OpenMM build environment. There are a number of strict requirements for tools, libraries, and compilers that are not easy to satisfy for each operating system. An obvious workaround is to maintain a development build box. However, the build box consumes space and resources and, as during the ongoing emergency, it might not be always available over the network. For unknown reasons, we lost access to our lab internal network at Brooklyn College and no one can go there to fix it.

A better alternative is to implement the software development environment as Docker image. Docker is a framework to create and run virtual machines, called "images" in Docker's parlance. I summarize here the steps I took to create one such image, and how we use it to build OpenMM and related utilities and prepare molecular systems for the Single Decoupling Method for protein-ligand binding free energy estimation.

The latest image is available at this link: centos610-anvil-7.tar.gz. To load it do:

$ gunzip centos610-anvil-7.tar.gz && docker load -i centos610-anvil-7.tar

How to Use the Docker Image to Run the SDM workflow

Run the image:

$ docker run -it condaforge/centos610-anvil-7

Follow the usual procedures to run the SDM workflow. Skip the minimization/thermalization in the script as there is no GPU in the docker image to do it. For some reason, mae2dms needs the conda libs:

> export LD_LIBRARY_PATH=/opt/conda/lib/:$LD_LIBRARY_PATH
> bash ./

Finally, rsync the working directory to a computational server/cluster to do the calculations.

Building the Docker Image

We have based our image on the one used by conda-forge, which is based on CentOS 6.10:

$ docker pull condaforge/linux-anvil
$ docker run -it condaforge/linux-anvil

you will be dropped into a
bash shell as user conda. A python 3.7 conda environment stored in /opt/conda is automatically activated. This user is authorized to install packages via sudo and yum. The Red Hat devtoolset-2 is also installed. The sudo command in this toolset conflicts with the system sudo. To run sudo use /usr/bin/sudo explicitly. To gain root do (while the image is running):

$ docker ps

to get the id of the running image, 8d0865c61538, say. Then:

docker exec -it 8d0865c61538 bash

To compile OpenMM we need gcc from devtoolset-6, also some reasonable text editors and such:

> /usr/bin/sudo install devtoolset-6
> scl enable devtoolset-6 'bash'
/usr/bin/sudo install nano emacs
> /usr/bin/sudo install wget rsync

This is how we built msys:

> conda install -c conda-forge boost
> conda install -c conda-forge scons
> mkdir src && cd src
git clone

I edited the SConscript file to add /opt/conda/include to the include path, also added the needed libraries in LIBS. The key section looks like:

if True:
    for p in env['CPPPATH']:
        if p.startswith('/proj') or p.startswith('/gdn'):
            flg.append('-I%s' % p)
    env.Append(CFLAGS=flg, CXXFLAGS=flg)


scons -j4
> scons -j4 PYTHONVER=37
scons -j4 PYTHONVER=37  install PREFIX=$HOME/local

The msys python tools such as dms-info fail complaining about some kind of python/C++ function argument mismatch. Here is a post that explains how to fix the problem if we need to. We mostly care about mae2dms, which works.

The next steps are to gather the tools necessary to compile OpenMM.

After a number of failed attempts related to cmake incompatibilities, I ended up installing cmake 3.6.3 from sources. As root:

# cd ~/src/
# wget
# tar xzvf v3.6.3.tar.gz
# yum install ncurses-devel
# cd CMake-3.6.3
./bootstrap && make && make install

I installed CUDA as root as well. I skipped the installation of the NVIDIA driver since the image does not have an NVIDIA GPU card:

# wget
# rm

The doxygen app packaged with conda-forge is broken. It does not scan directories with header files. It took a while to debug this. The centos version of doxygen is fine:

> /usr/bin/sudo yum install doxygen

The next steps are for building OpenMM. As the conda user:

> cd ~/src
> wget
> tar zxvf 7.3.1.tar.gz
> cd openmm-7.3.1

I modified CMakeLists.txt in src/openmm-7.3.1/wrappers/python/ to do the python installation under local/openmm-7.3.1/lib. Here is an excerpt:

#set(PYTHON_SETUP_COMMAND "install --root=\$ENV{DESTDIR}/")
set(PYTHON_SETUP_COMMAND "install --prefix=/home/conda/local/openmm-7.3.1/")

Next, do the actual build. For whatever reason CUDA_CUDA_LIBRARY needs to be specified explicitly:

> mkdir -p ~/local/openmm-7.3.1 && mkdir -p ~/devel/build_openmm_7.3.1
> cd ~/devel/build_openmm_7.3.1
ccmake -i ../../src/openmm-7.3.1/ -DCUDA_CUDA_LIBRARY=/usr/local/cuda/lib64/stubs/

In the ccmake interface I turned off the C and Fortran wrappers and pointed the installation directory to /home/conda/local/openmm-7.3.1. Then did the usual:

> make install && make PythonInstall

Now OpenMM is installed under ~/local/openmm-7.3.1. I followed similar steps to install the AGBNP and SDM plugins (see README).

> cd ~/src
> git clone
> git clone
> git clone

Etc. The SDM workflow does not need building. For the AGBNP and SDM plugins, the python wrapper CMakeLists.txt was modified to do the installation of python libraries under ~/local/openmm-7.3.1/lib as above.

The OpenMM installation is now ready for shipment for deployment on computational servers:

> cd ~/local
> tar zcvf openmm-7.3.1.tgz openmm-7.3.1
> scp openmm-7.3.1.tgz me@myfavoriteserver:~/software/

On the server, untar the distribution. Then use a launch script (runopenmm) such as:

export OPENMM_PLUGIN_DIR=${openmm_dir}/lib/plugins
export LD_LIBRARY_PATH=${openmm_dir}/lib:${openmm_dir}/lib/plugins:$LD_LIBRARY_PATH
export PYTHONPATH=${openmm_dir}/lib/python3.7/site-packages:$PYTHONPATH
${pythondir}/bin/python "$@"

For example:

 $ ~/software/bin/runopenmm

To finish up the development image, I installed a version of the academic Desmond-Maestro needed by the SDM workflow:

> mkdir -p ~/schrodinger/installers && cd ~/schrodinger/installers
> export SCHRODINGER=~/schrodinger/Desmond_Maestro_2018.4
> scp me@myfavoriteserver:~/software/Desmond_Maestro_2018.4.tar .
> tar xf Desmond_Maestro_2018.4.tar

and then proceed as usual with the Maestro installation.

Finally, exit from the docker image and commit the changes:

> exit
$ docker commit -m "build box" -a "Emilio Gallicchio" <image id> egallicchio/centos610-anvil-7

where <image id> is the id of the docker container one gets from docker ps -a.

Protonation of an α-Hydroxytropolone HIV RNase H Inhibitor through QM/MM Methods

posted Feb 1, 2020, 1:42 PM by Judy Wei   [ updated Feb 1, 2020, 1:47 PM ]

α-Hydroxytropolones are potential anti-HIV drugs that inhibit the processing of the DNA/RNA hybrid by the RNase H enzyme. It is known that these molecules bind to the Mg2+ metal ion cofactors of the enzyme. However, the specific protonation state of the bound inhibitor is unclear. The molecule we intend to study is the bound structure of β-thujaplicinol to HIV-1 RNase H. In order to figure out the protonation state of the oxygen substituents, we proposed four different states to find out which one is most consistent with the known crystal structure by the method of QM/MM geometry optimization.
Figure 3. the structure comparison of the calculated molecule to the original structure.
Figure 3. the structure comparison of the calculated molecule to the original structure 

Application of Heating and Cooling Strategy on Trp-cage Folding

posted Jan 31, 2020, 10:42 AM by VJay Molino   [ updated Jan 31, 2020, 10:45 AM ]

In our body, the proteins that are synthesized fold to their native structure at physiological temperature. Heating a protein breaks interaction between the amino acid residues and extreme temperature causes a protein to unfold and be denatured. What if during protein folding the system is heated? How will it affect the folding process? Will this strategy provide better conformational search during folding that will eventually lead to the native structure of the protein?

Figure 2. Line representation (top) and cartoon representation (bottom) extended structure of the peptide Trp-cage from the sequence of PDB ID 1RIJ.


Calculating Free Energy Change by Displacing Water Molecules

posted Jan 6, 2020, 12:10 PM by Joe Z Wu   [ updated Jan 6, 2020, 1:55 PM ]

While it is most common to consider direct ligand-protein interactions, recent research tries to consider also the solvation effect of water displacement in the binding region. Our protein of choice for this project is the bromodomain of “Pleckstrin homology domain interacting protein” (PHIP)3, a small protein module that recognizes acetylated lysines on histones and serves as an important role in the study of regulation of gene expression.We have chosen to study the mediation effects of the specific W1 water molecule located within the bromodomain binding region. By displacing this W1 water molecule from neat water and the bromodomain active site with a bubble potential, we were able to calculate the free energy penalty, and as such, understand the amount of net work necessary to introduce a ligand into the site. The resulting free energy change of displacing W1 in neat water was 1.3 kcal mol-1, while for the bromodomain binding pocket, this value was 0.384 kcal mol-1To understand more extensively the energetic effects of water molecules on pocket stability and ligand binding affinity, further studies using additional water molecules can be considered.

Figure 1. Spherical bubble (translucent) in bromodomain binding pocket surrounded by water.

Maximum Likelihood Inference of the Symmetric Double-Well Potential

posted Nov 10, 2019, 4:19 PM by Solmaz Azimi

The double-well potential serves as an optimal, one-dimensional model for exploring physical phenomena. This project aimed to estimate the probability distribution of the double-well potential fitted to a Gaussian mixture model by maximum likelihood inference. Hypothetical datasets of the quadratic function were generated using smart-darting Monte-Carlo simulations under the Metropolis acceptance criterion. A major challenge in studying free energy landscapes is sampling efficiency, therefore a larger displacement was implemented in the Monte-Carlo simulations that are generally done at smaller displacements. Although a conventional Monte-Carlo approach can accomplish displacement from one free energy minimum to the next, this is done at the expense of accuracy, such that the acceptance rate of the simulation deviates from an optimal 50%. The generated datasets demonstrated close resemblance to normalized Boltzmann Distribution functions at two temperatures. TensorFlow was utilized to obtain the most optimal parameters for observing the generated datasets based on the Gaussian Mixture Model. Our results demonstrated that TensorFlow approximates a set of optimal parameters for the Gaussian Mixture Model that accurately resemble the Boltzmann Distribution at 300 K, however the method is unable to do so with similar accuracy at 2000 K. 

The Standard State in Binding Free Energy Calculations

posted Jul 30, 2019, 4:10 PM by Emilio Gallicchio   [ updated Jul 30, 2019, 7:01 PM ]

Phys. Chem. Intro

Q: What is the standard free energy of binding?

It is the free energy of binding between a ligand and a receptor when they are in an ideal solution at the standard concentration (C=1 M). The standard free energy of binding is usually denoted by ΔGb

Q: how is ΔGmeasured in practice?

There are many ways to measure a standard binding free energy, the most straightforward is to measure the binding constant Kb, or equivalently the dissociation constant Kd= 1/Kb

Q: But wait, the relation above does not make sense because it takes the log of a quantity in concentration units.

In Physical Chemistry equilibrium reaction constants are dimensionless. The dimensionless nature of the equilibrium constant for binding is clear from its definition:

where R is the receptor, L is the ligand and RL is the complex. Changing the units of concentration does not change the binding constant.

Q: yeah ... I am not sure, my friend works in this lab and they measured the Kd of a drug and he said it's 1.8 nM

In fields such as Biochemistry, Medicinal Chemistry, and Biology, equilibrium constants are very confusingly reported with units, sometimes very strange units. Unless they are doing something really odd, such as redefining the standard state, read the above as Kd = 1.8 x 10-9, no units, and you'll be fine.

Q: really, I can change the standard state?

If you want, sure, but maybe don't call it "standard". A standard it's just a reference that people have agreed on, a bit like the origin of a coordinate system. You can certainly define your own reference solution state, I don't know ... 34,512 molecules per cubic millimeter. (As an example, for acid/base reactions, biologists like to use 10-7 M as the standard concentration of the hydronium ion, causing endless confusion.) Anyway, if you do use your own standard do not expect to get the same "standard" free energy of binding and equilibrium constant as the rest of the world, which uses molar units, for the same reason that you can't expect to get the same distance from the origin if you change the origin.

Let's stick to the 1 M standard state, please ...

The Standard State in Binding Free Energy Calculations

Q: Why do I need to report the standard free energy of binding?

The binding free energy depends on the concentrations of receptor and ligand. For example, the binding free energy is zero at equilibrium concentrations. It is meaningless to report a free energy of binding without specifying the concentrations it corresponds to. By stating that a free energy of binding is "standard", we tell people that it corresponds to the standard state at 1 M concentrations.

Q: Do I need to do a simulation at the standard state? That is, do I need to place enough ligand and receptor molecules in the simulation box so that they are at 1 M concentration?

No. You can do a calculation at any concentration you want. As the simulation progresses, monitor the number of complexes, RL, and the number of free R and L to measure their equilibrium concentrations. That will give you the equilibrium constant and standard free energy of binding using the relation above.

However, the binding constant is rarely computed by "counting" in this way. We like to do computational alchemy. It is faster and more reliable.

Q: Okay, I did an alchemical calculation. I decoupled the ligand from the solution then I coupled it to the receptor. I fed the data into MBAR etc. which spit out a free energy. Can I call it a standard free energy of binding?

Generally, no. What you have calculated is the excess component of the free energy of binding. To turn it into a standard free energy of binding, add the ideal term:

where Vsite is the volume of the binding "region". 

Q: Okay, that sounds easy. Where does the quantity ΔGideal come from?

 ΔGideal  is the reversible work for transferring a ligand from an ideal solution at concentration C to the binding site of volume Vsite . Read these papers:
    1. MK Gilson, JA Given, BL Bush, JA McCammon. The statistical thermodynamic basis for computation of binding affinities: a critical review, Biophysical Journal, 72, 1047 (1997).
    2. M Mihailescu, MK Gilson. On the theory of noncovalent binding. Biophysical Journal 87 (1), 23-36 (2004).
    3. E Gallicchio and RM Levy, Recent Theoretical and Computational Advances for Modeling Protein-Ligand Binding Affinities. Advances in Protein Chemistry and Structural Biology, 85, 27-80, (2011). 

Q: But how do I get  Vsite?

Getting Vsite is easy. You have set it yourself. It is whatever volume the ligand is allowed to explore during the process of alchemically coupling the ligand to the receptor.

Q: What do you mean? I did not restrain the ligand.

Then Vsite is the volume of the simulation box. Did you run the simulation long enough so that the ligand visited the whole box? Probably not, eh!? Then the simulation is not converged. Even more troubling is that you also have implicitly defined the complex RL as any conformation in which R and L are in a region of solution of the same volume as the simulation box. This is probably not what you wanted. You probably wanted to measure the standard free energy of binding of the ligand to a specific region of the receptor, not for the whole receptor and even including the solvent. You should have restrained the ligand to that region. Vsite  is the volume of whatever that region is that you meant to consider.

Q: Right, right ... I forgot about it. I did apply a restraint potential to avoid the "wandering ligand" problem when the ligand and receptor are decoupled.

Ok, great. Than you have a Vsite after all. Is the restraint potential based on the CM-CM distance between ligand and receptor atoms? If so, simply integrate the Boltzmann factor of the restraint potential over the three coordinates of space to get Vsite. Or is the restraining potential something more complicated?

Q: Yeah ... more complicated. The orientation of the ligand was also restrained.

No problem. Integrate over the orientational angles as well and get the angular binding site volume Ωsite and add -kB T ln Ωsite⁄8 π 2 to the ideal standard state factor. Read the papers I told you about.

Q: No, no. I restrained the orientation of the ligand by tethering two atoms of the ligand to two receptor atoms.

That is not allowed. It would perturb the intramolecular conformational distributions of receptor and ligand when they are not coupled. Read this paper:

Boresch S, Tettinger F, Leitgeb M, Karplus M. Absolute binding free energies: A quantitative approach for their calculation. J. Phys. Chem. B. 107, 9535–9551 (2003)

Q: I see. But after coupling the ligand to the receptor I turned off the restraints. Does that change  Vsite?

The binding restraints are not meant to be turned off. They are on during the whole coupling alchemical leg and they stay on.

Q: But then I am not simulating the real thing! The real complex does not have restraints!

I never said that you are simulating the real thing. We are doing computational alchemy! The binding restraints define the complex. With the binding site restraints, you are essentially specifying the set of conformations of the receptor-ligand system that form what you call the "complexed" state. You can't change the definition of the complex depending on the alchemical state of the system.

Q: Okay, now I know you finally went off the deep end. You cannot be right because then the standard free energy of binding would depend on how I set the restraints!

Precisely. The free energy of the complex depends on how you defined it. This is not different from, say, having to define the ranges of φ and ψ backbone angles of the alpha helix state in order to compute the alpha-helix population of a peptide. Change the range of the angles, and the population, as well as the corresponding free energy, changes. You need to define the complexed state before you can measure its free energy!

Q: But then I do compare the calculated free energy to the experiment? The experimental system does not have restraints. What does the experiment measure?

Good question and a very interesting topic. Read paper no. 2 above.

Q: Okay, but now I am worried that I have used harmonic restraints centered on a specific pose of the ligand which I got from docking and I am not sure it is the correct bound pose. Is it a problem?

It could be. It is safer to use flat-bottom harmonic restraints for the binding site volume. That way you have more tolerance and a greater chance of including the "correct" pose.

Q: I am starting to get this. But I heard that imposing restraints and then releasing them can improve convergence. Is that wrong?

It is absolutely fine to impose and then release additional restraining potentials along the alchemical path to improve convergence. These do not need to obey the same strict rules for the binding site restraints discussed above. Just make sure that the binding site restraints stay on throughout the alchemical simulation.

Q: Now I understand why many people prefer to compute relative binding free energies (FEP) rather than the standard binding free energies! They do not need to worry about standard states, the binding site volume, and all this stuff!

Actually, all of this applies to relative binding free energies as well. The relative binding free energy is the difference between two standard binding free energies. Because these depend on the definition of the binding site, so does the relative binding free energy. In principle, without restraints, a ligand undergoing an FEP transformation will eventually dissociate and wander around the simulation box adversely affecting the relative free energy estimate. The longer the simulation the higher the chance that this will happen. You want your estimate to improve, not worsen, as you make the simulation longer. Binding site restraints should be used in FEP as well ... but people rarely do it. 

Don't ask. I really don't know why.

Binding Free Energy of Posaconazole in Two Protonation States to a Model of Captisol

posted May 17, 2019, 6:15 PM by Sheenam   [ updated May 19, 2019, 10:50 AM ]

Posaconazole is a triazole antifungal drug that is used to treat invasive infections by Candida species and Aspergillus species in severely immunocompromised patients. However, its property as a weakly basic and poorly aqueous soluble drug results in poor bioavailability and variable absorption. Hence, for a better intravenous administration of the drug under aqueous conditions, the drug comes in a composition which consists of a solubilizing agent, a modified β-cyclodextrin such as Captisol which supposedly solubilizes with the drug in acidic medium. To test this hypothesis, we calculate the free binding energy of the drug Posaconazole to Captisol in different protonated states to understand the pH dependence of the observed solubility of posaconazole in the presence of Captisol via computational methods. The results of the computational experiments conducted here confirm the hypothesis that this effect is due to the higher affinity of the protonated form of Posaconazole relative to the unprotonated one.

Fig 1: Captisol-Posaconazole Complex (Posaconazole in Green)

Conformational Propensities of FXR Drug Inhibitors in Water Solution

posted May 7, 2019, 7:18 AM by Brenda Neuman   [ updated May 8, 2019, 5:59 AM ]

Hypercholesterolemia is a major cause of heart disease. A potential drug therapy for hypercholesterolemia is FXR inhibition because of the role of FXR receptors in bile acid and cholesterol metabolism. Drug molecules need to often reorganize conformations to bind protein receptors. The free energy penalty of reorganization is related to how often in solution the drug molecule adopts the same conformation as in the protein. By comparing the conformations of the FXR inhibitors in an aqueous environment to the conformation of these inhibitors in the enzyme substrate complex the probable effectiveness of the inhibitor as a drug can be determined. Different force fields were used to evaluate the distribution of conformations of two FXR inhibitors in an aqueous environment in order to analyze their reorganization penalty. The results from the different force fields were compared and it was determined that there was similarities and also significant differences between the conformations the different force fields produced and the reorganization penalties associated with them.

                                                                                            Figure 3. FXR inhibitor in Water Box

Research Poster that was presented at Brooklyn College Science Day on 05/03/19 can be viewed through the following link 

Heating and Cooling Strategy to Improve Estimates of CYFIP1p-Derived Peptides’ Helicity Probabilities in Biased Molecular Dynamics Simulations

posted Feb 24, 2019, 2:31 PM by Irene R   [ updated Feb 26, 2019, 6:32 PM ]

A CYFIP1p-derived peptide in VMD: bond lengths of 2.5 Å (units used by VMD) or less

for the middle three bonds would indicate a helical structure.

By Irene Rostovsky

Since the binding region of CYFIP1p can be used as a model for a precursor molecule for the development of a drug to treat Fragile X syndrome and cancer, we wanted to calculate the probability of finding the peptide corresponding to this binding region, and another peptide differing by one residue, in a helical conformation. Due to the difficulty of attaining convergence of random and helical starting conformations for one of our peptides, we performed a series of simulations at 6.25 kJ/mol bias force (previously determined as the optimal force by Megan Wang), where we manipulated temperatures in a manner that is similar conceptually to replica exchange, but is simpler to perform. We alternated between 400 K and 300 K for succeeding simulations to disturb the initial states and speed up exploration of different conformations. Our results so far suggest that alternating heating and cooling is an effective way to help peptides break out of their initial conformations, and form/ fully break alpha helices at a reasonable rate over 1 ms. The next step is to perform statistical analysis on our results and to test the possible convergence we have seen through our new heating/ cooling method.

Water energy contributions to structural inhomogeneity of curcubit[8]uril

posted Jun 15, 2018, 4:54 PM by Fatlum Hajredini   [ updated Jun 17, 2018, 7:09 AM ]

Expulsion of water from cavities on host binding sites can contribute a great deal of energy to binding of ligands. This can be attributed to the gain in entropy as the waters become unstructured. Accounting for such effects in binding free energy calculations greatly improves the prediction accuracy. This effect is likely to also contribute to the structural reorganization of macromolecules. Using molecular dynamics simulation and the recently developed SSTMap suite we investigate the contribution of water thermodynamics to the tendency of curcubit[8]uril to assuming a compacted structure. Hydration site analysis shows that structuring of waters at the cavity of cb8 carries an entropic penalty which was expected to be relieved upon compaction of cb8. A cb8ligand complex had overall more favorable hydration upon assuming a more compacted structure compared to its more extended counterpart, suggesting that the compaction was due to water expulsion. To investigate whether the same phenomenon would be observed in the cb8 complex alone, simulations were carried in an explicit solvation model, and an implicit solvation model which lacks water structure information. Contrary to expectation, when simulated in the implicit solvation model cb8 is predominantly compacted, and also assumes an overall more compact structure compared to any of the conformations in the explicit water simulation. Simulations in explicit solvent show a wide distribution of states, with the extended conformation being much more populated then the counterpart in the implicit solvent simulation. Taken together, these findings suggest that structural networks of water, when treated explicitly can have a significant impact on the structural reorganization of macromolecules. 

                    State Distribution of cb8 when simulated in implicit solvent (blue), or explicit solvent (red) as determined by differences in their radii of gyration. 

1-10 of 32