System Setup for AToM Calculations
Since it does not require alchemical topologies, preparing protein-ligand systems for AToM is essentially the same as for standard molecular dynamics. Granted, preparing protein-ligand complexes for molecular dynamics is not always straightforward. AToM does not make the job any easier or significantly more difficult. This section will not teach you how to properly prepare a molecular system. We will focus on some of the key aspects that are specific to AToM:
The binding site definition
The ligand displacement
The ligand alignment restraints and other restraints
There are aspects specific to absolute binding free energy estimation (ABFE) and aspects specific to relative binding free energy estimation (RBFE). We will assume ABFE to start and go from there.
The Binding Site Definition
Like any other calculation, a binding free energy calculation starts by defining the object of the calculation. In alchemical binding free energy calculations, this is the complex between a ligand and a receptor. More specifically, we need to specify the set of configurations of the protein-ligand system that we call a "complex." In AToM, this is commonly accomplished by restraining the ligand within a binding site region. Doing so prevents the ligand from wandering away from the receptor when it is not coupled to it. This step also has the more profound significance of defining the standard state of the binding equilibrium. The results of absolute binding free energy (ABFE) calculations are meaningless unless the binding site region is properly defined.
While it is still required in theory, the definition of the binding site is less of a concern for most relative binding free energy (RBFE) calculations. If the ligands in the pair are strong binders, they are unlikely to leave the binding site even if they are not restrained to it. In these cases, it is relatively safe to skip the definition of the binding site. (If you do not add binding site restraints and see one of the ligands leave the binding site, it is safe to assume that the ligand in question is a non-binder.) If specified, AToM's RBFE protocol imposes binding site restraining potentials to the first ligand in the binding site and to the second ligand in the solvent. The restraining potential in the solvent is the same as the one for the binding site but translated by the displacement vector h (see below).
There are many ways to define the binding site volume. With AToM, it is common to define the binding site region as a spherical region centered on the centroid of a set of receptor atoms, typically a set of Cα atoms. A flat-bottom harmonic restraining potential keeps the centroid of a set of ligand atoms within this region. See the figure below. The centroid of the ligand could be defined in terms of all of the atoms of the ligand, but, especially for large ligands, it could make more sense to consider the centroid of only the core of the ligand, or even only one reference atom of the ligand.
The relevant keywords in the AToM control file for the binding site definition are RCPT_CM_ATOMS, LIGAND_CM_ATOMS (ABFE), LIGAND1_CM_ATOMS/LIGAND2_CM_ATOMS (RBFE), CM_KF, CM_TOL.
AToM also supports binding site definitions that, in addition to the position, limit the ligand's orientation relative to the receptor using Boresch-style restraints. Doing so is significantly more complex and will be described in later versions of this document.
Example of the definition of the binding site of the Estrogen Receptor α. The center of the binding site (blue ball) is the center of mass of the chosen Cα atoms (gray balls). The reference atom of the ligand (gree ball) is restrained within the yellow circle.
The Ligand Displacement Vector
AToM is based on the Alchemical Transfer concept; the idea to formulate the alchemical binding process as the translation of the coordinates of the ligand from the solvent bulk to the binding site (or the other way around). The translation is accomplished by means of a constant displacement vector h.
The Alchemical transfer setup. The displacement vector h transfers the ligand from the receptor binding site to the solvent. The opposite displacement -h brings the ligand from the solvent to the binding site.
It is often more natural to prepare the system with the ligand bound to the receptor. For example, from a crystal structure or a docking run. In this case, the displacement vector is chosen so as to bring the ligand from the binding site to a position in the solvent far away from the receptor. See the ABFE tutorials for simple examples. Some illustrate the use of the Ambertools package to solvate the complex in a large enough solvent box to accommodate the translation of the ligand in the solution.
But where should the ligand go? How far from the receptor? Well, it is really up to you. In principle, it does not matter where the ligand is placed in the solvent as long as it is sufficiently far away from the receptor that you are comfortable calling the resulting configuration "unbound." Three layers of water molecules, or ~10 Å, is probably the minimum distance between any two pairs of ligand/receptor atoms to ensure a reasonable degree of decorrelation between the ligand and the receptor.
When choosing a location in the solvent, consider that the ligand and receptor atoms can move around and can make contact during the molecular dynamics run even if they do not in the starting configuration. The ligand in the solvent is a translated "image" of the ligand in the binding site. As the ligand moves around in the binding site so does its image in the solvent. When it is decoupled from the receptor, the ligand freely visits the spherical binding site region. Hence, its image in the solvent will span a similar spherical region.
It is tempting to fall on the side of caution and adopt a translation vector that places the ligand far, far away from the receptor. However, that comes at a price in computational cost. The further away the ligand from the receptor, the larger the solvent box needs to be to accommodate the complex together with the image of the ligand in the solvent. The choice of the translation vector is, as in many personal and professional matters, a trade-off between righteousness and feasibility.
In Relative Binding Free Energy Calculations (RBFE), the first ligand of the pair (ligand A) is initially placed into the binding site and the second ligand (ligand B) is placed in the solvent at a distance h from the first. Hence, the target final state in which the ligands' positions are switched, the one where ligand B is bound in the receptor site and ligand A is in the solvent, is obtained by translating ligand A out of the receptor binding site by means of the displacement h while, simultaneously, ligand B is translated into the receptor binding site by the opposite displacement -h.
The nature of the coordinate transformation (a translation of one ligand vs. the translation of two ligands in opposite directions) is the only fundamental difference between ATM's ABFE and RBFE protocols.
See the CDK2 RBFE tutorial for an example of how to set up a protein-ligand system for RBFE calculations. Starting with a structure in which the two ligands of the pair are aligned into the receptor binding site, there we use Ambertools to displace the second ligand in the pair into the solvent by the displacement vector h, followed by the solvation of the resulting molecular system.
The displacement vector is specified by the DISPLACEMENT control file keyword.
The Ligand Alignment Restraints (RBFE only)
At the initial state of RBFE calculations, when ligand A is bound to the receptor and ligand B is in the solvent, the position and orientation of ligand A are naturally limited by interactions with receptor atoms. However, ligand B in solution is limited only by the binding site restraints (if present) and can otherwise move and rotate freely. The wandering in the solvent of the unbound ligand can seriously slow down the rate of mixing of alchemical states and the convergence of the relative binding free energy. This is because it takes time for ligand B to randomly assume a position and orientation compatible with binding when translated into the receptor binding site.
If placed sufficiently far away from the receptor, any position and orientation of the unbound ligand in solution are equivalent. Hence, intuitively, it should not come as a surprise (as we have shown) that the position and orientation of the two ligands can be kept in sync to speed up convergence without affecting the relative binding free energy.
AToM employs a specific form of ligand alignment restraining potentials based on ligand coordinate frames defined by 3 reference atoms of each ligand:
The first reference atoms of each ligand (named 1A and 1B in the figure above) define the origin of the respective reference frames. They essentially define the "center", the position in space of each molecule. The alignment restraints try to keep these two atoms near each other when ligand B is translated into the binding site.
The second reference atoms (named 2A and 2B in the figure above), define the orientation of each ligand. Specifically, the axis that goes from the first reference atom to the second defines the z-axis of the internal coordinate frame of each ligand. AToM's alignment restraints try to keep these two axes pointing in the same direction.
Finally, the third reference atoms (named 3A and 3B in the figure above), define the rotation of each ligand around their internal z-axes. (This corresponds to the "roll" of a plane if the z-axis is the direction of flight.) This reference atom, together with the other reference atoms, defines the xz plane of the internal coordinate frames of the ligands. AToM's alignment restraints try to keep these two planes aligned.
To give you an idea, here is a video of the dynamics of two mutually aligned ligands :
Let's now turn to the issue of how to best choose the ligand reference atoms.
Actually, the most straightforward, but not necessarily the most efficient, choice is to decline to make a choice. As discussed above, the alignment restraints are, in principle, optional. Given enough time, the AToM simulation would eventually yield the correct relative binding free energy. If the ligands are small, or you have a lot of patience and spare electricity to burn, skip the rest of this section. Otherwise, read on.
Start by looking at the two ligands aligned in the receptor binding site. (You must have docked/aligned the ligands in the receptor binding site somehow. If not, do it now. It is a pre-requisite for any RBFE calculation.) Find a region of commonality between the two ligands in which you can identify corresponding atoms when aligned as in the figure above.
The region of commonality is rigid (such as an aromatic ring) and near the center of each ligand. This is what we would call a "rigid common core." Compared to other free energy packages, AToM tends to be rather forgiving about the degree of similarity between the two rigid common cores. Perhaps the common core of one ligand is made of two fused 6-membered rings and the other of fused 5- and 6-membered rings? No problem. Maybe only one ring is aligned? And it happens to be an alkyl ring for one ligand and an aromatic for the other, like in the video above? No problem ... okay more challenging, but it is probably still doable.
Pick the first reference atoms as those near the center of the rigid common core. Remember this selection sets the origins of the coordinate frames, that the centers of the two ligands. They should be near each other when the ligands are aligned.
Next, find an atom of one ligand as far as possible away from the first reference atom, but still within the common rigid core, such that the axis connecting it to the first reference atom can serve as a proxy for the orientation in space of the rigid common core. Now identify a similar atom of the second ligand approximately along the axis of the first ligand. The corresponding reference atoms of the two ligands do not have to be necessarily on top of each other. However, the axes of the two ligands should be approximately colinear. AToM will try to keep the two axes pointing in the same direction.
It is important that the reference atoms be within the rigid common core. If, for example, the second reference atom of a ligand belongs to a flexible region of the ligand, the orientation axis could move around not because the overall orientation of the ligand changes but because the conformation of the flexible region changes.
Finally, to pick the third pair of reference atoms, imagine rolling the ligands around their orientation axis. Select two corresponding atoms that would keep the roll of the common rigid cores of the two ligands in sync if the sets of reference atoms stayed in the same plane. Again, the atoms that define the plane do not have to necessarily be on top of each other. It is sufficient that they are approximately in the same plane.
The two ligands do not share a common rigid core. In this case, general recommendations are hard to come by. Maybe there is a common rigid core but is very small. AToM's ligand alignment restraints can work reasonably well even when the rigid core is made of only three (non-colinear) atoms. When even that does not make sense, you might want to consider partial alignment restraints that keep only the position and orientation, or only the position of the two ligands in sync. In the most extreme cases, do not apply alignment restraints at all.
If you are confronting a case in which there is no reasonable way to keep the two ligands in alignment, ask yourself if the two ligands are just too different to be modeled reliably by the RBFE approach. Consider the option to alchemically connect the two ligands using a series of intermediate molecules that can be modeled more easily. Maybe the two ligands are so unrelated to warrant two separate ABFE calculations.
We formulated AToM to make it as generally applicable as possible. Miracles, however, are not within its scope!
Ligand alignment restraints are set by the ALIGN_LIGAND1_REF_ATOMS, ALIGN_LIGAND2_REF_ATOMS, ALIGN_KF_SEP, ALIGN_K_THETA, ALIGN_K_PSI keywords of the AToM control file.
In our protein-ligand binding free energy calculations, we like to impose weak positional restraints on the Cα atoms of the structurally-rigid portion of the receptor. Receptor positional restraints are optional. They are employed to limit the range of receptor conformational space that needs to be explored to reach convergence of the binding free energy estimate. Without positional restraints, AToM is asked, in principle, to visit all possible conformational states of the protein including, for example, unfolded states, which is, of course, not feasible.
Performing a protein-ligand binding free energy calculation without receptor restraints implicitly demands the thorough exploration of all possible conformational states and often results in a slow drift of binding free energy estimates. Positional receptor restraints allow the determination of a binding free energy estimate conditional on a specific conformational state of the receptor.
It is better to ask a narrow (albeit possibly irrelevant) question for which there is an answer than to ask an open-ended question for which there is no definite answer!
Positional restraints are set by the POS_RESTRAINED_ATOMS, POSRE_FORCE_CONSTANT, and POSRE_TOLERANCE keywords of the AToM control file.
Summary of Restraints in AToM-OpenMM
By now you could be a little confused and a bit overwhelmed by the zoo of different restraints used in AToM. Here is a summary of restraints why there are used:
Binding site restraints: define the binding site volume as required by the statistical mechanics theory of binding and cannot be avoided. In RBFE calculations they can be neglected at one's peril.
Ligand alignment restraints: keep the two ligands in approximate alignment so that the one currently in the solution phase does not wander away and slow down convergence. Ligand alignment restraints are used to speed up the convergence of binding free energy estimates and, in principle, are optional.
Receptor positional restraints: they define the conformational state of the receptor. Optional, but highly recommended for macromolecular receptors.