# The Science Behind AToM

AToM-OpenMM computes binding free energies using an alchemical perturbation scheme. The general idea of free energy perturbation is to craft an alchemical potential energy function Uλ(x) that progressively modifies the potential energy function of the system along a progress variable λ to interpolate between an initial state with reference potential U0(x) to a final state with perturbed potential U1(x). The perturbation energy:

is sampled by molecular dynamics at various λ-states, and the resulting probability distributions are analyzed with a free energy estimator to compute the free energy difference between each λ-state and the reference state.

In the case of molecular binding, the reference state is where the ligand interacts with the solvent bulk, and in the final "perturbed" state, the ligand interacts with a receptor.

## Alchemical Transfer

In conventional alchemical approaches, potential energy functions are constructed by changing the parameters of interaction potentials. Alchemical Transfer is different. It represents the potential energy function of the bound state as the result of translating the ligand. In math notation:

Where -h is a translation vector that brings the ligand from a chosen position in the solvent bulk to the receptor binding site. During the simulation, the perturbation energy is obtained by u(x) = U0(x-h) - U0(x), that is, by computing the potential energy of the unbound state, translating the ligand to the binding site, and taking the difference of the two values. Essentially, AToM describes the alchemical perturbation in terms of a coordinate transformation.

It is relatively easy to apply a coordinate transformation during MD. After all, that is what MD codes do. They evaluate energy and derivatives given the atomic coordinates and change the coordinates of the atoms as time progresses. The AToM scheme does not require alchemical customizations of the MD energy routines. And it works with any energy function.

Translating a molecule from the solution to the binding site yields the absolute binding free energy (ABFE) of the complex. But what about relative binding free energies (RBFEs) that are so popular in drug design? RBFEs are easily implemented in Alchemical Transfer. We place one ligand in solution (call it A) and another ligand (B) in the binding site. We then construct the potential of the final state by swapping the positions of the two ligands:

That's it. In Alchemical Transfer, ABFEs and RBFEs are treated in the same way. The only thing that changes is the coordinate transformation: translation for ABFE and swapping for RBFE.

Here is a video illustrating a typical AToM RBFE setup (CHK1 kinase):

The positions of the green and gray ligands are switched at each time step to compute the binding energy. The shading illustrates the λ-state, which changes as time progresses.

## The Alchemical Perturbation Function

So far, so good. Let's now turn to the problem of formulating an alchemical potential energy function. Basically, of a λ-dependent hybrid function that goes from U0(x) to U1(x) as λ goes from 0 to 1. Here is a very general expression:

The perturbation potential function Wλ(u) can be anything you want as long as it is zero at λ=0, and it is equal to u=U1-U0 at λ=1. If these two conditions are met, the alchemical potential energy function does what we asked it to do: interpolate from U0(x) to U1(x). Here is a possible choice (linear perturbation):

And here is another (softplus perturbation):

The parameters of the softplus perturbation function (λ1, λ2, 𝛼, and u0) depend on λ. The softplus non-linear perturbation function satisfies the correct limits when λ1=λ2=λ at λ=0 and λ=1. The parameters of the perturbation function can be chosen freely anywhere in between. The softplus function is nice because it is linear for u<<u0 and u>>u0. but with different slopes λ1 and λ2, in these two regimes. This allows for more flexibility to create smoother and better-connected alchemical paths that lead to faster convergence.

See the chart below for an example of Wλ(x) with u0 = 0, λ1=0.5, and λ2=0.95. As you can see, the softplus function can apply a different linear perturbation depending on the value of the perturbation energy. This is a very valuable feature, as we have shown.

The linear perturbation is a special case of the softplus perturbation function when λ1=λ2=λ for all λ.

## The Soft-Core Perturbation Energy

Alchemical methods suffer from this thing called the endpoint catastrophe. AToM is no exception. However, AToM avoids the catastrophe using a different approach than the soft-core pair potentials of conventional alchemical methods. So let's talk about it.

First, what is the issue? In mathematical terms, the catastrophe exists because the ensemble average, <dUλ(x)/dλ>, of the derivative of the alchemical potential energy function Uλ(x) with respect to λ diverges at λ=0. This is caused by the steep 1/12 power law dependence of the Lennard-Jones interatomic potential. It is a long, complicated story, which I will not discuss further. But, intuitively, at λ close to 0, the ligand interacts mainly with the solvent bulk and very weakly with the receptor. Hence, the atoms of the receptor and water molecules are free to occupy the binding site volume where the ligand would normally sit. Then, because of atomic overlaps, very large perturbation energies are produced when the ligand is translated from the solvent bulk to the binding site. These are, in fact, so large (we are talking numbers like 1012 kcal/mol) that averaging them to extract free energies becomes almost impossible. A similar issue occurs near the other end-point at λ close to 1; for similar reasons.

One way to address this problem is to employ softer pair potentials than Lennard-Jones'. Atomic overlaps still occur, but, with appropriate choices of soft-core pair potentials, the divergences can be eliminated.

The big problem with soft-core pair potentials is that they require extensive customization of MD energy routines, which are often incompatible with multi-body potentials such as Ewald's long-range electrostatic summation and polarizable potentials. Moreover, maintaining and porting to various computing architectures different energy routines that track which atoms interact with standard pair potentials and which ones interact with soft-core pair potentials is, frankly, a nightmare and a source of buggy implementations.

Worse, because the form of soft-core pair potentials depends on λ, the perturbation energy becomes λ-dependent:

causing a host of theoretical and practical headaches. For example, to evaluate the perturbation energy of a sample collected at one λ-state at a range of λ-states--a necessary step to estimate free energies--one has to save not only the perturbation energy but also the whole set of coordinates of the system and to submit them back to the MD engine to compute the new perturbation energies.

There's got to be a better way.

Instead of eliminating the large perturbation energies at the endpoints by softening pair potentials, we can soften the perturbation energy itself. This works by imposing an upper bound to the perturbation energy with a continuous mapping function usc(u). The idea is that this mapping function reaches a maximum value umax at u=+∞ and does not modify u below a certain cutoff ucore. After some tinkering, we found a functional form that works particularly well. See the graph below for an example:

In this example, the soft-core map kicks in at u=ucore=10 (arbitrary units) and never exceeds umax=30. It does not modify moderate perturbation energies below 10. As we will see below, this makes it possible to compute accurate binding free energies by simply replacing the raw perturbation energy u wherever it appears with its soft-core counterpart usc. There are no other changes that are needed. Intuitively this is the case because samples of u and usc contain the same information. There is a 1 to 1 invertible relationship between the two. It's just that distributions of u at large energies are compressed below umax.

## Two Legs in Explicit Solvent

Alchemical Transfer applies equally well with implicit solvent and explicit solvent. However, with explicit solvent, there is a slight complication that requires the alchemical transformation to be split into two legs. This split is still in a single simulation box and occurs transparently for the user. But it is good to be aware of it. Recognizing and solving the problem was the critical breakthrough that made AToM possible.

Okay, it goes like this. Consider, for example, the linear alchemical potential function Uλ(x)=U0(x)+λu(x). When λ=0, the system sees only the reference potential, the one corresponding to the ligand in solution. No problem there. However, as λ increases, the alchemical potential looks more and more like U1(x), which corresponds to the ligand in the binding site. For λ close to 1, the ligand no longer interacts significantly with solvent bulk, and the solvent molecules are free to occupy the space previously occupied by the ligand. But that implies that there overlaps between solvent molecules and the ligand pop up in solution, and that the reference potential U0(x) is very large and ill-defined. Molecular dynamics near λ=1 becomes impossible. The potential energy varies too quickly; the forces are gigantic, and MD crashes.

What to do? Well, our solution is to start from 0 and stop at some large enough value of λ where ligand-solvent atomic overlaps do not occur. A useful stopping point is λ=1/2, the mid-point of the binding transformation. We call this state the alchemical intermediate. It is a strange state. Think about it, according to Uλ(x)=U0(x)+λu(x), with u(x) = U1(x) - U0(x), at λ=1/2 the potential energy function is U1/2(x) = (1/2) U0(x) + (1/2) U1(x). It describes a ligand that interacts 50% with the solvent bulk and 50% with the receptor. It's in both places at the same time. Equally so. Odd. Nevertheless, we would not expect to observe severe atomic overlaps in either place. After all, halving the interaction between atoms is a bit like doubling the temperature. In going from 300K to 600K, atoms do not just start penetrating each other's volumes. Estimating the free energy difference between λ=0 and λ=1/2 is not a problem.

Okay, but how do we go from λ=1/2 and λ=1 to complete the binding transformation?

We don't! We go back from the final bound state to the alchemical intermediate, this time using U1(x) as the reference potential. That is, in this second leg, we use the alchemical potential energy function Uλ(x)=U1(x)+λ[-u(x)], again from λ=0 and λ=1/2, where the minus sign take into account the fact that, relative to the bound state, the perturbation energy is U0(x) - U1(x) = -u(x).

Finally, if ΔG1 and ΔG2 are the free energy differences for legs 1 and 2 between the alchemical intermediate and the two endpoints, the binding free energy is their difference: ΔGb = ΔG1 - ΔG2. A diagram is helpful:

AToM distinguishes between the two legs using a direction alchemical parameter. direction=1 is from the unbound state to the bound state (leg 1) and direction=-1 (leg 2) is from the unbound state to the bound state. However, these definitions are not absolute and are reversed if one wishes to think in terms of unbinding instead of binding.

We are now ready to clear up the last piece of the puzzle: why would replacing the perturbation energy u with its soft-core counterpart usc not hurt the accuracy of the binding free energy estimate? It is because the soft-core modification does not affect the potential energies of the endpoints and the alchemical intermediate. First, at the endpoints, the potential energy is U0(x) or U1(x), corresponding to the unbound and bound physical states. These do not contain the perturbation energy function u(x) at all. So it really does not matter if u(x) is replaced with the soft-core version usc(x).

Next, consider the alchemical intermediate at λ=1/2. Here we have to make sure that usc(x) = u(x) = U1(x)-U0(x); otherwise, the alchemical potential energy would not have the nice symmetric form (1/2) U0(x) + (1/2) U1(x). But this can always be guaranteed by picking a large enough value for the ucore soft-core parameter. Remember: usc = u = for u<ucore. The perturbation energy measures the change in energy for transferring the ligand from the solution to the binding site. At the alchemical intermediate, the ligand interacts with 50% intensity in both places. So the perturbation energy cannot be ever extremely large. If it did, the ligand would be a very lousy binder, and it would not be of interest anyway. In practice, in our experience, a value of ucore of 50 kcal/mol has sufficed with no exception. If you feel anxious, set ucore to 100 kcal/mol and stop fretting about it. We validate that things are okay by checking that the distribution of perturbation energies collected at the alchemical intermediate never exceeds ucore. If that is the case, the free energy of the alchemical intermediate with the soft-core perturbation energy is virtually indistinguishable from the one with the unmodified perturbation energy. Hence, the accuracy of the binding free energy is not affected.

## Relative Binding Free Energies

To keep things simple, the discussion above has mainly focused on absolute binding. However, everything we mentioned applies to relative binding as well. In this case, the endpoints correspond to one ligand of the pair in solution and the other in the binding site, or viceversa. And the alchemical intermediate corresponds to the weird and nonphysical state in which both ligands interact at 50% strength with the solution and the receptor.

In a sense, at the alchemical intermediate, both ligands are present at 50% intensity in both places. But they do not interact with each other. This is because the perturbation energy is calculated as the potential energy change before and after switching the ligands' positions. So the ligands are always far from each other when either potential energy is calculated.

Yeah, it's weird. But no matter; it's just an alchemical intermediate state that, in principle, does not affect the binding free energy. Don't look at it if it makes you dizzy.

Otherwise, here is an animation, of the same CHK1 system as above but focused on the binding site:

The two ligands are present in the binding site with varying intensities regulated by the λ-value. The atoms of the receptor interact with some fractional intensity with the atoms of one ligand and with complementary intensity with the atoms of the other. When a ligand interacts weakly with the receptor, it is free to visit alternative orientations.

## Conclusions, so Far

That's it. If you made it this far, you have hopefully gained an understanding of the foundations of AToM's binding free energy estimation method and how it achieves the same result as much more complicated alchemical implementations. Within reason, AToM yields absolute and relative binding free energy estimates with one simulation box, with standard chemical topologies, without atom mapping, for standard and scaffold-hopping transformations, without splitting the transformation into van der Waals and electrostatic interactions, without double-decoupling from the vacuum state, without soft-core pair potentials, with, in principle, any potential energy function, and without any modification of the energy routines of the MD engine.