As summarized in the post on AGBNP2 Single-Core Speed Optimization, the AGBNP2 model is made of a collection of algorithms which we have been looking to optimize by means of vectorization techniques on CPU devices.
One of the primary algorithms seeks to compute the "self-volume" of each atom, defined as the contribution of that atom to the molecular volume. A volume element exclusively inside the atom contributes fully to its self-volume, a volume element inside two atoms contributes half to each, a volume element inside three atoms contributes one third to each, etc. It turns out that an efficient computational algorithm for the self-volumes can be derived from the Poincaré formula of a set of overlapping objects:
This involves computing all the overlap volumes between two, three, four, etc. atoms. To simplify the calculations atoms are described by Gaussian densities.
The original algorithm is based on a depth-first traversal of the overlap tree (start with atom 1, compute the overlap between atoms 1 and 2, if not zero compute the overlap between 1, 2, and 3, and so on). This algorithm does not vectorize well so we switched to a breadth-first algorithm in which we first compute all of the overlaps between two atoms, then three, etc.
The speed results for the same systems as in the previous post are below.
The calculation of self-volumes is accelerated by more than a factor of 2 using the breadth-first vectorized algorithm. The overall speed is now starting to clearly lift away from the the speed at the start of the project. Now the main remaining speed bottlenecks are the short-range hydration sites and neighbor list construction. Updates to come.