Monthly Archives: October 2013

New Publication: Energetics of Multi-Ion Conduction Pathways in Potassium Ion Channels

Can we predict the conductance of a potassium ion channel from an experimental structure?

In  we examine the kinetic barriers experienced by potassium ions (and waters) as they move through the narrowest part of two different potassium ion channels. We examine the reproducibility of our results and test the sensitivity of the approach to changes in the method. We conclude that we are currently unable to accurately calculate the kinetic barriers to conduction for potassium channels and that other channels (such as sodium channels) may be more amenable to this approach.

This article is published in the and is (open access). It carries out the preparatory work necessary for on how potassium ions and water molecules move through the selectivity filter of a voltage-gated potassium ion channel.

GROMACS 4.6: Scaling of a very large coarse-grained system

So if I have a particular system I want to simulate, how many processing cores can I harness to run a single version 4.6 job? If I only use a few then the simulation will take a long time to finish, if I use too many the cores will end up waiting for communications from other cores and so the simulation will be inefficient (and also take a long time to finish). In between is a regime where the code, in this case GROMACS, scales well. Ideally, of course, you’d like linear scaling i.e. if I run on 100 cores in parallel it is 100x faster than if I ran on just one.

The rule of thumb for GROMACS 4.6 is that it scales well until there are only ~130 atoms/core. In other words, the more atoms or beads in your system, the larger the number of computing cores you can run on before the scaling performance starts to degrade.

As you might imagine there is a hierarchy of computers we can run our simulations on; this starts at humble workstations, passes through departmental, university and regional computing clusters before ending up at national (Tier 1) and international (Tier 0) high performance computers (HPC).

In our lab we applied for, and , a set of European Tier 0 supercomputers through PRACE. These are currently amongst the . We tested five supercomputers in all: (Paris, France; green lines), (Barcelona, Spain; black line), (Bologna, Italy; lilac), (Munich, Germany; blue) and (Stuttgart, Germany; red ). Each has a different architecture and inevitably some are slightly newer than others. CURIE has three different partitions, called thin, fat and hybrid. The thin nodes constitute the bulk of the system; the fat nodes have more cores per node whilst the hybrid nodes combine conventional CPUs with GPUs.

We tested a coarse-grained 54,000 lipid bilayer (2.1 million beads) on all seven different architectures and the performance is shown in the graph – note that the axes are logarithmic. Some machines did better than others; FERMI, which is an , appears not to be well-suited to our benchmark system, but then one doesn’t expect fast per-core performance on a BlueGene as that is not how they are designed. Of the others, MareNostrum was fastest for small numbers of cores, but its performance began to suffer if more than 256 cores were used. SuperMUC and the Curie thin nodes were the fastest conventional supercomputers, with the Curie thin nodes performing better at large core counts. Interestingly, the Curie hybrid GPU nodes were very fast, especially bearing in mind the CPUs on these nodes are older and slower than those in the thin nodes. One innovation introduced into GROMACS 4.6 that I haven’t discussed previously is one can now run either using purely MPI processes or a . We were somewhat surprised to find, that, in nearly all cases, the pure MPI approach remained slightly faster than the new hybrid parallelisation.

Of course, you may see very different performance using your system with GROMACS 4.6. You just have to try and see what you get! In the next post I will show some detailed results on using GROMACS on GPUs.

GROMACS 4.6

is a scientific code designed to simulate the dynamics of small boxes of stuff, that usually contain a protein, water, perhaps a lipid bilayer and a range of other molecules depending on the study. It assumes that all the atoms can be represented as points with a mass and an electrical charge and that all the bonds can modelled using simple harmonic springs. There are some other terms that describe the bending and twisting of molecules and all of these, when combined with two long range terms, which take into account the repulsion and attraction between electrical charges, allow you to calculate the force on any atom due to the positions of all the other atoms. Once you know the force, you can calculate where the atom will be a short time later (often 2 fs) but of course the positions have changed so you have to recalculate the forces. And so on.

 

Anyway, I use GROMACS a lot in my research and the most recent major version, 4.6, . In this post I’m going to briefly describe my experience with some of the improvements. First off, so much has changed that I think it would have been more accurate to call this GROMACS 5.0. For example, version 4.6 is a lot faster than version 4.5. I typically use three different benchmarks when measuring the performance; one is an all-atom simulation of a bacterial peptide transporter in a lipid bilayer  (78,033 atoms). The other two are both coarse-grained models of a lipid bilayer using the – the difference is one has 6,000 lipids (137,232 beads), the other 54,000 (2,107,010 beads). Ok, so how much faster is version 4.6? It is important here to bear in mind that GROMACS was already very fast since a lot of effort had been put into optimising the loops that the code spends most of its time running. Even so, version 4.6 is between 20-120% faster when using either of the first two benchmarks, and in some cases even faster. How? Well, it seems the developers using commands. One important consequence of this is that it is and, since you have to specify which SIMD instruction sets to use, you may need several different versions of the key binary, mdrun. For example, you may want a version compiled using AVX SIMD instruction sets for recent CPUs, but also a version compiled using an older SSE SIMD instruction set. The latter will run on newer architectures, but it will be slower. You must never run a version compiled with no SIMD instruction sets as this can be 10x slower!

The other big performance improvement is that GROMACS 4.6 now uses GPUs seamlessly. The calculations are shared between any GPUs and the CPUs and GROMACS will even shift the load to try and share it equally. , one of the GROMACS developers, gave an on this subject in April 2013. A GPU here just means a reasonable consumer graphics card, such as an NVIDIA GTX680, that has compute capability of 2.0 or higher. So, how much performance boost do we see? I typically see a boost of 2.1-2.7x for the atomistic benchmark and 1.4-2.2x for the first, smaller coarse-grained benchmark. Just for fun, you can try running a version of GROMACS compiled with no SIMD instructions with a GPU (and without a GPU) and then you can get a performance increase of 10x.

Before I finish, I was given some good advice on running GROMACS benchmarks. Firstly, make sure you use the -noconfout mdrun option since this prevents it from writing a final .gro file as this takes some time. Secondly setup a .tpr file that will run for a long (wallclock) time even on a large number of cores and then use the -resethway option in combination with a time limit, such as -maxh 0.25, as this would then reset the timers after 7.5 min and record how many steps were calculated between 7.5 and 15 minutes. From experience a bit of time spent writing some good BASH scripts to automatically setup, run and analyse the benchmarking simulations really pays off in the long run.

In future posts I’ll talk about the scaling of GROMACS 4.6 (that is where the third benchmark comes in) and also look at the GPU performance in a bit more detail.

 

 

 

 

New Publication: Detailed examination of a single conduction event in a potassium channel.

What can we learn using computational methods about how potassium ions and water molecules move through the narrowest part of a ?

In , we calculate the average force experienced by three potassium ions as they move through the selectivity filter of . This allows us to identify the most probably mechanism, which includes two “knock-on�? events, just like a Newton’s cradle. By examining the behaviour of the conducting waters and the protein in detail we can see how the waters rotate to coordinate one or other of the conducting potassium ions, and even get squeezed between two potassium ions during a knock-on event. We also see how the coordination number of each potassium ion changes.

is published in the and is (open access). There is an that is published in the .