Monthly Archives: January 2016

New Publication: Predicting affinities for peptide transporters

PepT1 is a nutrient transporter found in the cells that line your small intestine. It is not only responsible for the uptake of di- and tai-peptides, and therefore much of your dietary proteins, but also the uptake of most β-lactam antibiotics. This serendipity ensures that we can take (many of) these important drugs orally.

Our ultimate goal is to develop the capability to predict modifications to drug scaffolds that will improve or enable their uptake by PepT1, thereby improving their oral bioavailability.

, just published online in the new journal Cell Chemical Biology (and free to download, thanks to ), we show that it is possible to predict how well a series of di- and tai-peptides bind to a bacterial homologue of PepT1 using a hierarchical approach that combines an end-point free energy method with thermodynamic integration. Since there is no structure of PepT1, we then tried our method on a homology model we have published in 2023. We found that method lost its predictive power. By studying a range of homology models of intermediate quality, we showed that it is highly likely an experimental structure of hPepT1 will be required for in silico accurate predictions of transport.

This is the second paper that Firdaus Samsudin has published as part of his DPhil here in Oxford.

GROMACS on AWS: compiling against CUDA

If you want to compile to run on a , please first read these instructions on how to compile GROMACS on an AMI without CUDA. These instructions then explain how to install the and compile GROMACS against it.

The first few steps are loosely based on , except rather than download the NVIDIA driver, we shall download the CUDA toolkit since this includes an NVIDIA driver. First we need to make sure the kernel is updated

sudo yum install kernel-devel-`uname -r`
sudo reboot

Safest to do a quick reboot here. Assuming you are in your HOME directory, move into your packages folder.

cd packages/

And download the (version 7.5 at present)

wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda_7.5.18_linux.run
sudo /bin/bash cuda_7.5.18_linux.run

It will ask you to accept the license and then asks you a series of questions. I answer Yes to everything except installing the CUDA samples. Now add the following to the end of your ~/.bash_profile using a text editor

export PATH; PATH="/usr/local/cuda-7.5/bin:$PATH"
export LD_LIBRARY_PATH; LD_LIBRARY_PATH="/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH"

Now we can build GROMACS against the CUDA toolkit. I’m assuming you’ve already downloaded a version of GROMACS and probably installed a non-CUDA version of GROMACS (so you’ll already have one build directory). Let’s make another build directory. You can call it what you want, but some kind of consistent naming can be helpful. The -j 4 flag assumes you have four cores to compile on – this will depend on the EC2 instance you have deployed. Obviously the more cores, the faster, but GROMACS only takes minutes, not hours.

mkdir build-gcc48-cuda75
cd build-gcc48-cuda75
cmake .. -DGMX_BUILD_OWN_FFTW=ON -DCMAKE_INSTALL_PREFIX=/usr/local/gromacs/5.0.7-cuda/  -DGMX_GPU=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda
make -j 4
sudo make install

To load all the GROMACS tools into your $PATH, run this command and you are done!

source /usr/local/gromacs/5.0.7-cuda/bin/GMXRC

If you run this mdrun binary on a it should automatically detect the GPU and run on it, assuming your MDP file options support this. If it does you will see this zip by in the log file as GROMACS starts up

1 GPU detected:
  #0: NVIDIA GRID K520, compute cap.: 3.0, ECC:  no, stat: compatible

1 GPU auto-selected for this run.
Mapping of GPU to the 1 PP rank in this node: #0

Will do PME sum in reciprocal space for electrostatic interactions.

Depending on the size and forcefield you are using you should get a speedup of at least a factor two, and realistically three, using a GPU in combination with the CPUs. For example, see these benchmarks.

GROMACS on AWS: compiling GCC

These are some quick instructions on how to build a more recent version of than is provided by the devel-tools package on the Cent OS based AMI. (currently GCC 4.8.3) You may, for example, wish to use a more recent version to compile – that is my interest. If so, then these instructions assume you have done all the steps up to, but not including, compiling GROMACS in this post. Compiling GCC needs several GB of disk space so if you use the default 8GB for an EC2 AMI it will run out of disk space; increasing this to 12 GB is sufficient.

First let’s find out what versions of GCC are available.

[ec2-user@ip-172-30-0-42 ~]$ svn ls svn://gcc.gnu.org/svn/gcc/tags | grep gcc | grep release
...
gcc_4_8_3_release/
gcc_4_8_4_release/
gcc_4_8_5_release/
gcc_4_9_0_release/
gcc_4_9_1_release/
gcc_4_9_2_release/
gcc_4_9_3_release/
gcc_5_1_0_release/
gcc_5_2_0_release/
gcc_5_3_0_release/

As you can see when I wrote this 5.3.0 was the most recent stable version, so let’s try that one. I’m going to compile everything inside a folder called packages/ so let’s create that then use subversion to check out version 5.3.0 (this is going to download a lot of files so will take a minute or two)

[ec2-user@ip-172-30-0-42 ~]$ mkdir ~/packages
[ec2-user@ip-172-30-0-42 ~]$ cd ~/packages
[ec2-user@ip-172-30-0-42 packages]$ svn co svn://gcc.gnu.org/svn/gcc/tags/gcc_5_3_0_release/
A    gcc_5_3_0_release/config-ml.in
A    gcc_5_3_0_release/libitm
...
A    gcc_5_3_0_release/fixincludes/fixopts.c
A    gcc_5_3_0_release/install-sh
A    gcc_5_3_0_release/ylwrap
 U   gcc_5_3_0_release
Checked out revision 232268.
[ec2-user@ip-172-30-0-42 packages]$ cd gcc_5_3_0_release/

GCC needs some prerequisites which are installed by this script.

[ec2-user@ip-172-30-0-42 gcc_5_3_0_release]$ ./contrib/download_prerequisites 
--2016-01-12 13:24:23--  ftp://gcc.gnu.org/pub/gcc/infrastructure/mpfr-2.4.2.tar.bz2
       => ‘mpfr-2.4.2.tar.bz2’
Resolving gcc.gnu.org (gcc.gnu.org)... 209.132.180.131
...
isl-0.14.tar.bz2    100%[=====================>]   1.33M   693KB/s   in 2.0s   

2016-01-12 13:24:39 (693 KB/s) - ‘isl-0.14.tar.bz2’ saved [1399896]

Go up a level, make a build directory and move there.

[ec2-user@ip-172-30-0-42 gcc_5_3_0_release]$ cd ..
[ec2-user@ip-172-30-0-42 packages]$ mkdir gcc_5_3_0_release_build/
[ec2-user@ip-172-30-0-42 packages]$ cd gcc_5_3_0_release_build/

Now we are in a position to compile GCC 5.3.0. This took about 50 min using all eight cores of a c3.2xlarge instance, so this is a good moment to go and have lunch. Note that since the instance I am compiling on has 8 virtual CPUs, I can use the -j 8 flag to tell make to use up to 8 threads during compilation which will speed things up. If you are using a micro instance, just omit the
-j 8 (but good luck as that would take a long time).

[ec2-user@ip-172-30-0-42 gcc_5_3_0_release_build]$ ../gcc_5_3_0_release/configure && make -j 8 && sudo make install && echo "success" && date
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking target system type... x86_64-unknown-linux-gnu
checking for a BSD-compatible install... /usr/bin/install -c
checking whether ln works... yes
...

Hopefully you now have a newer version of GCC to compile binaries with. With any luck it might even give you a performance boost.

GROMACS on AWS: Performance and Cost

So we have created an (AMI) with installed. In this post I will examine the sort of single core performance you can expect and much this is likely to cost compared to other compute options you might have.

Benchmark

To test the different types of instances you can deploy our GROMACS image on, we need a benchmark system to test. For this I’ve chosen a peptide MFS transporter in a simple POPC lipid bilayer solvated by water. This is very similar to the simulations found in . Or to put it another way: 78,000 atoms in a cube, most of which are water, some belong to lipids and the rest, protein. It is fully atomistic and is described using the CHARMM27 forcefield.

Computing Resources Tested

I tried to use a range of compute resources to provide a good comparison for AWS. First, and most obviously, I used my workstation on my desk, which is a which has 12 Intel Xeon cores. In our department we also have a small compute cluster, each node of which has 16 cores. Some of these nodes also have a K20 GPU. Then I also have access to run by the University. Unfortunately, since the division I am in has decided not to contribute to its running, I have to pay for any significant usage.

Rather than test all the available on EC2, I tested an example from each of the current (m4) and older generation (m3) of non-burstable general purpose instances. I also tested an example from the latest generation of compute instances (c4) and finally the smaller instance from the GPU instances (g2).

Performance

fig-aws-gromacs-performance

The performance, in nanoseconds per day for a single compute core, is shown on the left (bigger is better).

One worry about AWS EC2 is that for a highly-optimised compute code, like GROMACS, performance might suffer due to the layers of virtualisation, but, as you can see, even the current generation of general purpose instances is as fast as my MacPro workstation. The fastest machine, perhaps unsurprisingly, is the new University compute cluster. On AWS, the compute c2 class is faster than the current general purpose m4 class, which in turn is faster than the older generation general purpose m3 class. Finally, as you might expect, using a GPU boosts performance by slightly more than 3x.

 

Cost

fig-aws-gromacs-costI’m going to do a “real�? comparison. So if I buy a compute cluster and keep it in the department I only have to pay the purchase cost but none of the running costs. So I’m assuming the workstation is £2,500 and a single 16-core node is £4,000 and both of these have a five year lifetime. Alternatively I can use the university’s high performance computing clusters at 2p per core hour. This obviously is unfair on the university facility as this does include operational costs, like electricity, staff etc, and you can see that reflected in the difference in costs.

So AWS EC2 more or less expensive? This hinges on whether you use it in the standard “on demand�? manner or instead get access through bidding via the market. The later is significantly cheaper but you only have access whilst your bid price is above the current “spot price�? and so you can’t guarantee access and your simulations have to be able to cope with restarts. Since the spot price varies with time, I took the average of two prices at different times on Wed 13 Jan 2016.

As you can see AWS is more expensive per core hour if you use it “on demand�?, but is cheaper than the university facility if you are willing to surf the market. Really, though we should be considering the cost efficiency i.e. the cost per nanosecond as this also takes into account the performance.

 

Cost efficiency

fig-aws-gromacs-efficiency

 

When we do this an interesting picture emerges: using AWS EC2 via bidding on the market is cheaper than using the university facility and can be as cheap as buying your own hardware even if you don’t have to pay the running costs. Furthermore, as you’d expect, using a GPU decreases cost and so should be a no-brainer for GROMACS.

Of course, this assumes lots of people don’t start using the EC2 market, thereby driving the spot price up…

GROMACS on AWS

In this post I’m going to show how I created an Amazon Machine Instance with GROMACS 5.0.7 installed for use in the Amazon Web Services cloud.

I’m going to assume that you have signed up for Amazon Web Services (AWS), created an Identity and Access Management (IAM) user (each AWS account can have multiple IAM users), created an SSH key pair for that user, downloaded it, given it an appropriate name with the correct permissions and placed it in. ~/.ssh. Amazon have a that cover the above actions. One thing that confused me is if you already have an or account then you can use this to signup to AWS. In other words, depending on your mood, you can order a book or 10,000 CPU hours of compute. I felt a bit nervous about setting up an account backed by my credit card – if you also feel nervous, then Amazon offer a which permits you at present to use up to 750 hours a month, as long as you only use the smallest virtual machine instance (t2.micro). If you use more than this, or use a more powerful instance then you will be billed.

First, log in to your AWS console. This will have a strange URL like

https://123456789012.signin.aws.amazon.com/console

where 123456789012 is your AWS account number. You should get something that looks like this.

AWS Management Console

AWS Management Console

Next we need to create an EC2 (ElastiCloud) instance based on one of the standard virtual machine images and download and compile GROMACS on it. In the AWS Management Console, choosing “EC2�? in the top left should bring you here

AWS EC2 dashboard

AWS EC2 dashboard

Now click the Blue “Launch Instance�? button.

Step 1. Choose an Amazon Machine Instance (AMI).

Here we can choose one of the standard virtual machine images to compile GROMACS on. Let’s keep it simple and use the standard Amazon Linux AMI.

aws-ec2-1

Step 2. Choose an Instance Type.

The important thing to remember here is that the image we create can be run on any instance type. So if we want to compile on multiple cores to speed things up we can choose an instance with say 8 vCPUs, or if we don’t want to be billed and are willing to wait that we can choose the t2.micro instance. Let’s choose an c4.2xlarge instance which has 8 vCPUs. You can at this stage hit “Review and Launch�? but it is worth checking the amount of storage allocated to the instance. So hit Next:Configure Instance Details. I’m not going to fiddle with these options. Hit Next:Add Storage.

aws-ec2-2

Step 4. Add storage.

What I have found here is if you use the version of gcc installed via yum (4.8.3) then 8 GB is fine, but if you want to compile a more recent version you will need at least 12 GB.
I’m going to accept the rest of the defaults for the rest of the steps so will click “Review and Launch�? now.

aws-ec2-4

Step 7. Review instance Launch.

Check it all looks ok and hit “Launch�?. This will bring up a window. Here it is crucial that you choose the name of the keypair you created and downloaded. As you need a different key pair for each IAM user for each Amazon Region, it is worth naming them carefully as you will otherwise rapidly get very confused. Also Amazon don’t let you download a key pair again so you have to be careful with them. You can see mine is called

PhilFowler-key-pair-euwest.pem

Which contains the name of my IAM user and the name of the AWS region it will work for, here EU West, which is Ireland. Hit Launch.

aws-ec2-7b

Launch Status

This window gives you some links on how to connect to the AWS instance. Hit View Instances�?�?. It may take a minute or two for your instance to be created. During this time the status is given as “Initializing�?. When it is finished, you can click on your new instance (you should have only one) and it will give you a whole host of information. We need the public IP address and the name of our SSH key pair so we can ssh to the instance (Note that the user by default is called ec2-user).

aws-instances-2

lambda 508 $ ssh -i "PhilFowler-key-pair-euwest.pem" [email protected]
The authenticity of host '54.229.73.128 (54.229.73.128)' can't be established.
ECDSA key fingerprint is SHA256:N+B3toLxLE3vRuuzLZWF44N9qb3ucUVVU/RD00W3iNo.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '54.229.73.128' (ECDSA) to the list of known hosts.

__| __|_ )
_| ( / Amazon Linux AMI
___|\___|___|

https://aws.amazon.com/amazon-linux-ami/2023.09-release-notes/
11 package(s) needed for security, out of 27 available
Run "sudo yum update" to apply all updates.
[ec2-user@ip-172-30-0-42 ~]$

Installing pre-requisites

Amazon Linux is based on CentOS so uses the yum package manager. You might be more familiar with apt-get if you use Ubuntu but the principles are similar. Worth following their recommendation and applying all the updates – this will spew out a lot of information to the terminal and asks you to confirm.

[ec2-user@ip-172-30-0-42 ~]$ sudo yum update
Loaded plugins: priorities, update-motd, upgrade-helper
Resolving Dependencies
--> Running transaction check
---> Package aws-cli.noarch 0:1.9.1-1.29.amzn1 will be updated
---> Package aws-cli.noarch 0:1.9.11-1.30.amzn1 will be an update
---> Package binutils.x86_64 0:2.23.52.0.1-30.64.amzn1 will be updated
---> Package binutils.x86_64 0:2.23.52.0.1-55.65.amzn1 will be an update
---> Package ec2-net-utils.noarch 0:0.4-1.23.amzn1 will be updated
...
sudo.x86_64 0:1.8.6p3-20.21.amzn1
vim-common.x86_64 2:7.4.944-1.35.amzn1
vim-enhanced.x86_64 2:7.4.944-1.35.amzn1
vim-filesystem.x86_64 2:7.4.944-1.35.amzn1
vim-minimal.x86_64 2:7.4.944-1.35.amzn1

Complete!

This instance is fairly basic and there is no version of gcc, cmake etc. But we can install them via yum

[ec2-user@ip-172-30-0-42 ~]$ sudo yum install gcc gcc-c++ openmpi-devel mpich-devel cmake svn texinfo-tex flex zip libgcc.i686 glibc-devel.i686
...
texlive-xdvi.noarch 2:svn26689.22.85-27.21.amzn1
texlive-xdvi-bin.x86_64 2:svn26509.0-27.20130427_r30134.21.amzn1
zziplib.x86_64 0:0.13.62-1.3.amzn1

Complete!

Next we need to add some the openmpi executables to the $PATH. These will only persist for this session; to make them permanent add them to the .bashrc.

export PATH=/usr/lib64/openmpi/bin:$PATH
export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib

Now we hit a potential problem. The version of gcc installed by yum is fairly old

[ec2-user@ip-172-30-0-42 ~]$ gcc --version
gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9)
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Having said that 4.8.3 should be good enough for GROMACS. I’ll push ahead using this version, but in a subsequent post I also detail how to download and install gcc 5.3.0.

Compiling GROMACS

First, let’s get the GROMACS source code using wget. I’m going to compile version 5.0.7 since I’ve got benchmarks for this one, but you could equally install 5.1.X.

[ec2-user@ip-172-30-0-42 ~]$ mkdir ~/packages
[ec2-user@ip-172-30-0-42 ~]$ cd ~/packages
[ec2-user@ip-172-30-0-42 packages]$ wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-5.0.7.tar.gz
[ec2-user@ip-172-30-0-42 packages]$ tar zxvf gromacs-5.0.7.tar.gz
[ec2-user@ip-172-30-0-42 packages]$ cd gromacs-5.0.7

Now let’s make a build directory, move there and then issue the cmake directive

[ec2-user@ip-172-30-0-42 gromacs-5.0.7]$ mkdir build-gcc48
[ec2-user@ip-172-30-0-42 gromacs-5.0.7]$ cd build-gcc48
[ec2-user@ip-172-30-0-42 build-gcc48]$ cmake .. -DGMX_BUILD_OWN_FFTW=ON -DCMAKE_INSTALL_PREFIX='/usr/local/gromacs/5.0.7/

The compilation step will take a good few minutes on a single core machine, but as I’ve got 8 virtual CPUs to play with I can give make the “-j 8�? flag which is going to speed things up.

[ec2-user@ip-172-30-0-42 build-gcc48]$ make -j 8
...
Building CXX object src/programs/CMakeFiles/gmx.dir/gmx.cpp.o
Building CXX object src/programs/CMakeFiles/gmx.dir/legacymodules.cpp.o
Linking CXX executable ../../bin/gmx
[100%] Built target gmx
Linking CXX executable ../../bin/template
[100%] Built target template

This took 90 seconds using all 8 cores. Now we can install the binary. Note that because I told cmake to install it in /usr/local/gromacs/5.0.7 so I can keep track of different versions, rather than just having /usr/local/gromacs

[ec2-user@ip-172-30-0-51 build-gcc48]$ sudo make install
...
-- Installing: Creating symbolic link /usr/local/gromacs/5.0.7/bin/g_velacc
-- Installing: Creating symbolic link /usr/local/gromacs/5.0.7/bin/g_wham
-- Installing: Creating symbolic link /usr/local/gromacs/5.0.7/bin/g_wheel

To add this version of GROMACS to your $PATH (add this to .bashrc to avoid doing this each time)

[ec2-user@ip-172-30-0-51 build-gcc48]$ source /usr/local/gromacs/5.0.7/bin/GMXRC

Now you have all the GROMACS tools available!