How to setup a Gramble

This is a Gramble, which of course is short for a , or, in other words, a model B cluster running . Given the ARM processor in a Raspberry Pi 2 does not allow like the more complex (and expensive) Intel chips, why would I want to do such a thing? Well, I wanted to learn how to setup a simple compute cluster.

And this is what I did. Unless stated otherwise you need to do this on both machines (or how ever many you are using).

1. Install Ubuntu 14.04 LTS Server onto the microSD cards

Each Raspberry Pi 2 runs off a microSD card; on a computer with a microSD slot (I used an iMac) download the Ubuntu 14.04 LTS image and copy it onto the microSD card . Then you simply push it back into the slot on the Raspberry Pi and power it up. Note that Ubuntu does not run on the model A and, at the time of writing, only ran on the model B. If your microSD card is bigger than 2GB you might want to .

2. Update

First, let’s update the installed software and also install the ssh server so we can remotely connect.

sudo apt-get update -y
sudo apt-get upgrade -y
sudo apt-get install openssh-server

3. Setup the network

Now my setup was a bit strange. I was using an old Apple Time Capsule I had; both Raspberry Pis were connected to this via ethernet cables and the Time Capsule itself was in “Extend Wireless Network�? mode since our main wireless router is somewhere else in the house. Ideally, I’d want a dual-homed headnode with one public IP address and then a private network for communication within the cluster. Instead each of the two Raspberry Pis have their own IP that is dynamically assigned by my router, but this will do for now.

sudo nano /etc/hosts

so it reads

127.0.0.1 localhost
192.168.0.12 rasp0
192.168.0.18 rasp1

Also edit the hostname

sudo nano /etc/hostname

so it matches /etc/hosts

rasp0

and reboot

sudo reboot

4. Add an MPI user

We will need a special user that can log in without passwords to all the nodes that SLURM will use later on. As I understand it, giving it a uid less than 1000 stops the user appearing in any login GUI.

sudo adduser mpiuser --uid 999

5. Install NFS

We will share a folder on the headnode with all the compute nodes using the NFS protocol. This means we’ll only need to install applications on the headnode and they will be accessible from any compute node. Also this is where the GROMACS output files will be written.

on the headnode (rasp0)

sudo apt-get install nfs-kernel-server

on the compute node (rasp1)

sudo apt-get install nfs-common

on the headnode (rasp0) add the following to /etc/exports

/home/mpiuser *(rw,sync,no_subtree_check)
/apps *(rw,sync,no_subtree_check)

This will export the folders /apps and /home/mpiuser on rasp0 to all the compute nodes (in this case just rasp1). You need to make sure all folders shared by NFS exist on both machines. So on both machines

sudo mkdir /apps

You don’t need to mkdir /home/mpiuser as creating this user will have automatically created a home directory for it. Now on the headnode

sudo service nfs-kernel-restart

On the compute node (rasp1),

sudo ufw allow from 192.168.1.0/24
sudo mount rasp0:/home/mpiuser /home/mpiuser
sudo mount rasp0:/apps /apps

The first line opens a port in the firewall. although I’m definitely sure I needed to do this. The last two manually mount the NFS share from the headnode (rasp0). To set it up so this happens automatically

sudo nano /etc/fstab

and add

rasp0:/home/mpiuser /home/mpiuser nfs
rasp0:/apps /apps nfs

then we can force a remount via

sudo mount -a

6. Create an SSH key pair to allow passwordless login

Because we now have /apps and /home/mpiuser shared with all nodes of the cluster (ok, just rasp1, but you know what I mean) we can simply on the headnode create an ssh keypair as mpiuser and it will be shared with all the compute nodes. So on rasp0

su mpiuser
ssh-keygen -t rsa
cd .ssh/
cat id_rsa.pub >> authorized_keys

I didn’t use a passphrase during key generation. I expect this is a bad thing and I did read you could use a key chain, but as this is a toy cluster I’m going to stick my fingers in my ears and pretend I didn’t read that. If you haven’t created an ssh keypair before, it is fairly simple – it creates a public and a private key. This is described in more detail . The key things are that the private key (.ssh/id_rsa) should only be readable by the mpiuser and no-one else. In Linux-land, this means it should have permissions of 400 – this is how it will be created. Secondly, any remote machine will allow a passwordless login if the public key for that user is in .ssh/authorized_keys; this explains the last line above.

Let’s test it. Since we are already the mpiuser and we are on rasp0

ssh rasp1

Should automatically log you into rasp1. If you try the same thing as the default ubuntu user it will prompt you for your password as that user doesn’t have an ssh keypair setup.

7. Compile GROMACS

As we are thinking about NFS, let’s compile GROMACS in /apps so the gmx binary can be run from any of the compute node(s). We need a few things before we begin.

sudo apt-get install build-essential cmake

The first package contains the compilers you’ll need to, well, compile GROMACS and cmake is the build tool GROMACS uses. So as the mpiuser,

cd /apps
mkdir src
cd src/
wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-5.1.2.tar.gz
cd gromacs-5.1.2/
mkdir build-gcc48
cd build-gcc48
cmake .. -DGMX_BUILD_OWN_FFTW=ON -DCMAKE_INSTALL_PREFIX=‘/apps/gromacs/5.1.2’ -DBUILD_SHARED_LIBS=off
make -j 4
sudo make install

Note the unusual -DCBUILD_SHARED_LIBS flag in the GROMACS cmake command; this is to get around an error when compiling GROMACS on the Raspberry Pi. You shouldn’t normally need this flag. Now the make command will take at least ten minutes, so put the kettle on.

Because of dynamic linking, you’ll also need to install the compilers on the compute nodes via

sudo apt-get install build-essential

I suspect using might avoid this issue; I’m going to play with these and if I can get it to work will write another post.

Once you’ve done this, then on any machine you should be able to run GROMACS via

source /apps/gromacs/5.1.2/bin/GMXRC
gmx mdrun

8. Install a cluster management and job scheduling system (SLURM)

Despite the fact the machines I’ve used in the past have tended to use PBS or SGE (and so my fingers can type qstat really, really fast), I chose to use as

it is available as an Ubuntu package
our have recently started using it and they recommended it
it is actively developed and I can’t work out, or at least remember for more than five minutes, what is going on with SGE
it has ! and !
it is open source (GPL2)
I liked the name

To install on rasp0

sudo apt-get install slurm-llnl

This also installs MUNGE as a pre-requisite (see the next section). Now SLURM appeared to want to use /usr/bin/mail and complained when it couldn’t find it so I also installed

sudo apt-get install mailutils

which drops you into a setup screen and I chose the “local�? option.

Also on rasp1

sudo apt-get install slurm-llnl

9. Get MUNGE working

creates and validates credentials and SLURM uses it. On the headnode

sudo /usr/sbin/create-munge-key

This creates a key /etc/munge/munge.key. Now copy this key to /etc/munge/ on all nodes (you may need to fiddle with permissions etc to use the NFS share). There appears to be a bug with Ubuntu and MUNGE, but the workaround is to do the following on all nodes

sudo nano /etc/default/munge

and add the line

OPTIONS=“—force"

now start the service

sudo service munge start

Check it is running

ps -e | grep munge

10. Get SLURM working

This was the bit I wasn’t looking forward to as job schedulers have, frankly, scared me. But it turns out this was one of the easiest steps. If we are the mpiuser on rasp0. SLURM comes with a very simple configuration file that we can edit.

cp /usr/share/doc/slurm-llnl/examples/slurm.conf.simple.gz .
gunzip slurm.conf.simple.gz
nano slurm.conf.simple

All I did was change the lines so they read

ControlMachine=rasp0
..
NodeName=rasp[0-1] Procs=4 State=UNKNOWN
PartitionName=test Nodes=rasp[0-1] Default=YES MaxTime=INIFINITE State=UP

You’ll notice I’ve identified rasp0 as both the ControlMachine (i.e. headnode) and also a Node (compute node) belonging to the test Partition. On a regular cluster you probably don’t want the headnode also being a compute node, but I only had two Raspberry Pis so I thought why not? This also shows the syntax for referring to multiple nodes. If you want a more complex configuration an is provided. There is also an . A note of caution: these may not work with the version of SLURM installed by apt-get (2.6.5) since the current version is 15.08. That doesn’t mean 2.6.5 is old; they’ve changed the numbering system recently.

Finally copy the file to the right place on all nodes

 sudo cp slurm.conf.simple /etc/slurm-llnl/slurm.conf

and (on the headnode, rasp0)

sudo service slurm start

on the compute node

sudo slurmd -c

Test!

srun -N1 hostname

sinfo

11. Submit a GROMACS job to the queue

I’m going to assume you have prepared a TPR file called md.tpr and have copied it into /home/ubuntu (and we are now the default user, ubuntu).

Let’s do some simple benchmarking – remember a Raspberry Pi has 4 cores. So first, let’s create a series of TPR files

cp md.tpr md-1.tpr
cp md.tpr md-2.tpr
cp md.tpr md-4.tpr

Now let’s create some SLURM job submission files. This is the one for running on two cores – you’ll need to change the --cpus-per-task, the --job-name SBATCH flags and the -deffnm and -ntmpi GROMACS flags depending on the number of cores.

 sudo nano md-2.slurm.sh

and copy in

 #!/bin/bash
 #SBATCH --nodes=1
 #SBATCH --ntasks-per-node=1
 #SBATCH --cpus-per-task=2
 #SBATCH --time=00:15:00
 #SBATCH --job-name=md-2

 source /apps/gromacs/5.1.2/bin/GMXRC

 srun gmx mdrun -deffnm md-2 -ntmpi 2 -ntomp 1 -maxh 0.1 -resethway -noconfout

This will run for 6 minutes, resetting the GROMACS timers after 3 minutes. It won’t write out a final GRO file as this can affect the timings. Hopefully you’ll find that, whilst useful and fun machines, Raspberry Pis are really slow at running GROMACS! To submit the jobs

sbatch md-2.slurm.sh

To check the queue we can issue

squeue

and to cancel we can use scancel.

Ta da!

Philip W Fowler