This is a Gramble, which of course is short for a , or, in other words, a model B cluster running . Given the ARM processor in a Raspberry Pi 2 does not allow like the more complex (and expensive) Intel chips, why would I want to do such a thing? Well, I wanted to learn how to setup a simple compute cluster.
And this is what I did. Unless stated otherwise you need to do this on both machines (or how ever many you are using).
1. Install Ubuntu 14.04 LTS Server onto the microSD cards
Each Raspberry Pi 2 runs off a microSD card; on a computer with a microSD slot (I used an iMac) download the Ubuntu 14.04 LTS image and copy it onto the microSD card . Then you simply push it back into the slot on the Raspberry Pi and power it up. Note that Ubuntu does not run on the model A and, at the time of writing, only ran on the model B. If your microSD card is bigger than 2GB you might want to .
2. Update
First, let’s update the installed software and also install the ssh server so we can remotely connect.
sudo apt-get update -y
sudo apt-get upgrade -y
sudo apt-get install openssh-server
3. Setup the network
Now my setup was a bit strange. I was using an old Apple Time Capsule I had; both Raspberry Pis were connected to this via ethernet cables and the Time Capsule itself was in “Extend Wireless Network�? mode since our main wireless router is somewhere else in the house. Ideally, I’d want a dual-homed headnode with one public IP address and then a private network for communication within the cluster. Instead each of the two Raspberry Pis have their own IP that is dynamically assigned by my router, but this will do for now.
sudo nano /etc/hosts
so it reads
127.0.0.1 localhost
192.168.0.12 rasp0
192.168.0.18 rasp1
Also edit the hostname
sudo nano /etc/hostname
so it matches /etc/hosts
rasp0
and reboot
sudo reboot
4. Add an MPI user
We will need a special user that can log in without passwords to all the nodes that SLURM will use later on. As I understand it, giving it a uid less than 1000 stops the user appearing in any login GUI.
sudo adduser mpiuser --uid 999
5. Install NFS
We will share a folder on the headnode with all the compute nodes using the NFS protocol. This means we’ll only need to install applications on the headnode and they will be accessible from any compute node. Also this is where the GROMACS output files will be written.
on the headnode (rasp0)
sudo apt-get install nfs-kernel-server
on the compute node (rasp1)
sudo apt-get install nfs-common
on the headnode (rasp0) add the following to /etc/exports
/home/mpiuser *(rw,sync,no_subtree_check)
/apps *(rw,sync,no_subtree_check)
This will export the folders /apps
and /home/mpiuser
on rasp0 to all the compute nodes (in this case just rasp1). You need to make sure all folders shared by NFS exist on both machines. So on both machines
sudo mkdir /apps
You don’t need to mkdir /home/mpiuser
as creating this user will have automatically created a home directory for it. Now on the headnode
sudo service nfs-kernel-restart
On the compute node (rasp1),
sudo ufw allow from 192.168.1.0/24
sudo mount rasp0:/home/mpiuser /home/mpiuser
sudo mount rasp0:/apps /apps
The first line opens a port in the firewall. although I’m definitely sure I needed to do this. The last two manually mount the NFS share from the headnode (rasp0). To set it up so this happens automatically
sudo nano /etc/fstab
and add
rasp0:/home/mpiuser /home/mpiuser nfs
rasp0:/apps /apps nfs
then we can force a remount via
sudo mount -a
6. Create an SSH key pair to allow passwordless login
Because we now have /apps and /home/mpiuser shared with all nodes of the cluster (ok, just rasp1, but you know what I mean) we can simply on the headnode create an ssh keypair as mpiuser and it will be shared with all the compute nodes. So on rasp0
su mpiuser
ssh-keygen -t rsa
cd .ssh/
cat id_rsa.pub >> authorized_keys
I didn’t use a passphrase during key generation. I expect this is a bad thing and I did read you could use a key chain, but as this is a toy cluster I’m going to stick my fingers in my ears and pretend I didn’t read that. If you haven’t created an ssh keypair before, it is fairly simple – it creates a public and a private key. This is described in more detail . The key things are that the private key (.ssh/id_rsa) should only be readable by the mpiuser and no-one else. In Linux-land, this means it should have permissions of 400
– this is how it will be created. Secondly, any remote machine will allow a passwordless login if the public key for that user is in .ssh/authorized_keys
; this explains the last line above.
Let’s test it. Since we are already the mpiuser and we are on rasp0
ssh rasp1
Should automatically log you into rasp1. If you try the same thing as the default ubuntu user it will prompt you for your password as that user doesn’t have an ssh keypair setup.
7. Compile GROMACS
As we are thinking about NFS, let’s compile GROMACS in /apps
so the gmx binary can be run from any of the compute node(s). We need a few things before we begin.
sudo apt-get install build-essential cmake
The first package contains the compilers you’ll need to, well, compile GROMACS and cmake is the build tool GROMACS uses. So as the mpiuser,
cd /apps
mkdir src
cd src/
wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-5.1.2.tar.gz
cd gromacs-5.1.2/
mkdir build-gcc48
cd build-gcc48
cmake .. -DGMX_BUILD_OWN_FFTW=ON -DCMAKE_INSTALL_PREFIX=‘/apps/gromacs/5.1.2’ -DBUILD_SHARED_LIBS=off
make -j 4
sudo make install
Note the unusual -DCBUILD_SHARED_LIBS
flag in the GROMACS cmake command; this is to get around an error when compiling GROMACS on the Raspberry Pi. You shouldn’t normally need this flag. Now the make command will take at least ten minutes, so put the kettle on.
Because of dynamic linking, you’ll also need to install the compilers on the compute nodes via
sudo apt-get install build-essential
I suspect using might avoid this issue; I’m going to play with these and if I can get it to work will write another post.
Once you’ve done this, then on any machine you should be able to run GROMACS via
source /apps/gromacs/5.1.2/bin/GMXRC
gmx mdrun
8. Install a cluster management and job scheduling system (SLURM)
Despite the fact the machines I’ve used in the past have tended to use PBS or SGE (and so my fingers can type qstat really, really fast), I chose to use as
- it is available as an Ubuntu package
- our have recently started using it and they recommended it
- it is actively developed and I can’t work out, or at least remember for more than five minutes, what is going on with SGE
- it has ! and !
- it is open source (GPL2)
- I liked the name
To install on rasp0
sudo apt-get install slurm-llnl
This also installs MUNGE as a pre-requisite (see the next section). Now SLURM appeared to want to use /usr/bin/mail
and complained when it couldn’t find it so I also installed
sudo apt-get install mailutils
which drops you into a setup screen and I chose the “local�? option.
Also on rasp1
sudo apt-get install slurm-llnl
9. Get MUNGE working
creates and validates credentials and SLURM uses it. On the headnode
sudo /usr/sbin/create-munge-key
This creates a key /etc/munge/munge.key
. Now copy this key to /etc/munge/
on all nodes (you may need to fiddle with permissions etc to use the NFS share). There appears to be a bug with Ubuntu and MUNGE, but the workaround is to do the following on all nodes
sudo nano /etc/default/munge
and add the line
OPTIONS=“—force"
now start the service
sudo service munge start
Check it is running
ps -e | grep munge
10. Get SLURM working
This was the bit I wasn’t looking forward to as job schedulers have, frankly, scared me. But it turns out this was one of the easiest steps. If we are the mpiuser on rasp0. SLURM comes with a very simple configuration file that we can edit.
cp /usr/share/doc/slurm-llnl/examples/slurm.conf.simple.gz .
gunzip slurm.conf.simple.gz
nano slurm.conf.simple
All I did was change the lines so they read
ControlMachine=rasp0
..
NodeName=rasp[0-1] Procs=4 State=UNKNOWN
PartitionName=test Nodes=rasp[0-1] Default=YES MaxTime=INIFINITE State=UP
You’ll notice I’ve identified rasp0 as both the ControlMachine (i.e. headnode) and also a Node (compute node) belonging to the test Partition. On a regular cluster you probably don’t want the headnode also being a compute node, but I only had two Raspberry Pis so I thought why not? This also shows the syntax for referring to multiple nodes. If you want a more complex configuration an is provided. There is also an . A note of caution: these may not work with the version of SLURM installed by apt-get (2.6.5) since the current version is 15.08. That doesn’t mean 2.6.5 is old; they’ve changed the numbering system recently.
Finally copy the file to the right place on all nodes
sudo cp slurm.conf.simple /etc/slurm-llnl/slurm.conf
and (on the headnode, rasp0)
sudo service slurm start
on the compute node
sudo slurmd -c
Test!
srun -N1 hostname
or
sinfo
11. Submit a GROMACS job to the queue
I’m going to assume you have prepared a TPR file called md.tpr and have copied it into /home/ubuntu (and we are now the default user, ubuntu).
Let’s do some simple benchmarking – remember a Raspberry Pi has 4 cores. So first, let’s create a series of TPR files
cp md.tpr md-1.tpr
cp md.tpr md-2.tpr
cp md.tpr md-4.tpr
Now let’s create some SLURM job submission files. This is the one for running on two cores – you’ll need to change the --cpus-per-task
, the --job-name
SBATCH flags and the -deffnm
and -ntmpi
GROMACS flags depending on the number of cores.
sudo nano md-2.slurm.sh
and copy in
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=2
#SBATCH --time=00:15:00
#SBATCH --job-name=md-2
source /apps/gromacs/5.1.2/bin/GMXRC
srun gmx mdrun -deffnm md-2 -ntmpi 2 -ntomp 1 -maxh 0.1 -resethway -noconfout
This will run for 6 minutes, resetting the GROMACS timers after 3 minutes. It won’t write out a final GRO file as this can affect the timings. Hopefully you’ll find that, whilst useful and fun machines, Raspberry Pis are really slow at running GROMACS! To submit the jobs
sbatch md-2.slurm.sh
To check the queue we can issue
squeue
and to cancel we can use scancel
.
Ta da!