geek

BPS15: Twitter and conferences: an ideal match or a nuisance?

I’m at the of the Meeting in Baltimore which is large (6,500 scientists) with multiple parallel sessions. You might have thought that Twitter would be the ideal platform for providing a feed for all the questions, reactions and suggestions but very few people are using it, although there is definitely more tweeting compared to last year. You can read my musings on it . This is part of my series of posts as one of their guest bloggers.

Getting an ext3 Drobo 5D to play nicely with Ubuntu 12.04

Our lab has recently bought two to give us some large storage. They work out of the box with Macs but getting them to play nicely with Linux, specifically Ubuntu 12.04, has been a bit more work so I thought I’d share the recipe that, for us at least, appears to work. Much of this has been cobbled together from the page and also from a very helpful earlier . One thing I could not get to work, unfortunately, is USB3. There appeared to be problems with USB3 and Linux when I was trying this out. Finally I should mention that the Drobo here was setup on a Mac, so was formatted HFS+ to begin with and, of course, follow these commands at your own risk. They worked for me, but they might not work for you..

 

First plug the Drobo into the power and connect with the USB lead to your Ubuntu machine. Don’t use any blue USB ports – these are USB3 and I couldn’t get them to work with the Drobo. After a while the Drobo should appear as a USB disk drive in a window. You can check what Ubuntu is doing by looking at this log

$ dmesg | tail

It will show something like

[250886.772714] usb 1-1.1: new high-speed USB device number 10 using ehci_hcd
[250887.331458] scsi19 : usb-storage 1-1.1:1.0
[250888.328628] scsi 19:0:0:0: Direct-Access Drobo 5D 5.00 PQ: 0 ANSI: 0
[250888.329605] sd 19:0:0:0: Attached scsi generic sg3 type 0
[250888.330168] sd 19:0:0:0: [sdb] Very big device. Trying to use READ CAPACITY(16).

First we need to intall the latest version of the linux Drobo tools, so we will probably need git and let’s get QT as well so we can check the GUI.

$ sudo apt-get install git
$ sudo apt-get install python-qt4

Now cd to somewhere where you put packages etc and run

$ git clone git://drobo-utils.git.sourceforge.net/gitroot/drobo-utils/drobo-utils

This will download all the files and binaries you need

$ cd drobo-utils/

Just check it is all up to date

$ git pull

Check it is all working by seeing if this works (warning: this can take about a minute)

$ sudo ./drobom status

In theory, we can bring up the GUI as below, but on my machine I just got python errors about KeyError: 'UseStaticIPAddress'. Check it if you want.

$ sudo ./drobom view

Next we need to know which device the Drobo is currently plugged into. This will probably change everytime you plug the Drobo in.

$ ls -lrt /dev/disk/by-uuid/

There should be a long alphanumeric list that I will call foo that is pointing to something like /dev/sdb. The foo should match the foo when I type

$ ls /media/

If so, then we know that the Drobo is connected to /dev/sdb. Next we need to set the Logical Unit Size (LUNS). This is the largest volume the Drobo will appear as, and if we run a df it will show this as the physical size of the Drobo even if there are not enough disks inside to make it this size. Since the Drobo 5D has five slots and we are using 4TB disks at present, then if we run with single disk redundancy the maximum size is 16 TB. You could make this smaller but then you would have multiple “drobo partitions�? mounted all pointing to the same machine. The disadvantage with a large LUNS is it means the startup time is long, as is any disk checking time. The units in the line below are TB! Caution these commands can take a while to run and I’ve not pasted in the usual “are you sure?�? prompts.

$ sudo ./drobom set lunsize 16 PleaseEraseMyData

Now we need to setup a partition for the disk using parted which should be already installed. This has its own command line. Although we are setting up an ext3 disk, it seems ext3 is just ext2 with journalling, so we ask parted for an ext2 disk.

$ sudo parted /dev/sdb
GNU Parted 1.8.9
Using /dev/sdd
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mklabel
New disk label type? gpt
(parted) mkpart ext2 0 100%
(parted) quit
Information: You may need to update /etc/fstab.

Now we need to format the disk. Again remember to use the right device. Also note it is sdb1 since we are formatting the first and only partition, not the disk itself. Also note again we are formatting as ext2 but with the -j flag for journalling, hence ext3. Again, this will ask whether you are sure etc and could take a few hours.

$ sudo mke2fs -j -i 262144 -L Drobo -m 0 -O sparse_super,^resize_inode /dev/sdb1

Nearly there. If you remount the Drobo it should appear in /Media/Drobo (or whatever name you gave it above) Now we need to make sure you have permissions to write to the disk. For this we need to know your user and group numeric ids.

$ id
uid=9009(fowler) gid=100 groups=100

So my user id is 9009 and my group id is 100. Hence

$ sudo chown -R 9009:100 /media/Drobo/

If we want to mount the Drobo somewhere else, we need to edit /etc/fstab. First we need to know the UUID of the disk (this was the foo).

$ ls -lrt /dev/disk/by-uuid/

Copy the foo into the clipboard and open

$ sudo emacs /etc/fstab

Add a line at the end that looks like

# mount the ext3 Drobo
UUID=b278aff6-db1a-436b-995b-8808c2c82f9e /drobo1 ext3 defaults 0 2

make sure the mount point exists!

$ sudo mkdir /drobo1

Now remount the disk, either by rebooting or by issuing

$ sudo mount -a

and voila, you should find the disk by

$ ls /drobo1/

Check you can make a file

$ touch /drobo1/hello-world.txt

Check it appears in your list of disks using df etc. I’ve checked you can be a bit rough with it e.g. just pulling out the USB cable and then reconnecting to a different port. Seemed ok but I did need to remount it using

$ sudo mount -a

and then I could add and edit files as normal and there was no complaining in dmesg about write-only filesystems or anything like before with HFS+.

GROMACS 4.6

is a scientific code designed to simulate the dynamics of small boxes of stuff, that usually contain a protein, water, perhaps a lipid bilayer and a range of other molecules depending on the study. It assumes that all the atoms can be represented as points with a mass and an electrical charge and that all the bonds can modelled using simple harmonic springs. There are some other terms that describe the bending and twisting of molecules and all of these, when combined with two long range terms, which take into account the repulsion and attraction between electrical charges, allow you to calculate the force on any atom due to the positions of all the other atoms. Once you know the force, you can calculate where the atom will be a short time later (often 2 fs) but of course the positions have changed so you have to recalculate the forces. And so on.

 

Anyway, I use GROMACS a lot in my research and the most recent major version, 4.6, . In this post I’m going to briefly describe my experience with some of the improvements. First off, so much has changed that I think it would have been more accurate to call this GROMACS 5.0. For example, version 4.6 is a lot faster than version 4.5. I typically use three different benchmarks when measuring the performance; one is an all-atom simulation of a bacterial peptide transporter in a lipid bilayer  (78,033 atoms). The other two are both coarse-grained models of a lipid bilayer using the – the difference is one has 6,000 lipids (137,232 beads), the other 54,000 (2,107,010 beads). Ok, so how much faster is version 4.6? It is important here to bear in mind that GROMACS was already very fast since a lot of effort had been put into optimising the loops that the code spends most of its time running. Even so, version 4.6 is between 20-120% faster when using either of the first two benchmarks, and in some cases even faster. How? Well, it seems the developers using commands. One important consequence of this is that it is and, since you have to specify which SIMD instruction sets to use, you may need several different versions of the key binary, mdrun. For example, you may want a version compiled using AVX SIMD instruction sets for recent CPUs, but also a version compiled using an older SSE SIMD instruction set. The latter will run on newer architectures, but it will be slower. You must never run a version compiled with no SIMD instruction sets as this can be 10x slower!

The other big performance improvement is that GROMACS 4.6 now uses GPUs seamlessly. The calculations are shared between any GPUs and the CPUs and GROMACS will even shift the load to try and share it equally. , one of the GROMACS developers, gave an on this subject in April 2013. A GPU here just means a reasonable consumer graphics card, such as an NVIDIA GTX680, that has compute capability of 2.0 or higher. So, how much performance boost do we see? I typically see a boost of 2.1-2.7x for the atomistic benchmark and 1.4-2.2x for the first, smaller coarse-grained benchmark. Just for fun, you can try running a version of GROMACS compiled with no SIMD instructions with a GPU (and without a GPU) and then you can get a performance increase of 10x.

Before I finish, I was given some good advice on running GROMACS benchmarks. Firstly, make sure you use the -noconfout mdrun option since this prevents it from writing a final .gro file as this takes some time. Secondly setup a .tpr file that will run for a long (wallclock) time even on a large number of cores and then use the -resethway option in combination with a time limit, such as -maxh 0.25, as this would then reset the timers after 7.5 min and record how many steps were calculated between 7.5 and 15 minutes. From experience a bit of time spent writing some good BASH scripts to automatically setup, run and analyse the benchmarking simulations really pays off in the long run.

In future posts I’ll talk about the scaling of GROMACS 4.6 (that is where the third benchmark comes in) and also look at the GPU performance in a bit more detail.

 

 

 

 

Software Carpentry Feedback

Image

As well as asking the attendees how they thought the workshop had gone, I sent them a questionnaire before the workshop. The idea was to see what their expectations were and if the workshop then met them. For example we asked “How would you describe your expertise in the following tools?�? and the results are on the right. Overall most people didn’t feel they knew much about the tools we had identified as being potentially most useful. We also asked “What you would like the workshop to cover?�? and the answers indicated these tools were relevant (results not shown).

 

So, how did the workshop do? Well, 92% of the attendees agreed or strongly agreed with the statement “I enjoyed the Software Carpentry Workshop�? and 96% “[felt they] learnt something useful from the workshop that will help my research.�?. Everyone who had come from an experimental lab thought that “other members of my lab would benefit from a workshop like this�?. A good start, but did it improve their understanding? So we also asked “I understand enough to try using the the following tools�? and most people agreed (see left)! Promising, but maybe it was the sugar from the donuts kicking in.

To try and resolve things we then asked “I intend using the tools to help my research�? and lo, some of those agrees not unsurprisingly sneak to the left and join the disagrees (see the graph on the right). I’m happy and seeing as 92% agreed with “A workshop like this should be run annually in Biochemistry�? maybe I’ll be running another one.

Few comments:

“The course was very informative and useful for my research! Thanks�?

“I now see the value of a more ‘scientific’ approach to programming in science, in terms of version tracking, reproduciblity and validity. I try to be thorough in my approach to my research and that should extend to my programming. This workshop has been an excellent first step in that direction.�?

“Excellent course, thanks for letting me take part.�?

Running my first Software Carpentry workshop

“Can you email me that script you used to do your analysis?�?

“Sure. It isn’t very well commented but you should be able to work out what it’s doing. I’ve tested it on a few things and it seems to work.�?

Sound familiar? Of course, the story normally ends happily but….

Teaching some of the tools and methods of software engineering to scientists so that they write code that is easy to understand, tested and so can be shared more readily. This is the idea behind , a small but fast-growing movement.

 

I joined one of their online courses a few years ago and found it very useful, although inevitably I only managed to complete half the exercises before I had a bad week and fell off the back of the course.

So back in April 2012 when I was talking to Neil Chue-Hong at the in Oxford and he mentioned Software Carpentry my ears pricked up. Neil is the director of the who were running the workshop and he mentioned that they were helping Software Carpentry run two-day intensive courses in the UK. I thought it would be just the thing for our department and, well, we have just finished running the first ever two-day Software Carpentry workshop at the University of Oxford.

Interest in the workshop has been high; although the plan was to limit it to Biochemistry we ended up with helpers and observers from other university departments as well as one of companies on the science park and if we’d opened it up could have filled the room at least twice over. In the end we only had enough chairs and desks for the attendees and everyone else had to perch.

The first day covered shell scripting using bash and awk, version control, and automation courtesy of GNU make. I think most people had seen shell scripting but everyone sat up a bit straighter during version control… I always think a good course is like a good physics lecture; you sit there at the beginning nodding thinking “this is easy�?, then the difficulty slowly ratchets up and at the end you realise you’ve learnt a lot. Yesterday it was the turn of python, including unit tests and some of the more relevant modules to us such as numpy, scipy and .

I’ll describe some of the feedback in a later post, but overall it appears to have been well-received. Just remains for me to say thank you to the instructors, all the helpers and, of course, the attendees.

Here’s to another one in 2013.

This blog…

…is where I shall put thoughts that at least might be of interest to other people. Any opinions are my own and are not representative of my department or university in any way.