is cool. But what is it? From the
Docker containers wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, system libraries – anything you can install on a server. This guarantees that it will always run the same, regardless of the environment it is running in.
I like to think of it as somewhere in between and a . Although the DOCKER website is focussed on commercial software development, and so talks about building and shipping applications, DOCKER could be of huge use to myself as a computational scientist. For example, rather than make a series of input files for my simulations available, along with a list of which software versions I used, I could instead simply make a DOCKER image available that contains all the compiled software I used along with all the input files. Then anyone should, in principle, be able to reproduce my research.
Make no mistake: reproducibility is, rightly, a coming trend. But surely all scientific results are reproduced?. Turns out if the experiment or simulation was difficult to do the answer is not so much. And when concerted efforts have been made to reproduce results reported in high impact journals, the answer is often, well, disconcerting at the very least. In a now famous study, from a pharmaceutical company, Amgen, reported that their in-house scientists were unable to reproduce 47 out of 53 landmark experimental studies in haematology and oncology. They were looking at novel, exciting findings which are more likely to be challenging to reproduce (although the pressure to over-sell is also stronger). I have no reason to think computational studies are much better. The past few years there have been a flurry of , and . One can even now via GitHub with a so it can be cited independently of an article.
As I’d like to do this in the future, I’ve started to play with DOCKER and GROMACS. Since my workstation is a Mac, the DOCKER host has to run within a lightweight Linux virtual machine. First I . Then I opened a DOCKER Quick Terminal and checked everything was working by downloading the hello-world image and running it
$ docker run hello-world Unable to find image 'hello-world:latest' locally latest: Pulling from library/hello-world 4276590986f6: Pull complete a3ed95caeb02: Pull complete Digest: sha256:4f32210e234b4ad5cac92efacc0a3d602b02476c754f13d517e1ada048e5a8ba Status: Downloaded newer image for hello-world:latest Hello from Docker. This message shows that your installation appears to be working correctly.
Let’s get try something more real, like an Ubuntu 16.04 Server image.
$ docker run -it ubuntu bash
This drops me inside the Ubuntu image. Let’s compile GROMACS!
[email protected]:/# apt-get update -y [email protected]:/# apt-get upgrade -y [email protected]:/# apt-get install build-essential cmake wget openssh-server -y [email protected]:/# wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-5.1.2.tar.gz [email protected]:/# tar zxvf gromacs-5.1.2.tar.gz [email protected]:/# cd gromacs-5.1.2 [email protected]:/# mkdir build [email protected]:/# cd build [email protected]:/# cmake .. -DGMX_BUILD_OWN_FFTW=ON [email protected]:/# make [email protected]:/# make install [email protected]:/# cd
Now let’s copy over a TPR file to see how fast GROMACS is within a DOCKER container
[email protected]:/# scp [email protected]:benchmark.tpr . [email protected]:/# source /usr/local/gromacs/bin/GMXRC [email protected]:/# gmx mdrun -s benchmark -resethway -noconfout -maxh 0.1
Note that this is a single CPU DOCKER image. I was worried that since the DOCKER host was running inside a Linux VM it would be slow compared to running natively in Mac OS X so I ran three repeats of each and DOCKER was only 1.7% slower…
To save this DOCKER image locally, quit the session
$ docker commit -m "Installed GROMACS 5.1.2 for benchmarking" -a "Philip W Fowler" c5f1cf30c96b philipwfowler/gromacs-5.1.2 $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE philipwfowler/gromacs-5.1.2 latest 73e44c120bfa 6 seconds ago 809 MB ubuntu latest c5f1cf30c96b 2 weeks ago 120.8 MB hello-world latest 94df4f0ce8a4 3 weeks ago 967 B
Done. More soon on multiple cores, can-we-use-the-GPU? and using DOCKER on Amazon Web Services.