Build it and you will learn: Building an RPi2 cluster
External Storage for NFS
The microSD card in the master node could be used for cluster storage, but you don't have a great deal of space. SD cards also aren't known for having a long life compared with hard drives or solid-state drives (SSDs). I had a 120GB SSD handy, so I decided to use it for attached storage to the master node. The drive was placed in an external USB case and attached to the master node.
One of the critical items for the best SSD performance is to make sure the partitions are aligned. See the box titled "Aligning with Block Boundaries."
Aligning with Block Boundaries
To maximize performance for SSD devices, you should make sure the partitions are aligned with block boundaries. If the partitions are not aligned on a block boundary, writing a single page to the SSD could result in two blocks being written. This doubles the work of the controller and wears out the blocks faster.
Two articles online [3] [4] explain how to align SSD partitions. Raspbian comes with an fdisk version of at least 2.17.1, so the following command was used to partition the SSD.
pi@raspberrypi ~ $ fdisk -c -u /dev/sdb
The defaults for the first and last sectors were used to create a single partition. After partitioning is completed, the results can be checked by the command in Listing 4. From the output, you can tell the disk is aligned on a block basis because the start sector is divisible by 2,048. With 512-byte sectors, this means they are aligned on 1MB boundaries (2,048 sectors x 512 bytes/sector = 1,048,576 bytes).
After partitioning, you can format the single partition with whatever filesystem you like. For this project, I used ext4. So the drive is mounted every time the system reboots, add the following line to /etc/fstab
file:
/dev/sda1 /work ext4 defaults 0 0
A simple
sudo mount -a
command mounts the drive.
Listing 4
Partition Alignment Results
pi@raspberrypi ~ $ sudo fdisk -l /dev/sda Disk /dev/sda: 120.0 GB, 120034123776 bytes 182 heads, 30 sectors/track, 42938 cylinders, total 234441648 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x56f56d8d Device Boot Start End Blocks Id System /dev/sda1 2048 230688767 115343360 83 Linux
I will use the classic Unix/Linux tool called NFS (Network File System) to make the SSD drive accessible to all the nodes in the cluster. Be sure NFS is installed on the master node, then edit the NFS configuration file /etc/exports
on the master node to look like Listing 5. The uncommented line NFS-exports the /work
directory to the other compute nodes. (Note that I used CIDR notation.)
Listing 5
/etc/exports
pi@raspberrypi ~ $ more /etc/exports # /etc/exports: the access control list for filesystems which may be exported # to NFS clients. See exports(5). # # <A number of commented examples ...> # /work 10.0.1.0/24(rw,subtree_check)
It is a good idea to NFS-export /home
from the master node to the compute nodes. Although it's not a good idea to run applications from the master node's /home
directory (because it resides on an SD card), making the directory accessible will make life much easier. The /etc/exports
file should also include the line:
/home 10.0.1.0/24(rw,subtree_check)
at the end. The reason to NFS-export /home
is because it contains all of the SSH information about the other nodes in the cluster. Passwordless SSH requires that this SSH information be available to all nodes in the cluster.
Adding Cluster Tools to the Master Node
The real magic of a cluster is the software. An HPC system is designed to leverage the power of parallel computing. As I described earlier in this article, an HPC cluster consists of several computers that all work on the same problem together. Specialized software tools perform the background tasks necessary for getting the cluster to behave like a single system.
The applications that run on parallel systems use the Message Passing Interface (MPI) protocol to pass information and data among the nodes in the cluster. Open MPI is a free library that provides many tools that let a programmer write software that will leverage the system's parallel capabilities. The operator of an HPC cluster also needs specialized tools for managing the cluster and scheduling jobs that will run on the system.
You can think of the tools in an HPC cluster as belonging to software "layers." For this cluster, I'm going to go just above the basic layer to include some important cluster tools. In particular, I'm going to install:
- Pdsh – a parallel shell tool that allows you to run commands across multiple nodes
- MPICH and Open MPI – tools and libraries that are used to build parallel applications
- Lmod – an environment module tool that allows you to manipulate the user environment, so that you can build and run applications that use different compilers or libraries
- OpenLava – a job scheduler (also referred to as a resource manager). The job scheduler prepares and manages the workload distributed to the compute nodes.
I've chosen to use OpenLava in this cluster because I've found it to be a great tool, and it's derived from LSF (Load Sharing Facility), so its lineage and capability are well-tested. Moreover, OpenLava works well with a shared filesystem such as NFS.
All of these tools are stored in /work
, which is an NFS-shared filesystem for all nodes. The source code for all tools is installed in /work/pi/src
. The binaries are installed in various locations, as recommended by Robert McLay, the developer of Lmod.
McLay recommends putting the binaries in specific locations according to version number. He clarified this in a recent email [5]. In particular, he recommends only using two digits for the version (e.g., 3.1.4 becomes 3.1 and 1.4.5 becomes 1.4) because applications that change minor version numbers (e.g., 1.4.5 to 1.4.6) are supposed to be compatible with one another. Therefore, if you update the minor version, the applications should be compatible and should not need to be upgraded or rebuilt.
With this best practice in mind, the applications were built and installed in /work/apps
. The following sections take a closer look at these important HPC tools.
Buy this article as PDF
Pages: 2
(incl. VAT)