Build it and you will learn: Building an RPi2 cluster

External Storage for NFS

The microSD card in the master node could be used for cluster storage, but you don't have a great deal of space. SD cards also aren't known for having a long life compared with hard drives or solid-state drives (SSDs). I had a 120GB SSD handy, so I decided to use it for attached storage to the master node. The drive was placed in an external USB case and attached to the master node.

One of the critical items for the best SSD performance is to make sure the partitions are aligned. See the box titled "Aligning with Block Boundaries."

Aligning with Block Boundaries

To maximize performance for SSD devices, you should make sure the partitions are aligned with block boundaries. If the partitions are not aligned on a block boundary, writing a single page to the SSD could result in two blocks being written. This doubles the work of the controller and wears out the blocks faster.

Two articles online [3] [4] explain how to align SSD partitions. Raspbian comes with an fdisk version of at least 2.17.1, so the following command was used to partition the SSD.

pi@raspberrypi ~ $ fdisk -c -u /dev/sdb

The defaults for the first and last sectors were used to create a single partition. After partitioning is completed, the results can be checked by the command in Listing 4. From the output, you can tell the disk is aligned on a block basis because the start sector is divisible by 2,048. With 512-byte sectors, this means they are aligned on 1MB boundaries (2,048 sectors x 512 bytes/sector = 1,048,576 bytes).

After partitioning, you can format the single partition with whatever filesystem you like. For this project, I used ext4. So the drive is mounted every time the system reboots, add the following line to /etc/fstab file:

/dev/sda1   /work   ext4  defaults   0     0

A simple

sudo mount -a

command mounts the drive.

Listing 4

Partition Alignment Results

pi@raspberrypi ~ $ sudo fdisk -l /dev/sda
Disk /dev/sda: 120.0 GB, 120034123776 bytes
182 heads, 30 sectors/track, 42938 cylinders, total 234441648 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x56f56d8d
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1            2048   230688767   115343360   83  Linux

I will use the classic Unix/Linux tool called NFS (Network File System) to make the SSD drive accessible to all the nodes in the cluster. Be sure NFS is installed on the master node, then edit the NFS configuration file /etc/exports on the master node to look like Listing 5. The uncommented line NFS-exports the /work directory to the other compute nodes. (Note that I used CIDR notation.)

Listing 5

/etc/exports

pi@raspberrypi ~ $ more /etc/exports
# /etc/exports: the access control list for filesystems which may be exported
#               to NFS clients.  See exports(5).
#
# <A number of commented examples ...>
#
/work     10.0.1.0/24(rw,subtree_check)

It is a good idea to NFS-export /home from the master node to the compute nodes. Although it's not a good idea to run applications from the master node's /home directory (because it resides on an SD card), making the directory accessible will make life much easier. The /etc/exports file should also include the line:

/home 10.0.1.0/24(rw,subtree_check)

at the end. The reason to NFS-export /home is because it contains all of the SSH information about the other nodes in the cluster. Passwordless SSH requires that this SSH information be available to all nodes in the cluster.

Adding Cluster Tools to the Master Node

The real magic of a cluster is the software. An HPC system is designed to leverage the power of parallel computing. As I described earlier in this article, an HPC cluster consists of several computers that all work on the same problem together. Specialized software tools perform the background tasks necessary for getting the cluster to behave like a single system.

The applications that run on parallel systems use the Message Passing Interface (MPI) protocol to pass information and data among the nodes in the cluster. Open MPI is a free library that provides many tools that let a programmer write software that will leverage the system's parallel capabilities. The operator of an HPC cluster also needs specialized tools for managing the cluster and scheduling jobs that will run on the system.

You can think of the tools in an HPC cluster as belonging to software "layers." For this cluster, I'm going to go just above the basic layer to include some important cluster tools. In particular, I'm going to install:

  • Pdsh – a parallel shell tool that allows you to run commands across multiple nodes
  • MPICH and Open MPI – tools and libraries that are used to build parallel applications
  • Lmod – an environment module tool that allows you to manipulate the user environment, so that you can build and run applications that use different compilers or libraries
  • OpenLava – a job scheduler (also referred to as a resource manager). The job scheduler prepares and manages the workload distributed to the compute nodes.

I've chosen to use OpenLava in this cluster because I've found it to be a great tool, and it's derived from LSF (Load Sharing Facility), so its lineage and capability are well-tested. Moreover, OpenLava works well with a shared filesystem such as NFS.

All of these tools are stored in /work, which is an NFS-shared filesystem for all nodes. The source code for all tools is installed in /work/pi/src. The binaries are installed in various locations, as recommended by Robert McLay, the developer of Lmod.

McLay recommends putting the binaries in specific locations according to version number. He clarified this in a recent email [5]. In particular, he recommends only using two digits for the version (e.g., 3.1.4 becomes 3.1 and 1.4.5 becomes 1.4) because applications that change minor version numbers (e.g., 1.4.5 to 1.4.6) are supposed to be compatible with one another. Therefore, if you update the minor version, the applications should be compatible and should not need to be upgraded or rebuilt.

With this best practice in mind, the applications were built and installed in /work/apps. The following sections take a closer look at these important HPC tools.

Buy this article as PDF

Express-Checkout as PDF

Pages: 2

Price $2.95
(incl. VAT)

Buy Raspberry Pi Geek

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content