Build it and you will learn: Building an RPi2 cluster


A resource manager, commonly referred to as a job scheduler, is another critical component of a cluster. Although a resource manager is not absolutely required, it is extremely helpful, even if you are the only user in the system, because it allows you to queue up applications to run even when you aren't on the system. A resource manager also matches applications with specific resource requirements with nodes that meet those requirements.

A number of resource managers are available, from open source to commercial. The resource manager I used for this article is OpenLava [14] [15], an open source tool based on LSF [16]. Building and installing OpenLava (in /work/pi/src/openlava-<version>) is as simple as for the other tools:

$ ./configure --prefix=/work/openlava-3.0
$ make
$ sudo make install

OpenLava needs the following post-installation configuration:

$ sudo cd config; cp lsb.hosts \
  lsb.params lsb.queues lsb.users \
  lsf.cluster.openlava lsf.conf \
  lsf.shared lsf.tasks openlava.* \

These configuration files will be edited later; before that, you should create an openlava user as root:

# useradd -r openlava

Checking the /etc/passwd file will confirm whether the user was created. Now you need to switch the ownership of the OpenLava installation files to the openlava user and copy some files to the local OS (as root):

# chown -R openlava:openlava /work/openlava-3.0
# cp /work/openlava-3.0/etc/openlava /etc/init.d
# cp /work/openlava-3.0/ etc/openlava.* /etc/profile.d
# chkconfig openlava on

Now you can configure OpenLava. As root, you first need to edit the file lsf.cluster.openlava in the /work/openlava-3.0/etc directory and define the master node hosts (Listing 10). The only line that changes comes after HOSTNAME, which specifies the hostname of the master node (raspberrypi) used to run jobs.

Listing 10


pi@raspberrypi /work/openlava-3.0/etc $ more lsf.cluster.openlava
Begin   Host
HOSTNAME                model          type  server  r1m  RESOURCES
raspberrypi             !              !     1       -    -
End     Host

Now, as root, the file /work/open-lava-3.0/etc/lsb.hosts also needs to be edited (Listing 11). The only change to the file is adding the line that starts with raspberrypi after HOST_NAME. The maximum number of jobs that can be run on the master node (raspberrypi) is listed as 4 (one per core). This number varies according to your rules, ideas, and hardware.

Listing 11


pi@raspberrypi /work/openlava-3.0/etc $ more lsb.hosts
# <a lot of comments>
# Don't use non-default thresholds unless job dispatch needs to be controlled.
Begin Host
HOST_NAME     MXJ JL/U   r1m    pg    ls     tmp  DISPATCH_WINDOW  # Keywords
raspberrypi   4   ()     ()     ()    ()     ()   ()
#<commented examples>
#default       !   ()     ()     ()    ()     ()   ()               # Example
End Host

At this point, you're ready to start OpenLava. Each of the four daemons has an associated PID. If you have one or more daemons without a PID, you will have to debug your installation (the OpenLava community is pretty good at helping with this). The commands lsid and bhosts should output information indicating that the cluster is running.

For a first test job from the command line, Listing 12 shows a job that simply "sleeps" for 60 seconds on each specified host. For this particular example, the output from the job is sent to the user via mail (hence the need to install mailtool on the master node). To get a list of completed jobs, use the bjobs -d command.

Listing 12

First Test

pi@raspberrypi ~ $ bsub 'echo my first job;sleep 60'
Job <103> is submitted to default queue .
pi@raspberrypi ~ $ bjobs
103     pi      RUN   default    raspberrypi raspberrypi *;sleep 60 Jun 14 10:44
pi@raspberrypi ~ $ bjobs
103     pi      RUN   default    raspberrypi raspberrypi *;sleep 60 Jun 14 10:44
pi@raspberrypi ~ $ bjobs
No unfinished job found
You have new mail in /var/mail/pi
pi@raspberrypi ~ $ mail
"/var/mail/pi": 1 message 1 new
>N   1 OpenLava           Sun Jun 14 10:45  41/1266  Job 103:
<... mail headers here ...>
Job  was submitted from host  by user .
Job was executed on host(s) , in queue , as user .
 was used as the home directory.
 was used as the working directory.
Started at Sun Jun 14 10:44:09 2015
Results reported at Sun Jun 14 10:45:09 2015
Your job looked like:
# LSBATCH: User input
echo my first job;sleep 60
Successfully completed.
Resource usage summary:
    CPU time   :      0.04 sec.
    Max Memory :         3 MB
    Max Swap   :        15 MB
    Max Processes  :         3
The output (if any) follows:
my first job

Testing the Master Node

For the master node, I'll test a simple parallel SOR 2D Laplace solver [17]. I downloaded the code and used the Fortran MPI script from MPICH to build it (it uses gfortran). Then I created the simple OpenLava job script in Listing 13. Notice that the job script includes loading the GNU compilers and the MPICH module.

Listing 13

Master Node Test

01 #!/bin/bash
02 #
03 # MPI Test script for openlava
04 #
06 #BSUB -P jacobi_test                    # Project jacobi_test
07 #BSUB -n 4
08 #BSUB -o jacobi_test.out                # output filename
09 #BSUB -e jacobi_test.err                # error filename
10 #BSUB -J jacobi_test                    # job name
12 # Change to correct directory (full path)
13 #  Not strictly necessary but a good practice
14 cd /home/pi/src/TEST/3
16 # Load needed modules here
17 . /etc/profile.d/
18 module load gnu/4.6
19 module load mpich/3.1
21 # Write hosts to a file
22 for h in `echo $LSB_HOSTS`
23 do
24    echo $h >> pgfile
25    echo "host name: $h"
26 done
28 # Calculate the number of processors allocated to this run.
29 NPROCS=`wc -l < ./pgfile`
31 # Calculate the number of nodes allocated.
32 NNODES=`uniq ./pgfile | wc -l`
34 ### Display the job context
35 echo "Running on host `hostname` "
36 echo "Start Time is `date` "
37 echo "Directory is `pwd` "
38 echo "Using ${NPROCS} processors across ${NNODES} nodes "
40 # Execute mpi command
41 mpiexec -f ./pgfile -n 4 ./jacobi_parallel < input
43 # erase file with node names
44 rm ./pgfile
45 echo "End time is `date` "

To queue the job, you just have to submit it to OpenLava (Listing 14). OpenLava queues the job and continually checks the status of the resources. If resources are available that match the needs of queued job, OpenLava runs it. The command bjobs lists jobs both running and queued.

Listing 14

Submitting a Job

pi@raspberrypi ~/src/TEST/3 $ bsub <
Job <109> is submitted to default queue .
pi@raspberrypi ~/src/TEST/3 $ bjobs
109     pi      RUN   default    raspberrypi raspberrypi *cobi_test Jun 14 11:59
pi@raspberrypi ~/src/TEST/3 $ bjobs
No unfinished job found
pi@raspberrypi ~/src/TEST/3 $ ls -s
total 316
 4 backup              4 jacobi_parallel.f90   4 runit_jacobi_parallel
 4 input              36 jacobi_parallel.out   4
 4 jacobi.f90          0 jacobi_test.err       4 sor_module.f90
84 jacobi_module.mod  36 jacobi_test.out      88 sor_module.mod
 8 jacobi.o            4 machinefile           4 types_module.f90
20 jacobi_parallel     4 pgfile                4 types_module.mod

Two new files, jacobi_test.err and jacobi_test.out, are created by the job. The first file is the error log (notice that the file is zero length, indicating no errors). The second file contains the output from OpenLava and the application.

Buy this article as PDF

Express-Checkout as PDF

Pages: 2

Price $2.95
(incl. VAT)

Buy Raspberry Pi Geek

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content