Subnavigation: 

Metanavigation: 

AACSB accredited

wissen.leben | WWU Münster 


The Münster Skeleton Library Muesli

The Muenster Skeleton Library Muesli is a C++ template library enabling the hassle-free programming of multi-node, multi-core cluster computers by implementing the concept of so-called algorithmic skeletons (skeletons for short). In essence, skeletons are higher order functions and encapsulate typical parallel computation/communication patterns. By predefining them in a library, the user does not need to bother about programming problems typically encountered when using MPI and/or OpenMP manually, such as deadlocks, starvation, mutual exclusion etc. Instead, when using Muesli, all communication details are encapsulated inside our library, such that parallel programming is taken to a higher level of abstraction. Users do not need to bother with MPI and/or OpenMP, but can simply implement parallel programs as if they were sequential thanks to the SPMD execution model underlying our library. In essence, Muesli makes parallel programming easier, safer, and less error-prone.

Currently, our library is available in version 2.2 and is published under the MIT License. It can be downloaded here. If you would like to have a look at some older version, please follow the links below:


Key features

The main features of our library are as follows:
  • Support of task and data parallel skeletons. A complete list of all skeletons can be found here.
  • Scalability across nodes and cores by simultaneously using MPI and OpenMP, respectively.
  • Nesting. Users can build complex skeletal topologies by arbitrary nesting both task and data parallel skeletons.
  • Sequential programming. When using our library, parallel programming reduces to sequentially calling skeleton functions which are internally implemented in parallel.
  • Safety. Users do not have to bother with typical parallel programming problems such as starvation or deadlocks. All these problems are taken care of inside our library.

Skeletons

Our library offers task parallel as well as data parallel skeletons:
  • The following task parallel skeletons are offered by our library: Pipeline, Farm, BranchAndBound, and DivideAndConquer. Auxiliary skeletons are Atomic, Filter, Final, and Initial.
  • Data parallel skeletons are available as functions of a distributed data structure. Currently, Muesli provides the following distributed data structures: DistributedArray, DistributedMatrix, and DistributedSparseMatrix. All of them provide data parallel skeletons such as fold, map, scan, and zip and their variants.

Technical Aspects

Our library is written in standard C++ and makes use of MPI and OpenMP. While MPI is used for the communication between nodes, OpenMP is used to parallelize certain functions such that they are executed more efficiently on multi-core cluster computers. Thus, our library scales on multi-node, single-core as well as on multi-node, multi-core computer architectures. The scalability across nodes is activated at compile-time by providing a certain switch to your C++ compiler. Depending on the compiler, the switch is as follows:

  • GNU C Compiler (GCC) 4.4.2: -fopenmp
  • IBM XL C/C++ 10.1: -qsmp
  • Intel ICC 11.1: -openmp
  • Microsoft Visual C 2008: /openmp
  • Sun Studio 12, Update 1: -xopenmp

If you only want to use a single core per node or you only have a single-core cluster at hand, simply omit the switch and compile/link the program as usual. At this point it is crucial to note, that your program will only benefit from a multi-node environment, if exactly one MPI process is started per node. The remaining cores will come into play when data parallel skeletons are executed (see Example). In a multi-node environment, the administrator of your cluster should have already deactivated the so-called CPU affinity, such that this prerequisite is met.


Downloads

You can download our library in two different zip-files which contain all necessary header and source files:

  • The first version contains all necessary header and source files, but no test cases and can be downloaded here.
  • The second version contains all header and source files including various test cases and can be downloaded here.

Changelog

Version 2.2:
  • Muesli now is published under the MIT License
  • Major and minor bugfixes
Version 2.1:
  • A couple of bugfixes were made.
  • Full compatibility with the GNU C Compiler Collection.
  • Added a Makefile for automated building/rebuilding/running of all test cases
  • Changed the building/running behaviour of all test cases.

Compared to version 1.79, the following major changes have been implemented in the current version 2.0:

  • All classes are now declared in the namespace msl. Thus, you either have to use the fully qualified name of a class, e.g. msl::DistributedArray, or simply insert a so-called using directive into your source code, i.e. using namespace msl. However, the former is preferable.
  • All data parallel skeletons of our distributed data structures now efficiently scale on multi-node, multi-core computer architectures by additionally using OpenMP. How to activate this feature is described in section Technical Aspects and Example.
  • All task parallel skeletons have been declared in separate files.
  • All static global variables and some constants have been moved to the new class msl::Muesli.
Additionally, there are some minor improvements/enhancements:
  • All tests have been moved to the namespace msl::test. Furthermore, additonal tests have been implemented.
  • The header file Serializable.h declares and implements two functions msl::read and msl::write which help to handle serialized data and void* arrays. Additionally, numerous auxiliary constants are defined.

Building and running Muesli

For Unix-based operating systems just extract the zip-file with the test cases included, go to the location where you extracted the file and type

make

for building Muesli. See the Makefile for more information.
For Windows-based operating systems see the example below.

Example

Let's have a look at a very simple example of how to use the class DistributedSparseMatrix to calculate the Frobenius norm of the matrix (the Frobenius norm of a matrix A is defined as the square root of the sum of all values squared):

/* This is file "FrobeniusNorm.cpp"
*/
#include <cmath>
#include "DistributedSparseMatrix.h"
#include "Muesli.h"

double square(double x) {
  return x * x;
}

double sum(double x, double y) {
  return x + y;
}

int main(int argc, char** argv) {
  int n = 4; int m = 4; int r = 2; int c = 2;
  msl::InitSkeletons(argc, argv);
  msl::DistributedSparseMatrix<double> dsm(n, m, r, c, 0);

  for(int i = 0; i < n; i++) {
    for(int j = 0; j < m; j++) {
      dsm.setElement(i + j + 1, i, j);
    }
  }

  dsm.mapInPlace(&square);
  printf("||A||_F = %f\n", sqrt(dsm.fold(&sum)));
  msl::TerminateSkeletons();
  return 0;
}

To compile the program on a multi-node, multi-core cluster computer using mpiCC (mpicxx, mpic++), type in the following:

mpiCC -openmp -c BsrIndex.cpp Distribution.cpp FrobeniusNorm.cpp Muesli.cpp OAL.cpp RoundRobinDistribution.cpp

After compiling, you need to link your application by typing in the following:

mpiCC -openmp -o FrobeniusNorm BsrIndex.o Distribution.o FrobeniusNorm.o Muesli.o OAL.o RoundRobinDistribution.o

Finally, you can start your application using the job scheduler installed on your cluster. Keep in mind that only one MPI process must be started per node, if you want to make use of multiple cores per node. As explained above, this is usually done by deactivating the so-called CPU affinity which should have already been taken care of by the administrator of your cluster.


Publications



Impressum | © Praktische Informatik