The Münster Skeleton Library Muesli
The Muenster Skeleton Library Muesli is a C++ template library enabling the hassle-free programming of multi-node, multi-core cluster computers by implementing the concept of so-called algorithmic skeletons (skeletons for short). In essence, skeletons are higher order functions and encapsulate typical parallel computation/communication patterns. By predefining them in a library, the user does not need to bother about programming problems typically encountered when using MPI and/or OpenMP manually, such as deadlocks, starvation, mutual exclusion etc. Instead, when using Muesli, all communication details are encapsulated inside our library, such that parallel programming is taken to a higher level of abstraction. Users do not need to bother with MPI and/or OpenMP, but can simply implement parallel programs as if they were sequential thanks to the SPMD execution model underlying our library. In essence, Muesli makes parallel programming easier, safer, and less error-prone.
Currently, our library is available in version 2.2 and is published under the MIT License. It can be downloaded here. If you would like to have a look at some older version, please follow the links below:
Key features
The main features of our library are as follows:- Support of task and data parallel skeletons. A complete list of all skeletons can be found here.
- Scalability across nodes and cores by simultaneously using MPI and OpenMP, respectively.
- Nesting. Users can build complex skeletal topologies by arbitrary nesting both task and data parallel skeletons.
- Sequential programming. When using our library, parallel programming reduces to sequentially calling skeleton functions which are internally implemented in parallel.
- Safety. Users do not have to bother with typical parallel programming problems such as starvation or deadlocks. All these problems are taken care of inside our library.
Skeletons
Our library offers task parallel as well as data parallel skeletons:- The following task parallel skeletons are offered by our library: Pipeline, Farm, BranchAndBound, and DivideAndConquer. Auxiliary skeletons are Atomic, Filter, Final, and Initial.
- Data parallel skeletons are available as functions of a distributed data structure. Currently, Muesli provides the following distributed data structures: DistributedArray, DistributedMatrix, and DistributedSparseMatrix. All of them provide data parallel skeletons such as fold, map, scan, and zip and their variants.
Technical Aspects
Our library is written in standard C++ and makes use of MPI and OpenMP. While MPI is used for the communication between nodes, OpenMP is used to parallelize certain functions such that they are executed more efficiently on multi-core cluster computers. Thus, our library scales on multi-node, single-core as well as on multi-node, multi-core computer architectures. The scalability across nodes is activated at compile-time by providing a certain switch to your C++ compiler. Depending on the compiler, the switch is as follows:
- GNU C Compiler (GCC) 4.4.2: -fopenmp
- IBM XL C/C++ 10.1: -qsmp
- Intel ICC 11.1: -openmp
- Microsoft Visual C 2008: /openmp
- Sun Studio 12, Update 1: -xopenmp
If you only want to use a single core per node or you only have a single-core cluster at hand, simply omit the switch and compile/link the program as usual. At this point it is crucial to note, that your program will only benefit from a multi-node environment, if exactly one MPI process is started per node. The remaining cores will come into play when data parallel skeletons are executed (see Example). In a multi-node environment, the administrator of your cluster should have already deactivated the so-called CPU affinity, such that this prerequisite is met.
Downloads
You can download our library in two different zip-files which contain all necessary header and source files:
- The first version contains all necessary header and source files, but no test cases and can be downloaded here.
- The second version contains all header and source files including various test cases and can be downloaded here.
Changelog
Version 2.2:- Muesli now is published under the MIT License
- Major and minor bugfixes
- A couple of bugfixes were made.
- Full compatibility with the GNU C Compiler Collection.
- Added a Makefile for automated building/rebuilding/running of all test cases
- Changed the building/running behaviour of all test cases.
Compared to version 1.79, the following major changes have been implemented in the current version 2.0:
- All classes are now declared in the namespace msl. Thus, you either have to use the fully qualified name of a class, e.g. msl::DistributedArray, or simply insert a so-called using directive into your source code, i.e. using namespace msl. However, the former is preferable.
- All data parallel skeletons of our distributed data structures now efficiently scale on multi-node, multi-core computer architectures by additionally using OpenMP. How to activate this feature is described in section Technical Aspects and Example.
- All task parallel skeletons have been declared in separate files.
- All static global variables and some constants have been moved to the new class msl::Muesli.
- All tests have been moved to the namespace msl::test. Furthermore, additonal tests have been implemented.
- The header file Serializable.h declares and implements two functions msl::read and msl::write which help to handle serialized data and void* arrays. Additionally, numerous auxiliary constants are defined.
Building and running Muesli
For Unix-based operating systems just extract the zip-file with the test cases included, go to the location where you extracted the file and type
make
for building Muesli. See the Makefile for more information.For Windows-based operating systems see the example below.
Example
Let's have a look at a very simple example of how to use the class DistributedSparseMatrix to calculate the Frobenius norm of the matrix (the Frobenius norm of a matrix A is defined as the square root of the sum of all values squared):
To compile the program on a multi-node, multi-core cluster computer using mpiCC (mpicxx, mpic++), type in the following:
After compiling, you need to link your application by typing in the following:
mpiCC -openmp -o FrobeniusNorm BsrIndex.o Distribution.o FrobeniusNorm.o Muesli.o OAL.o RoundRobinDistribution.o
Finally, you can start your application using the job scheduler installed on your cluster. Keep in mind that only one MPI process must be started per node, if you want to make use of multiple cores per node. As explained above, this is usually done by deactivating the so-called CPU affinity which should have already been taken care of by the administrator of your cluster.
Publications
- Philipp Ciechanowicz and Herbert Kuchen: Enhancing Muesli's Data Parallel Skeltons for Multi-Core Computer Architectures. In: Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications (HPCC). Melbourne, Victoria, Australia, pp. 108-113, DOI 10.1109/HPCC.2010.23.
- Philipp Ciechanowicz, Michael Poldner, Herbert Kuchen: The Münster Skeleton Library Muesli - A Comprehensive Overview. ERCIS Working Paper No. 7, 2009. Abstract Paper
- Philipp Ciechanowicz: Algorithmic Skeletons for General Sparse Matrices on Multi-Core Processors. In: Proceedings of The 20th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS). Orlando, Florida, USA, 2008, p. 188-197. Abstract (pdf) Buy
- Philipp Ciechanowicz, Stephan Dlugosz, Herbert Kuchen, Ulrich Müller-Funk: Exploiting Training Example Parallelism with a Batch Variant of the ART 2 Classification Algorithm. In: Proceedings of The IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN) as part of The 26th IASTED International Multi-Conference on Applied Informatics. Innsbruck, Austria, 2008, p. 195-201. Abstract (pdf) Buy
- Michael Poldner, Herbert Kuchen: Task Parallel Skeletons for Divide and Conquer. Proceedings of Workshop of the Working Group Programming Languages and Computing Concepts of the German Computer Science Association GI, Bad Honnef, 2008.
- Michael Poldner, Herbert Kuchen: Optimizing Skeletal Stream Processing for Divide and Conquer. Proceedings of the 3rd International Conference on Software and Data Technologies (ICSOFT), pages 181-189, INSTICC PRESS, 2008.
- Michael Poldner, Herbert Kuchen: Algorithmic Skeletons for Branch and Bound. ICSOFT 2006, CCIS 10, pages 204–219, Springer, 2008.
- Michael Poldner, Herbert Kuchen: On Implementing the Farm Skeleton. Parallel Processing Letters, Vol. 18, No. 1, pages 117-131, March 2008.
- Michael Poldner, Herbert Kuchen: Skeletons for Divide and Conquer Algorithms. Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN), Innsbruck, Austria, February, IASTED/ACTA Press 2008.
- Michael Poldner, Herbert Kuchen: Algorithmic Skeletons for Branch & Bound. Proceedings of 1st International Conference on Software and Data Technology (ICSOFT), Vol. 1, pages 291-300, Setubal, Portugal, 2006.
- Michael Poldner, Herbert Kuchen: Scalable Farms. Proceedings of the International Conference ParCo 2005, NIC Series, Vol. 33, pages 795-802, 2006.
- Michael Poldner, Herbert Kuchen: On Implementing the Farm Skeleton. Proceedings of 3rd International Workshop on High-level Parallel Programming and Applications (HLPP), Warwick, 2005.



