A Scalable Farm Skeleton for Hybrid Parallel and Distributed Programming
Ernsting Steffen, Kuchen Herbert
Multi-core processors and clusters of multi-core processors are ubiquitous. They provide scalable performance yet introducing complex and low-level programming models for shared and distributed memory programming. Thus, fully exploiting the potential of shared and distributed memory parallelization can be a tedious and error-prone task: programmers must take care of low-level threading and communication (e.g. message passing) details. In order to assist programmers in developing performant and reliable parallel applications Algorithmic Skeletons have been proposed. They encapsulate well-defined, frequently recurring parallel and distributed programming patterns, thus shielding programmers from low-level aspects of parallel and distributed programming. In this paper we take on the design and implementation of the well-known Farm skeleton. In order to address the hybrid architecture of multi-core clusters we present a two-tier implementation built on top of MPI and OpenMP. On the basis of three benchmark applications, including a simple ray tracer, an interacting particles system, and an application for calculating the Mandelbrot set, we illustrate the advantages of both skeletal programming in general and this two-tier approach in particular.
algorithmic skeletons; parallel programming; farm