A Scalable Farm Skeleton for Heterogeneous Parallel Programming
Ernsting Steffen, Kuchen Herbert
Nowadays, multi-core processors and GPUs with thousands of cores are omnipresent. Fully exploiting their resources involves dealing with low-level concepts of parallel programming. These low-level concepts still constitute a high barrier to efficient development of parallel applications. That is why we need high-level tools for parallel programming. In order to assist programmers in developing performant and reliable parallel applications Algorithmic Skeletons have been proposed. They encapsulate well-defined, frequently recurring parallel programming patterns, thereby shielding programmers from low-level aspects of parallel programming. In this paper we take on the design and implementation of the well-known Farm skeleton. In order to address heterogeneous computing platforms we present a multi-tier implementation on top of MPI, OpenMP, and CUDA. On the basis of two benchmark applications, including an interacting particles system and a ray tracing application, we illustrate the advantages of both skeletal programming in general and this multi-tier approach in particular.