Data Parallel Algorithmic Skeletons with Accelerator Support

Ernsting, S; Kuchen, H

Data Parallel Algorithmic Skeletons with Accelerator Support

Ernsting S, Kuchen H

Abstract

Hardware accelerators such as GPUs or Intel Xeon Phi comprise hundreds or thousands of cores on a single chip and promise to deliver high performance. They are widely used to boost the performance of highly parallel applications. However, because of their diverging architectures programmers are facing diverging programming paradigms. Programmers also have to deal with low-level concepts of parallel programming that make it a cumbersome task. In order to assist programmers in developing parallel applications Algorithmic Skeletons have been proposed. They encapsulate well-defined, frequently recurring parallel programming patterns, thereby shielding programmers from low-level aspects of parallel programming. The main contribution of this paper is a comparison of two skeleton library implementations, one in C++ and one in Java, in terms of library design and programmability. Besides, on the basis of four benchmark applications we evaluate the performance of the presented implementations on two test systems, a GPU cluster and a Xeon Phi system. The two implementations achieve comparable performance with a slight advantage for the C++ implementation. Xeon Phi performance ranges between CPU and GPU performance.

Cite as

Ernsting, S., & Kuchen, H. (2016). Data Parallel Algorithmic Skeletons with Accelerator Support. International Journal of Parallel Programming, 2016, 1–17.

Details

Publication type

Research article (journal)

Peer reviewed

Yes

Publication status

Published

Year

2016

Journal

International Journal of Parallel Programming

Volume

2016

Start page

1

End page

17

Language

English

ISSN

1573-7640

DOI

https://doi.org/10.1007/s10766-016-0416-7

Full text

http://dx.doi.org/