Denis Mayr Lima Martins

Reverse AD for an Array Language with Nested Parallelism

Redundant Execution: A Practical Backpropagator
Tuesday, 14. December 2021 - 12:00 to 13:00, Zoom

Title: Reverse AD for an Array Language with Nested Parallelism - Redundant Execution: A Practical Backpropagator

Speaker affiliation: Prof. Dr. Cosmin Oancea, University of Copenhagen, Denmark.

Talk abstract: Automatic differentiation (AD) is a practical way for computing derivatives of functions that are expressed as programs. AD has been recognized as one of the key pillars of the current machine learning (ML) revolution, and has key applications in other domains such as financial algorithms, computational fluid dynamics, atmospheric sciences, and engineering design optimization. This talk presents a technique for applying reverse AD on a higher-order functional array language, which allows programs to be written as a nested composition of parallel operators such as map, reduce, scan (also together with loops with in-place updates). We start by implementing the "tape" (required by reverse AD) by means of a code-generation technique that performs redundant computation, but which still preserves the work-depth asymptotic of the original program. More importantly, our technique revels a very useful time-space tradeoff that can be navigated by traditional compiler transformations: Essentially, strip-mining a look k times increases the re-execution overhead by a factor of k but exponentially decreases the memory footprint of checkpointing.   Having designed the glue that binds scopes together, we then discuss how to efficiently differentiate standard parallel constructs such as map, scan, (multi-)reduce. We report the implementation of both the forward- and reverse-mode AD as code transformations in the optimising compiler of the Futhark language, which allows to generate high-performance code for parallel execution on GPUs.   We present an experimental evaluation of our technique on nine benchmarks, including the applications from AD-Bench, but also a recurrent neural network (LSTM), and k-means. We compare against other state-of-the-art implementations, each tested on their own home ground, and demonstrate competitive performance on both sequential CPU and parallel GPU execution.

Short bio: Cosmin Oancea is an Associate Professor with the Department of Computer Science (DIKU), University of Copenhagen. His main research interests lie in the general field of programming language design and implementation, with focus on compiler optimizations aimed at parallel execution on highly-parallel hardware. He is one of the main architects of the optimizing compiler of the Futhark functional language. Previous to that, he has studied with Alan Mycroft (University of Cambridge) and Lawrence Rauchwerger (Texas A & M) various compiler analyses---ranging from static to entirely dynamic---related to automatic parallelization of (imperative) loop-based code. Cosmin holds a PhD from the University of Western Ontario, where he has studied topics related to language interoperability under the supervision of Stephen M. Watt.