Nagare Media Engine: Task Error Recovery in MPEG NBMP Workflows Through Event Sourcing
Neugebauer, Matthias
Abstract
Multimedia workflows have become complex distributed systems that are deployed in a multi-cloud and multi-edge fashion. Such systems are prone to errors either in hardware or software. Modern multimedia workflows therefore need to design an appropriate error-handling strategy. Because tasks are often computationally expensive and run for a long time, simply restarting and redoing prior work is an inadequate solution. Instead, tasks should create regular checkpoints and continue from the last good state after a restart. We propose to use an event sourcing approach for recording and potentially replaying state changes in the form of published events. In this paper, we adopt this approach for Network-Based Media Processing (NBMP), an MPEG standard published as ISO/IEC 23090-8 that defines APIs and data models for network-distributed multimedia workflows. Additionally, we developed our approach as an extension to Nagare Media Engine, our existing open source NBMP implementation based on the Kubernetes platform. We evaluated our approach in scene detection and video encoding scenarios with simulated disruptions and observed significant speedups.
Keywords
nbmp; network-distributed multimedia processing; error recovery; event sourcing; encoding