Interactivity for Data Quality: A Design Science Approach for Interactive Data Collection Systems Increasing Data Quality
Speaker: Stefan Morana (Uni Saarland)
Abstract: High-quality data is essential for the development of contemporary information systems that, for example, leverage supervised machine learning techniques. This data often needs to be provided by human data contributors, for example, by labeling objects on images for subsequently applied machine learning. Despite its importance, there is surprisingly little research on the design of data collection systems for contributors that label or provide the data. Moreover, these data contributors have a strong impact on the resulting data quality. Therefore, by drawing on the theory of interactive media effects, we derive a nascent design theory for interactive data collection systems that positively impact data quality. We evaluate our design in three evaluation episodes and demonstrate the positive effects that instantiating the proposed design has on data quality in two different data collection contexts. Our work contributes with a nascent design theory for interactive data collection systems to increase data quality in the design of data collection systems. The proposed design addresses one of the major issues of data collection resulting in poor data quality: poorly designed data collection systems that do not enable data contributors to deliver high-quality data. Thereby, we contribute to research and practice by providing a solution to the critical challenge of collecting high-quality data for the development of contemporary information systems.