RP-Mod & RP-Crowd: Moderator- and Crowd-Annotated German News Comment Datasets
Talk title: RP-Mod & RP-Crowd: Moderator- and Crowd-Annotated German News Comment Datasets
Speaker affiliation: Dennis Assenmacher, Chair of Data Science: Statistics and Optimizations, University of Münster
Talk abstract: Abuse and hate are penetrating social media and many comment sections of news media companies. These platform providers invest considerable efforts to moderate user-generated contributions to prevent losing readers who get appalled by inappropriate texts. This is further enforced by legislative actions, which make non-clearance of these comments a punishable action. While (semi-)automated solutions using Natural Language Processing and advanced Machine Learning techniques are getting increasingly sophisticated, the domain of abusive language detection still struggles as large non-English and well-curated datasets are scarce or not publicly available. In this talk, I elaborate on the largest annotated German abusive language comment datasets to date, published in the context of a research project between the University of Münster and one of the largest newspaper outlets in Germany, the Rheinische Post. In contrast to existing datasets, we achieve a high labeling standard by conducting a thorough crowd-based annotation study that complements professional moderators' decisions, which are also included in the dataset. We compare and cross-evaluate the performance of baseline algorithms and state-of-the-art transformer-based language models, which are fine-tuned on our datasets and an existing alternative, showing the usefulness for the community.
Short bio: Dennis Assenmacher is a computational social scientist at the GESIS Leibnitz Institute for the Social Sciences. He is part of the Data Science (DS) team at the Department of Computational Social Science (CSS). The team's mission is to develop and evaluate models based on behavioral data to analyze socio-cultural phenomena. His current research is concerned with harmful communication in online media, focusing on social bots and abusive language detection. He holds an MSc. in Information Systems from the University of Münster and recently submitted his dissertation at the Chair of Data Science: Statistics and Optimizations, University of Münster.