A Review of Abusive Language Detection Research in Online Media

Background

In the recent years, the menace of abusive language has been spreading online [1], poisoning the discussion climate in various outlets [2,3]. Facing pressure from their communities, advertisement partners and legal bodies especially newsmedia publishers shut down their discussion fora and comment functions [4,5]. Since this represents an inhibitor for democratic discussion, many journalists try to keep public contributions available [6] seeking for technology to support their work [7].

Striving to better understand the domain as well as searching for a solution, academia has produced a vast stream of publications in the last years, ranging from rather seminal publications [8-10] to smaller or more specific works [11-13].


Goal

The goal of the proposed thesis will be a multi-criterion analysis of the state-of-the-art/research landscape of abusive language research (incl. but not limited to hate speech, cyberbullying, profanity, …) in online outlets (foci can be put on social media as well as more traditional outlets such as newspapers). Aspects of analysis can be chosen by the candidate and might include one or multiple of the following:

  • social network analysis of abusive language researchers
  • descriptive analysis of publications on certain types of abusive language publications
  • distribution of publications across outlets (identification of major journals and conferences)
  • analysis of languages that have been subjected to abusive language research (identify covered and uncovered languages)
  • assessment of applied techniques (machine learning, feature generation, …)

Given that the thesis will supposedly contain a larger number of (descriptive) statistics the candidate is encouraged to make use of state-of-the-art visualization techniques to present his/her findings.


Literature

[1]           Köffer, S., Riehle, D. M., Höhenberger, S., and Becker, J. 2018. “Discussing the Value of Automatic Hate Speech Detection in Online Debates,” in Tagungsband Multikonferenz Wirtschaftsinformatik 2018, MKWI 2018, P. Drews, B. Funk, P. Niemeyer, and L. Xie (eds.), Lüneburg, Germany: Leuphana Universität.
[2]          Gardiner, B., Mansfield, M., Anderson, I., Holder, J., Louter, D., and Ulmanu, M. 2016. “The Dark Side of Guardian Comments,” The Guardian. (https://www.theguardian.com/technology/2016/apr/12/the-dark-side-of-guar..., accessed November 29, 2017).
[3]          Chatzakou, D., Kourtellis, N., and Blackburn, J. 2017. “Measuring #GamerGate: A Tale of Hate, Sexism, and Bullying,” in Proceedings of the 26th International Conference on World Wide Web Companion, WWW ’17 Companion, Perth, Australia: International World Wide Web Conferences Steering Committee, pp. 1285–1290.
[4]          Leurs, R. 2015. “Warum Wir Die Kommentarfunktion Teilweise Sperren,” Zeitgeist. (https://zeitgeist.rp-online.de/debatte/warum-wir-die-kommentarfunktion-t..., accessed December 1, 2017).
[5]          Plöchinger, S. 2016. “Über Den Hass,” Ploechinger.Tumblr.Com. (http://ploechinger.tumblr.com/post/140370770262/über-den-hass, accessed September 29, 2017).
[6]          Diakopoulos, N. 2015. “Picking the NYT Picks : Editorial Criteria and Automation in the Curation of Online News Comments,” #ISOJ, the Official Research Journal of ISOJ (5:1), pp. 147–166.
[7]          Bilton, R. 2014. “Why Some Publishers Are Killing Their Comment Sections,” Digiday UK. (https://digiday.com/media/comments-sections/, accessed November 29, 2017).
[8]          Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., and Chang, Y. 2016. “Abusive Language Detection in Online User Content,” in Proceedings of the 25th International Conference on World Wide Web, WWW ’16, Montreal, Canada: ACM Press, pp. 145–153.
[9]          Wulczyn, E., Thain, N., and Dixon, L. 2017. “Ex Machina,” in Proceedings of the 26th International Conference on World Wide Web, WWW ’17, Perth, Australia: ACM Press, pp. 1391–1399.
[10]         Burnap, P., and Williams, M. L. 2015. “Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making,” Policy & Internet (7:2), pp. 223–242.
[11]          Ross, B., Rist, M., Carbonell, G., Cabrera, B., Kurowsky, N., and Wojatzki, M. 2016. “Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis,” in Proceedings of the 3rd Workshop on Natural Language Processing for Computer-Mediated Communication, NLP4CMC III, M. Beißwenger, M. Wojatzki, and T. Zesch (eds.), Bochum, Germany: Stefanie Dipper, Sprachwissenschaftliches Institut, Ruhr-Universität Bochum, pp. 6–9.
[12]         Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., and Bhamidipati, N. 2015. “Hate Speech Detection with Comment Embeddings,” in Proceedings of the 24th International Conference on World Wide Web, WWW ’15 Companion, Florence, Italy: ACM Press, pp. 29–30.
[13]         Fišer, D., Erjavec, T., and Ljubešić, N. 2017. “Legal Framework, Dataset and Annotation Schema for Socially Unacceptable Online Discourse Practices in Slovene,” in Proceedings of the First Workshop on Abusive Language Online, ALW1, Z. Waseem, W. H. K. Chung, D. Hovy, and J. Tetreault (eds.), Vancouver, Canada: Association for Computational Linguistics, pp. 46–51.