Benchmarking Sentence Embeddings in Textual Stream Clustering with Applications to Campaign Detection

Stampe; Lucas;, Lütke-Stockdiek; Janina;, Grimme; Britta;, Grimme; Christian

Benchmarking Sentence Embeddings in Textual Stream Clustering with Applications to Campaign Detection

Stampe, Lucas; Lütke-Stockdiek, Janina; Grimme, Britta; Grimme, Christian

Zusammenfassung

Motivated by the emergence of large language models, we conduct a benchmark of sentence embeddings used to represent short texts in textual stream clustering. We achieve comparable results by adapting a non-textual stream clustering algorithm to use sentence embeddings compared to textual stream clustering approaches that use other textual representation mechanisms. Benchmarking datasets with differing degrees of preprocessing are used. The results suggest that the chosen approach using sentence embeddings does not perform as well as previous approaches on preprocessed datasets but has more significant potential on less preprocessed datasets. This highlights the need for new and more application-oriented benchmarking datasets for stream clustering. Further, we conduct a case study in the context of social media campaign detection and show that the approaches are able to find traces of orchestrated activities.

Schlüsselwörter

stream clustering; embeddings; benchmark

Zitieren als

Stampe, L., Lütke-Stockdiek, J., Grimme, B., & Grimme, C. (2024). Benchmarking Sentence Embeddings in Textual Stream Clustering with Applications to Campaign Detection. In Hirose, A., Ishibuchi, H., Jayne, C., & , (Eds.), Proceedings of the IEEE World Congress on Computational Intelligence (WCCI) — International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). New Jersey: Wiley-IEEE Press.

Details

Publikationstyp

Forschungsartikel in Sammelband (Konferenz)

Begutachtet

Ja

Publikationsstatus

Veröffentlicht

Jahr

2024

Konferenz

IEEE World Congress on Computational Intelligence

Konferenzort

Yokohama

Buchtitel

Proceedings of the IEEE World Congress on Computational Intelligence (WCCI) - International Joint Conference on Neural Networks (IJCNN)

Herausgeber

Hirose, Akira; Ishibuchi, Hisao; Jayne, Chrisina;

Erste Seite

1

Letzte Seite

8

Verlag

Wiley-IEEE Press

Ort

New Jersey

Sprache

Englisch

ISSN

2161-4407

ISBN

979-8-3503-5931-2

DOI

https://doi.org/10.1109/IJCNN60899.2024.10650595

Gesamter Text

https://ieeexplore.ieee.org/