Unifying Privacy Policy Detection

Hosseini; Henry;, Degeling; Martin;, Utz; Christine;, Hupperich; Thomas

Unifying Privacy Policy Detection

Hosseini, Henry; Degeling, Martin; Utz, Christine; Hupperich, Thomas

Abstract

Privacy policies have become a focal point of privacy research. With their goal to reflect the privacy practices of a website, service, or app, they are often the starting point for researchers who analyze the accuracy of claimed data practices, user understanding of practices, or control mechanisms for users. Due to vast differences in structure, presentation, and content, it is often challenging to extract privacy policies from online resources like websites for analysis. In the past, researchers have relied on scrapers tailored to the specific analysis or task, which complicates comparing results across different studies.

To unify future research in this field, we developed a toolchain to process website privacy policies and prepare them for research purposes. The core part of this chain is a detector module for English and German, using natural language processing and machine learning to automatically determine whether given texts are privacy or cookie policies. We leverage multiple existing data sets to refine our approach, evaluate it on a recently published longitudinal corpus, and show that it contains a number of misclassified documents. We believe that unifying data preparation for the analysis of privacy policies can help make different studies more comparable and is a step towards more thorough analyses. In addition, we provide insights into common pitfalls that may lead to invalid analyses.

Keywords

Privacy policy; Data handling; Policy detector; Natural language processing

Cite as

Hosseini, H., Degeling, M., Utz, C., & Hupperich, T. (2021). Unifying Privacy Policy Detection. Proceedings on Privacy Enhancing Technologies (PoPETs), 2021(4), 480–499.

Details

Publication type

Research article (journal)

Peer reviewed

Yes

Publication status

Published

Year

2021

Conference

Privacy Enhancing Technologies Symposium (PETS)

Venue

The Internet

Journal

Proceedings on Privacy Enhancing Technologies

Volume

2021

Issue

4

Book title

Proceedings on Privacy Enhancing Technologies

Editor

Johnson, Aaron; Kerschbaum, Florian

Start page

480

End page

499

Publisher

Sciendo / De Gruyter

Place

Online

Language

English

ISSN

2299-0984

DOI

https://doi.org/10.2478/popets-2021-0081

Full text

https://doi.org/