PrivacyScore: Analyse von Webseiten auf Sicherheits- und Privatheitsprobleme -- Konzept und rechtliche Zul"assigkeit

PrivacyScore: Analyse von Webseiten auf Sicherheits- und   Privatheitsprobleme -- Konzept und rechtliche Zul"assigkeit
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

PrivacyScore ist ein "offentliches Web-Portal, mit dem automatisiert "uberpr"uft werden kann, ob Webseiten g"angige Mechanismen zum Schutz von Sicherheit und Privatheit korrekt implementieren. Im Gegensatz zu existierenden Diensten erm"oglicht PrivacyScore, mehrere Webseiten in Benchmarks miteinander zu vergleichen, die Ergebnisse differenziert und im Zeitverlauf zu analysieren sowie nutzerdefinierte Kriterien f"ur die Auswertung zu definieren. PrivacyScore verbessert dadurch nicht nur die Transparenz f"ur Endanwender, sondern erleichtert auch die Arbeit der Datenschutz-Aufsichtsbeh"orden. In diesem Beitrag stellen wir das Konzept des Dienstes vor und wir er"ortern, unter welchen Umst"anden das automatische Scannen und "offentliche “Anprangern” von Schw"achen aus rechtlicher Sicht zul"assig ist. – This German article describes the technical and legal considerations surrounding PrivacyScore, a public web portal that allows automatic scans of websites for privacy and security problems. For an English article discussing the same system in more technical detail, but lacking the legal interpretation, see arXiv:1705.05139.


💡 Research Summary

The paper presents PrivacyScore, a publicly accessible web portal designed to automatically assess and benchmark the security and privacy characteristics of websites. Unlike existing scanners that focus mainly on technical security aspects such as SSL/TLS configuration, PrivacyScore also evaluates privacy‑relevant factors—including the presence of tracking scripts, cookies, fingerprinting techniques, CDN usage, and the geographical location of third‑party services. Users can create “benchmarks” that group a set of URLs (e.g., all sites of a particular industry or all sites under a data‑protection authority’s supervision) and assign custom attributes (country, organization size, etc.) and weighting schemes to reflect their specific priorities. The platform then runs scans on a pool of virtual machines using open‑source tools such as OpenWPM and testssl.sh, collecting data on TLS versions, cipher suites, HSTS, Perfect Forward Secrecy, DNSSEC, server software versions, and the inclusion of external resources. Detected external resources are cross‑referenced with a curated list of known advertising and analytics services to identify privacy‑impacting elements. Results are stored in a time‑series database, enabling longitudinal visualisations that show how a site’s security posture evolves over time.

PrivacyScore offers both public and private benchmarks. Public benchmarks are displayed on the website and exposed via a REST API in both human‑readable and machine‑readable formats, fostering transparency and enabling researchers to conduct large‑scale analyses. Private benchmarks keep URLs and raw scan data confined to the provider’s infrastructure, ensuring that sensitive information is not inadvertently disclosed. The entire codebase is released under a GPL‑compatible license, allowing organizations—especially data‑protection authorities—to deploy isolated instances that meet stricter confidentiality requirements.

The authors address ethical concerns by implementing rate‑limiting (a new scan of the same site may only occur after 30 minutes) and by deliberately avoiding the publication of overly detailed exploit‑level data that could aid attackers. Instead, they present high‑level risk scores and remediation guidance (e.g., configuration hints for web servers, recommendations for privacy‑preserving analytics tools like Piwik).

A substantial portion of the paper is devoted to the legal analysis of automated website scanning in the German and EU context. The authors argue that, under German civil law (§ 903 BGB), ownership rights apply only to tangible objects, so the content of a website is not “owned” in a way that would prohibit its examination. However, they acknowledge that many sites’ terms of service expressly forbid automated crawling, which could constitute a contractual breach. Regarding copyright, the paper notes that while HTML, scripts, and images are protected works, the act of extracting only metadata and headers typically falls within permissible use, especially when the original files are not stored. From a data‑protection perspective (GDPR and ePrivacy), the scanning process does not directly process personal data, but the identification of tracking services may involve personal identifiers; therefore, the system minimizes data collection, anonymizes results, and aggregates findings to stay compliant. Potential tort liability under § 823 BGB (unlawful interference with a business’s operations) is mitigated through the aforementioned rate‑limiting and low‑impact scanning methodology.

In conclusion, PrivacyScore combines technical rigor, user‑customizable benchmarking, transparent result dissemination, and a thorough legal‑ethical framework to provide a novel service for end‑users, researchers, and supervisory authorities. Future work includes expanding the tracking‑service database, adding real‑time alerting, collaborating with EU data‑protection agencies to establish standards, and integrating AI‑driven risk prediction to further reduce the chance of misuse.


Comments & Academic Discussion

Loading comments...

Leave a Comment