Challenges in Android Data Disclosure: An Empirical Study

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Current legal frameworks enforce that Android developers accurately report the data their apps collect. However, large codebases can make this reporting challenging. This paper employs an empirical approach to understand developers’ experience with Google Play Store’s Data Safety Section (DSS) form. We first survey 41 Android developers to understand how they categorize privacy-related data into DSS categories and how confident they feel when completing the DSS form. To gain a broader and more detailed view of the challenges developers encounter during the process, we complement the survey with an analysis of 172 online developer discussions, capturing the perspectives of 642 additional developers. Together, these two data sources represent insights from 683 developers. Our findings reveal that developers often manually classify the privacy-related data their apps collect into the data categories defined by Google-or, in some cases, omit classification entirely-and rely heavily on existing online resources when completing the form. Moreover, developers are generally confident in recognizing the data their apps collect, yet they lack confidence in translating this knowledge into DSS-compliant disclosures. Key challenges include issues in identifying privacy-relevant data to complete the form, limited understanding of the form, and concerns about app rejection due to discrepancies with Google’s privacy requirements. These results underscore the need for clearer guidance and more accessible tooling to support developers in meeting privacy-aware reporting obligations.

💡 Research Summary

This paper investigates the practical difficulties Android developers face when completing Google Play’s Data Safety Section (DSS) form, a mandatory privacy‑labeling mechanism introduced in 2022. The authors aim to understand why many apps still misreport data collection despite regulatory pressure from GDPR and Google’s enforcement actions. To answer this, they adopt a mixed‑methods approach that combines (1) an online survey of 41 Android developers and (2) a qualitative analysis of 172 public developer discussions (covering 642 unique participants) drawn from Stack Overflow, Reddit, Discord, GitHub, and Hacker News.

The survey was conducted in three waves between August and December 2024, using broad calls on X, LinkedIn, and Reddit, followed by direct outreach to over 150 developers on LinkedIn. Respondents had an average of 5.5 years of Android experience; 90 % had previously filled out a DSS form and 56 % had published apps in the European Union. The questionnaire comprised Likert‑scale items, multiple‑choice, and open‑ended questions aligned with three research questions (RQ1–RQ3): (RQ1) methods and resources used for mapping app‑collected data to DSS categories, (RQ2) developers’ confidence in correctly completing the DSS, and (RQ3) challenges encountered during the process.

For the online‑discussion component, the authors systematically searched the five platforms using the query “DSS OR ‘data safety section’ OR ‘data safety form’”. After extracting 2,351 posts, a two‑stage filtering process retained only those authored by developers and directly related to DSS completion, yielding 172 posts. The authors then performed open coding, followed by axial coding, resulting in a codebook of 24 codes grouped into five sub‑themes. Coding was performed independently by two authors and discrepancies were resolved through discussion.

Key findings:

Data Classification Practices (RQ1) – The majority of developers (≈68 %) manually map their app’s data collection to Google’s predefined categories and types. Only a small minority (≈12 %) use any form of automated tooling (e.g., static analysis), citing concerns about accuracy and coverage. About 20 % either omit classification entirely (marking “no data”) or over‑report by including data that is not actually collected, reflecting a “play‑it‑safe” strategy.
Confidence Levels (RQ2) – While developers feel confident (average 4.2/5) about recognizing what data their apps collect, their confidence drops sharply (average 2.8/5) when translating this knowledge into the DSS form. Understanding of GDPR versus Google’s specific definitions is especially weak, and the purpose‑selection dropdown is perceived as ambiguous by 62 % of respondents.
Challenges (RQ3) – Three recurring problem areas emerged:
- Identifying privacy‑relevant data – Implicit data such as crash logs, device identifiers, or telemetry are hard to classify as personal or non‑personal.
- Form ambiguity – The purpose options, optional/mandatory flags, and “ephemeral” handling fields lack clear explanations, leading to inconsistent entries.
- Fear of rejection – Developers report anxiety about Google’s review process, which is not transparent. Some have experienced app removal or rejection due to “excessive data collection” or “unclear purpose”, prompting either overly conservative or overly permissive disclosures.

The discussion of online posts provides concrete anecdotes: a developer admitted marking “unknown” for device IDs in crash reports, while another described a two‑week turnaround after a rejection that cited vague purpose statements without actionable guidance.

Based on these insights, the authors argue that the DSS system suffers from systemic design shortcomings rather than isolated developer errors. They propose three concrete improvements: (1) Automation support – static analysis tools that extract data types from source code and pre‑populate the DSS fields; (2) Clearer documentation – side‑by‑side mapping of GDPR concepts to Google’s categories, with concrete examples for each purpose; and (3) Transparent review criteria – public, standardized guidelines on what triggers a rejection, enabling developers to anticipate compliance issues.

Limitations are acknowledged: the survey sample may be self‑selecting, the online discussion set excludes private corporate channels, and no quantitative validation (e.g., measuring actual code‑form alignment) was performed.

In conclusion, the study provides the first empirical evidence of the specific pain points developers encounter with Google’s DSS, highlighting a gap between regulatory intent and practical implementation. By exposing manual classification reliance, low confidence in form translation, and fear of opaque enforcement, the paper makes a strong case for tooling, better guidance, and policy transparency to improve privacy‑aware reporting in the Android ecosystem.

Challenges in Android Data Disclosure: An Empirical Study

💡 Research Summary

Comments & Academic Discussion

Leave a Comment