Age Matters: Analyzing Age-Related Discussions in App Reviews

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In recent years, mobile applications have become indispensable tools for managing various aspects of life. From enhancing productivity to providing personalized entertainment, mobile apps have revolutionized people’s daily routines. Despite this rapid growth and popularity, gaps remain in how these apps address the needs of users from different age groups. Users of varying ages face distinct challenges when interacting with mobile apps, from younger users dealing with inappropriate content to older users having difficulty with usability due to age-related vision and cognition impairments. Although there have been initiatives to create age-inclusive apps, a limited understanding of user perspectives on age-related issues may hinder developers from recognizing specific challenges and implementing effective solutions. In this study, we explore age discussions in app reviews to gain insights into how mobile apps should cater to users across different age groups.We manually curated a dataset of 4,163 app reviews from the Google Play Store and identified 1,429 age-related reviews and 2,734 non-age-related reviews. We employed eight machine learning, deep learning, and large language models to automatically detect age discussions, with RoBERTa performing the best, achieving a precision of 92.46%. Additionally, a qualitative analysis of the 1,429 age-related reviews uncovers six dominant themes reflecting user concerns.

💡 Research Summary

This paper investigates how age‑related concerns are expressed in mobile app reviews and whether such discussions can be automatically identified with high accuracy. The authors begin by highlighting the growing importance of age‑inclusive design as mobile applications become integral to daily life for users ranging from children to seniors. While prior work has examined usability and accessibility issues for specific age groups, no study has systematically mined large‑scale user reviews to uncover the breadth of age‑related feedback.

To fill this gap, the authors formulate two research questions: (RQ1) Can age‑related reviews be detected automatically and accurately? (RQ2) What types of age discussions exist in mobile app reviews? They build a labeled dataset by starting from a publicly available corpus of 7 million Google Play reviews collected in earlier work (Shahin et al.). From 70 popular Android apps they randomly sample 4,163 reviews. Using a curated list of 29 age‑related n‑grams (e.g., “children”, “teenager”, “senior”, “grandparent”), they filter candidate reviews and then have multiple annotators manually label each as age‑related or not, resulting in 1,429 age‑related and 2,734 non‑age‑related instances.

For automatic detection, eight classification approaches are evaluated: traditional machine‑learning models (Support Vector Machine, Random Forest, Logistic Regression), deep‑learning architectures (CNN, LSTM), and four pre‑trained transformer‑based language models (BERT, RoBERTa, DistilBERT, GPT‑2). The dataset is split 80 %/20 % for training and testing, and performance is measured with precision, recall, F1‑score, and accuracy. RoBERTa achieves the best results—precision 92.70 %, recall 92.39 %, F1 92.45 %, accuracy 92.39 %—demonstrating that a fine‑tuned transformer can reliably capture the subtle, often short, informal language of app reviews.

Having established a high‑performing classifier, the authors apply it to the full set of 4,163 reviews and extract the 1,429 age‑related ones for qualitative analysis. Through a combination of topic modeling and manual coding, six dominant themes emerge:

Content Age Appropriateness – Users, especially parents, report exposure to inappropriate ads, images, or language in apps marketed to children.
Language and Recommendations – Feedback highlights the need for age‑tailored wording, notifications, and recommendation algorithms that respect developmental stages.
Age Verification and Access Barriers – Many complain about overly strict, error‑prone, or invasive age‑verification mechanisms that hinder usability, particularly for seniors.
Usability and Accessibility Across Ages – Issues such as small fonts, low contrast, complex navigation, and lack of assistive features disproportionately affect older adults and users with visual or cognitive decline.
Privacy and Safety Concerns – Age‑specific sensitivities around data collection, location tracking, and sharing of personal information are raised, with heightened anxiety for children and teenagers.
Interactions, Relationships, and Feature Requests – Users suggest parent‑child sharing tools, family‑mode settings, and new functionalities that accommodate inter‑generational interaction.

Based on these findings, the authors propose actionable recommendations for developers: implement flexible, gradient‑based age restrictions rather than binary blocks; prioritize safety features (e.g., robust parental controls, content filters) in apps targeting younger audiences; improve age‑verification flows to reduce friction for older users; adopt accessibility best practices (larger touch targets, high‑contrast UI, voice guidance); and design privacy settings that are transparent and age‑appropriate.

The paper’s contributions are fourfold: (1) it is the first systematic study of age‑related discourse in mobile app reviews; (2) it provides a benchmark dataset and a comparative evaluation of eight classification models, identifying RoBERTa as the most effective; (3) it offers a nuanced taxonomy of six age‑related concern categories derived from real user feedback; and (4) it delivers concrete design and policy guidelines to help create more inclusive apps.

Limitations are acknowledged: the data are confined to Android/Google Play and English language, potentially missing cultural or platform‑specific nuances; manual annotation may involve subjective bias; and the study does not compare against the latest generation of large language models such as GPT‑4. Future work is suggested to expand to iOS and multilingual corpora, explore intersectional analyses (age × gender × culture), and integrate state‑of‑the‑art generative models for real‑time monitoring of age‑related feedback.

In conclusion, by demonstrating that age‑related discussions can be automatically detected with high precision and by uncovering the diverse concerns users express, this research equips developers, product managers, and platform moderators with evidence‑based insights to build mobile applications that are safer, more usable, and truly inclusive across the lifespan.

Age Matters: Analyzing Age-Related Discussions in App Reviews

💡 Research Summary

Comments & Academic Discussion

Leave a Comment