A methodology for analyzing financial needs hierarchy from social discussions using LLM

A methodology for analyzing financial needs hierarchy from social discussions using LLM
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This study examines the hierarchical structure of financial needs as articulated in social media discourse, employing generative AI techniques to analyze large-scale textual data. While human needs encompass a broad spectrum from fundamental survival to psychological fulfillment financial needs are particularly critical, influencing both individual well-being and day-to-day decision-making. Our research advances the understanding of financial behavior by utilizing large language models (LLMs) to extract and analyze expressions of financial needs from social media posts. We hypothesize that financial needs are organized hierarchically, progressing from short-term essentials to long-term aspirations, consistent with theoretical frameworks established in the behavioral sciences. Through computational analysis, we demonstrate the feasibility of identifying these needs and validate the presence of a hierarchical structure within them. In addition to confirming this structure, our findings provide novel insights into the content and themes of financial discussions online. By inferring underlying needs from naturally occurring language, this approach offers a scalable and data-driven alternative to conventional survey methodologies, enabling a more dynamic and nuanced understanding of financial behavior in real-world contexts.


💡 Research Summary

This paper investigates whether financial needs, like other human motivations, are organized in a hierarchical fashion, and it does so by mining large‑scale, naturally occurring text from Reddit. The authors begin by reviewing two theoretical frameworks that have been used to conceptualize need hierarchies: (1) a Maslow‑derived Needs Hierarchy Framework (NHF) that distinguishes basic, safety, love/belonging, esteem, self‑transcendence and self‑actualization levels, and (2) a Financial Needs Prioritization Framework (NPF) that groups needs into consumption/immediate, emergency savings, and retirement/wealth/lifestyle improvement. Both frameworks predict that as individuals acquire more resources, they move from lower‑order to higher‑order financial concerns.

Data were collected via the Pushshift API from four finance‑related subreddits (r/personalfinance, r/FinancialPlanning, r/investing, r/EstatePlanning) covering the period 1 January 2020 to 31 December 2023. From more than 860 000 posts, the authors filtered for users who explicitly mentioned both age and income in at least one post, resulting in a final sample of 334 users and 6 709 posts (average 21 posts per user). This filtering ensures that each user can be linked to demographic variables, but it also introduces a selection bias toward more open users.

The methodological pipeline consists of two main components: (i) large language model (LLM) processing and (ii) topic modeling. The authors employ Llama‑family models accessed through the Groq API to (a) generate concise summaries of each post, (b) extract a “core query” and two auxiliary queries, (c) label the financial need expressed (e.g., “paying tuition – saving”, “buying a house – mortgage”), (d) map each need to the NHF and NPF categories using carefully crafted prompts, and (e) derive behavioral attributes such as emotion (via text2emotion), stress level (four‑point scale), and risk propensity (cautious, calculative, chance‑taking). Prompt design was iteratively refined on a validation subset, though the exact prompt text is not disclosed, limiting reproducibility.

For thematic analysis, the authors apply Latent Dirichlet Allocation (LDA) using the MALLET toolkit. Rather than relying on perplexity or coherence, they select the optimal number of topics (k) by minimizing the count of needs with negative skewness in their topic distributions, a novel criterion that balances topic concentration and spread. This yields a set of interpretable topics that align with the hierarchical need categories (e.g., “rent & utilities”, “investment strategies”, “retirement planning”, “travel & hobbies”).

Statistical results confirm the hypothesized hierarchy. In the NHF panel, average monthly income rises from $6 536 for basic needs to $7 568 for love/belonging, and further to $8 410 for self‑actualization, indicating a monotonic relationship between income and need level. In the NPF panel, consumption/immediate needs average $6 774, emergency savings $6 952, and retirement/wealth/lifestyle improvement $7 232, again showing higher income among users expressing higher‑order financial concerns. Correlations between age and income follow a classic life‑cycle pattern: income increases through the 21‑30, 31‑40, and 41‑50 cohorts, peaks in the 51‑60 group ($9 158), then declines for users over 60.

Behavioral attribute analysis reveals systematic patterns. Users in higher‑income, older brackets display more positive emotions, lower stress, and a shift toward “chance‑taking” risk propensity, especially when mapped to self‑actualization needs. Conversely, younger, lower‑income users cluster in basic and safety needs, exhibit higher stress levels, and adopt a more cautious risk stance. These findings echo prior finance‑behavior literature linking resource availability to risk tolerance and emotional wellbeing.

The authors acknowledge several limitations. First, the reliance on self‑disclosed age and income creates a non‑random sample that may over‑represent financially literate or socially open individuals. Second, the lack of detailed prompt specifications hampers replication and external validation of the LLM extraction pipeline. Third, the analysis excludes comment threads, thereby missing the interactive dimension where needs may be addressed, negotiated, or resolved by the community. Future work could integrate multimodal signals (comments, images, links) and longitudinal modeling to capture dynamic transitions between need levels over time.

In sum, this study demonstrates that large‑scale, unstructured social media data can be systematically transformed into structured variables reflecting financial need hierarchies. By coupling state‑of‑the‑art LLM extraction with classic topic modeling, the authors provide empirical support for the existence of a tiered financial‑need structure that aligns with income, age, and behavioral traits. The work offers a scalable alternative to traditional survey‑based approaches, opening avenues for real‑time monitoring of financial well‑being and for more nuanced, data‑driven financial product design.


Comments & Academic Discussion

Loading comments...

Leave a Comment