SMART: A Social Movement Analysis & Reasoning Tool with Case Studies on #MeToo and #BlackLivesMatter
Social movements supporting the UN’s Sustainable Development Goals (SDGs) play a vital role in improving human lives. If journalists were aware of the relationship between social movements and external events, they could provide more precise, time-sensitive reporting about movement issues and SDGs. Our SMART system achieves this goal by collecting data from multiple sources, extracting emotions on various themes, and then using a transformer-based forecasting engine (DEEP) to predict quantity and intensity of emotions in future posts. This paper demonstrates SMART’s Retrospective capabilities required by journalists via case studies analyzing social media discussions of the #MeToo and #BlackLivesMatter before and after the 2024 U.S. election. We create a novel 1-year dataset which we will release upon publication. It contains over 2.7M Reddit posts and over 1M news articles. We show that SMART enables early detection of discourse shifts around key political events, providing journalists with actionable insights to inform editorial planning. SMART was developed through multiple interactions with a panel of over 20 journalists from a variety of news organizations over a 2-year period, including an author of this paper.
💡 Research Summary
The paper introduces SMART (Social Movement Analysis & Reasoning Tool), a comprehensive system designed to help journalists monitor and forecast social movements that support the United Nations Sustainable Development Goals (SDGs). Developed over two years in close collaboration with more than 20 journalists from outlets such as The Wall Street Journal, the American Press Institute, and The Washington Post, SMART addresses four core editorial needs: (i) detecting rising movement popularity, (ii) tracking sentiment and emotion evolution, (iii) linking discourse to key political events (KPEs), and (iv) providing early warnings for proactive story planning.
SMART’s pipeline consists of four main components. First, a daily acquisition module queries Reddit and news sources via the World News API using a curated list of SDG‑specific keywords. Over 50 000 documents per day are harvested, yielding a novel one‑year dataset (September 2024 – August 2025) that contains 2.7 million Reddit posts and 1 million news articles related to #MeToo and Black Lives Matter (BLM). Second, an NLP stage enriches each document: KeyBERT and Amazon Comprehend expand the initial keyword set, a RoBERTa‑base model fine‑tuned on GoEmotions extracts 28‑dimensional emotion scores, and MiniLM‑L6‑v2 generates semantic embeddings stored in ChromaDB. Metadata are kept in MongoDB and a relational database for fast filtering.
Third, a multi‑layer filtering process (L0–L8) isolates movement‑relevant content. L0 captures explicit hashtag mentions; subsequent layers progressively relax the relevance threshold based on a high‑salience vocabulary (top 1 % co‑occurring terms). The authors select layer L5 (≥20 % of high‑salience terms present) for analysis, balancing precision with sufficient sample size.
Fourth, SMART offers two analytic engines. The previously published DEEP (Discourse Evolution Engine Prediction) is a transformer‑based forecasting model that ingests historical discourse states (volume, emotion intensity, thematic distribution) and user‑specified KPEs to produce probabilistic forecasts of future states, expressed as Student‑t distributions with uncertainty estimates. The new REAR (Retrospective Engine for Analysis & Reasoning) supports investigative, backward‑looking studies. REAR computes median differences between event windows and matched control periods, effect sizes (Cohen’s d), bootstrap confidence intervals, and significance via permutation testing (10 000 permutations) with Benjamini‑Hochberg FDR correction.
The authors evaluate SMART through case studies of #MeToo and BLM surrounding 36 KPEs identified by journalists (election debates, primary results, policy announcements). They test five hypotheses concerning volume spikes, anticipatory versus reactive patterns, event‑specific heterogeneity, and emotion intensity changes. Results show that news media consistently exhibit significant volume increases during KPE windows across all window sizes (e.g., BLM news: d = 1.17 for ±7 days, p = 0.001, a 28.5 % rise). In contrast, Reddit activity does not rise; for BLM Reddit, longer windows even show significant declines (d ≈ ‑0.49, p < 0.05). Emotion intensity (e.g., anger, sadness) rises only in news articles, not on Reddit. These findings highlight platform‑specific dynamics: traditional news outlets respond promptly to political events, while Reddit’s user‑driven discussions are less event‑sensitive or may reflect delayed grassroots reactions.
Key contributions are: (1) the release of a large, multi‑source, SDG‑focused dataset; (2) a co‑design methodology that directly incorporates journalistic workflow requirements; (3) the first systematic cross‑platform comparison of movement discourse dynamics; and (4) an event‑impact framework that quantifies heterogeneous KPE effects. Limitations include potential Reddit demographic bias, cultural variability in emotion labeling, and the inability of the forecasting model to capture exogenous policy shocks. Future work aims to extend language coverage, build real‑time dashboards, and develop scenario‑based simulations for policymakers.
Overall, SMART demonstrates that integrating large‑scale data collection, sophisticated NLP, and transformer‑based analytics can equip journalists with actionable, timely insights into how social movements evolve around political milestones, thereby enhancing the quality and relevance of SDG‑related reporting.
Comments & Academic Discussion
Loading comments...
Leave a Comment