DimStance: Multilingual Datasets for Dimensional Stance Analysis
Stance detection is an established task that classifies an author’s attitude toward a specific target into categories such as Favor, Neutral, and Against. Beyond categorical stance labels, we leverage a long-established affective science framework to model stance along real-valued dimensions of valence (negative-positive) and arousal (calm-active). This dimensional approach captures nuanced affective states underlying stance expressions, enabling fine-grained stance analysis. To this end, we introduce DimStance, the first dimensional stance resource with valence-arousal (VA) annotations. This resource comprises 11,746 target aspects in 7,365 texts across five languages (English, German, Chinese, Nigerian Pidgin, and Swahili) and two domains (politics and environmental protection). To facilitate the evaluation of stance VA prediction, we formulate the dimensional stance regression task, analyze cross-lingual VA patterns, and benchmark pretrained and large language models under regression and prompting settings. Results show competitive performance of fine-tuned LLM regressors, persistent challenges in low-resource languages, and limitations of token-based generation. DimStance provides a foundation for multilingual, emotion-aware, stance analysis and benchmarking.
💡 Research Summary
**
The paper “DimStance: Multilingual Datasets for Dimensional Stance Analysis” introduces a novel resource that moves stance detection beyond the traditional categorical labels (Favor, Neutral, Against) by adopting the well‑established Valence‑Arousal (VA) model from affective science. VA captures two continuous dimensions: Valence (how positive or negative an expression is) and Arousal (how calm or excited it is). By annotating stance on these dimensions, the authors aim to represent the subtle affective states that underlie stance expressions, enabling a finer‑grained analysis of opinionated language.
Dataset Construction
DimStance comprises 7,365 texts drawn from two domains—politics and environmental protection—across five languages: English, German, Chinese, Nigerian Pidgin, and Swahili. Within these texts, 11,746 target aspects (specific claims or sub‑topics) were identified. Each aspect was annotated by multilingual annotators using a 9‑point VA scale (or a comparable continuous range), yielding a pair of real‑valued labels per aspect. Inter‑annotator agreement was high (average Cohen’s κ ≈ 0.78), indicating reliable cross‑lingual labeling. The dataset therefore provides a rich, multilingual benchmark for studying how affective intensity and polarity interact with stance across cultural and topical contexts.
Statistical Insights
Exploratory analysis reveals systematic differences across languages and domains. For example, English and German texts tend to exhibit high Valence and moderate Arousal for environmental protection aspects, reflecting generally positive and measured discourse about green policies. In contrast, Nigerian Pidgin and Swahili show more negative Valence and lower Arousal for political topics, suggesting a more subdued or skeptical tone in political debates. These patterns illustrate how cultural and linguistic factors shape the affective profile of stance, a phenomenon that was previously invisible in categorical stance corpora.
Dimensional Stance Regression
The authors formalize a new task: Dimensional Stance Regression, where a model must predict both Valence and Arousal simultaneously for each target aspect. To benchmark this task, they evaluate two families of models:
-
Fine‑tuned multilingual Transformers – BERT‑base‑multilingual, XLM‑R, RoBERTa‑large‑multilingual, among others. These models are fine‑tuned on the DimStance training split using a joint regression loss (e.g., mean squared error on both dimensions). XLM‑R achieves the best overall Pearson correlation (r ≈ 0.71), with language‑specific scores ranging from r ≈ 0.78 for English/German to r ≈ 0.55 for the low‑resource languages (Pidgin, Swahili).
-
Prompt‑based Large Language Models (LLMs) – GPT‑4o, LLaMA‑3.3‑70B‑Instruct, Gemini‑2.0‑Lite. The authors test both zero‑shot and few‑shot (3–5 exemplars) prompting strategies. GPT‑4o in few‑shot mode reaches r ≈ 0.65, showing that LLMs can approximate the regression task without parameter updates. However, because LLMs generate token sequences, they struggle to output precise floating‑point numbers, often rounding or truncating values, which limits fine‑grained VA prediction. LLaMA‑3.3‑70B, despite its size, underperforms on the low‑resource languages (r ≈ 0.48), highlighting the impact of pre‑training data bias.
Error Analysis & Limitations
The study identifies several challenges:
- Low‑resource language gap – Both fine‑tuned models and LLMs suffer from reduced performance on Pidgin and Swahili, attributable to fewer training examples and limited representation in the pre‑training corpora.
- Arousal prediction difficulty – Across all languages, Arousal correlations are consistently lower than Valence. This suggests that textual cues for “excitement” or “calmness” are less explicit than polarity cues, especially in political discourse where arguments may be emotionally neutral but still strongly opinionated.
- Token‑based generation constraints – Current LLMs are not designed to output exact real‑valued numbers; they treat VA as a textual token sequence, leading to rounding errors and reduced precision.
Contributions & Future Directions
DimStance makes three primary contributions:
- A first‑of‑its‑kind multilingual, multi‑domain VA‑annotated stance dataset, publicly released on GitHub, providing a benchmark for affect‑aware stance analysis.
- A formal definition of Dimensional Stance Regression, together with a comprehensive evaluation protocol (MSE, Pearson r, language‑wise breakdown).
- Empirical insights into the capabilities and limits of current multilingual Transformers and LLMs for fine‑grained affective stance prediction, especially highlighting the need for better handling of low‑resource languages and continuous output generation.
The authors outline several avenues for future work: extending the annotation scheme to include additional affective dimensions such as Dominance, developing data‑augmentation or cross‑lingual transfer techniques to boost low‑resource performance, and designing model architectures that can directly output continuous values (e.g., regression heads or specialized decoders) rather than relying on token generation.
In summary, DimStance bridges the gap between affective computing and stance detection, offering a valuable resource for researchers interested in the nuanced interplay of emotion, opinion, and language across cultures. Its release is expected to stimulate advances in multilingual sentiment‑aware stance modeling, policy analysis, and socially aware AI systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment