LogSigma at SemEval-2026 Task 3: Uncertainty-Weighted Multitask Learning for Dimensional Aspect-Based Sentiment Analysis
This paper describes LogSigma, our system for SemEval-2026 Task 3: Dimensional Aspect-Based Sentiment Analysis (DimABSA). Unlike traditional Aspect-Based Sentiment Analysis (ABSA), which predicts discrete sentiment labels, DimABSA requires predicting continuous Valence and Arousal (VA) scores on a 1-9 scale. A central challenge is that Valence and Arousal differ in prediction difficulty across languages and domains. We address this using learned homoscedastic uncertainty, where the model learns task-specific log-variance parameters to automatically balance each regression objective during training. Combined with language-specific encoders and multi-seed ensembling, LogSigma achieves 1st place on five datasets across both tracks. The learned variance weights vary substantially across languages due to differing Valence-Arousal difficulty profiles-from 0.66x for German to 2.18x for English-demonstrating that optimal task balancing is language-dependent and cannot be determined a priori.
💡 Research Summary
LogSigma is a system designed for the SemEval‑2026 Dimensional Aspect‑Based Sentiment Analysis (DimABSA) task, which requires predicting continuous Valence (V) and Arousal (A) scores on a 1‑9 scale for each aspect in a review. Unlike traditional ABSA that outputs discrete polarity labels, DimABSA treats sentiment as a regression problem, demanding fine‑grained affect modeling. The authors identify a core difficulty: V and A have different prediction complexities that vary across languages and domains, with Arousal consistently harder (average Pearson correlation gap of 0.29). Manually tuning loss weights for the two regression objectives is impractical because the optimal V/A ratio can differ up to threefold between languages and cannot be estimated without labeled data.
To address this, LogSigma adopts learned homoscedastic uncertainty (Cipolla et al., 2018). For each regression head a log‑variance parameter s = log σ² is introduced. The training loss becomes
L = ½ e^(−s_V) L_V + ½ e^(−s_A) L_A + s_V + s_A,
where L_V and L_A are the mean‑squared errors for Valence and Arousal respectively. The exponential terms act as precision‑based weights: tasks with higher learned variance (larger σ²) contribute less to the total loss, automatically balancing the two objectives. The parameters are initialized from a uniform distribution (0.2–1.0) and trained with a relatively high learning rate (5 × 10⁻²) to adapt quickly, while the main model parameters use a standard rate (2 × 10⁻⁵). This approach requires only two extra scalars per language, making it far simpler than gradient‑norm based dynamic weighting.
The architecture consists of three stages. First, the input text and aspect term are concatenated as “
Comments & Academic Discussion
Loading comments...
Leave a Comment