DementiaBank-Emotion: A Multi-Rater Emotion Annotation Corpus for Alzheimer's Disease Speech (Version 1.0)

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We present DementiaBank-Emotion, the first multi-rater emotion annotation corpus for Alzheimer’s disease (AD) speech. Annotating 1,492 utterances from 108 speakers for Ekman’s six basic emotions and neutral, we find that AD patients express significantly more non-neutral emotions (16.9%) than healthy controls (5.7%; p < .001). Exploratory acoustic analysis suggests a possible dissociation: control speakers showed substantial F0 modulation for sadness (Delta = -3.45 semitones from baseline), whereas AD speakers showed minimal change (Delta = +0.11 semitones; interaction p = .023), though this finding is based on limited samples (sadness: n=5 control, n=15 AD) and requires replication. Within AD speech, loudness differentiates emotion categories, indicating partially preserved emotion-prosody mappings. We release the corpus, annotation guidelines, and calibration workshop materials to support research on emotion recognition in clinical populations.

💡 Research Summary

The paper introduces DementiaBank‑Emotion (v1.0), the first publicly available multi‑rater emotion‑annotated speech corpus specifically designed for Alzheimer’s disease (AD) research. Building on the well‑established DementiaBank Pitt Corpus, the authors selected 108 speakers (54 AD patients and 54 age‑ and gender‑matched healthy controls) from the ADReSS 2020 Challenge training set, each of whom performed the Cookie‑Theft picture‑description task. From these recordings, 1,492 participant (PAR) utterances were extracted (615 from AD, 731 from controls) and annotated for Ekman’s six basic emotions (joy, sadness, fear, anger, surprise, disgust) plus a neutral category.

Annotation involved 11 raters from diverse backgrounds (clinical nursing researchers, a nurse practitioner, a psychiatry professor, computer‑science researchers, and a business professor). The process unfolded in three rounds: an initial clinical‑expert labeling phase, a calibration workshop phase, and a final technical‑expert labeling phase. Calibration workshops were central to improving consistency; they addressed issues such as distinguishing different types of laughter (genuine joy, helpless, sarcastic), applying a “Default to Neutral” principle when prosody was flat despite emotionally‑laden lexical items, and balancing rule‑based versus perceptual judgments. After calibration, inter‑rater reliability (Fleiss’ κ) rose from 0.094 (pre‑calibration) to 0.313 (post‑calibration) for patient data, while control data (three raters after exclusion) achieved κ = 0.254.

A hierarchical adjudication algorithm was used to derive a gold‑standard label per utterance, preferring majority votes, then non‑neutral over neutral in ties, and finally weighted confidence scores. Utterances that remained unresolved (137 AD, 9 control) were marked “ambiguous” and retained for acoustic analysis but excluded from emotion‑distribution statistics.

Statistical analysis revealed that AD speakers expressed non‑neutral emotions in 16.9 % of their utterances, a three‑fold increase over controls (5.7 %; χ² = 38.45, p < .001). Joy was the most frequent non‑neutral emotion in both groups (AD 7.6 %, control 3.3 %), followed by surprise (AD 4.2 %, control 0.8 %). Other emotions (sadness, anger, disgust, fear) were rare but slightly more prevalent among AD speakers.

Exploratory acoustic analysis focused on prosodic cues. For sadness, control speakers showed a substantial F0 drop (‑3.45 semitones from baseline), whereas AD speakers exhibited virtually no change (+0.11 semitones). The interaction was significant (p = .023), suggesting a possible attenuation of the sadness‑related pitch modulation in AD. However, the sample size was limited (sadness: n = 5 controls, n = 15 AD), so the result must be interpreted cautiously.

Within the AD cohort, loudness emerged as a discriminative feature across emotion categories. Joy and surprise utterances tended to have higher loudness (≈0.3 sone greater) than neutral or sad utterances, indicating that AD speakers may rely more on intensity rather than pitch variation to convey affect. Ambiguous utterances were characterized by lower F0, reduced F0 variance, lower loudness, lower harmonic‑to‑noise ratio, and fewer words, underscoring the difficulty of labeling when prosodic cues are weak.

The authors acknowledge several limitations: (1) the acoustic findings are based on small subsamples, limiting generalizability; (2) control‑speaker annotation did not benefit from the same calibration workshops, potentially introducing bias; (3) the corpus is currently limited to audio and transcripts, lacking visual or physiological modalities that could enrich multimodal emotion analysis.

Despite these constraints, the paper makes four key contributions: (1) release of a novel, multi‑rater emotion‑annotated AD speech corpus together with all annotation materials; (2) a detailed, reproducible calibration workflow that demonstrably improves inter‑rater agreement for clinical speech; (3) empirical evidence that AD patients produce a higher proportion of non‑neutral emotional expressions and that certain prosodic mappings (e.g., sadness‑related pitch lowering) are attenuated, while loudness remains a viable cue; (4) provision of baseline analyses that can serve as benchmarks for future automatic speech emotion recognition (SER) systems targeting clinical populations.

The released resources enable a range of downstream research: building SER models that account for label uncertainty, investigating whether emotional expression patterns correlate with disease progression, developing emotion‑aware assistive technologies for dementia care, and exploring multimodal extensions (e.g., facial expression, eye‑tracking) to capture the full affective profile of AD patients. Future work suggested includes expanding the corpus to longitudinal data, increasing sample sizes for each emotion, integrating multimodal signals, and designing deep learning architectures that explicitly model the partial preservation of prosodic cues observed in AD speech.

DementiaBank-Emotion: A Multi-Rater Emotion Annotation Corpus for Alzheimer's Disease Speech (Version 1.0)

💡 Research Summary

Comments & Academic Discussion

Leave a Comment