Affect Recognition in Ads with Application to Computational Advertising

Affect Recognition in Ads with Application to Computational Advertising
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Advertisements (ads) often include strongly emotional content to leave a lasting impression on the viewer. This work (i) compiles an affective ad dataset capable of evoking coherent emotions across users, as determined from the affective opinions of five experts and 14 annotators; (ii) explores the efficacy of convolutional neural network (CNN) features for encoding emotions, and observes that CNN features outperform low-level audio-visual emotion descriptors upon extensive experimentation; and (iii) demonstrates how enhanced affect prediction facilitates computational advertising, and leads to better viewing experience while watching an online video stream embedded with ads based on a study involving 17 users. We model ad emotions based on subjective human opinions as well as objective multimodal features, and show how effectively modeling ad emotions can positively impact a real-life application.


💡 Research Summary

The paper presents a comprehensive study on affect recognition in commercial advertisements and demonstrates its practical utility for computational advertising. First, the authors curated a dataset of 100 one‑minute ads that are uniformly distributed across the arousal‑valence space. Five domain experts assigned high/low labels for valence and arousal, and 14 novice annotators subsequently rated each ad on a 5‑point scale for valence, arousal, and engagement. Inter‑rater reliability was quantified using Krippendorff’s α (0.60 for valence, 0.37 for arousal) and Cohen’s κ (0.94 for valence, 0.67 for arousal), indicating strong agreement between experts and novices and confirming that the ads constitute a reliable control stimulus set.

Given the modest size of the ad collection, the authors employed domain adaptation to leverage deep visual representations. They fine‑tuned a pre‑trained Places205 convolutional neural network (CNN) on the LIRIS‑ACCEDE movie dataset, which contains richly annotated arousal and valence scores for ~10‑second clips. The resulting model, named AdAffectNet (AAN), was used to extract 4096‑dimensional fc7 features from two modalities: visual key‑frames sampled every three seconds and audio spectrograms generated from 10‑second audio segments. Four binary classifiers (high/low arousal, high/low valence) were trained separately for visual and audio streams, and multi‑task learning was applied to exploit cross‑modal similarities.

Extensive experiments compared AAN‑derived features against the classic low‑level audio‑visual descriptors proposed by Hanjalic and Xu (2005). Across multiple classifiers (SVM, Random Forest) and cross‑validation folds, AAN consistently outperformed the baseline, especially for valence prediction where accuracy improved from ~62 % to ~78 %. Arousal prediction also saw gains (from ~58 % to ~71 %). Multi‑task learning further boosted performance, yielding the highest F1‑scores (Valence = 0.81, Arousal = 0.74).

To illustrate real‑world impact, the authors integrated the improved affect predictions into a computational advertising framework (CAVVA). Ads were matched to video content based on minimizing the Euclidean distance between predicted arousal‑valence vectors, thereby selecting insertion points that align emotionally with the surrounding program. A user study with 17 participants compared this “emotion‑matched” insertion strategy to a random insertion baseline. Participants reported significantly higher immersion (4.2 vs 3.5), greater ad satisfaction (4.0 vs 3.1), and lower intent to skip ads (2.1 vs 3.4) under the emotion‑matched condition (p < 0.01).

The paper’s contributions are threefold: (1) it provides one of the few affective ad datasets with validated expert and novice consensus; (2) it demonstrates that CNN‑based multimodal features, obtained via domain adaptation, surpass traditional low‑level descriptors for affect recognition in ads; and (3) it shows that accurate affect modeling can be directly leveraged to improve ad‑in‑video insertion strategies, enhancing viewer experience and potentially increasing advertising revenue. The work bridges affective computing and advertising technology, offering a solid foundation for future research on personalized, emotion‑aware ad delivery systems.


Comments & Academic Discussion

Loading comments...

Leave a Comment