Talk to Me, Not the Slides: A Real-Time Wearable Assistant for Improving Eye Contact in Presentations

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Effective eye contact is a cornerstone of successful public speaking. It strengthens the speaker’s credibility and fosters audience engagement. Yet, managing effective eye contact is a skill that demands extensive training and practice, often posing a significant challenge for novice speakers. In this paper, we present SpeakAssis, the first real-time, in-situ wearable system designed to actively assist speakers in maintaining effective eye contact during live presentations. Leveraging a head-mounted eye tracker for gaze and scene view capture, SpeakAssis continuously monitors and analyzes the speaker’s gaze distribution across audience and non-audience regions. When ineffective eye-contact patterns are detected, such as insufficient eye contact, or neglect of certain audience segments, SpeakAssis provides timely, context-aware audio prompts via an earphone to guide the speaker’s gaze behavior. We evaluate SpeakAssis through a user study involving eight speakers and 24 audience members. Quantitative results show that SpeakAssis increases speakers’ eye-contact duration by 62.5% on average and promotes a more balanced distribution of visual attention. Additionally, statistical analysis based on audience surveys reveals that improvements in speaker’s eye-contact behavior significantly enhance the audience’s perceived engagement and interactivity during presentations.

💡 Research Summary

The paper introduces SpeakAssis, the first real‑time, in‑situ wearable assistant designed to help speakers maintain effective eye contact during live presentations. Leveraging a commercially available head‑mounted eye tracker (Pupil Core), the system captures both the speaker’s gaze coordinates (120 Hz) and a front‑facing scene video (1280 × 720). SpeakAssis operates in two phases. In a pre‑presentation “audience registration” phase, the speaker briefly scans the room while the system records a short video, detects all faces, and assigns each a unique identifier and facial template based on left‑to‑right spatial ordering. During the presentation, each incoming frame is processed to detect faces and, using a novel “anchor‑face” strategy, the identity of the gaze target is inferred from its relative position to a previously identified anchor. This approach reduces computational load and is robust to lighting changes, occlusions, and frequent camera shifts.

The system continuously classifies the speaker’s gaze into audience versus non‑audience regions (e.g., laptop, ceiling). It computes, over a sliding time window, the proportion of gazes directed at the audience and the distribution across predefined audience zones (left, right, front, back). Two ineffective eye‑contact patterns trigger feedback: (1) insufficient eye contact—if less than a configurable threshold (e.g., 20 % of gazes) are aimed at the audience, a short audio prompt such as “Please look at the audience” is sent; (2) imbalanced attention—if a zone receives significantly fewer gazes than the overall average, a zone‑specific prompt like “Look at the audience on the right” is delivered. Prompts are transmitted via a Bluetooth earphone, lasting less than a second to avoid disrupting the speaker’s flow.

A user study with eight speakers and twenty‑four audience members (three per presentation) compared a control condition (no feedback) with the SpeakAssis condition. Quantitative metrics included total eye‑contact duration, gaze‑distribution balance (standard deviation across zones), and post‑presentation audience surveys measuring perceived engagement and interactivity. Results showed a 62.5 % average increase in eye‑contact time, a 35 % reduction in gaze‑distribution variance, and statistically significant improvements in audience‑reported engagement (p < 0.01). These findings demonstrate that real‑time, context‑aware auditory cues can effectively steer a speaker’s visual attention without imposing excessive cognitive load.

The authors acknowledge limitations: facial identification accuracy degrades with larger audiences, dynamic audience movement requires re‑registration, and overly frequent prompts could increase speaker workload. Future work aims to integrate gaze‑driven automatic slide advancement, visual feedback via AR glasses, multimodal cues (speech, gestures), and adaptive re‑registration algorithms to handle moving audiences.

In summary, SpeakAssis provides a practical, low‑cost solution that bridges the gap between offline eye‑tracking analysis and real‑time presentation support. By delivering discreet, actionable feedback, it enhances non‑verbal communication, boosts audience perception of credibility, and opens new avenues for wearable assistance in public speaking and related domains.

Talk to Me, Not the Slides: A Real-Time Wearable Assistant for Improving Eye Contact in Presentations

💡 Research Summary

Comments & Academic Discussion

Leave a Comment