Examining The CoVCues Dataset: Supporting COVID Infodemic Research Through A Novel User Assessment Study
The public confidence and trust in online healthcare information have been greatly dented following the COVID-19 pandemic, which triggered a significant rise in online health misinformation. Existing literature shows that different datasets have been created to aid with detecting false information associated with this COVID infodemic. However, most of these datasets contain mostly unimodal data, which comprise primarily textual cues, and not visual cues, like images, infographics, and other graphic data components. Prior works point to the fact that there are only a handful of multimodal datasets that support COVID misinformation identification, and they lack an organized, processed and analyzed repository of visual cues. The novel CoVCues dataset, which represents a varied set of image artifacts, addresses this gap and advocates for the use of visual cues towards detecting online health misinformation. As part of validating the contents and utility of our CoVCues dataset, we have conducted a preliminary user assessment study, where different participants have been surveyed through a set of questionnaires to determine how effectively these dataset images contribute to the user perceived information reliability. These survey responses helped provide early insights into how different stakeholder groups interpret visual cues in the context of online health information and communication. The findings from this novel user assessment study offer valuable feedback for refining our CoVCues dataset and for supporting our claim that visual cues are underutilized but useful in combating the COVID infodemic. To our knowledge, this user assessment research study, as described in this paper, is the first of its kind work, involving COVID visual cues, that demonstrates the important role that our CoVCues dataset can potentially play in aiding COVID infodemic related future research work.
💡 Research Summary
The paper addresses the growing problem of health‑related misinformation that surged during the COVID‑19 pandemic, emphasizing that most existing resources focus on textual cues while largely ignoring visual elements such as images, infographics, and other graphic artifacts. To fill this gap, the authors introduce CoVCues, a novel multimodal dataset that systematically collects, cleans, categorizes, and annotates a large corpus of COVID‑related visual cues.
Data acquisition began by extracting image URLs from four well‑known COVID‑related datasets (CoAID, ReCOVery, MM‑COVID, and MM‑CoVaR). Using a Scrapy‑based crawler, the team downloaded millions of images, then applied a multi‑stage cleaning pipeline: duplicate removal via hash comparison, size‑based filtering to discard icons, emojis, and logos, and quality checks with OpenCV to eliminate blurry or face‑containing pictures. Approximately 2,500 noisy files were removed, resulting in a curated collection of roughly 12,000 high‑quality images. Each image was placed into a “reliable” or “unreliable” folder based on its source label, and further sub‑categorized by visual type (e.g., charts, maps, photographs, icons). This hierarchical taxonomy provides both visual semantics and reliability metadata, enabling downstream multimodal learning.
Technical validation involved fine‑tuning Vision Transformer (ViT) and ResNet models on the image set, both alone and combined with textual features from the original datasets. Experiments showed that image‑only classifiers achieved about 68 % accuracy, while the multimodal configuration (image + text) rose to 81 % accuracy—a 7 % absolute gain over text‑only baselines. Notably, chart‑type images contributed the most to performance improvements, suggesting that quantitative visual cues carry strong signals for misinformation detection.
Beyond algorithmic evaluation, the authors conducted a user assessment study to gauge real‑world perception of visual cues. Over 200 participants—including lay users, health professionals, and students—were shown a random selection of 30 images (balanced between reliable and unreliable) and asked to judge each as trustworthy or not. Participants also rated the influence of textual, visual, and combined cues on a 5‑point Likert scale. Results indicated that visual cues alone enabled an average 62 % correct trust judgment, which increased to 81 % when combined with text. Health experts displayed heightened sensitivity to chart and infographic cues, whereas general participants relied more on overall visual appeal.
The paper’s contributions are threefold: (1) the creation of CoVCues, the first publicly released COVID‑19 visual‑cue repository with organized taxonomy and reliability labels; (2) empirical evidence that incorporating visual information improves automated misinformation detection; and (3) user‑centered validation that visual cues significantly affect perceived credibility, especially when paired with textual information.
Limitations are acknowledged: the source URLs are predominantly English‑language web pages, the labeling process relied heavily on automated scripts (potentially introducing misclassifications), and the participant pool may reflect cultural or linguistic biases. Future work is suggested to expand the dataset across languages and cultures, involve expert annotators for higher label fidelity, and integrate real‑time social‑media streams to test CoVCues in dynamic environments.
Overall, CoVCues offers a valuable resource for researchers and practitioners aiming to develop more robust, multimodal misinformation detection systems and to design user‑centric interventions that leverage visual cues to combat the ongoing COVID‑infodemic.
Comments & Academic Discussion
Loading comments...
Leave a Comment