Message Distortion in Information Cascades

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Information diffusion is usually modeled as a process in which immutable pieces of information propagate over a network. In reality, however, messages are not immutable, but may be morphed with every step, potentially entailing large cumulative distortions. This process may lead to misinformation even in the absence of malevolent actors, and understanding it is crucial for modeling and improving online information systems. Here, we perform a controlled, crowdsourced experiment in which we simulate the propagation of information from medical research papers. Starting from the original abstracts, crowd workers iteratively shorten previously produced summaries to increasingly smaller lengths. We also collect control summaries where the original abstract is compressed directly to the final target length. Comparing cascades to controls allows us to separate the effect of the length constraint from that of accumulated distortion. Via careful manual coding, we annotate lexical and semantic units in the medical abstracts and track them along cascades. We find that iterative summarization has a negative impact due to the accumulation of error, but that high-quality intermediate summaries result in less distorted messages than in the control case. Different types of information behave differently; in particular, the conclusion of a medical abstract (i.e., its key message) is distorted most. Finally, we compare abstractive with extractive summaries, finding that the latter are less prone to semantic distortion. Overall, this work is a first step in studying information cascades without the assumption that disseminated content is immutable, with implications on our understanding of the role of word-of-mouth effects on the misreporting of science.

💡 Research Summary

Information diffusion models traditionally treat content as immutable, but in real‑world communication messages are repeatedly rephrased, compressed, and thus distorted. This paper introduces a controlled, crowdsourced experiment to quantify how much distortion arises from (i) the “telephone effect” – cumulative errors introduced as a message passes through multiple people – and (ii) the “summary effect” – information loss caused by length constraints. The authors selected 16 medical abstracts from the New England Journal of Medicine covering vaccination, breast cancer, cardiovascular disease, and nutrition. For each abstract they created five target lengths (≈1000, 500, 250, 125, and 64 characters) and asked crowd workers to produce summaries at each step.

Two experimental conditions were defined. In the cascading condition, each summary at length ℓk was generated from the summary produced at length ℓk‑1, thereby exposing it to both telephone and summary effects. In the control condition, every target length was obtained by summarizing the original abstract directly, exposing it only to the summary effect. For each abstract‑length pair the authors collected eight independent chains, resulting in 256 summaries per length. Workers were prevented from copying text and were limited to one summary per hop to avoid leakage and bias.

To measure distortion, the researchers manually annotated lexical and semantic units (key phrases and factual statements) in the original abstracts. For every summary they recorded whether each unit was preserved, omitted, or semantically altered. This fine‑grained annotation allowed them to compute “information persistence” – the probability that a unit survives k summarization steps – and to compare the two conditions.

Key findings:

Telephone effect is measurable – cascading summaries contain on average 10–15 percentage points more errors than control summaries, confirming that errors accumulate across hops.
Quality of intermediate summaries matters – when an intermediate summary retains at least 80 % of the essential units, subsequent hops show a marked reduction in additional distortion. High‑quality early compression can act as a “buffer” against later loss.
Content type matters – the conclusion section of medical abstracts is the most vulnerable; its correct transmission drops by about 25 percentage points in cascading versus control conditions at the same final length. Background, methods, and results are comparatively more stable.
Extractive vs. abstractive summarization – extractive summaries (directly copying key phrases) lead to significantly fewer semantic distortions than abstractive ones, which involve paraphrasing and are prone to meaning shifts, especially for statistical findings.

The authors discuss practical implications. In real‑world health communication, ensuring that early intermediaries (press releases, news articles) produce high‑quality, extractive‑style summaries could limit the spread of misinformation. Special attention should be given to preserving conclusions, perhaps by requiring explicit verification steps before dissemination.

Limitations include the exclusive focus on text, the artificial nature of the crowdsourcing task compared to organic social‑media sharing, and the potential influence of workers’ domain knowledge. Future work should explore multimodal content, network‑level cascade structures, and the role of expert versus lay summarizers.

In sum, this study provides the first experimental quantification of message distortion in information cascades, separates the telephone and summary effects, and demonstrates that both the fidelity of intermediate summaries and the choice of summarization strategy critically shape the integrity of information as it propagates. These insights can inform the design of more robust communication pipelines, fact‑checking tools, and policies aimed at curbing the inadvertent spread of scientific misinformation.

Message Distortion in Information Cascades

💡 Research Summary

Comments & Academic Discussion

Leave a Comment