Clean Up the Mess: Addressing Data Pollution in Cryptocurrency Abuse Reporting Services
Cryptocurrency abuse reporting services are a valuable data source about abusive blockchain addresses, prevalent types of cryptocurrency abuse, and their financial impact on victims. However, they may suffer data pollution due to their crowd-sourced nature. This work analyzes the extent and impact of data pollution in cryptocurrency abuse reporting services and proposes a novel LLM-based defense to address the pollution. We collect 289K abuse reports submitted over 6 years to two popular services and use them to answer three research questions. RQ1 analyzes the extent and impact of pollution. We show that spam reports will eventually flood unchecked abuse reporting services, with BitcoinAbuse receiving 75% of spam before stopping operations. We build a public dataset of 19,443 abuse reports labeled with 19 popular abuse types and use it to reveal the inaccuracy of user-reported abuse types. We identified 91 (0.1%) benign addresses reported, responsible for 60% of all the received funds. RQ2 examines whether we can automate identifying valid reports and their classification into abuse types. We propose an unsupervised LLM-based classifier that achieves an F1 score of 0.95 when classifying reports, an F1 of 0.89 when classifying out-of-distribution data, and an F1 of 0.99 when identifying spam reports. Our unsupervised LLM-based classifier clearly outperforms two baselines: a supervised classifier and a naive usage of the LLM. Finally, RQ3 demonstrates the usefulness of our LLM-based classifier for quantifying the financial impact of different cryptocurrency abuse types. We show that victim-reported losses heavily underestimate cybercriminal revenue by estimating a 29 times higher revenue from deposit transactions. We identified that investment scams have the highest financial impact and that extortions have lower conversion rates but compensate for them with massive email campaigns.
💡 Research Summary
The paper investigates the pervasive problem of data pollution in crowd‑sourced cryptocurrency abuse reporting services and proposes a novel, unsupervised large language model (LLM) based classifier to mitigate it. The authors collected 289 000 abuse reports spanning six years from two widely used services—BitcoinAbuse (287 823 reports covering 92 151 Bitcoin addresses) and Scam Tracker (2 321 reports).
RQ1 – Extent and Impact of Pollution
Analysis of BitcoinAbuse, which lacks any pre‑filtering, reveals that at least 10.6 % of all submissions are spam, and this proportion surged to 75 % just before the service ceased operations. Spam falls into two categories: (1) advertisements for “fund‑recovery” scams and (2) false accusations that cryptocurrency exchanges are involved in terrorism. In contrast, Scam Tracker’s manual validation keeps spam below 0.5 %. The authors also built a ground‑truth (GT) dataset of 19 443 report descriptions manually labeled into 19 fine‑grained abuse types using a two‑step process (unsupervised clustering followed by expert annotation). Comparison with the six “BA types” offered to reporters shows that three of them (Bitcoin‑Tumbler, Darknet‑Market, Ransomware) are almost entirely spam, while only the “Sextortion” option is reasonably accurate. Even after removing spam, 22 % of reports labeled as ransomware actually describe sextortion. Moreover, 91 Bitcoin addresses (0.1 % of the total) are benign (mostly exchange wallets) yet account for 60 % of all received funds, illustrating that a tiny number of mis‑reported legitimate addresses can dominate financial impact.
RQ2 – Automated Identification and Classification
To address the need for scalable validation, the authors design an unsupervised LLM‑based classifier. The key idea is to provide natural‑language definitions for each of the 19 abuse types as part of the prompt, allowing the LLM to match the free‑form description to the most appropriate definition. Six LLMs (GPT‑4, GPT‑4o, GPT‑4o‑mini, GPT‑3.5, Llama 3, Llama 3.1) and three query strategies (zero‑shot, few‑shot, chain‑of‑thought) were evaluated. The best configuration achieves an F1‑score of 0.95 on in‑distribution data, 0.89 on out‑of‑distribution data, and 0.99 for spam detection—far surpassing a supervised baseline (F1≈0.42) and a naïve LLM usage (F1≈0.42). The classifier requires no training data, can be extended simply by adding new type definitions, and also outputs a natural‑language rationale for each decision, enhancing explainability.
RQ3 – Financial Impact of Abuse Types
Using the refined labels, the authors quantify the monetary impact of each abuse category. Victim‑reported losses dramatically underestimate actual criminal revenue; deposit‑based estimates are 29 × larger. Investment scams generate the highest revenue share (44 % of total $453 M in deposits to 1 930 classified addresses), followed by fund‑recovery scams and romance scams. Extortion campaigns, despite low conversion rates, still generate nearly $10 M due to massive email outreach. The study demonstrates that accurate classification is essential for both victim‑centric reporting and blockchain‑analytics‑centric revenue estimation.
Contributions and Implications
- Empirical evidence that data pollution (spam, mis‑labeling, benign address misreporting) can severely distort abuse statistics and financial impact assessments.
- A definition‑driven, unsupervised LLM classifier that outperforms traditional supervised models and can be readily adapted to new abuse taxonomies.
- A revised, more realistic picture of cryptocurrency crime revenue, highlighting the under‑reporting bias in official victim‑loss statistics.
The authors suggest future work on real‑time LLM‑based filtering pipelines, extension to multi‑chain ecosystems, and tighter integration of explainable AI for continuous taxonomy refinement. Their findings are highly relevant for regulators, exchange compliance teams, and security researchers seeking reliable, scalable tools to clean and interpret crowd‑sourced abuse data.
Comments & Academic Discussion
Loading comments...
Leave a Comment