False Alarms, Real Damage: Adversarial Attacks Using LLM-based Models on Text-based Cyber Threat Intelligence Systems
Cyber Threat Intelligence (CTI) has emerged as a vital complementary approach that operates in the early phases of the cyber threat lifecycle. CTI involves collecting, processing, and analyzing threat data to provide a more accurate and rapid understanding of cyber threats. Due to the large volume of data, automation through Machine Learning (ML) and Natural Language Processing (NLP) models is essential for effective CTI extraction. These automated systems leverage Open Source Intelligence (OSINT) from sources like social networks, forums, and blogs to identify Indicators of Compromise (IoCs). Although prior research has focused on adversarial attacks on specific ML models, this study expands the scope by investigating vulnerabilities within various components of the entire CTI pipeline and their susceptibility to adversarial attacks. These vulnerabilities arise because they ingest textual inputs from various open sources, including real and potentially fake content. We analyse three types of attacks against CTI pipelines, including evasion, flooding, and poisoning, and assess their impact on the system’s information selection capabilities. Specifically, on fake text generation, the work demonstrates how adversarial text generation techniques can create fake cybersecurity and cybersecurity-like text that misleads classifiers, degrades performance, and disrupts system functionality. The focus is primarily on the evasion attack, as it precedes and enables flooding and poisoning attacks within the CTI pipeline.
💡 Research Summary
The paper investigates the vulnerability of modern Cyber Threat Intelligence (CTI) pipelines to adversarial attacks that exploit large language models (LLMs) for generating deceptive, cybersecurity‑like text. While prior work has largely focused on attacking individual machine‑learning classifiers, this study adopts a system‑level perspective, modeling a typical CTI workflow as five stages: (1) data collection from open‑source intelligence (OSINT) sources, (2) AI‑based analysis (classification, NER, relation extraction, etc.), (3) monitoring and validation (dashboards, human analyst verification), (4) threat scoring and prioritisation, and (5) actionable reporting. The authors argue that the ingestion of raw textual data without robust verification creates a cascade of weaknesses that can be weaponised by an adversary.
Three attack categories are defined:
-
Evasion – Using LLMs to craft fake cybersecurity posts that closely mimic the lexical and structural patterns of genuine CTI (e.g., including CVE identifiers, CVSS scores, and technical jargon) while conveying false information. The authors develop a prompt‑optimisation and topic‑reinforcement methodology to maximise the likelihood that existing classifiers will label the fabricated text as legitimate.
-
Flooding – Injecting large volumes of evasion‑crafted messages into the data stream, overwhelming monitoring dashboards and causing legitimate alerts to be missed or deprioritised.
-
Poisoning – Persistently storing evasion/flooding texts in the pipeline’s knowledge base so that future model retraining incorporates the malicious samples, effectively teaching the classifier to accept false patterns as normal.
Experimental evaluation targets two classifiers: a specialised machine‑learning model trained on CTI data, and the conversational model ChatGPT‑4o used as a zero‑shot classifier. Under evasion attacks, the specialised model exhibits a false‑positive rate (FPR) of 97 %, while ChatGPT‑4o reaches an FPR of 75 %. These numbers demonstrate that even state‑of‑the‑art LLMs are highly susceptible to carefully engineered adversarial text. The flooding scenario shows that dashboards become saturated, leading to delayed or missed responses. Poisoning experiments reveal that after a modest amount of malicious data is incorporated, the retrained models begin to systematically misclassify similar fake texts, confirming the cumulative nature of the threat.
The paper’s key contribution is the identification of a systemic gap: current CTI pipelines lack an early‑stage verification component capable of assessing source credibility, metadata consistency, and factual correctness. The authors propose a multi‑layer defence architecture that includes (i) source‑reputation scoring, (ii) metadata‑based trust metrics, (iii) semantic consistency checks (e.g., fact‑checking APIs), (iv) human analyst triage for high‑risk alerts, and (v) continual adversarial training to harden models against LLM‑generated noise. They also argue that traditional anomaly detection based on word embeddings is insufficient against high‑quality LLM output, necessitating richer semantic and provenance analyses.
In conclusion, the study demonstrates that the integration of LLM‑generated fake text into OSINT streams can severely degrade CTI pipeline reliability, leading to massive false positives, missed detections, and long‑term model drift. Mitigating these risks requires redesigning CTI architectures to embed verification at the ingestion point, adopting multi‑modal trust assessments, and continuously updating defensive models with adversarial examples. This work highlights the emerging need for “data‑trust” as a core security control in automated threat‑intelligence ecosystems.
Comments & Academic Discussion
Loading comments...
Leave a Comment