From Threat Intelligence to Firewall Rules: Semantic Relations in Hybrid AI Agent and Expert System Architectures

From Threat Intelligence to Firewall Rules: Semantic Relations in Hybrid AI Agent and Expert System Architectures
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Web security demands rapid response capabilities to evolving cyber threats. Agentic Artificial Intelligence (AI) promises automation, but the need for trustworthy security responses is of the utmost importance. This work investigates the role of semantic relations in extracting information for sensitive operational tasks, such as configuring security controls for mitigating threats. To this end, it proposes to leverage hypernym-hyponym textual relations to extract relevant information from Cyber Threat Intelligence (CTI) reports. By leveraging a neuro-symbolic approach, the multi-agent system automatically generates CLIPS code for an expert system creating firewall rules to block malicious network traffic. Experimental results show the superior performance of the hypernym-hyponym retrieval strategy compared to various baselines and the higher effectiveness of the agentic approach in mitigating threats.


💡 Research Summary

The paper presents a hybrid artificial‑intelligence system that automatically converts cyber‑threat‑intelligence (CTI) reports into firewall filtering rules. The core idea is to exploit taxonomic (hypernym‑hyponym) relations in the natural‑language text of CTI documents to obtain a richer semantic representation of the threat. A three‑stage prompting pipeline is built on a large language model (LLM), specifically Qwen‑2.5‑Coder‑14B, to (1) extract concrete domain entities (hyponyms), (2) abstract them into higher‑level categories (hypernyms), and (3) use the resulting semantic pairs to drive downstream processing.

The extracted hypernyms are inserted into a graph‑based knowledge base managed by an “Enhanced CoALA” multi‑agent. This agent maintains a dynamic concept network, allowing incremental updates as new reports arrive. The graph is then mapped to CLIPS expert‑system constructs (defclass, deftemplate, defrule). A second module, “Expert System B,” runs the CLIPS forward‑chaining inference engine, validates the generated code, and filters out hallucinations or syntactic errors that may have survived the LLM guardrails. Deterministic execution is enforced through fixed random seeds, disabled CuDNN benchmarking, greedy decoding, and explicit device placement, providing reproducibility across GPU platforms.

Two experimental tasks evaluate the approach. Task A assesses the semantic extraction and multilabel classification performance on a human‑annotated CTI dataset (CTI‑HAL) containing 81 reports and 116 MITRE ATT&CK technique labels. The hypernym‑based prompting outperforms static embeddings (Word2Vec, GloVe), contextual embeddings (SecureBERT), and traditional classifiers (Naïve Bayes, SVM, Random Forest). It achieves a weighted F1 of 0.329, Top‑10 accuracy of 0.968, BERTScore of 0.858, and ROUGE‑L of 0.444, with a notable gain on minority classes, addressing the well‑known class‑imbalance problem in cyber‑security datasets.

Task B runs the full pipeline on a second corpus (CTI‑B) of 66 malware entries, generating iptables‑style firewall rules. Five cybersecurity experts evaluate the output on three dimensions: technical correctness (syntactic validity), fidelity to the original CTI, and scope calibration (appropriateness of the rule’s coverage). Inter‑rater agreement is strong: Krippendorff’s α of 0.5768 for technical correctness, 0.5215 for fidelity, and 0.5030 for scope, with Spearman ρ values ranging from 0.4595 to 0.7143, indicating consistent expert judgments and practical usability.

Key contributions are: (1) a novel hypernym‑hyponym extraction method that leverages LLM prompting to enrich semantic search in CTI; (2) a neuro‑symbolic multi‑agent architecture that bridges the extracted semantics with a rule‑based expert system, enabling deterministic code generation; (3) empirical evidence that the approach outperforms baseline methods on imbalanced multilabel classification and produces high‑quality firewall rules validated by domain experts.

Limitations include reliance on CLIPS, which restricts the expressiveness of generated policies compared with modern programmable data‑plane languages (e.g., OpenFlow, eBPF). The hypernym lexicon is domain‑specific, so extending the method to other cyber‑security taxonomies (STIX/TAXII) will require additional mapping work. Finally, while deterministic inference heuristics reduce variability, complete elimination of LLM hallucinations still depends on post‑generation verification.

In conclusion, the study demonstrates that semantic relation extraction combined with a neuro‑symbolic, multi‑agent framework can close the gap between threat intelligence and actionable defense measures, delivering rapid, trustworthy firewall rule generation. Future work will explore integration with richer policy languages, real‑time streaming CTI ingestion, and stronger guarantees of deterministic LLM behavior.


Comments & Academic Discussion

Loading comments...

Leave a Comment