Agentic Knowledge Distillation: Autonomous Training of Small Language Models for SMS Threat Detection

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

SMS-based phishing (smishing) attacks have surged, yet training effective on-device detectors requires labelled threat data that quickly becomes outdated. To deal with this issue, we present Agentic Knowledge Distillation, which consists of a powerful LLM acts as an autonomous teacher that fine-tunes a smaller student SLM, deployable for security tasks without human intervention. The teacher LLM autonomously generates synthetic data and iteratively refines a smaller on-device student model until performance plateaus. We compare four LLMs in this teacher role (Claude Opus 4.5, GPT 5.2 Codex, Gemini 3 Pro, and DeepSeek V3.2) on SMS spam/smishing detection with two student SLMs (Qwen2.5-0.5B and SmolLM2-135M). Our results show that performance varies substantially depending on the teacher LLM, with the best configuration achieving 94.31% accuracy and 96.25% recall. We also compare against a Direct Preference Optimisation (DPO) baseline that uses the same synthetic knowledge and LoRA setup but without iterative feedback or targeted refinement; agentic knowledge distillation substantially outperforms it (e.g. 86-94% vs 50-80% accuracy), showing that closed-loop feedback and targeted refinement are critical. These findings demonstrate that agentic knowledge distillation can rapidly yield effective security classifiers for edge deployment, but outcomes depend strongly on which teacher LLM is used.

💡 Research Summary

The paper introduces “Agentic Knowledge Distillation,” a fully autonomous framework that uses a large language model (LLM) as a self‑directed teacher to train a small language model (SLM) for on‑device SMS spam and smishing detection. Traditional knowledge distillation requires human engineers to design data pipelines, label data, and iteratively refine models. In contrast, the proposed system gives the teacher LLM a task description (“SMS Threat Detection”) and a set of evaluation criteria, then lets it generate synthetic training data, fine‑tune the student SLM with Low‑Rank Adaptation (LoRA), evaluate performance on a synthetic validation set, hypothesize failure patterns, generate targeted additional data, and repeat until the validation metrics plateau—all without any human‑labeled data or external feedback.

Four state‑of‑the‑art LLMs—Claude Opus 4.5, GPT 5.2 Codex, Gemini 3 Pro, and DeepSeek V3.2—serve as teachers. Two instruction‑tuned SLMs—Qwen2.5‑0.5B‑Instruct (≈494 M parameters) and SmolLM2‑135M‑Instruct (≈135 M parameters)—serve as students. For each teacher‑student pairing, the teacher first creates a balanced synthetic corpus of spam and benign SMS messages covering modern attack tactics (phishing links, URL shorteners, homoglyphs, crypto scams, etc.). The student is fine‑tuned on this data using LoRA, which injects small trainable matrices into selected layers while keeping the bulk of the model frozen, enabling efficient training on consumer hardware.

After each fine‑tuning iteration, the teacher evaluates the student on a fixed synthetic validation set and receives a metric vector (accuracy, precision, recall, false‑positive and false‑negative rates). Based solely on these aggregated metrics, the teacher decides whether performance has converged. If not, it performs error analysis: high false‑positives trigger generation of more benign‑looking messages, high false‑negatives trigger additional subtle phishing examples, etc. This targeted data augmentation mimics hard‑negative mining but is driven entirely by the teacher’s domain knowledge rather than observed misclassifications. The loop repeats until metrics cease improving, at which point the final student model is evaluated on the real, human‑labeled SMS Spam Collection dataset (held out from the entire loop) for a “sanity check.”

Results show a strong dependence on teacher LLM quality. The best configuration (Claude Opus 4.5 teaching Qwen2.5‑0.5B) achieved 94.31 % accuracy and 96.25 % recall, while the weakest (DeepSeek V3.2 teaching SmolLM2) reached only about 86 % accuracy. To isolate the contribution of the closed‑loop, agentic process, the authors also implemented a non‑agentic baseline using Direct Preference Optimisation (DPO). The DPO baseline uses the same synthetic data and LoRA settings but performs a single‑stage fine‑tuning without iterative feedback or targeted data generation. Across all teacher models, DPO yields substantially lower performance (50–80 % accuracy), confirming that the iterative, feedback‑driven refinement is the key driver of success.

The paper contributes four main points: (1) a novel autonomous knowledge‑distillation pipeline where the LLM acts as a self‑sufficient ML engineer; (2) an empirical study demonstrating that teacher LLM selection critically impacts student SLM effectiveness; (3) evidence that closed‑loop feedback and targeted synthetic data generation dramatically outperform static preference‑based fine‑tuning; and (4) validation that LoRA‑based fine‑tuning enables deployment of sub‑billion‑parameter models on edge devices for real‑time SMS threat filtering.

In conclusion, Agentic Knowledge Distillation eliminates the need for costly human labeling and rapid model redesign in the face of evolving SMS‑based attacks. By leveraging the generative and reasoning capabilities of modern LLMs, it creates a self‑optimising training loop that can quickly adapt to new threat patterns while producing lightweight models suitable for on‑device deployment. The study also highlights the importance of choosing high‑quality teacher LLMs and suggests future work on quantifying teacher selection criteria, extending the approach to other security domains, and exploring robustness against adversarially crafted synthetic data.

Agentic Knowledge Distillation: Autonomous Training of Small Language Models for SMS Threat Detection

💡 Research Summary

Comments & Academic Discussion

Leave a Comment