Lightweight GenAI for Network Traffic Synthesis: Fidelity, Augmentation, and Classification

1 Lightweight GenAI for Network T raf ﬁc Synthesis: Fidelity , Augmentation, and Classiﬁcation Giampaolo Bov enzi, Domenico Ciuonzo, Jonatan Kroliko wski, Antonio Montieri, Alfredo Nascita, Antonio Pescap ` e, Dario Rossi Abstract —Accurate Network T rafﬁc Classiﬁcation (NTC) is in- creasingly constrained by limited labeled data and strict privacy requir ements. While Network T rafﬁc Generation (NTG) pr ovides an effective means to mitigate data scarcity , con ventional genera- tive methods struggle to model the complex temporal dynamics of modern trafﬁc or/and often incur signiﬁcant computational cost. In this article, we address the NTG task using lightweight Gen- erative Artiﬁcial Intelligence (GenAI) architectures, including transformer -based, state-space, and diffusion models designed for practical deployment. W e conduct a systematic evaluation along four axes: ( 𝑖 ) (synthetic) trafﬁc ﬁdelity , ( 𝑖𝑖 ) synthetic-only training, ( 𝑖𝑖𝑖 ) data augmentation under low-data regimes, and ( 𝑖 𝑣 ) com- putational efﬁciency . Experiments on two heterogeneous datasets show that lightweight GenAI models preser ve both static and temporal trafﬁc characteristics, with transf ormer and state-space models closely matching real distributions across a complete set of ﬁdelity metrics. Classiﬁers trained solely on synthetic trafﬁc achieve up to 87% F1-score on real data. In low-data settings, GenAI-driven augmentation impr oves NTC performance by up to + 40% , substantially reducing the gap with full-data training. Overall, transformer -based models provide the best trade-off between ﬁdelity and efﬁciency , enabling high-quality , priv acy- aware trafﬁc synthesis with modest computational overhead. I N T R O D U C T I O N The rapid ev olution of Generative Artiﬁcial Intelligence (GenAI) is reshaping communications, accelerating the vision of autonomous and self-e volving networks [1]. Large gener- ativ e models have demonstrated remarkable potential across telecom domains, exploiting massi ve datasets to learn com- plex patterns and generate new content. Applications already span physical-layer optimization and channel modeling [2], semantic communications, and network planning [1]. While their adoption for data-centric network monitoring and man- agement [3] is also blooming, it remains an open frontier . W ithin this landscape, Network Traf ﬁc Classiﬁcation (NTC) remains pi votal for network management and security across mobile, wired, and capillary Internet of Things (IoT) environ- ments. While accurate NTC enables various critical functions, such as anomaly detection, quality of service provisioning, and security policy enforcement, traditional data-dri ven solu- tions often struggle to meet practical desiderata. Despite the advancements fueled by Machine Learning (ML), practical constraints like limited labeled data, imbalanced classes, and strict priv acy regulations act as severe roadblocks. These constraints hinder scalability and reduce rob ustness in real- world settings [4]. G. Bov enzi, D. Ciuonzo, A. Montieri, A. Nascita, and A. Pescap ` e are with the DIETI, University of Naples Federico II, Italy . J. Krolikowski and D. Rossi are with Huawei T echnologies France SASU. T o address these limitations, Network Traf ﬁc Generation (NTG) has been widely e xplored to synthesize realistic traf- ﬁc data, enrich training sets, and strengthen classiﬁcation robustness. Ne vertheless, con ventional approaches fall short. Classical data augmentation and traditional ML-based NTC rely on handcrafted statistical features, limiting adaptability to ev olving trafﬁc patterns and f ailing to capture rich tem- poral dependencies. Meanwhile, con ventional NTG solutions struggle with a core trade-of f: producing high-ﬁdelity synthetic trafﬁc while maintaining the computational efﬁcienc y required by modern networks [3]. In this article, we advocate for a shift to ward lightweight GenAI architectur es for practical NTG. As depicted in Fig. 1, we propose a modular NTG pipeline speciﬁcally tailored for GenAI. Rather than synthesizing ﬁne-grained payload bytes using massive foundation models, our approach generates a compact traf ﬁc repr esentation derived fr om header ﬁelds of the ﬁrst packets in each network ﬂow . This choice enables training and operating advanced GenAI models, namely T ransformers , State Space Models (SSMs) , and Diffusion Models (DMs) , under a budget of 1 – 2 million parameters , ensuring computa- tional efﬁcienc y without sacriﬁcing generation ﬁdelity . T o validate this lightweight paradigm, we formulate f our Research Questions (RQs) aligned with real-world network- ing challenges (Fig. 1, bottom left). W e assess both generated data quality from dif ferent viewpoints ( T rafﬁc Evaluation – RQ1 to RQ3) and GenAI model efﬁcienc y ( Model Evaluation – RQ4): RQ1: Can lightweight GenAI faithfully reproduce real traf ﬁc patterns? RQ2: Can GenAI synthetic traf ﬁc enable priv acy-preserving NTC without degrading performance? RQ3: Can GenAI synthetic data mitigate training scarcity in low-data regimes? RQ4: Are lightweight GenAI models computationally efﬁcient for deployment? T o answer these RQs, our work provides an e xtensive eval- uation of synthetic traf ﬁc generated via lightweight GenAI models by ( 𝑖 ) deﬁning a comprehensi ve ﬁdelity assessment pr ocedure , and tackling two piv otal do wnstream NTC tasks, namely ( 𝑖𝑖 ) synthetic-only training and ( 𝑖𝑖𝑖 ) data augmentation for low-data r e gimes . Complementarily , we deliv er an ( 𝑖 𝑣 ) usability assessment in terms of space and time complexity . Experiments on two public datasets— Mirage - 2019 ( 40 mobile apps) and CESNET - TLS22 - 80 ( 80 network services)—demonstrate that lightweight GenAI models achiev e strong ﬁdelity , preserving both static and temporal 2 Data Augmentation TrafficMatrix2Image Diffusion Models Transformers Traffic Segmentation Feature Extraction Packets Fields BF Real Network Traffic Mobile apps or network services Diffusion Models Transformers State-Space Model s Image2TrafficMatrix Token2TrafficMatrix Generation conditioned on app or service (e.g., Y ouTube) State-Space Models Downstream Task Model (e.g., network traffic classifier) Fidelity Evalutation (Few) Real T raffic Data # packets histogram 2-gram histogram 1-gram histogram Synthetic T raffic Data Markov transition matrix T raining Phase Workflow Generation Phase Work flow Fidelity Synthetic Training Efficiency Real World Challenges • Generation reliability • Privacy concerns • Data scarcity • Computational overhead Model Evaluation T raffic Evaluation TrafficMatrix2Token Fig. 1. Overvie w of the proposed lightweight GenAI-based network trafﬁc generation pipeline and real-world challenges linked with our Research Questions . T raining phase workﬂow: real network traces are segmented into biﬂows and mapped into canonical image- or token-based representations; these serve as inputs to train diffusion, transformer, or state-space generative models. Generation phase workﬂow: trained GenAI models are conditioned to generate image- or token-based representations for a given trafﬁc class; these are then converted into trafﬁc matrices to feed downstream network trafﬁc classiﬁers; generation efﬁcacy is assessed in terms of both model and trafﬁc ev aluation. trafﬁc patterns. Compared to traditional NTG baselines (e.g., CVAE , SMOTE, and domain-expert transformations) and DMs, T ransformers and SSMs (i.e. LLaMA and Mamba , respectiv ely) exhibit higher performance. Classiﬁers trained exclusi vely on synthetic data reach up to 87% F1-score on real trafﬁc, while GenAI-driven augmentation improves F1-scores by up to + 40% in low-data regimes. Last b ut not least, resource analysis rev eals a clear trade-off between architectural complexity and computational cost, with T ransformer-based models of fering the most fav orable balance for practical deployment, combining a moderate memory footprint with the lowest generation latency . The remainder of the article explores the background and related work shaping the current NTG landscape and presents the proposed lightweight GenAI-based pipeline. It then details the experimental setup and discusses the results addressing the four RQs, before concluding with directions for future work. 3 T H E G E N A I P A R A D I G M S H I F T I N N E T W O R K T R A FFI C G E N E R A T I O N This section frames NTG ev olution, categorizing existing work into three phases: early statistical methods and con- ventional ML models, current large-scale GenAI models, and the emerging need for lightweight solutions addressing our research gap. T raditional Generative Methods. Early NTG relied on sta- tistical generative models (e.g., Mark ov models) to capture sequential dependencies through state transitions. Though in- tuitiv e and lightweight, they do not scale well to mid- or long- range dependencies. Con ventional ML models improved ex- pressiv e power [5]. V ariational Autoencoders learn probabilis- tic latent representations that enhance reconstruction accuracy , while Normalizing Flows provide exact likelihood estimation at the cost of a higher computational overhead. Generati ve Ad- versarial Networks, in contrast, focus on high-ﬁdelity sampling but often suffer from training instability and mode collapse. Although traditional methods support downstream tasks like intrusion detection and traf ﬁc classiﬁcation [3, 5], the y struggle to balance training stability , computational efﬁcienc y , and ﬁdelity in modeling complex, evolving trafﬁc patterns. Recent GenAI Advancements. Since 2021, NTG has been impacted by the growing popularity of T ransformers , DMs , and SSMs , each employing distinct strate gies for traf ﬁc rep- resentation. Originating from Natural Language Processing (NLP), T ransformer-based models (e.g., GPTs and T5 ) [6– 10] and SSMs (i.e. Mamba ) [11] treat network trafﬁc as token sequences, learning networking “grammar” to model ﬂow dynamics. Input sequences range from ra w packet bytes [8, 11] and header ﬁelds (e.g., packet sizes, inter-arri val times, and di- rections [6]) to tcpdump / tshark packet summaries [7, 10]. T o further bridge the modality gap between natural language and network data, the authors of [9] combine specialized trafﬁc-domain tokenization and multimodal learning to un- derstand expert instructions and learn task-speciﬁc trafﬁc representations simultaneously . Crucially , generation strategies operate either iteratively , constructing ﬂows “token-by-token” akin to sentences in NLP [8], or natively synthesize complete sequences [6, 11]. A notable exception is the work in [7], which generates Python code interacting with the Scapy li- brary , acting more as traf ﬁc replay than GenAI synthesis. Con versely , DMs, originating from computer vision, hav e recently gained traction across the entire networking stack, from physical-layer channel generation and resource manage- ment [2] to NTG [12, 13]. DMs synthesize traf ﬁc by iterati vely denoising random data until structured patterns emerge. This requires encoding traf ﬁc into image-like representations [12] or directly modeling raw byte streams [13] to capture high- ﬁdelity details. Although these “large” models achie ve high ﬁdelity , their byte-lev el processing and iterative generation often make them impractical for network deployment. Positioning . Our work di ver ges from the emer ging trend of massiv e network foundation models [10] to in vestigate the feasibility of lightweight GenAI . Indeed, while foundation models offer generalization and ﬁne-tuning capabilities, they impose prohibitive computational costs. Instead, we explore NTG solutions based on T ransformers, SSMs, and DMs under strict resource constraints (i.e. 1 – 2 M parameters), prioritiz- ing deployability and training/inference speed. Unlike related works focused on generating raw packet bytes [11, 13] (which increases complexity and risks payload data leakage), we synthesize lightweight traf ﬁc features constituting the network ﬁngerprint of applications. Speciﬁcally , we model the time series of payload lengths and packet directions ( PL × DIR ), akin to [6, 8, 12], but explicitly discard inter-arri val times, as these depend on network conditions rather than application logic [14]. Furthermore, we address the lack of rigorous validation in prior studies by introducing adv anced ﬁdelity metrics, including PL × DIR 𝑛 -grams and Markov transition matrices, to assess temporal integrity . Finally , targeting practi- cal downstream tasks for NTG, our comprehensiv e ev aluation demonstrates that lightweight GenAI enables effecti ve classi- ﬁer training ev en in ( 𝑎 ) synthetic-only or ( 𝑏 ) low-data re gimes, ensuring efﬁcienc y for real-world deployment. L I G H T W E I G H T G E N A I - B A S E D N T G P I P E L I N E A T W O R K Figure 1 depicts the modular NTG pipeline po wered by lightweight GenAI models, structured around two distinct phases . The training phase pre-processes real network traces to train GenAI models, while the generation phase leverages them to synthesize high-ﬁdelity , application-conditioned trafﬁc data. T raining Phase W orkﬂow . The training phase ingests r eal network trafﬁc traces (e.g., from mobile apps or network services). T o bypass the computational overhead and priv acy risks associated with raw payload utilization, data undergoes traf ﬁc se gmentation to group packets into bidirectional ﬂows (biﬂows) 1 , follo wed by featur e extraction . This produces a highly ef ﬁcient traf ﬁc matrix representation, where ro ws are packets and columns are packet ﬁelds. Speciﬁcally , we extract the Payload Length (PL) and Packet Direction (DIR) of the ﬁrst 10 packets of each biﬂow . Depending on the GenAI model, this matrix under goes a speciﬁc modality-mapping before training: • TrafficMatrix2Image : Image-based models, such as DMs, interpret the (potentially preprocessed) trafﬁc matrix as a structured 2D image. This enables DMs to learn complex 2D patterns that encode both the under- lying features and the temporal dynamics of the ﬂo ws, generating the entire trafﬁc representation in a single, non-autoregressi ve step. • TrafficMatrix2Token : Sequence-based models, such as T ransformers and SSMs, treat the matrix as a multiv ariate time series, serializing it into a sequence of discrete tokens, where each token encodes the v ectorial values of the sequence steps. This enables autoregressiv e generation that explicitly captures complex temporal de- pendencies across packets and ﬁelds. 1 A biﬂow is a network ﬂow consisting of packets ﬂowing bidirectionally between the same network and transport endpoints—IPs, ports, and L4 protocol. It represents both directions of communication as a single entity . 4 Giv en this mapped data, the goal of the selected GenAI models is to learn a distribution that faithfully captures the underlying structure of the real network traces. Generation Phase W orkﬂo w . Once the GenAI models are trained, they are deployed to synthesize new trafﬁc samples. Generation is conditioned via a token prompt, which dictates the target network class (e.g., a certain mo- bile app like Y ouT ube, or a network service). The GenAI architectures generate synthetic samples in their nati ve for- mats: 2D images for DMs or token sequences for Trans- formers and SSMs. T o lev erage these outputs for downstream NTC, an in verse mapping step is needed. Speciﬁcally , the Image2TrafficMatrix and Token2TrafficMatrix steps reconstruct the generated samples back into the original traf ﬁc matrix format, recovering the corresponding synthetic PLs and DIRs. Generation ef ﬁcacy is ev aluated from two complementary viewpoints: • T rafﬁc Evaluation: T o assess generation ﬁdelity , we quantify the div ergence between real and synthetic trafﬁc distributions using distance metrics, such as the Jensen- Shannon Div ergence (JSD). More precisely , we compute these distances across the trafﬁc properties illustrated in Fig. 1: packet count histograms for session-lev el behavior , 1-gr am histogr ams for mar ginal probabilities of PL and DIR, 2-gram histograms for temporal de- pendencies across consecutive packet pairs, and Markov transition matrices for ﬁrst-order transition dynamics. Beyond ﬁdelity , we ev aluate the utility of generated traf ﬁc in two practical downstream NTC scenarios: synthetic- only training , where classiﬁers are trained exclusi vely on synthetic data and tested on real samples, and data augmentation , assessing whether enriching a few real samples with synthetic ones boosts classiﬁcation perfor- mance. • Model Evaluation: T o v alidate the deplo yment feasibility of our lightweight GenAI models, we also proﬁle their computational efﬁciency during training and inference, ev aluating training time, generation latency , GPU mem- ory utilization, and on-disk model footprint. Addition- ally , we in vestigate post-training quantization to assess whether these architectures can be further optimized for resource-constrained en vironments. Lightweight GenAI Models. T o implement our NTG pipeline under strict computational constraints, we leverage lightweight GenAI models ( ≈ 1 – 2 M parameters) belonging to different families: • Diffusion Models (DMs): DMs iteratively reverse a noising process to reconstruct realistic samples. W e use NetDiffus - NR [12], a reﬁned architecture operating on 2D Grammian Angular Summation Field (GASF) images. A post-generation reﬁnement step accurately maps the 2D GASF images back to trafﬁc sequences, minimizing reconstruction errors and improving the quality of the T ABLE I C O N FI G U R ATI O N O F T H E G E N A I M O D E L S , G RO U P E D B Y M A I N A T T R I B U T E S . T H E L A S T C O L U M N P R OV I D E S T H E R E P O S I T O RY L I N K . Model HS IS #L #AH Repo CVAE 500 / 250 20 6 – – LLaMA 160 320 4 8 Mamba 72 144 4 – NetDiffus - NR 32 – 4 4 § Legend: HS – Hidden Size; IS – Intermediate Size; #L – Number of Layers; #AH – Number of Attention Heads; Repo – Repository (clickable icon). synthesized trafﬁc traces. 2 • T ransf ormer-based Models: T ransformers excel at se- quence modeling by capturing temporal dependencies through self-attention. W e adopt LLaMA [15], a causal decoder-only architecture designed to model long-range dependencies efﬁciently . It leverages optimized attention mechanisms to autoregressi vely synthesize high-ﬁdelity sequences while maintaining computational scalability . • Structur ed State-Space Models (SSMs): T o comple- ment T ransformers, we explore SSMs designed for ef- ﬁcient sequence processing. W e employ Mamba [11], which replaces standard attention with a selectiv e state- space formulation. Operating causally , it achiev es linear- time scalability , well-suited for modeling per-biﬂo w traf- ﬁc sequences. E X P E R I M E N TA L E V A L UA T I O N This section e v aluates our lightweight GenAI-based NTG pipeline to answer the four RQs formulated in the Introduction. First, we outline the experimental setup, encompassing the employed datasets and generation conﬁgurations. Then, we provide the corresponding Research Answers (RAs), covering the trafﬁc ev aluation for generation ﬁdelity (RA1) and down- stream NTC utility (RA2 and RA3), follo wed by the model ev aluation proﬁling computational efﬁcienc y and deployment feasibility (RA4). Experimental Setup. Our ev aluation relies on two public network traf ﬁc datasets: Mirage - 2019 3 , containing 40 An- droid apps with ≈ 100 k biﬂo ws, and CESNET - TLS22 - 80 , a CESNET - TLS22 subset 4 downsampled to cover the top 80 ser- vices and obtain a sample size comparable to Mirage - 2019 . Raw traf ﬁc data are pre-processed into sequences of the ﬁrst 10 signed PLs ( ± PLs), where negati ve and positiv e values encode downstream and upstream DIRs, respectiv ely . For a fair cross-architecture comparison, all lightweight GenAI models are bounded to 1 – 2 M trainable parameters. Alongside these models, we employ a Conditional V aria- tional Autoencoder ( CVAE ) baseline for class-aware traf ﬁc generation. T able I summarizes their key hyperparameters. 2 W e e xclude heavier byte-le vel alternati ves like NetDiffusion [13] from our ev aluation, as its massiv e scale (hundreds of millions of parameters) and simpliﬁed direction modeling contradict our lightweight, time-series focus. 3 https://trafﬁc.comics.unina.it/mirage/mirage- 2019.html 4 https://www .liberouter .org/datasets/cesnet- tls22 5 C V A E N e t D i f f u s - N R L L a M A M a m b a J S D N u m P a c k e t s J S D 1 - g r a m J S D 2 - g r a m J S D M a r k o v L e a k a g e U n i q A l i g n 1 0.75 0.50 0.25 0 Mirage - 2019 J S D N u m P a c k e t s J S D 1 - g r a m J S D 2 - g r a m J S D M a r k o v L e a k a g e U n i q A l i g n 1 0.75 0.50 0.25 0 CESNET - TLS22 - 80 Fig. 2. Radar plots of 6 ﬁdelity metrics comparing real and syn- thetic trafﬁc data across generative models for Mirage - 2019 (left) and CESNET - TLS22 - 80 (right). For all considered metrics, lower values indicate better performance (with 0 being optimal). Note that the axes are scaled with 0 at the outer edge, meaning that models producing larger polygon areas exhibit higher generative ﬁdelity . For sequence-based models, the vocab ulary assigns a unique token to each signed PL value, resulting in 2 × PL max possible tokens, augmented with an token and N tokens ( 𝑁 = 40 for Mirage - 2019 and 𝑁 = 80 for CESNET - TLS22 - 80 ). T o ensure ﬁxed-length inputs during training, biﬂows with fe wer than 10 packets are right-padded via tokens. RA1 – Fidelity Ev aluation. T o answer RQ1, we quantitati vely assess the ﬁdelity of generated trafﬁc through the six metrics reported in the radar plots of Fig. 2. First, we translate the visual properties depicted in Fig. 1 into numerical metrics by computing the macro-av eraged JSD between real and synthetic distributions for packet count histograms (JSD NumPackets ), 1- gram histograms (JSD 1-gram ), 2-gram histograms (JSD 2-gram ), and Markov transition matrices (JSD Markov ). Furthermore, we assess the realism and priv acy of generated biﬂows via two additional metrics [5]: (i) UniqAlign e valuates data realism by computing the uniqueness score (i.e. the proportion of distinct sequences) independently for the real and synthetic datasets, and then measuring their absolute difference; a lower score indicates that synthetic traf ﬁc accurately replicates the repetition patterns of real data. (ii) Leakage directly quantiﬁes the exact sequence overlap between real and synthetic datasets via Jaccard similarity; a lo wer score indicates no vel biﬂow generation rather than mere memorization of training data, thereby mitigating pri vac y leakage risks. Figure 2 summarizes the performance of generati ve mod- els. Since all metrics follo w a “lower -is-better” logic, radar plot axes are in verted, meaning that larger areas correspond to higher generation ﬁdelity . Results are consistent across Mirage - 2019 and CESNET - TLS22 - 80 . LLaMA and Mamba outperform all alternatives, achie ving near -zero JSD scores across all ev aluated trafﬁc properties. They effecti vely capture both marginal PL × DIR distributions ( 1 -gram) and more com- plex temporal dependencies ( 2 -gram and Markov). Notably , LLaMA achie ves the best JSD Markov , conﬁrming its ability to model advanced sequential transitions. Also, both sequence- based models exhibit near -optimal UniqAlign and Leakage scores. This demonstrates that their generated samples closely T ABLE II F 1 - S C O R E S O F A N RF C L A S S I FI E R T R A I N E D O N S Y N T H E T I C A N D T E S T E D O N R E A L Mirage - 2019 ( O R A N G E ) A N D CESNET - TLS22 - 80 ( A Z U R E ) T R A FFI C . B E S T G E N A I M O D E L S A R E I N B O L D . (GenAI) Appr oach Mirage - 2019 CESNET - TLS22 - 80 T rain on Real 85 . 84% 91 . 80% CVAE 66 . 73% 75 . 44% NetDiffus - NR 46 . 83% 65 . 84% LLaMA 78.78% 87.43% Mamba 76 . 07% 85 . 09% match the real trafﬁc distribution, synthesizing highly div erse and realistic sequences without merely memorizing the train- ing set (with Mamba showing a slight edge over LLaMA in leakage mitigation). Interestingly , the CVAE baseline suit- ably performs only on coarse-grained properties, successfully matching the biﬂow length distrib ution (JSD NumPackets ≈ 0 ). Howe ver , its ﬁdelity drops when capturing complex pat- terns, exposing its structural limitations in modeling ﬁne- grained traf ﬁc dynamics, though it still limits data leakage. NetDiffus - NR consistently yields the lo west ﬁdelity . It struggles to capture structural trafﬁc properties, exhibiting the highest JSD scores (i.e. the smallest polygon area) and failing to accurately model ev en the biﬂow lengths. Despite these limitations, it achie ves a moderate UniqAlign and successfully minimizes Leakage . RA2 – NTC: T rain on Synthetic T rafﬁc. T o address RQ2, we ev aluate generated data utility via a train-on-synthetic, test- on-real approach. W e train a Random Forest ( RF ) downstream classiﬁer exclusi vely on synthetic samples and ev aluate its generalization on unseen real trafﬁc from Mirage - 2019 and CESNET - TLS22 - 80 . Performance under this setting reﬂects both the ﬁdelity of the synthetic samples and their alignment with real-world class distributions. T able II shows that RF models trained on synthetic traf- ﬁc exhibit an expected performance gap relati ve to real- data training, reﬂecting the inherent difﬁculty of fully repro- ducing realistic traf ﬁc characteristics. Nevertheless, LLaMA and Mamba generated data lead to consistently higher clas- siﬁcation performance, outperforming the CVAE baseline and NetDiffus - NR . On Mirage - 2019 , the RF trained on LLaMA and Mamba samples achie ves 78 . 78% and 76 . 07% F1-scores, respectively , substantially reducing the gap with the real-data upper bound ( 85 . 84% ). Similarly , on CESNET - TLS22 - 80 , the synthetic-trained RF reaches 87 . 43% ( LLaMA ) and 85 . 09% ( Mamba ) F1-scores, closely trailing the 91 . 80% obtained with real trafﬁc. In contrast, NetDiffus - NR severely underperforms across both datasets, conﬁrming that this DM struggles to capture trafﬁc character- istics relev ant for downstream classiﬁcation. T o summarize, LLaMA and Mamba of fer the best balance of generation ﬁdelity and utility for downstream tasks, enabling synthetic-to- real generalization and priv acy-preserving NTC with minimal performance degradation, whereas NetDiffus - NR appears less suitable for realistic trafﬁc synthesis. RA3 – NTC: Data A ugmentation. W e in vestigate GenAI- 6 real LLaMa Mamba CVAE NetDiffus-NR SMOTE Fast Retr. 5 10 20 50 P er centage of R eal Samples [%] 40 50 60 70 80 F1-Scor e [%] Model R e a l O n l y L L a M a M a m b a C V A E * N e t D i f f u s * S M O T E F a s t R e t r . Mirage - 2019 5 10 20 50 P er centage of R eal Samples [%] 72.5 75.0 77.5 80.0 82.5 85.0 87.5 90.0 F1-Scor e [%] R e a l O n l y L L a M a M a m b a C V A E * N e t D i f f u s * S M O T E F a s t R e t r . CESNET - TLS22 - 80 Fig. 3. F1-score in data augmentation scenarios under low-data regimes for Mirage - 2019 (left) and CESNET - TLS22 - 80 (right) using an RF classiﬁer . Colors indicate the approach family: orange for sequence-based GenAI, green for other generative models, red for statistical techniques, violet for expert transformations, black for real-only training. driv en data augmentation under data-scarcity conditions, where only a limited fraction of real training samples is av ailable to the do wnstream classiﬁer . The GenAI model is trained on the full labeled dataset and used to generate synthetic samples. The downstream classiﬁer is trained with few real samples plus synthetic data. This reﬂects a practical use case of a network operator with limited data lev eraging a pre-trained GenAI model to augment the training dataset. T o establish a more extensi ve benchmark, we compare GenAI models against two non-AI baselines: ( 𝑖 ) F ast Retransmit [14], a domain-expert traf ﬁc transformation , and ( 𝑖𝑖 ) SMO TE , a sta- tistical synthesis technique. Fast Retransmit probabilistically delays a single packet to mimic a TCP retransmission, while SMO TE generates new samples by replicating or interpolating existing ones. Unlike GenAI models, which are trained ofﬂine on the entire dataset, Fast Retransmit and SMOTE operate directly on the limited data av ailable at augmentation time. In detail, we ev aluate augmentation utility by training an RF classiﬁer on mixed real and synthetic trafﬁc. T o this end, we upsample all classes with synthetic samples to match the size of the majority class from the original training set, yielding a perfectly balanced set. In the low-data regime (i.e. 5 – 20% of the original train- ing set), Fig. 3 shows that GenAI-based augmentation pro- duces substantial F1-score improv ements. On Mirage - 2019 , LLaMA and Mamba samples signiﬁcantly boost the classi- ﬁcation performance compared to real-only training, rapidly approaching the F1-scores achiev ed with abundant real data. On CESNET - TLS22 - 80 , the gains remain consistent, albeit more moderate ( 10 – 15% F1-score improv ement ov er train- ing without augmentation). Conv ersely , traditional statisti- cal augmentation methods exhibit limited impact: SMO TE typically matches, or even slightly underperforms, real-only training. Fast Retransmit exhibits more fav orable behav- ior , yielding moderate gains in the lo w-data regime on CESNET - TLS22 - 80 . Nonetheless, its impact diminishes on Mirage - 2019 and remains strictly below that of LLaMA and Mamba . The other generati ve methods exhibit markedly dif- ferent behaviors across datasets. CVAE consistently improv es upon real-only training, conﬁrming its ability to model rele vant trafﬁc characteristics. NetDiffus - NR , instead, shows lim- T ABLE III T R A I N I N G A N D G E N E R A T I O N R E S O U R C E U S AG E ( T I M E , G P U U T I L I Z A T I O N , M E M O RY , A N D O N - D I S K S I Z E ) F O R E AC H M O D E L O N A H I G H - E N D D A TAC E N T E R G P U ( 4 8 G B ) , M E A S U R E D O V E R 1 0 RU N S A N D R E P O RT E D A S M E D I A N S . T R A I N I N G : 10 E P O C H S , 5 C L A S S E S , 500 S A M P L E S / C L A S S , B A T C H S I Z E 1 . G E N E R A T I O N : 100 S A M P L E S / C L A S S , BAT C H S I Z E 1 . Model T raining Generation Model Size Time [ s/epoch ] GPU [ % ] Mem [ MB ] Time [ ms/sample ] GPU [ % ] Mem [ MB ] on Disk [ MB ] CVAE 22 . 568 18 379 0 . 50 1 291 3 . 9 NetDiffus - NR 117 . 029 21 387 860 . 67 19 355 4 . 2 LLaMA 36 . 810 20 393 31 . 21 17 353 7 . 9 Mamba 108 . 477 15 377 148 . 52 14 359 15 . 5 LLaMA - PTQ int8 - WO — — — 45 . 33 16 357 3 . 5 LLaMA - PTQ int8 - DA — — — 990 . 04 14 347 3 . 4 ited effecti veness, yielding marginal gains on Mirage - 2019 and systematically underperforming on CESNET - TLS22 - 80 . Overall, sequence-based GenAI augmentation consistently achiev es the best performance across both datasets, highlight- ing its clear superiority ov er traditional statistical, expert- driv en, and other generative alternatives. RA4 – Computational Efﬁciency . Addressing RQ4, we analyze computational efﬁcienc y by measuring training time, generation latenc y , GPU utilization, memory consumption, and on-disk footprint. All generati ve models were trained for 10 epochs with 500 samples per class ( 5 classes, batch size of 1 ) using a high-end datacenter GPU ( 48 GB), and subsequently used to generate 100 synthetic samples per class. The results rev eal a clear trade-off between computational cost and architectural complexity . During training, times range from 22 . 6 s/epoch for CVAE to ≈ 108 – 117 s/epoch for mod- els with more structured generati ve mechanisms. Notably , LLaMA stands out as an exception, achie ving a highly com- petitiv e 36 . 8 s/epoch despite its autoregressi ve architecture. NetDiffus - NR exhibits the highest GPU utilization dur- ing training ( 21% ), followed by LLaMA ( 20% ) and CVAE ( 18% ), whereas Mamba is the most efﬁcient ( 15% ). Mem- ory usage remains uniform across all models, ranging from 377 MB ( Mamba ) to 393 MB ( LLaMA ), suggesting that the dominant memory cost is the framework o verhead rather than the model itself. Generation latency varies substantially . CVAE is the fastest ( 0 . 50 ms/sample), followed by LLaMA ( 31 . 21 ms/sample), whereas Mamba and NetDiffus - NR incur considerably higher latencies ( 148 . 52 ms/sample and 860 . 67 ms/sample, respectively). Regarding on-disk footprint, Mamba is by far the lar gest model at 15 . 5 MB, while the others range from 3 . 9 MB ( CVAE ) to 7 . 9 MB ( LLaMA ). Lastly , we explore P ost-T raining Quantization (PTQ) to assess whether these models can be further optimized for resource-constrained environments. W e apply PTQ exclu- siv ely to LLaMA , as its optimal trade-off between archi- tectural footprint (substantially smaller than Mamba ) and generation ﬁdelity makes it the ideal candidate for a tiny- footprint gener ative ar chitectur e . Speciﬁcally , we in vestigate two int8 PTQ variants: weight-only ( LLaMA - PTQ int8 - WO ) and dynamic activation ( LLaMA - PTQ int8 - DA ). The weight- only variant preserves GPU memory consumption while more than halving the model size (from 7 . 9 MB to 3 . 5 MB) and 7 reducing GPU utilization from 19% to 16% . This enhance- ment comes at the cost of a moderate increase in generation latency ( 45 . 33 ms/sample). Con versely , the dynamic activ ation variant further reduces the model size ( 3 . 4 MB) and GPU utilization ( 14% ), b ut incurs a substantial latency penalty ( ≈ + 960 ms/sample) due to the o verhead of on-the-ﬂy acti vation quantization. Notably , both quantized LLaMA variants achieve ﬁdelity metrics and real-trafﬁc classiﬁcation performance con- sistent with the non-quantized model up to tw o signiﬁcant ﬁgures, conﬁrming that quantization introduces no meaningful degradation. T aken together , these ﬁndings demonstrate that lightweight GenAI architectures, particularly LLaMA coupled with quantization, can achiev e the computational efﬁciency required for practical deployment. C O N C L U S I O N This work presented a comprehensiv e study on generativ e approaches for network trafﬁc synthesis, focusing on ﬁdelity , downstream classiﬁcation, and deployment feasibility . Overall, our results highlight LLaMA and Mamba as the most promis- ing models for realistic trafﬁc synthesis, pri vac y-preserving classiﬁcation, and effecti ve data augmentation. In ﬁdelity assessment (RQ1), both models achiev e near-zero JSD scores across all properties; LLaMA excels on Markov transition matrices, while Mamba shows a slight edge in leakage miti- gation. In synthetic-only training (RQ2), LLaMA and Mamba reach 78 . 78% and 76 . 07% F1-score on Mirage - 2019 , and 87 . 43% and 85 . 09% on CESNET - TLS22 - 80 , narro wing the gap with real-data training to ≈ 9% and ≈ 13% , respec- tiv ely . In data augmentation (RQ3), sequence-based GenAI improv es classiﬁcation by up to + 40% F1-score in the low- data regime ( 5 – 20% of real training set) on Mirage - 2019 , with consistent + 10 – 15% gains on CESNET - TLS22 - 80 . Re- garding computational efﬁcienc y (RQ4), LLaMA offers the best trade-of f with 36 . 8 s/epoch training time, 31 . 21 ms/sample generation latency , and a 7 . 9 MB on-disk footprint, further reducible to 3 . 5 MB via int8 weight-only post-training quan- tization, without meaningful degradation in generation ﬁdelity or classiﬁcation performance. Con versely , diffusion models ( NetDiffus - NR ) and baselines ( CVAE ) prov ed less ef fectiv e, either incurring prohibiti ve generation latencies or failing to capture ﬁne-grained temporal dynamics. Future work will explore adapti ve generation strategies and hybrid pipelines combining generativ e models with domain- speciﬁc transformations. Furthermore, we aim to ev aluate these lightweight architectures in real-world deployment sce- narios, such as integrating quantized models into edge-based intrusion detection systems for on-the-ﬂy , priv acy-preserving trafﬁc analysis. Lastly , e xtending this paradigm to ward Agentic AI powered by lightweight models represents a promising frontier to enable autonomous, closed-loop network simulation and proactiv e defense mechanisms. R E F E R E N C E S [1] L. Bariah et al. , “Large generativ e AI models for telecom: The next big thing?” IEEE Commun. Mag . , vol. 62, no. 11, pp. 84–90, 2024. [2] X. Xu et al. , “Generative artiﬁcial intelligence for mobile communica- tions: A diffusion model perspecti ve, ” IEEE Commun. Mag. , 2024. [3] G. Bov enzi et al. , “Mapping the landscape of generative AI in network monitoring and management, ” IEEE T rans. Netw . Service Manag. , 2025. [4] G. Aceto et al. , “AI-powered internet trafﬁc classiﬁcation: Past, present, and future, ” IEEE Commun. Mag. , vol. 62, no. 9, pp. 168–175, 2023. [5] ——, “Synthetic and priv acy-preserving trafﬁc trace generation using generativ e ai models for training network intrusion detection systems, ” Journal of Network and Computer Applications , p. 103926, 2024. [6] R. F . Bikmukhamedov and A. F . Nadeev , “Multi-class network trafﬁc generators and classiﬁers based on neural networks, ” in Systems of Signals Generating and Proc. in the F ield of on Boar d Comm. , 2021. [7] D. K. Kholgh and P . K ostakos, “P A C-GPT: A novel approach to generating synthetic network trafﬁc with GPT -3, ” IEEE Access , vol. 11, pp. 114 936–114 951, 2023. [8] J. Qu et al. , “Traf ﬁcGPT: Breaking the token barrier for efﬁcient long trafﬁc analysis and generation, ” arXiv preprint , 2024. [9] T . Cui et al. , “Traf ﬁcLLM: Enhancing large language models for net- work trafﬁc analysis with generic trafﬁc representation, ” arXiv pr eprint arXiv:2504.04222 , 2025. [10] S. Mayhoub et al. , “T alk like a packet: Rethinking network trafﬁc analysis with transformer foundation models, ” IEEE Commun. Mag. , 2026, in press. [11] A. Chu et al. , “Feasibility of state space models for network trafﬁc generation, ” in Pr oc. of the SIGCOMM W orkshop on Networks for AI Computing , 2024, pp. 9–17. [12] N. Siv aroopan et al. , “NetDiffus: Network traf ﬁc generation by diffusion models through time-series imaging, ” Computer Networks , vol. 251, p. 110616, 2024. [13] X. Jiang et al. , “Netdiffusion: Network data augmentation through protocol-constrained trafﬁc generation, ” Proc. of the ACM on Measur e- ment and Analysis of Computing Systems , vol. 8, no. 1, pp. 1–32, 2024. [14] C. W ang et al. , “Data augmentation for trafﬁc classiﬁcation, ” in Int. Conf. on P assive and Active Network Measurement , 2024, pp. 159–186. [15] H. T ouvron et al. , “Llama: Open and ef ﬁcient foundation language models, ” arXiv preprint , 2023. Giampaolo Bovenzi (giampaolo.bovenzi@unina.it) is an Assistant Professor at the University of Napoli Federico II. His research concerns (anonymized and encrypted) trafﬁc classiﬁcation and network security . Domenico Ciuonzo [SM] (domenico.ciuonzo@unina.it) is an Associate Pro- fessor at the Uni versity of Napoli Federico II. His research concerns data fusion, network analytics, IoT , signal processing, and AI. Jonatan Kroliko wski (jonatan.krolikowski@hua wei.com) is a senior research engineer at the DataCom Lab of Huawei’ s Paris Research Center . His research interests include ML- and operations research-driven optimization of real- world networks and the modeling and analysis of network-related problems, with a recent focus on time series modeling. Antonio Montieri (antonio.montieri@unina.it) is an Assistant Professor at the Univ ersity of Napoli Federico II. His research concerns network measure- ments, trafﬁc classiﬁcation, modeling and prediction, and AI for networks. Alfredo Nascita (alfredo.nascita@unina.it) is an Assistant Professor at the Univ ersity of Napoli Federico II. His research interests include Internet network trafﬁc analysis, machine and deep learning, and explainable artiﬁcial intelligence. Antonio Pescap ´ e [SM] (pescape@unina.it) is a Full Professor at the Univer- sity of Napoli Federico II. His work focuses on measurement, monitoring, and analysis of the Internet. Dario Rossi [SM] (dario.rossi@huawei.com) is network AI CTO and director of the DataCom Lab at Huawei T echnologies, France. He has coauthored 15+ patents and over 200+ papers in leading conferences and journals, and has receiv ed 9 best paper awards, a Google Faculty Research A ward (2015), and an IR TF Applied Network Research Prize (2016).

Lightweight GenAI for Network Traffic Synthesis: Fidelity, Augmentation, and Classification

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment