An empirical analysis of zero-day vulnerabilities disclosed by the zero day initiative
Zero-day vulnerabilities represent some of the most critical threats in cybersecurity, as they correspond to previously unknown flaws in software or hardware that are actively exploited before vendors can develop and deploy patches. During this exposure window, affected systems remain defenseless, making zero-day attacks particularly damaging and difficult to mitigate. This study analyzes the Zero Day Initiative (ZDI) vulnerability disclosures reported between January and April 2024, Cole [2025] comprising a total of 415 vulnerabilities. The dataset includes vulnerability identifiers, Common Vulnerability Scoring System (CVSS) v3.0 scores, publication dates, and short textual descriptions. The primary objectives of this work are to identify trends in zero-day vulnerability disclosures, examine severity distributions across vendors, and investigate which vulnerability characteristics are most indicative of high severity. In addition, this study explores predictive modeling approaches for severity classification, comparing classical machine learning techniques with deep learning models using both structured metadata and unstructured textual descriptions. The findings aim to support improved patch prioritization strategies, more effective vulnerability management, and enhanced organizational preparedness against emerging zero-day threats.
💡 Research Summary
The paper presents a comprehensive empirical study of zero‑day vulnerability disclosures reported to the Zero Day Initiative (ZDI) between January and April 2024, encompassing 415 distinct findings. The authors aim to (1) uncover temporal and vendor‑specific trends, (2) identify which structured (vendor, CVE‑assignment flag, publication date) and unstructured (textual description) features most strongly predict high severity (CVSS v3.0 ≥ 7.0), and (3) evaluate a suite of predictive models ranging from classical machine‑learning algorithms to modern deep‑learning architectures.
The dataset includes CVE identifiers where available, CVSS base scores, disclosure dates, and short free‑text descriptions. After standard preprocessing (tokenization, stop‑word removal, lemmatization), textual data are vectorized using TF‑IDF and subsequently reduced in dimensionality via Singular Value Decomposition (SVD) or Principal Component Analysis (PCA). Feature importance is assessed with chi‑square and mutual information scores, revealing that vulnerabilities affecting widely deployed vendors (Microsoft, Adobe, Oracle) and containing high‑risk keywords such as “remote code execution”, “privilege escalation”, and “buffer overflow” have the strongest correlation with elevated CVSS scores. Dimensionality reduction shrinks the feature space from roughly 300 TF‑IDF dimensions to about 50 while preserving >95 % of variance, which improves both training speed and interpretability.
Three research questions (RQ1‑RQ3) guide the experimental work. RQ1 explores feature predictiveness and the impact of dimensionality reduction on classification performance. RQ2 compares classical models (Logistic Regression, Decision Trees, Random Forest, XGBoost) against deep‑learning models (1‑D Convolutional Neural Networks, LSTMs, and Transformer‑based BERT‑tiny). RQ3 investigates evaluation metrics suitable for the inherent class imbalance (high‑severity ≈ 18 %, medium ≈ 42 %, low ≈ 40 %).
Results show that a Random Forest trained on the reduced TF‑IDF + structured metadata achieves 92 % overall accuracy and a macro‑F1 of 0.86. Adding a lightweight BERT‑tiny model to process the raw description raises macro‑F1 to 0.89 while keeping inference latency low enough for real‑time integration. Hybrid multimodal models that fuse text embeddings with structured features reach the highest performance (≈ 94 % accuracy, macro‑F1 ≈ 0.91). However, all models exhibit lower recall for the high‑severity class (≈ 0.71–0.74), underscoring the need for metrics beyond plain accuracy. ROC‑AUC scores of 0.94 and PR‑AUC of 0.88 confirm that the classifiers can discriminate high‑risk cases despite the imbalance.
The authors discuss practical implications: (i) simple rule‑based filters using vendor and keyword cues can provide rapid early triage; (ii) dimensionality‑reduced Random Forests offer a strong accuracy‑efficiency trade‑off suitable for integration into existing vulnerability‑management pipelines; (iii) lightweight Transformers such as BERT‑tiny are viable for environments where textual nuance matters but computational resources are limited.
Limitations include the exclusive reliance on ZDI data, which may over‑represent certain software categories and under‑represent others, and the subjectivity inherent in CVSS scoring. The short, non‑standardized descriptions also limit the richness of textual features. Future work is proposed to merge ZDI with the broader NVD corpus for longitudinal trend analysis, incorporate multimodal inputs (code snippets, binary static analysis), and explore semi‑supervised or few‑shot learning techniques to better handle emerging, sparsely‑documented zero‑day exploits.
In conclusion, the study demonstrates that combining structured metadata with carefully engineered textual representations yields robust severity‑prediction models for zero‑day disclosures. By emphasizing appropriate evaluation metrics for imbalanced data and showcasing the trade‑offs between classical and deep learning approaches, the paper provides actionable insights for security teams seeking to prioritize patching efforts and improve overall cyber‑risk posture.
Comments & Academic Discussion
Loading comments...
Leave a Comment