Towards Sensitivity-Aware Language Models
With LLMs increasingly deployed in corporate data management, it is crucial to ensure that these models do not leak sensitive information. In the context of corporate data management, the concept of sensitivity awareness has been introduced, enabling LLMs to adhere to predefined access rights rules. However, it remains unclear how sensitivity awareness relates to established notions of privacy, such as differential privacy (DP), thereby making it difficult to deploy meaningfully in real-world applications. In this work, we formalize the notion of sensitivity awareness and theoretically establish its connection to DP. Additionally, we develop a supervised fine-tuning recipe to make existing, four-bit quantized LLMs more sensitivity-aware. With a performance boost of up to 21.7%, the finetuned LLMs not only substantially improve over their baseline but also outperform other full-precision open-source and commercial models of similar size in achieving sensitivity awareness, demonstrating the effectiveness of our proposed approach. At the same time, our method also largely preserves the models’ performance on other tasks, such as general instruction-following, mathematical, and common-sense reasoning.
💡 Research Summary
The paper tackles the pressing problem of preventing large language models (LLMs) from leaking confidential corporate data. It introduces a formal notion of Sensitivity Awareness (SA), which requires a model to obey predefined access‑right rules: it must not reveal sensitive information to unauthorized users, must not hallucinate inaccurate data, and must respect any non‑SA output constraints while still providing correct answers to authorized queries. The authors first extend the earlier informal definition of SA by embedding it in a rigorous privacy‑game framework that draws on role‑based access control (RBAC). In this game, an adversary receives the non‑sensitive context φ(z) and black‑box access to the model’s retrieval system, then attempts to infer the hidden sensitive attribute π(z).
Through Lemma 1 they prove that the advantage an adversary gains in the SA game is bounded by its advantage in a standard attribute‑inference (AI) game, establishing SA as a stricter variant of AI. Definition 3.1 quantifies SA advantage as the probability of correctly guessing π(z) minus the baseline random‑guess probability. Theorem 2 provides a general lower bound on this advantage based solely on statistical correlations between φ(z) and π(z), showing that no mechanism can completely eliminate leakage that is inherent in the data distribution. Theorem 3 then links SA to differential privacy (DP): if the model is trained with an (ε, δ)‑DP algorithm, any adversary’s SA advantage is upper‑bounded by e^ε − 1 + 2δ · e^ε + 1. This bridges the gap between the well‑studied DP guarantees (originally for training‑data privacy) and the inference‑time access‑control guarantees required for corporate deployments.
On the practical side, the authors develop a supervised fine‑tuning (SFT) recipe that works on four‑bit quantized LLMs. They use Low‑Rank Adaptation (LoRA) to inject a small set of trainable parameters that learn to apply a “guard” function enforcing RBAC policies on the model’s raw outputs. Training data consist of the Access Denied Inc (ADI) benchmark—designed to test SA—and standard instruction‑following, math, and common‑sense tasks. After fine‑tuning, the quantized models achieve up to a 21.7 % increase in SA scores, surpassing similarly sized full‑precision open‑source and commercial models. Importantly, performance on the non‑SA benchmarks degrades by less than 1 %, demonstrating that SA can be improved without sacrificing general utility.
The authors also explore the privacy‑utility trade‑off by varying the DP parameter ε. Experiments show that ε≈1.5–2 provides a good balance: SA improves markedly while overall task accuracy remains high. This suggests that modest DP noise is sufficient to curb excess leakage beyond the unavoidable statistical baseline identified in Theorem 2.
Limitations are acknowledged: the current RBAC model is static, the privacy game assumes a simplified retrieval oracle, and dynamic policy changes or multi‑tenant scenarios are not covered. Future work is proposed to extend the framework to adaptive access controls, federated learning across organizations, and more realistic retrieval pipelines.
All code, data, and evaluation scripts are released publicly, enabling reproducibility and encouraging further research on privacy‑preserving LLMs for enterprise use. In sum, the paper provides both a solid theoretical connection between sensitivity awareness and differential privacy, and a practical, low‑cost method to make quantized LLMs safer for corporate data management while retaining their broad reasoning capabilities.
Comments & Academic Discussion
Loading comments...
Leave a Comment