Good-Enough LLM Obfuscation (GELO)

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Large Language Models (LLMs) are increasingly served on shared accelerators where an adversary with read access to device memory can observe KV caches and hidden states, threatening prompt privacy for open-source models. Cryptographic protections such as MPC and FHE offer strong guarantees but remain one to two orders of magnitude too slow for interactive inference, while static obfuscation schemes break under multi-run statistical attacks once the model is known. We present GELO (Good-Enough LLM Obfuscation), a lightweight protocol for privacy-preserving inference that limits information leakage from untrusted accelerator observations by hiding hidden states with fresh, per-batch invertible mixing. For each offloaded projection, the TEE samples a random matrix $A$, forms $U = AH$, offloads $U$ and weights W to the accelerator, and then applies $A^{-1}$ on return, so that $A^{-1}((AH)W ) = HW$ and outputs are unchanged. Because mixing is never reused across batches, the attacker faces only a single-batch blind source separation problem. We analyse information leakage and introduce two practical defences: (i) non-orthogonal mixing to mask Gram matrices, and (ii) orthogonal mixing augmented with a small fraction of high-energy “shield” vectors that pollute higher-order statistics. On Llama-2 7B, GELO preserves float32 outputs exactly, closely matches low-precision baselines, offloads the dominant matrix multiplications with about 20-30% latency overhead, and defeats a range of ICA/BSS and anchor-based attacks.

💡 Research Summary

The paper introduces Good‑Enough LLM Obfuscation (GELO), a lightweight protocol designed to protect user privacy during inference of large language models (LLMs) when the computation is performed on shared accelerators that may be observed by an honest‑but‑curious adversary. The threat model assumes the attacker has full read access to the GPU’s memory and can see all data transferred to and from the accelerator, but the Trusted Execution Environment (TEE) is assumed to provide strong hardware isolation. GELO’s core idea is to generate a fresh, random invertible matrix A inside the TEE for each inference batch, multiply the hidden‑state matrix H on the left (U = A H), and then offload the mixed data U together with the model’s weight matrix W to an untrusted accelerator. The accelerator computes the projection Y = U W, returns Y to the TEE, which then applies A⁻¹ to recover the exact result Q = A⁻¹ Y = H W. Because A is never reused, the attacker faces only a single‑batch blind source separation (BSS) problem, eliminating cross‑batch statistical leakage.

The authors identify that if A is orthogonal (preferred for performance because A⁻¹ = Aᵀ), the attacker can recover the hidden‑state covariance HᵀH from UᵀU and the eigen‑spectrum from U Uᵀ, which may be undesirable. To mitigate this, two defenses are proposed: (1) use a non‑orthogonal, well‑conditioned random matrix A, which masks Gram matrices at the cost of an O(n³) inversion per batch; (2) retain an orthogonal A but pad the batch with k high‑energy “shield” vectors S, forming H_full =

Good-Enough LLM Obfuscation (GELO)

💡 Research Summary

Comments & Academic Discussion

Leave a Comment