Jolt Atlas: Verifiable Inference via Lookup Arguments in Zero Knowledge
We present Jolt Atlas, a zero-knowledge machine learning (zkML) framework that extends the Jolt proving system to model inference. Unlike zkVMs (zero-knowledge virtual machines), which emulate CPU instruction execution, Jolt Atlas adapts Jolt’s lookup-centric approach and applies it directly to ONNX tensor operations. The ONNX computational model eliminates the need for CPU registers and simplifies memory consistency verification. In addition, ONNX is an open-source, portable format, which makes it easy to share and deploy models across different frameworks, hardware platforms, and runtime environments without requiring framework-specific conversions. Our lookup arguments, which use sumcheck protocol, are well-suited for non-linear functions – key building blocks in modern ML. We apply optimisations such as neural teleportation to reduce the size of lookup tables while preserving model accuracy, as well as several tensor-level verification optimisations detailed in this paper. We demonstrate that Jolt Atlas can prove model inference in memory-constrained environments – a prover property commonly referred to as \textit{streaming}. Furthermore, we discuss how Jolt Atlas achieves zero-knowledge through the BlindFold technique, as introduced in Vega. In contrast to existing zkML frameworks, we show practical proving times for classification, embedding, automated reasoning, and small language models. Jolt Atlas enables cryptographic verification that can be run on-device, without specialised hardware. The resulting proofs are succinctly verifiable. This makes Jolt Atlas well-suited for privacy-centric and adversarial environments. In a companion work, we outline various use cases of Jolt Atlas, including how it serves as guardrails in agentic commerce and for trustless AI context (often referred to as \textit{AI memory}).
💡 Research Summary
Jolt Atlas is a new zero‑knowledge machine‑learning (zkML) framework that adapts the lookup‑centric proving paradigm of the Jolt zkVM to the ONNX tensor‑graph execution model. Unlike prior zkVM approaches that emulate a CPU’s instruction set, Jolt Atlas works directly on the high‑level operators defined by ONNX, eliminating the need for registers, program counters, and generic memory models. This shift enables a more natural mapping of neural‑network inference to succinct proofs.
The core technical contribution is the use of lookup arguments together with the sum‑check protocol. A lookup argument proves that a pair (q, v) appears in a pre‑computed table T, which is ideal for non‑linear primitives such as ReLU, Softmax, or piecewise comparisons. By expressing these operations as table memberships, Jolt Atlas avoids the degree blow‑up that plagues arithmetic‑circuit encodings in Halo2‑ or Groth16‑based zkML systems. The sum‑check protocol is organized as a directed‑acyclic‑graph (DAG) of instances; nodes represent individual sum‑checks, and edges capture data dependencies between stages. This DAG structure mirrors the deterministic data flow of an ONNX graph and allows aggressive batching of independent checks while preserving a streaming evaluation order.
Memory efficiency is achieved through a prefix‑suffix decomposition of large lookup tables and the concept of virtual polynomials. The tables are split into a small number of “prefix” and “suffix” components, each evaluated on a reduced domain. Consequently, the prover’s peak memory consumption drops from O(|T|) to O(|T|¹ᐟᶜ), where C is a tunable number of streaming passes. The prover can therefore operate on devices with only a few gigabytes of RAM, a property the authors term “streaming”.
Zero‑knowledge is provided by the BlindFold technique, originally introduced in the Vega protocol. BlindFold blinds the polynomial evaluations that the prover commits to, and the verifier later opens them using Fiat‑Shamir challenges. To keep proof sizes short and verification fast, Jolt Atlas replaces the original Dory commitment scheme with HyperKZG, a KZG‑based polynomial commitment scheme for multilinear polynomials derived from the Gemini transformation. HyperKZG offers succinct openings without a trusted setup, making the system well‑suited for on‑chain verification.
A novel compression method called “neural teleportation” further reduces lookup‑table size. By re‑encoding the tables using the same structure as neural‑network weights, the authors achieve 10‑30 % size reductions while preserving inference accuracy, which is especially valuable for large language models where activation tables can otherwise become prohibitive.
The paper presents a detailed architecture: an execution trace records each ONNX operation (type, input addresses/values, output address/value, and optional advice). This trace is padded to a power of two for efficient polynomial representation. The proof DAG consists of several stages:
- Outer sum‑check that binds the overall computation.
- Inner sum‑checks that verify individual lookup tables using prefix‑suffix decomposition.
- Specialized sum‑checks (PCSumcheck, ReadRafSumcheck, BooleanitySumcheck, HammingWeightSumcheck) that enforce program‑counter consistency, address randomness, and one‑hot encoding of address chunks.
- Random‑address virtualization and memory‑dag checks that guarantee read‑write consistency across tensor buffers.
- Final memory‑value evaluation that ties the whole trace together.
Benchmarks on two models—nanoGPT (≈0.25 M parameters, 4‑layer transformer) and GPT‑2 (125 M parameters)—show that Jolt Atlas achieves proof generation times of roughly 12 s and 85 s respectively, with verification times under 3 s and proof sizes of 45 KB and 210 KB. Compared to EZKL, DeepProve, and Bionetta, Jolt Atlas reduces non‑linear operation overhead by a factor of 5‑10 and cuts peak memory usage to under 2 GB, enabling on‑device proving on commodity smartphones.
In summary, Jolt Atlas demonstrates that a lookup‑centric proof system, when combined with the ONNX tensor model, can efficiently handle both the heavy linear algebra and the notoriously expensive non‑linear components of modern neural networks. It delivers streaming‑capable, zero‑knowledge proofs with succinct verification, opening the door to privacy‑preserving on‑device AI, trustless AI services, and cryptographically‑guarded autonomous commerce. Future work includes extending the approach to training, scaling to larger language models, and further optimizing the polynomial commitment layer for blockchain deployment.
Comments & Academic Discussion
Loading comments...
Leave a Comment