VaultGemma receipt completion under the pre-registration proposal

James Dreben · 2026-05-19 · revised 2026-05-21 · earlier versions · Companion to the pre-registration proposal

Companion note to the pre-registration proposal and Two numbers from VaultGemma.

This note asks whether the proposed disclosure structure tells us anything useful when applied to a real DP-trained model release.

What the exercise tests

Two questions matter.

First, is the receipt idea tractable? Can an outside reader take VaultGemma’s public materials and fill in the claim, mechanism, accounting, artifact, and empirical-evaluation fields without guessing?

Second, does the pre-registered-protocol section add useful signal? In particular, does it make clear which empirical results are leakage evidence, which are formal DP audits, and which operating points are not yet covered by mature public protocols?

Short answer: yes, with important limits. VaultGemma’s formal claim is readable, but several fields needed for independent accounting review or stable probe comparison are not public.

What is clear from public materials

VaultGemma’s core privacy claim is unusually explicit for a public model release. The technical report describes a from-scratch 1B parameter language model trained with DP-SGD. The public claim is approximately:

ε ≤ 2.0;
δ = 1.1e-10;
sequence-level privacy for 1024-token sequences;
zeroing-out adjacency;
full-training boundary;
DP-SGD with privacy-loss-distribution accounting.

That is meaningful disclosure. It also matters that the claim is sequence-level, not user-level. A reader should not infer “person-level privacy at ε=2” from this release.

What is missing

Six disclosure-grade fields remain hard to recover from public materials.

Clipping norm. Needed to check the relationship between clipping and noise calibration.
Gradient-normalization convention. Needed because DP-SGD accounting can depend on whether noisy clipped gradients are normalized by expected or sampled batch size.
Sampling rate. Derivable in principle, but only if the final sequence count after packing and splitting is known.
Accountant version. Needed to reproduce privacy-loss-distribution numerics against the same library implementation.
Pinned model revision or artifact digest. Needed to bind later probe results to immutable model bytes.
Pinned probe surface. Needed so a probe receipt names exactly what was tested.

These are not exotic requests. They are the facts an outside expert would need before checking the accounting or comparing probe results cleanly. None of them requires changing how the model was trained.

What the empirical protocol says

Google’s discoverable-extraction test is a good candidate for a declared empirical protocol. It asks whether the model continues a 50-token training prefix with the next 50 tokens from the same training document. Google reported no detectable memorization under that test.

In the proposed receipt vocabulary, that result should be classified as leakage-evidence, not formal-audit-lower-bound. That distinction is the main point. A memorization probe can reveal serious leakage, but it does not by itself produce a formal lower bound on ε.

The external audit gap

The strongest formal auditing methods for DP-SGD still generally need training-time cooperation. Steinke, Nasr, and Jagielski’s one-run audit and Panda, Tang, Nasr, Choquette-Choo, and Mittal’s LLM canary work are important, but they require controlled audit examples before training or fine-tuning. An outside auditor with only released weights cannot run those protocols after the fact.

Zero-run and post-hoc observational audits are promising, but not yet a stable public standard for from-scratch DP pretraining. That is not a defect in VaultGemma. It is a limitation of the current audit surface, and it should be visible in the receipt rather than hidden.

What this changes about the VaultGemma comparison

Google and the Illinois workshop paper measured different things. Google reported no detectable memorization under a discoverable extraction protocol. Diwan, Wang, and Alabi reported memorization under a more adversarial extraction protocol.

Both results may be true. The difficulty is that no adversarial protocol was declared in advance. If one had been, the later result would be easier to interpret: either as evidence under a protocol the publisher had already said mattered, or as exploratory evidence under a new protocol.

That is the value of pre-registration here. It does not decide the scientific disagreement by itself. It makes the protocol choice visible before results are known.

Bottom line

The receipt exercise is useful because it keeps the layers separate. VaultGemma’s formal DP claim is readable. Several reproducibility and pinning fields are missing. The discoverable-extraction result is useful leakage evidence, not a formal DP audit. External post-hoc auditing for this kind of model remains thin.

That is enough to justify the proposal’s core idea: public DP model claims should bind formal privacy claims, implementation disclosures, and empirical protocols in one inspectable record.

Source material

Sinha et al., “VaultGemma: A Differentially Private Gemma Model” (arXiv:2510.15001).
Google Research, “VaultGemma: The world’s most capable differentially private LLM” (blog post, 2025).
HuggingFace google/vaultgemma-1b model card.
Diwan, Wang, Alabi, “Extractable Memorization of Differentially Private Large Language Model” (TPDP 2026 workshop note).
Steinke, Nasr, Jagielski, “Privacy Auditing with One (1) Training Run” (arXiv:2305.08846).
Panda, Tang, Nasr, Choquette-Choo, Mittal, “Privacy Auditing of Large Language Models” (arXiv:2503.06808).
Cebere, Even, Bleistein, Bellet, “Privacy Auditing with Zero (0) Training Run” (arXiv:2605.14591).
Wang et al., “Rethinking the Security of DP-SGD: A Corrected Analysis of Differentially Private Machine Learning” (arXiv:2605.15648).