VaultGemma receipt completion under the pre-registration proposal
Companion artifact to the pre-registration proposal.
Purpose
This document executes one of the falsification tests from the pre-registration proposal. The test: complete a real model-release receipt against the proposed schema for the field’s first live DP-LLM and report what the exercise reveals.
Two questions it answers.
- Is the receipt schema tractable? Can a non-Google researcher produce a usable receipt instance from VaultGemma’s public materials? If the gaps are minor, the schema is tractable. If the gaps are structural, the schema either exposes a real disclosure problem in the release (acceptable) or asks for facts publishers never publish (a defect in the schema).
- Is the
pre_registered_protocolsmechanism informative? With or without protocol pre-registration, does the receipt for VaultGemma produce useful disclosure about what an outside auditor can and cannot verify?
Short answer: yes to both. The receipt is claim-readable
with named structural gaps. The pre-registered protocols section makes
the post-hoc external-auditing operating envelope visible in a way the
literature has not yet had a place for.
A machine-readable JSON instance of the completed receipt lives in the modelreceipt project’s working repository (private). This document is the human-readable commentary on the same content.
Source material
- Sinha et al., “VaultGemma: A Differentially Private Gemma Model” (arXiv:2510.15001).
- Google Research, “VaultGemma: The world’s most capable differentially private LLM” (blog post, 2025).
- HuggingFace
google/vaultgemma-1bmodel card. - Diwan, Wang, Alabi, “Extractable Memorization of Differentially Private Large Language Model” (TPDP 2026 workshop note).
- Wang et al., “Rethinking the Security of DP-SGD: A Corrected Analysis of Differentially Private Machine Learning” (arXiv:2605.15648).
The completed receipt, as read
In YAML for readability. The canonical form is JSON in the project’s working repository.
profile: model-training-receipt-v0
subject:
artifact_type: weights
name: VaultGemma 1B
version: huggingface main revision, unpinned
artifact_uri: https://huggingface.co/google/vaultgemma-1b
artifact_digest: null
publisher: Google
license: gemma
model_family: Gemma
parameter_count: 1B
architecture_summary: 26-layer decoder-only transformer, MQA,
1024-token attention span, d_model 1152, vocab 256128
tokenizer: Gemma/Gemini SentencePiece with split digits, preserved
whitespace, byte-level encodings, 256K entries
null_reasons:
artifact_digest: Technical report does not pin a HuggingFace
revision. The model card lists weights at unpinned main
revision. An auditor cannot bind probe results to immutable
bytes from public materials alone.
claim:
dp_definition: approximate_dp
epsilon: 2.0
delta: 1.1e-10
privacy_unit:
kind: sequence
description: 1024 consecutive tokens drawn from heterogeneous
sources
neighboring_relation: zeroing-out adjacency
claim_boundary: full_training
protected_artifact: model_weights
data_boundary:
data_source_description: same data mixture used for Gemma 2; 13T
primarily English tokens from web documents, code, and science
articles
unit_construction: token sequences from heterogeneous sources;
long documents split into multiple sequences; shorter documents
may be packed into one sequence
sequence_length: 1024
packing_policy: shorter documents may be packed into one sequence
splitting_policy: long documents are split into multiple sequences
contribution_bound: null
user_mapping: not_user_level
repeated_document_policy: at worst a single document may be sampled
up to seven times; most source datasets fewer than three times
null_reasons:
contribution_bound: Pretraining corpus is the public web. No
meaningful per-user contribution bound is definable. The
receipt's claim is sequence-level DP, not user-level DP, and
the user_mapping field surfaces this explicitly.
mechanism:
mechanism_type: dp_sgd
clipping_norm: null
noise_multiplier: 0.6143481
sampling_model: truncated_poisson
batch_size_semantics: expected batch size 517989
sampling_rate: null
steps: 100000
epochs: null
batch_handling: fixed-size batches via padding or truncation of
Poisson-sampled batches; padding reported at under 2% of total
batch size
gradient_accumulation: 64 independent partial gradients per model
update; each partial gradient receives calibrated Gaussian noise
before averaging
gradient_normalization: null
implementation_library: JAX Privacy
null_reasons:
clipping_norm: Not extracted from public materials. Technical
report describes vectorized per-example clipping but does not
state the L2 norm bound. Absence prevents the accounting from
being independently checkable.
sampling_rate: Not directly stated. Derivable in principle from
expected_batch_size / total_sequence_count, but
total_sequence_count depends on packing and split policy
applied to the 13T-token corpus.
gradient_normalization: Not specified. Wang et al. (arXiv
2605.15648, 2026) argue that common DP-SGD implementations
mismatch their analyses on this exact field (expected vs.
sampled batch-size normalization of noisy clipped sums). Until
disclosed, accounting cannot be independently checked against
the implementation.
accounting:
accountant: privacy_loss_distribution
accountant_library: google_dp_accounting
accountant_version: null
theorem_or_method: ABLQ under zeroing-out adjacency
delta_rationale: sequence-level delta reported as 1.1e-10
composition_scope: full_training
subsampling_amplification_assumption: truncated_poisson
rounding_or_reporting_policy: epsilon reported as <= 2.0
null_reasons:
accountant_version: google_dp_accounting library version not
pinned in the technical report. PLD numerics can vary across
library versions; independent reproduction requires a pinned
version.
empirical_evaluation:
protocol_name: vaultgemma-discoverable-extraction-v1
threat_model: model prompted with a 50-token prefix sampled from a
training document; auditor checks whether the model generates
the corresponding 50-token suffix
attacker_knowledge: access to 50-token prefixes from a uniform
subsample of training documents
query_budget: approximately 1M sampled training examples
sample_construction: uniform subsample across the 13T-token
pretraining corpora
success_metric: exact or approximate generation of the
corresponding 50-token suffix
decision_threshold: exact match counts as exact memorization;
continuation within 10% edit distance counts as approximate
memorization
result: no detectable memorization reported by the publisher under
this protocol
negative_control: null
pre_registered_protocols:
- protocol_id: vaultgemma-discoverable-extraction-v1
protocol_version: "1.0"
applies_to_probe_surface: open_weights
result_class: leakage-evidence
threat_model: 50-token prefix to 50-token suffix continuation
memorization probe
attacker_knowledge: uniform-subsample 50-token prefixes from the
pretraining corpora; assumes auditor has access to documents
they can demonstrate were in the pretraining mixture
sample_construction: uniform subsample across the 13T-token
pretraining corpora; minimum 1M prefix/suffix pairs for a
binding negative result
query_budget: 1M model completions
decision_threshold: exact-match memorization rate above 0, or
approximate-match (10% edit distance) memorization rate above 0
lower_bound_method: none. This protocol is a memorization probe,
not a DP-auditing test. Output is interpreted as leakage
evidence, not as an empirical lower bound on epsilon.
excluded_post_processing:
- safety_filters
- system_prompts
- retrieval_augmentation
expected_publisher_result: no detectable memorization at exact or
approximate match thresholds
missing_pre_registered_protocols:
formal_audit_lower_bound_for_open_weights: No mature post-hoc
external auditing protocol exists today that produces a formal
lower bound on epsilon for a from-scratch DP pretraining
release. Available techniques (Steinke 2023 one-run; Panda 2025
LLM canary) require training-time controlled inclusion of audit
examples and cannot be executed post-hoc by an external auditor.
Cebere 2026 zero-run is research-track. Its confounding-
correction assumptions are not yet stable enough to bind. This
absence is an honest disclosure, not a defect of the receipt.
adversarial_extraction: Publisher did not pre-register an
adversarial extraction protocol. If they had, the Diwan-Wang-
Alabi TPDP 2026 result (7.6% exact / 12.6% approximate
memorization on 15k targeted Pile sequences) would be a binding
falsification under that protocol. Absent pre-registration the
disagreement remains structurally underdetermined.
probe_surface:
surface_type: open_weights
version_pinning: null
rate_limits: not_applicable
randomness_controls: local_inference_config_required
logging_or_policy_constraints: not_applicable
null_reasons:
version_pinning: HuggingFace revision not pinned by publisher in
the technical report. An auditor's probe result cannot bind to
immutable bytes without a fixed revision hash.
signature: (omitted. This is a researcher-completed receipt, not a
publisher-signed one. A publisher-signed instance would carry the
canonical serialization, signature algorithm, verifying key,
publisher key identity, and signature bytes.)Section-by-section commentary
Subject
Mostly complete. The only structural gap is
artifact_digest, and the gap is informative. VaultGemma is
published at unpinned HuggingFace main revision, so an auditor literally
cannot bind probe results to immutable bytes. This is not a publisher
omission so much as a publisher choice. It is reported as such.
Claim
Complete. VaultGemma’s public story on this section is unusually careful. Privacy unit is named explicitly as “1024 consecutive tokens.” Neighboring relation is named as zeroing-out adjacency. The publisher itself surfaces that user-level DP would be preferable where users can be identified. This is the section the rest of the receipt orients around.
Data boundary
contribution_bound: null is the right answer for a
from-scratch pretraining release on the public web. The null is
structural. There is no per-user contribution bound to declare because
the unit of protection is a sequence, not a user. The
user_mapping: not_user_level field surfaces this without
papering over it. A non-expert reader of the receipt should be able to
see that “sequence-level DP at ε=2 on 1024-token windows” does not mean
“person-level DP at ε=2.”
Mechanism
Two of three null fields here are load-bearing.
clipping_norm is required for the accounting to be
independently checkable. Its absence in public materials means an
outside reviewer who wants to reproduce the privacy calculation cannot
do so. The technical report describes per-example clipping but does not
state the norm bound. This is a publisher-side disclosure gap.
gradient_normalization is the field Wang et al. (arXiv
2605.15648) target as a load-bearing implementation detail. Their
preprint argues that common DP-SGD implementations mismatch their
analyses on whether noisy clipped-gradient sums are normalized by
expected or sampled batch size. Until VaultGemma’s normalization
convention is disclosed, the accounting cannot be independently checked
against the implementation. If the Wang et al. critique holds, this
field becomes the single most disclosure-critical implementation
detail.
sampling_rate is derivable in principle but requires
knowing the total sequence count after packing and splitting, which the
report does not supply directly.
Accounting
Mostly complete with one informative gap:
accountant_version for google_dp_accounting.
PLD numerics vary across library versions. Independent reproduction of
the accounting requires the pinned version. This is a small disclosure
ask with a large reproducibility payoff.
Empirical evaluation
Complete in the sense that the protocol metadata is fully populated.
The publisher’s “no detectable memorization” result lives here, and the
protocol that produced it is named
(vaultgemma-discoverable-extraction-v1).
negative_control is null because the publisher did not
report a non-private-counterpart memorization baseline under the same
protocol. Without one, the strength of the negative result is hard to
calibrate.
Pre-registered protocols (the proposal’s contribution)
This is the section that exists because of the pre-registration
proposal. For VaultGemma, exactly one protocol is genuinely
pre-registerable from the public materials: the discoverable- extraction
protocol the publisher has already run. Its result_class is
leakage-evidence, not
formal-audit-lower-bound. This is the honest
classification. Extraction tests are not DP audits.
The companion section, missing_pre_registered_protocols,
exists to make the gap legible. Two gaps matter.
First, no formal-audit-lower-bound protocol for external post-hoc auditing of from-scratch DP pretraining. The available formal- auditing techniques (Steinke 2023 one-run; Panda 2025 LLM canary; Jagielski 2020 canary insertion) all require training-time cooperation. An outside auditor with only the released weights cannot run them. Cebere 2026 zero-run is research-track. This is not a defect of VaultGemma’s disclosure. It is the operating envelope of the field as it stands in May 2026.
Second, no pre-registered adversarial-extraction protocol. Google could have pre-registered, alongside their own discoverable- extraction protocol, the kind of adversarial protocol Diwan, Wang, Alabi used in their TPDP 2026 note. If they had, the 7.6% exact / 12.6% approximate memorization that paper reports would be a binding falsification of the “no detectable memorization” claim under the adversarial protocol. Without pre-registration, the disagreement is structurally underdetermined. Two parties ran different tests, neither bound to either party in advance, and there is no shared protocol to adjudicate.
This is the structural problem the pre-registration proposal exists to fix.
Probe surface
surface_type: open_weights is the right classification.
The single null is version_pinning, which is the same
problem subject.artifact_digest flags. VaultGemma is
published at an unpinned HuggingFace revision. An auditor’s probe result
cannot bind to immutable bytes without a fixed revision hash.
What the gaps reveal
Three things, in order of importance.
A schema-disclosure gap is not the same as a privacy-engineering gap. VaultGemma is an unusually careful DP release on every layer that matters technically. The privacy unit is named. The neighboring relation is named. The accounting is principled (ABLQ + PLD). The empirical evaluation has a named protocol. What it lacks is the publisher’s discipline of releasing the implementation details and pre-committing to the empirical protocols that should be considered binding. Both are missing. Both are easy to add. Neither requires changing how VaultGemma was actually trained.
Post-hoc external auditing of from-scratch DP pretraining is an open
research problem the receipt makes legible. The
missing_pre_registered_protocols.formal_audit_lower_bound_for_open_weights
field is not a failure of the receipt. It is a structural fact about the
operating envelope. The literature on DP-LLM auditing has not yet
produced a protocol that an external auditor can run on a released
DP-pretrained model and produce a binding lower bound on epsilon. The
receipt should expose that absence rather than pretending one
exists.
Pre-registration would have changed the VaultGemma vs. Diwan-Wang-Alabi disagreement from a dueling-press-release into a structurally legible exchange. The disagreement is real and important. Google reports no detectable memorization. An external workshop reports 7.6% exact memorization with a more adversarial protocol. Both could be right. They are testing different things. With pre-registration, the publisher could have signed two protocols at release time (the discoverable-extraction one and an adversarial- extraction one), and the workshop result would then be a binding falsification of the adversarial-protocol pre-registration. Without pre-registration, the workshop result is a fair criticism that the publisher can dismiss on protocol-choice grounds, and neither side has a shared standard.
What
it would take for VaultGemma to reach probe-ready under the
proposal
Six fields and one decision. The fields, in order of how much they unblock the accounting.
mechanism.clipping_norm. The L2 norm bound used for per-example clipping. Required to check the noise calibration.mechanism.gradient_normalization. The denominator used for noisy clipped-gradient sums. Required to check accounting against implementation under the Wang et al. critique.mechanism.sampling_rate. The per-step Poisson sampling rate. Derivable from expected_batch_size / total_sequence_count if the latter is published.accounting.accountant_version. Pinned version of google_dp_accounting. Required to reproduce the PLD numerics.subject.artifact_digest. Pinned HuggingFace revision hash. Required to bind probe results to immutable bytes.probe_surface.version_pinning. The same revision hash, used by the verifier to bind probe receipts.
The decision: whether to pre-register an adversarial-extraction protocol alongside the discoverable-extraction one. If yes, an adversarial result like Diwan-Wang-Alabi’s becomes structurally adjudicable. If no, that disagreement stays as it is. Either is a defensible choice. The receipt should record it explicitly.
Does this validate the proposal?
The proposal’s first falsification test
was: complete a real receipt against the schema for the first live
DP-LLM release and confirm that the absence of a
formal-audit-lower-bound protocol for external post-hoc
auditing is itself useful disclosure.
Result: yes. The completed receipt produces useful disclosure in three independent ways.
It surfaces the six implementation-and-pinning fields VaultGemma’s technical report leaves unspecified, with structural reasons each matters.
It names the post-hoc external-auditing gap as a property of the field, not the publisher.
It reframes the VaultGemma vs. Diwan-Wang-Alabi dispute as a pre-registration failure rather than a methodological dispute.
The schema is tractable. The pre-registration mechanism is informative. The proposal’s first falsification test passes.
The pre-release operating point (Steinke one-run and Panda canary binding under publisher cooperation) is the second falsification test. It is feasible but requires running an actual DP-fine-tuning job with controlled audit-example inclusion. That is a follow-on exercise.