James Dreben  ·  back to post

Pre-registered empirical privacy for DP model releases

James Dreben  ·   ·  revised  ·  earlier versions  ·  Draft for community review

This is the proposal behind Two numbers from VaultGemma. It is intentionally smaller than a full specification. The point is to separate three things that are easy to blur together:

  1. the formal differential-privacy claim;
  2. the empirical tests the publisher specified before seeing results;
  3. later audits or probes run by outside parties.

The claim is not that pre-registration proves privacy. It does not. The claim is that pre-registration makes empirical privacy evidence easier to interpret.

The disclosure gap

Differentially-private model releases usually make two kinds of claim.

The first is formal: under stated assumptions, the training mechanism satisfies some version of (ε, δ)-DP. That claim needs the privacy unit, neighboring relation, mechanism, accountant, and composition boundary. Without those details, an epsilon value is not very meaningful.

The second is empirical: the publisher ran some memorization, extraction, or membership-inference test and reports what happened. Those results can be useful. They can also be hard to compare if the protocol was chosen after seeing the model.

VaultGemma is the motivating example. Google reported no detectable memorization under its discoverable-extraction test. Diwan, Wang, and Alabi later reported memorization under a more adversarial extraction protocol. Both results may be true. The missing piece is a public record of which empirical protocols the publisher considered binding in advance.

The proposal

At release time, the publisher signs a structured receipt with three sections.

1. Formal claim. The receipt records the DP definition, epsilon, delta, privacy unit, neighboring relation, mechanism, accountant, and claim boundary.

2. Reproducibility details. For DP-SGD releases, this includes implementation details such as clipping norm, sampling model, sampling rate, gradient-normalization convention, accountant library, and pinned artifact version. These are not exotic requests. They are the facts an outside expert needs before checking the accounting.

3. Pre-specified empirical protocols. The publisher declares one or more protocols it considers meaningful evidence for named privacy claims. Each protocol should state the threat model, sample construction, query budget, decision threshold, and, where applicable, the statistical method that maps audit output to an empirical privacy bound.

An auditor who runs a declared protocol emits a second signed document: a probe receipt. It identifies the publisher receipt, states the protocol version, records deviations, and reports the result class: formal-audit-lower-bound, leakage-evidence, inconclusive, or not-applicable.

The verifier displays the chain. It should not say “passed.” It should say what was claimed, what protocol was run, what evidence was attached, and what assumptions limit the result.

How this relates to current auditing work

This proposal does not compete with privacy-auditing research. It gives that research a disclosure instrument.

The important distinction is result class. A memorization finding may matter a lot, but it is not automatically a formal DP violation.

What this is not

This is not a certification scheme. A signed receipt records what the publisher claimed; it does not make the claim true.

It is not a privacy proof. Empirical audits can produce evidence and, for some protocols, empirical lower-bound estimates. They cannot produce the publisher’s formal upper bound.

It is not a privacy score. A coverage summary may be useful, but it should remain a list of surfaces and protocols, not a single number.

It is not a substitute for accounting review. The receipt makes the inputs legible. The review still has to be done.

Open questions

Several questions remain open.

Name. “Pre-registration” is clear to readers familiar with clinical trials or experimental psychology. It may sound wrong to engineers. “Protocol binding” or “declared-protocol disclosure” may be better.

Post-hoc auditing. The strongest formal audits for DP-SGD still need training-time cooperation. For from-scratch DP pretraining with only released weights available, the external post-hoc operating point is much thinner.

Adoption. The mechanism is useful if even one publisher signs a receipt. Broader uptake would require pressure from deployers, reviewers, regulators, or peer convention.

Synthetic-data chains. DP-SGD generator → synthetic dataset → downstream model is a chain of claims. Each stage likely needs its own receipt and its own assumptions.

Sources