modelreceipt

Differential privacy for model releases

Make private-model claims inspectable outside the building.

DP-trained models and synthetic-data pipelines need more than a sentence in a model card. They need signed receipts that say exactly what was claimed, and probes that can falsify weak claims under a stated threat model.

release DP-trained model or synthetic-data pipeline
receipt signed claim with mechanism, unit, accounting, and scope
probe empirical evidence that can falsify weak claims
outside read the claim, check the signature, inspect the assumptions
privacy unit
person, document, session, or record
claim boundary
training, synthetic data, downstream model
testable surface
canary audit, extraction probe, leakage test

The bad binary

Powerful data-driven systems are now routine. Without a rigorous privacy layer, deploying them often means shipping a black box and accepting that it may leak details about people in the training data. Refusing to deploy avoids that risk, but forfeits the capability.

Differential privacy is the third option: a quantified bound on how much one person's data can influence the released artifact. But the bound only helps the outside world if the claim is specific enough to inspect.

How this relates to dpdap

dpdap is the working sibling project for differentially-private aggregate releases: census-like tables, scalar statistics protected with Laplace, Gaussian, or discrete-Laplace noise, and Distributed Aggregation Protocol (DAP) telemetry — the IETF-track system Mozilla and Cloudflare use to gather device metrics without any single server seeing an individual contribution.

modelreceipt carries the same public-verifiability thesis to model releases. The receipt still records a claim and the probe still falsifies weak claims, but the operating point shifts: DP-SGD training runs, model weights, hosted model APIs, synthetic datasets, and downstream models trained on those samples. The math is still differential privacy; the receipt fields and the probe techniques are not the same. Canary-extraction and membership-inference audits replace KS-against-Laplace; the OpenDP deployment card needs new fields for the synthetic-data chain.

What modelreceipt is trying to define

Receipt

A signed, machine-readable record of the model artifact, privacy unit, neighboring relation, DP mechanism, accounting method, synthetic-data chain, and empirical evaluation protocol.

Probe

An empirical test that can falsify an exposed claim: canary membership audit, extraction-oriented memorization probe, or synthetic-data leakage test, depending on the release.

Verifier

A small consumer-side tool that checks signatures, parses the receipt, names the assumptions, and reports whether probe evidence is consistent with the claim.

First-month research plan

  1. Receipt field map

    Use VaultGemma as the first concrete target. Extract the fields needed to inspect a DP model claim and compare them with the OpenDP deployment-card structure.

  2. Probe operating envelope

    Separate canary audits, extraction probes, and synthetic-data leakage tests. Decide what the first external probe can actually falsify.

  3. Synthetic-data chain profile

    Define the receipt shape for generated datasets and downstream models trained on them, including the assumptions needed for post-processing to carry the privacy guarantee.

  4. Scope-of-validity

    Write the critique before writing code: what the project can verify, what it can falsify, and which claims would be overreach.

What this will not claim

A receipt does not make a privacy claim true. It binds a publisher to a claim. A probe does not prove differential privacy. It can fail bad claims under stated assumptions. A DP guarantee does not mean a model can never emit memorized text; it bounds individual influence under a neighboring-dataset definition.

The work starts with research because the hard part is not serializing JSON. The hard part is saying exactly which claims are externally meaningful for DP-trained models and private synthetic-data pipelines.