For publishers
A one-line addition to an existing publishing workflow that attaches a signed receipt to a release.
No pipeline rewrite. No new infrastructure.
The U.S. Census Bureau, Wikipedia, the IRS, and the UN’s refugee agency publish public statistics under a mathematical guarantee called differential privacy. The math is real. The catch is that you have to take it on faith. dpdap turns “trust us” into a signed receipt anyone can verify — and, for releases that publish a fresh draw each time, a probe that tests the claim against the data.
You read in the news that “1 in 5 people in your county has diabetes.” That number probably came from a public release. The agency that published it likely added a tiny amount of statistical noise — the true count might have been 18,213 cases and the published number 18,247 — so no individual person could be singled out from the data. That’s differential privacy.
Today, you have no way to check whether the protection was applied correctly, or how strong it was. The institution publishes a number; you trust them.
dpdap attaches a small, signed “privacy receipt” to every release — recording who published it, what protection was used, and how strong it was. Anyone can verify the signature in seconds; for releases that publish a fresh draw every reporting interval — streaming telemetry, daily-refresh statistics — the receipt’s noise claim can also be tested against the data. dpdap reads disclosures from the OpenDP Deployment Registry schema — the open-source library and registry maintained by Harvard’s OpenDP project and used in production by the IRS, the Wikimedia Foundation, and UNHCR — and aligns with the federal evaluation guidance in NIST SP 800-226.[1]
Differential privacy was formalized in 2006 by Cynthia Dwork and collaborators.[2] The idea is direct: when you publish a statistic, add a small amount of carefully calibrated random noise to it. The noise comes from a known distribution — typically Laplace or Gaussian, bell curves centered on zero — so the published count might be a few units higher or lower than the true count, with the size of the wobble pinned to a known number. Done right, the noise is enough to hide any one person’s contribution while leaving the overall pattern intact.
The strength of the protection is measured by a small Greek letter, ε (epsilon). Smaller ε means more noise and stronger privacy; larger ε means less noise and weaker privacy. It is called a privacy budget because every query an institution publishes from a dataset spends some of it; once the total is exhausted the institution must stop publishing or accept that further releases erode the guarantee. There is no universal “right” value, but the literature has rough reference points: academic work that calls a release “strongly private” usually means ε ≤ 1; mainstream production deployments sit somewhere between ε ≈ 1 and ε ≈ 10; the 2020 U.S. Census release was published at the high end at ε ≈ 19.61, which several academics criticized as too weak.[3] Apple’s on-device keyboard analytics report per-event ε in a roughly comparable range, with the per-day budget rolled up daily.[4]
Differential privacy is rigorous in a way that older approaches — like “remove the names and date of birth” — are not. It comes with a mathematical proof, not a hopeful intuition. That proof is the thing dpdap is built to let you verify.
The premise is straightforward. An institution’s release arrives accompanied by a signed receipt that says, for example, “I added Laplace noise calibrated to ε = 1.0.” dpdap’s job is to check whether the noise actually applied matches that claim.
It does this by running the underlying release process many independent times (in synthetic mode, against either the real aggregator or a controlled simulation), collecting the noise samples, and running standard statistical tests:
Every probe run also publishes a sample-adequacy note: the smallest noise-scale drift the run could have caught with 80% power at this sample size. If the test could only have caught a 2× under-noising and the data came back consistent with the claim, the report says so — no false confidence about what the data resolution actually allows.
What the probe tests is a parametric claim: “this output is a draw from a known noise distribution at a stated scale.” The probe does not certify that the underlying mechanism is differentially private. It tests whether the data looks like the declaration. Anything more would be an overstatement of what a black-box check can do.
The probe never reports “Pass.” It reports Inconclusive (the data is consistent with the claim), Failed (the data is not), or Skipped (an output is declared as a public invariant, or a post-processing step has altered the noise distribution so the tests don’t apply). This is not pedantic: a probe can falsify a privacy claim, but cannot prove one. Anyone telling you their tool proves DP correctness is selling you something.
In practice, the probe is naturally suited to streaming aggregates and telemetry — DAP/Prio3 pipelines, daily-refresh sketches, anything that publishes a fresh draw every reporting interval. For one-shot statistical releases such as a decennial census file, only the signed receipt applies; the probe needs samples the release does not provide.
400 batches of synthetic reports run through dpdap’s probe — the same routine an auditor would run against a live release.
This stopped being academic a while ago. In the last two years:
The math is being deployed. The receipts are missing.
In 1995, every website was http://. There was no
lock icon. You had no way to know if your bank’s login
page was actually your bank’s login page. People knew
the encryption math worked — but it was invisible at
the boundary where a normal person made a decision.
Today every site you visit shows you a small lock. You probably haven’t thought about it in years. That happened because TLS got standardized at the IETF, certificate authorities like Let’s Encrypt made the certificates free, and browsers wired the result into the address bar. The math, the protocol, and the user-visible signal all had to ship together.
Differential privacy is in 1995’s position. dpdap is the lock icon.
A one-line addition to an existing publishing workflow that attaches a signed receipt to a release.
No pipeline rewrite. No new infrastructure.
A one-line check (or a click in your browser) that validates a published statistic against its receipt.
Same check, in a web page, for non-technical readers.
Differential privacy is in production. The U.S. Census Bureau built its 2020 release on it and committed to using it for the next decade. The Wikimedia Foundation publishes reader analytics under it. The IRS uses it for Statistics of Income releases. The UN refugee agency is piloting it on microdata. Apple, Google, Mozilla, and Cloudflare all use it for usage telemetry — the measurements an app or browser sends back to its maker about how it is being used. Most of these institutions build on the open-source OpenDP toolkit, which absorbed Tumult Analytics and Tumult Core in October 2025.[5] The mathematics layer is solved.
Several active research and standards efforts are converging on the missing disclosure layer:
dpdap is the signature, verification, and empirical conformance layer on top of these. The receipt format borrows from the disclosure-label work. The probe makes those claims testable. A consumer-side verifier — eventually compiled to WebAssembly — makes the testing accessible to anyone with a browser.
dpdap v0.2 is a Rust workspace of seven crates with 122 tests and continuous integration on Linux and macOS. Licensed Apache-2.0.
JanusAdapter speaks
draft‑ietf‑ppm‑dap‑17
with HPKE and Prio3Sum, enabling probing against real
DAP deployments over HTTP.
abi3‑py39): probe_mock(),
verify_receipt(),
generate_keypair(), sign_receipt().
Three draft IETF issue write-ups — underspecified receipt format, absent conformance-test guidance, budget-binding ambiguity in draft‑thomson‑ppm‑dap‑dp‑ext — are ready to file with working test cases attached.
dpdap is about differentially-private aggregate releases: public statistics, DAP-style measurements, census-like tables, and other scalar or tabular outputs where the receipt names a noise mechanism and the probe can test the distribution of repeated releases.
modelreceipt carries the same public-verifiability idea to DP model releases: DP-SGD trained LLMs, synthetic datasets sampled from those models, and downstream models trained on the synthetic data. The core relationship is the same — signed receipts plus empirical probes — but the technical surface is different: privacy units, accounting assumptions, model artifacts, canary audits, extraction probes, and synthetic-data composition.
If you cover privacy, AI, or civic technology — or you work on a release pipeline that publishes differentially private statistics and have an opinion about what a verifiable receipt should look like — I’d like to talk. The differential-privacy era is here, the field is still running on faith, and the design choices in front of us will outlast a lot of louder news cycles.
For readers who want to go deeper. None of these are required to use dpdap; they are the intellectual ground the project stands on.
I am James Dreben. I studied computer science and machine learning at Harvard. In 2017, my senior year, I took Professor Cynthia Dwork’s graduate seminar in cryptography and privacy — she co-invented differential privacy — and I wrote my final paper on a public-data version of differentially private mobility modeling. I’ve spent the years since as a software engineer across AI, site reliability, and full-stack web work.
The framing matters more after the generative-AI boom than it did when I first encountered it. Powerful data-driven systems are now routine, and the realistic options for deploying them have collapsed to two unattractive ones: ship the system as a black box and accept that it will quietly leak details about the people in its training data, or refuse to ship it and forgo the capability. Differential privacy is the third option. Receipts are what make the third option legible to everyone outside the building.