The 2020 US Decennial Census is more private than you (might) think
Pith reviewed 2026-05-23 18:55 UTC · model grok-4.3
The pith
The 2020 Census delivers stronger privacy protections than its published guarantees at all eight geographical levels.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The 2020 U.S. Census disclosure avoidance system achieves stronger privacy than its nominal differential privacy guarantees at each of the eight geographical levels because the composition of the private queries can be tracked more tightly with f-differential privacy than the published bounds assume. Consequently the noise variances injected into the tabulations could be reduced by 15.08 percent to 24.82 percent while maintaining nearly the same level of privacy protection for every level.
What carries the argument
f-differential privacy composition of the private queries run across the eight geographical levels
If this is right
- Noise variances in the privatized census tabulations can be reduced by 15.08% to 24.82% at each geographical level while preserving the nominal privacy level.
- The accuracy of published census statistics increases as a direct result of the lower noise.
- Downstream applications that use the privatized data, such as regression studies of earnings and education, experience measurably less distortion from privacy noise.
Where Pith is reading between the lines
- Census designs for future decades could allocate their total privacy budget more efficiently by adopting the same tight composition analysis at the design stage.
- Other statistical agencies that release multi-level geographic data under differential privacy may be over-injecting noise for the same reason.
- Policymakers who rely on census counts for apportionment or funding formulas would receive higher-fidelity inputs if the identified noise reductions were applied.
Load-bearing premise
The composition of the private queries across the eight geographical levels can be tracked precisely with f-differential privacy without unaccounted dependencies or extra privacy costs arising from the specific structure of the Census Bureau's disclosure avoidance system.
What would settle it
A re-computation of the total privacy loss using the exact sequence of queries and the f-DP accountant that shows the realized privacy parameter is no stronger than the nominal published value would falsify the central claim.
Figures
read the original abstract
The U.S. Decennial Census serves as the foundation for many high-profile policy decision-making processes, including federal funding allocation and redistricting. In 2020, the Census Bureau adopted differential privacy to protect the confidentiality of individual responses through a disclosure avoidance system that injects noise into census data tabulations. The Bureau subsequently posed an open question: Could stronger privacy guarantees be obtained for the 2020 U.S. Census compared to their published guarantees, or equivalently, had the privacy budgets been fully utilized? In this paper, we address this question affirmatively by demonstrating that the 2020 U.S. Census provides significantly stronger privacy protections than its nominal guarantees suggest at each of the eight geographical levels, from the national level down to the block level. This finding is enabled by our precise tracking of privacy losses using $f$-differential privacy, applied to the composition of private queries across these geographical levels. Our analysis reveals that the Census Bureau introduced unnecessarily high levels of noise to meet the specified privacy guarantees for the 2020 Census. Consequently, we show that noise variances could be reduced by $15.08\%$ to $24.82\%$ while maintaining nearly the same level of privacy protection for each geographical level, thereby improving the accuracy of privatized census statistics. We empirically demonstrate that reducing noise injection into census statistics mitigates distortion caused by privacy constraints in downstream applications of private census data, illustrated through a study examining the relationship between earnings and education.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that precise f-differential privacy tracking of the composition of private queries in the 2020 Census Disclosure Avoidance System across eight geographical levels (national to block) shows that actual privacy loss is lower than the nominal guarantees, allowing noise variance reductions of 15.08% to 24.82% at each level while preserving nearly equivalent privacy, with an empirical demonstration that lower noise improves a downstream earnings-education regression.
Significance. If the f-DP accounting is shown to be tight for the deployed top-down DAS, the result would indicate that the Census Bureau could have released more accurate tabulations without exceeding its privacy budget, with measurable utility gains in policy-relevant analyses. The work directly addresses an open question posed by the Census Bureau and supplies concrete percentage reductions and a downstream case study.
major comments (2)
- [f-DP composition analysis (likely §4 or §5)] The central claim rests on the assertion that standard f-DP composition across the eight levels exactly bounds the privacy loss of the deployed DAS. However, the manuscript does not provide a formal argument or theorem showing that the top-down algorithm's consistency constraints, shared randomness, and post-processing steps introduce no additional privacy loss beyond the independent composition of the per-level mechanisms. If such terms exist, the reported 15-24% noise reduction would exceed the nominal budget.
- [Empirical evaluation section] The empirical demonstration in the earnings-education study uses the reduced-noise data but does not report the exact privacy parameters (f-DP curves or effective ε) achieved after the proposed variance reduction at each geographic level, making it impossible to verify that the privacy guarantee remains 'nearly the same' as the nominal one.
minor comments (2)
- Notation for the eight geographical levels and the mapping from queries to levels should be introduced earlier and used consistently.
- The abstract states precise tracking was performed but the main text should include a table or appendix listing the per-level query sets and their individual f-DP parameters before composition.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our f-DP analysis of the 2020 Census DAS. We address each major point below and will revise the manuscript accordingly to strengthen the formal justification and empirical reporting.
read point-by-point responses
-
Referee: [f-DP composition analysis (likely §4 or §5)] The central claim rests on the assertion that standard f-DP composition across the eight levels exactly bounds the privacy loss of the deployed DAS. However, the manuscript does not provide a formal argument or theorem showing that the top-down algorithm's consistency constraints, shared randomness, and post-processing steps introduce no additional privacy loss beyond the independent composition of the per-level mechanisms. If such terms exist, the reported 15-24% noise reduction would exceed the nominal budget.
Authors: The DAS applies independent Gaussian mechanisms at each geographic level whose privacy losses are tracked via f-DP composition; consistency constraints are enforced by post-processing of the noisy counts, which cannot increase privacy loss by the post-processing property. Shared randomness across levels is already folded into the per-level f-DP parameters before composition. We will add a short lemma (with proof) in §4 establishing that these implementation details introduce no extra loss beyond the composed mechanisms, thereby confirming the reported variance reductions remain within the nominal budget. revision: yes
-
Referee: [Empirical evaluation section] The empirical demonstration in the earnings-education study uses the reduced-noise data but does not report the exact privacy parameters (f-DP curves or effective ε) achieved after the proposed variance reduction at each geographic level, making it impossible to verify that the privacy guarantee remains 'nearly the same' as the nominal one.
Authors: We agree that explicit verification of the post-reduction privacy parameters is necessary. In the revised manuscript we will add a table in the empirical section listing the f-DP curves (or equivalent effective ε at δ=10^{-10}) at each of the eight levels both before and after the 15.08–24.82% variance reductions, confirming that the new parameters remain at or below the nominal guarantees. revision: yes
Circularity Check
No circularity: f-DP composition applied to external query structure
full rationale
The paper computes composed f-DP privacy loss from the Census Bureau's published query structure and noise scales across eight geographic levels, then compares the resulting effective privacy to the nominal published budgets. This uses standard external composition theorems for f-DP; no parameter is fitted to the target privacy loss, no self-citation supplies a uniqueness result, and no equation defines the output privacy loss in terms of itself. The 15-24% noise reduction claim follows directly from the computed gap between nominal and tracked loss. The derivation is therefore self-contained against external benchmarks and receives score 0.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption f-differential privacy composition rules apply directly to the sequence of queries released by the 2020 Census disclosure avoidance system
Reference graph
Works this paper leans on
-
[1]
MSE between the non-privatized 2010 Census Summary Files and the simulated privacy-protected Summary Files after non-negative postprocessing, across nine racial query; 2) the relationship between earnings and education level using the 2020 ACS 5-year estimates (27, 45). D.1. Impact on racial group counts.We analyze the MSE between the non-privatized 2010 ...
work page 2010
-
[2]
Census in comparison with the privacy levels published by the Census Bureau
Discussion In this paper, we have analyzed the privacy guarantees of the 2020 U.S. Census in comparison with the privacy levels published by the Census Bureau. Our analysis demonstrates that the actual privacy guarantee is significantly stronger than that provided by the Bureau’s existing approach, as evidenced by our uniformly smallerϵvalue for anyδ. Thi...
work page 2020
-
[3]
Hotchkiss M, Phelan J (2017)Uses of Census Bureau data in federal funds distribution: A new design for the 21st century. (United States Census Bureau)
work page 2017
-
[4]
US Census Bureau (2023) Census bureau data guide more than $2.8 trillion in federal funding in fiscal year 2021
work page 2023
-
[5]
Kenny CT, et al. (2023) Comment: The Essential Role of Policy Evaluation for the 2020 Census DisclosureAvoidance System.Harvard Data Science Review(Special Issue 2)
work page 2023
-
[6]
(Schloss Dagstuhl - Leibniz-Zentrum für Informatik), Vol
Cohen A, Duchin M, Matthews J, Suwal B (2021) Census TopDown: The impacts of differential privacy on redistricting in2nd Symposium on Foundations of Responsible Computing, FORC 2021, June 9-11, 2021, Virtual Conference, LIPIcs. (Schloss Dagstuhl - Leibniz-Zentrum für Informatik), Vol. 192, pp. 5:1–5:22
work page 2021
-
[7]
The Quarterly Journal of Economics118(1):157–206
Autor DH, Duggan MG (2003) The rise in the disability rolls and the decline in unemployment. The Quarterly Journal of Economics118(1):157–206
work page 2003
-
[8]
census.gov/topics/employment/labor-force/guidance.html)
US Census Bureau (2021) Guidance for labor force statistics data users ( https://www. census.gov/topics/employment/labor-force/guidance.html)
work page 2021
-
[9]
Eckman SJ (2021) Apportionment and redistricting following the 2020 census (https://sgp. fas.org/crs/misc/IN11360.pdf)
work page 2021
-
[10]
US Census Bureau (2021) 2020 census apportionment results (https://www.census.gov/ data/tables/2020/dec/2020-apportionment-data.html)
work page 2021
-
[11]
Duncan G, Lambert D (1989) The risk of disclosure for microdata.Journal of Business & Economic Statistics7(2):207–217
work page 1989
-
[12]
Dick T, et al. (2023) Confidence-ranked reconstruction of census microdata from published statistics.Proceedings of the National Academy of Sciences120(8):e2218605120
work page 2023
-
[13]
Hawes M (2022) Reconstruction and re-identification of the demographic and housing charac- teristics file (dhc)
work page 2022
-
[14]
Abowd JM (2019) Staring down the database reconstruction theorem
work page 2019
-
[15]
3: 21-cv-00211-rah-ecm-kcn, the state of alabama v
Abowd J (2021) Declaration of John Abowd in case no. 3: 21-cv-00211-rah-ecm-kcn, the state of alabama v. united states department of commerce
work page 2021
-
[16]
Dwork C, McSherry F , Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis.Theory Of Cryptography, Proceedings3876:265–284. 8| Su, Su, and Wanget al
work page 2006
-
[17]
Petersburg, Russia, May 28-June 1, 2006
Dwork C, Kenthapadi K, McSherry F , Mironov I, Naor M (2006) Our data, ourselves: Privacy via distributed noise generation inAdvances in Cryptology-EUROCRYPT 2006: 24th Annual International Conference on the Theory and Applications of Cryptographic Techniques, St. Petersburg, Russia, May 28-June 1, 2006. Proceedings 25. (Springer), pp. 486–503
work page 2006
-
[18]
(2022) The 2020 census disclosure avoidance system TopDown algorithm
Abowd JM, et al. (2022) The 2020 census disclosure avoidance system TopDown algorithm. Harvard Data Science Review(Special Issue 2)
work page 2022
-
[19]
(2022) Invited lecture: The u.s
Abowd JM, et al. (2022) Invited lecture: The u.s. census bureau adopts differential privacy inKDD ’18: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining
work page 2022
-
[20]
US Census Bureau (2023) Phillips v
Phillips v. US Census Bureau (2023) Phillips v. U.S. Census Bureau (https://thearp.org/ litigation/phillips-v-us-census-bureau/)
work page 2023
-
[21]
Kenny CT, et al. (2021) The use of differential privacy for census data and its impact on redistricting: The case of the 2020 U.S. census.Science Advances7(41):eabk3283
work page 2021
-
[22]
Kenny CT, McCartan C, Simko T, Imai K (2024) Census officials must constructively en- gage with independent evaluations.Proceedings of the National Academy of Sciences 121(11):e2321196121
work page 2024
-
[23]
Anderson MJ (2015)The American census: A social history. (Y ale University Press)
work page 2015
-
[24]
Census Bureau’s Use of Differential Privacy.Harvard Data Science Review(Special Issue 2)
Boyd D, Sarathy J (2022) Differential Perspectives: Epistemic Disconnects Surrounding the U.S. Census Bureau’s Use of Differential Privacy.Harvard Data Science Review(Special Issue 2)
work page 2022
-
[25]
Kifer D, et al. (2022) Bayesian and frequentist semantics for common variations of differential privacy: Applications to the 2020 census.arXiv preprint arXiv:2209.03310
-
[26]
Dong J, Roth A, Su WJ (2022) Gaussian differential privacy.Journal of the Royal Statistical Society: Series B (Statistical Methodology)84(1):3–37
work page 2022
-
[27]
Balle B, Barthe G, Gaboardi M (2020) Privacy profiles and amplification by subsampling.J. Priv. Confidentiality10(1)
work page 2020
-
[28]
US Census Bureau (2022) Privacy-protected 2010 census demonstration data | ipums nhgis (https://www.nhgis.org/privacy-protected-2010-census-demonstration-data# v20220825-files)
work page 2022
- [29]
-
[30]
(Curran Associates, Inc.), Vol
Canonne CL, Kamath G, Steinke T (2020) The discrete Gaussian for differential privacy in Advances in Neural Information Processing Systems. (Curran Associates, Inc.), Vol. 33, pp. 15676–15688
work page 2020
-
[31]
Dwork C, Rothblum GN (2016) Concentrated differential privacy.arXiv preprint arXiv:1603.01887
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[32]
Bun M, Steinke T (2016) Concentrated differential privacy: Simplifications, extensions, and lower bounds inTheory of Cryptography Conference. (Springer), pp. 635–658
work page 2016
-
[33]
Bun M, Dwork C, Rothblum GN, Steinke T (2018) Composable and versatile privacy via truncated CDP inProceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing. pp. 74–86
work page 2018
-
[34]
Micciancio D, Regev O (2007) Worst-case to average-case reductions based on Gaussian measures.SIAM Journal on Computing37(1):267–302
work page 2007
-
[35]
US Census Bureau (2022) Privacy-loss budget allocation 2022-08-25 ( https: //www2.census.gov/programs-surveys/decennial/2020/program-management/ data-product-planning/2010-demonstration-data-products/02-Demographic_ and_Housing_Characteristics/2022-08-25_Summary_File/2022-08-25_ Privacy-Loss_Budget_Allocations.pdf)
work page 2022
-
[36]
McSherry F (2010) Privacy integrated queries: an extensible platform for privacy-preserving data analysis.Commun. ACM53(9):89–97
work page 2010
-
[37]
(2022) Making the most of parallel composition in differential privacy.Proc
Smith J, et al. (2022) Making the most of parallel composition in differential privacy.Proc. Priv. Enhancing Technol.2022(1):253–273
work page 2022
-
[38]
Kairouz P , Oh S, Viswanath P (2017) The composition theorem for differential privacy.IEEE Trans. Inf. Theory63(6):4037–4049
work page 2017
-
[39]
Mironov I (2017) Rényi differential privacy in2017 IEEE 30th computer security foundations symposium (CSF). (IEEE), pp. 263–275
work page 2017
-
[40]
Bu Z, Dong J, Long Q, Su WJ (2020) Deep learning with Gaussian differential privacy.Harvard Data Science Review2020(23):10–1162
work page 2020
-
[41]
Wang C, Su B, Y e J, Shokri R, Su WJ (2024) Unified enhancement of privacy bounds for mixture mechanisms viaf-differential privacy.Advances in Neural Information Processing Systems36
work page 2024
- [42]
-
[43]
US Census Bureau (2020) Selected social characteristics in the united states (U.S. Census Bureau). Accessed on 4 October 2024
work page 2020
-
[44]
US Census Bureau (2020) Selected economic characteristics (U.S. Census Bureau). Accessed on 4 October 2024
work page 2020
-
[45]
US Census Bureau (2020) Selected housing characteristics (U.S. Census Bureau). Accessed on 4 October 2024
work page 2020
-
[46]
US Census Bureau (2020) Acs demographic and housing estimates (U.S. Census Bureau). Accessed on 4 October 2024
work page 2020
-
[47]
Muller A (2002) Education, income inequality, and mortality: a multiple regression analysis. BMJ324(7328):23
work page 2002
-
[48]
census bureau’s privacy protection methods.Science Advances10(18):eadl2524
Kenny CT, McCartan C, Kuriwaki S, Simko T, Imai K (2024) Evaluating bias and noise induced by the u.s. census bureau’s privacy protection methods.Science Advances10(18):eadl2524
work page 2024
-
[49]
Cumings-Menon R (2024) Full-information estimation for hierarchical data.arXiv preprint arXiv:2404.13164
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[50]
Drechsler J, Globus-Harris I, Mcmillan A, Sarathy J, Smith A (2022) Nonparametric differentially private confidence intervals for the median.Journal of Survey Statistics and Methodology 10(3):804–829
work page 2022
- [51]
-
[52]
Sullivan TA (2020) Coming to Our Census: How Social Statistics Underpin Our Democracy (and Republic).Harvard Data Science Review2(1)
work page 2020
- [53]
-
[54]
Cumings-Menon R, et al. (2024) Geographic spines in the 2020 census disclosure avoidance system.Journal of Privacy and Confidentiality14(3)
work page 2024
-
[55]
US Census Bureau (2023) DAS-implementation-details (U.S. Census Bureau)
work page 2023
-
[56]
Kairouz P , Liu Z, Steinke T (2021) The distributed discrete gaussian mechanism for federated learning with secure aggregation inProceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, Proceedings of Machine Learning Research. (PMLR), Vol. 139, pp. 5201–5212
work page 2021
-
[57]
Zhu Y , Dong J, Wang YX (2022) Optimal accounting of differential privacy via characteristic function inInternational Conference on Artificial Intelligence and Statistics. (PMLR), pp. 4782–4817
work page 2022
-
[58]
Koskela A, Jälkö J, Honkela A (2020) Computing tight differential privacy guarantees using FFT inThe 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020, 26-28 August 2020, Online [Palermo, Sicily, Italy], Proceedings of Machine Learning Research. (PMLR), Vol. 108, pp. 2560–2569
work page 2020
-
[59]
Gopi S, Lee YT, Wutschitz L (2021) Numerical composition of differential privacy inAdvances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual. pp. 11631–11642
work page 2021
-
[60]
Balle B, Barthe G, Gaboardi M, Hsu J, Sato T (2020) Hypothesis testing interpretations and renyi differential privacy inThe 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020, 26-28 August 2020, Online [Palermo, Sicily, Italy], Proceedings of Machine Learning Research, eds. Chiappa S, Calandra R. (PMLR), Vol. 108, pp. 2496–2506
work page 2020
-
[61]
(Springer, New Y ork), Third edition, pp
Lehmann EL, Romano JP (2005)Testing statistical hypotheses, Springer Texts in Statistics. (Springer, New Y ork), Third edition, pp. xiv+784
work page 2005
-
[62]
Wang H, Gao S, Zhang H, Shen M, Su WJ (2022) Analytical composition of differential privacy via the Edgeworth accountant.arXiv preprint arXiv:2206.04236
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[63]
Kiayias A, Kohlweiss M, Wallden P , Zikas V
Genise N, Micciancio D, Peikert C, Walter M (2020) Improved discrete gaussian and subgaus- sian analysis for lattice cryptography inPublic-Key Cryptography - PKC 2020 - 23rd IACR International Conference on Practice and Theory of Public-Key Cryptography, Edinburgh, UK, May 4-7, 2020, Proceedings, Part I, Lecture Notes in Computer Science, eds. Kiayias A, ...
work page 2020
-
[64]
(Cambridge university press) Vol
Durrett R (2019)Probability: theory and examples. (Cambridge university press) Vol. 49
work page 2019
-
[65]
Sablonnière P , Sbibih D, Tahrichi M (2010) Error estimate and extrapolation of a quadrature formula derived from a quartic spline quasi-interpolant.BIT50(4):843–862. Su, Su, and Wanget al. PNAS |September 25, 2025| vol. XXX | no. XX |9 A. Technical proofs and details This section presents our main methodology and key tools for deriving the privacy profil...
work page 2010
-
[66]
by treating the binary categories as the 2-fold composition of two counting queries. According to (28), the discrete Gaussian mechanism isρ-zCDP if we takeσ2 = ∆ 2 M/2ρ.zCDP is currently adopted by the Bureau to count the privacy budget of the 2020 Census. A better zCDP guarantee for the discrete Gaussian is also investigated by (54). The Bureau obtained ...
work page 2020
-
[67]
The technical details of all this section is similar to Section F
The results presented here are used to derive the trade-off functions shown in Figure 4 and Figure 12. The technical details of all this section is similar to Section F. The trade-off function is uniquely determined by the following parametric equation. α(ζ) =PXi∼NZ(0,σ2 i ) ( k∑ i=1 1 σ2 i ni∑ j=1 Xij >ζ ) +c·PXi∼NZ(0,σ2 i ) ( k∑ i=1 1 σ2 i ni∑ j=1 Xij =...
work page 2025
-
[68]
+ν(Λc 2) + ⏐⏐⏐⏐⏐PXij∼NZ(0,σ2 i ) [ k∑ i=1 ai ni∑ j=1 Xij >t ϵ,Λ 1 ] −ν ( k∑ i=1 ai ¯Xi≥tϵ,Λ 2 ) ⏐⏐⏐⏐⏐ = Ω 6 + Ω7 + Ω8. Upper bound onΩ 6.We have PXij∼NZ(0,σ2 i ) [Λc 1]≤ k∑ i=1 PXij∼NZ(0,σ2 i ) [ ⏐⏐⏐⏐⏐ 1√ni ni∑ j=1 Xij ⏐⏐⏐⏐⏐>12·σi ] . According to Eq. (A.2), ∑ni j=1Xij is sub-Gaussian with variance proxy √ niσ2 i.As a result, it holds PXij∼NZ(0,σ2 i ) [ ⏐...
work page 2025
-
[69]
Therefore, the error bound is given by ⏐⏐⏐⏐⏐⏐ ∫ 1 100 0 F(t)dt− N−1 4∑ k=1 2h 45×(7F(x4l−4) + 32F(x4l−3) + 12F(x4l−2) + 32F(x4l−1) + 7F(x4l)) ⏐⏐⏐⏐⏐⏐ ≤2 945×1.2×1035×1 100× ( 1 100×107 )6 <2.54×10−24. E. Supplementary figures 28| Su, Su, and Wanget al. □4 □2 0 2 4 i/Bn 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Probability Density Compositions of DGM Our Appr...
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.