ResidualPlanner+: a scalable matrix mechanism for marginals and beyond

Danfeng Zhang; Daniel Kifer; Guanlin He; Levent Toksoz; Yingtai Xiao; Zeyu Ding

arxiv: 2305.08175 · v4 · submitted 2023-05-14 · 💻 cs.DB · cs.CR· cs.LG

ResidualPlanner+: a scalable matrix mechanism for marginals and beyond

Yingtai Xiao , Guanlin He , Levent Toksoz , Zeyu Ding , Danfeng Zhang , Daniel Kifer This is my paper

Pith reviewed 2026-05-24 08:10 UTC · model grok-4.3

classification 💻 cs.DB cs.CRcs.LG

keywords matrix mechanismsdifferential privacymarginal queriesGaussian noiseconvex optimizationrange queriesprefix sumsscalable privacy

0 comments

The pith

ResidualPlanner formulates matrix mechanism optimization for marginal queries as a convex program over variances, scaling to 100 attributes without full query matrices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ResidualPlanner as a method that reduces the problem of finding optimal noise for marginal queries under Gaussian noise to a convex optimization over the variances of those marginals. This formulation supports multiple convex loss functions and avoids constructing or storing the full query matrix, which lets it run on datasets with 100 attributes in minutes where prior methods exhaust memory. ResidualPlanner+ extends the approach to mixed workloads that combine marginals with range queries or prefix-sum queries on the same attributes. A reader would care because noisy marginal releases are a standard building block for contingency tables, Bayesian networks, and synthetic data, and current methods hit hard scalability limits on real attribute counts.

Core claim

ResidualPlanner solves the matrix mechanism design problem for marginal queries by optimizing a convex function of the marginal variances directly; the resulting program yields the optimal Gaussian noise allocation for any such loss while remaining computationally tractable even when the underlying query matrix is too large to materialize. ResidualPlanner+ relaxes optimality to handle arbitrary user-specified combinations of marginals and range or prefix-sum queries on overlapping attribute sets, still using the same variance-centric convex program.

What carries the argument

A convex program whose decision variables are the variances (and covariances) of the target marginals; the program encodes the matrix mechanism's noise allocation without ever building the explicit query matrix.

If this is right

Any convex loss written in terms of marginal variances can be optimized, removing the restriction to a single predefined objective.
Variance and covariance values for every marginal can be returned directly without materializing large matrices.
Mixed workloads that include both marginals and range or prefix-sum queries on the same data can be answered from one noise allocation.
The same convex program runs in seconds on marginal workloads that exhaust memory in HDMM and finishes in a few minutes for 100-attribute tables.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The variance-centric view may let the same optimizer be reused for other linear query families whose error can be expressed through marginal variances.
Because the program never builds the full matrix, it could be embedded inside systems that release many different marginal sets on the same underlying table without repeated matrix construction.
Extending the convex program to additional query types would require only that their error contributions remain expressible as functions of the same marginal variances.

Load-bearing premise

The desired loss can be expressed as a convex function of the marginal variances alone, and the feasible set of variance vectors remains convex.

What would settle it

A concrete marginal query workload and convex loss for which the variance vector returned by the program produces strictly higher loss than the noise allocation obtained by solving the original matrix mechanism quadratic program on the full query matrix.

read the original abstract

Noisy marginals are a common form of confidentiality protecting data release and are useful for many downstream tasks such as contingency table analysis, construction of Bayesian networks, and even synthetic data generation. Privacy mechanisms that provide unbiased noisy answers to linear queries (such as marginals) are known as matrix mechanisms. We propose ResidualPlanner and ResidualPlanner+, two highly scalable matrix mechanisms. ResidualPlanner is both optimal and scalable for answering marginal queries with Gaussian noise, while ResidualPlanner+ provides support for more general workloads, such as combinations of marginals and range queries or prefix-sum queries. ResidualPlanner can optimize for many loss functions that can be written as a convex function of marginal variances (prior work was restricted to just one predefined objective function). ResidualPlanner can optimize the accuracy of marginals in large scale settings in seconds, even when the previous state of the art (HDMM) runs out of memory. It even runs on datasets with 100 attributes in a couple of minutes. Furthermore, ResidualPlanner can efficiently compute variance/covariance values for each marginal (prior methods quickly run out of memory, even for relatively small datasets). ResidualPlanner+ provides support for more complex workloads that combine marginal and range/prefix-sum queries (e.g., a marginal on race, a range query on age, and a combined race/age tabulation that answers age range queries for each race). It even supports custom user-defined workloads on different attributes. With this added flexibility, ResidualPlanner+ is not necessarily optimal, however it is still extremely scalable and outperforms the prior state-of-the-art (HDMM) on prefix-sum queries both in terms of accuracy and speed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ResidualPlanner scales optimal matrix mechanisms for marginals to 100 attributes and adds a flexible extension for mixed workloads, beating HDMM on memory and speed.

read the letter

The core advance is ResidualPlanner, which formulates the matrix mechanism optimization as a convex program over marginal variances rather than the full query matrix. This lets it stay optimal for Gaussian noise on marginal queries while supporting multiple convex loss functions on those variances, and it runs in seconds or minutes on datasets where HDMM exhausts memory. ResidualPlanner+ relaxes optimality to handle mixed workloads such as marginals plus range or prefix-sum queries on the same attributes, and the abstract reports better accuracy and speed than HDMM on prefix sums. Both versions also compute per-marginal variance and covariance values without the memory blowup that hits prior methods. These are practical gains for releasing high-dimensional contingency tables or feeding Bayesian network construction. The main soft spot is that the abstract gives no derivation outline or experimental controls, so it is not yet clear how the optimality carries over when queries have dependencies or when the convex program is solved at scale. The mixed-workload version is explicitly not optimal, which is fine if the performance edge holds, but the paper will need to show that the relaxation does not introduce hidden accuracy costs on realistic combined workloads. This work is aimed at people building privacy tools for databases and statistical releases who already know the matrix mechanism literature. If the full paper supplies the missing derivations and reproducible experiments, it deserves a serious referee.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces ResidualPlanner, a matrix mechanism for releasing noisy answers to marginal queries under Gaussian noise that formulates the problem as a convex optimization over marginal variances to achieve optimality while supporting arbitrary convex loss functions of those variances; it further presents ResidualPlanner+ as a scalable (though not necessarily optimal) extension that handles mixed workloads combining marginals with range or prefix-sum queries on the same or different attributes. The work emphasizes practical scalability, reporting that ResidualPlanner solves large instances (including 100-attribute datasets) in minutes where prior methods such as HDMM exhaust memory, and that it can output per-marginal variance/covariance values efficiently.

Significance. If the optimality derivation and empirical scaling results hold, the contribution would be significant for the differential privacy and private data release literature: it removes a key practical barrier (memory and runtime) for high-dimensional marginal release while broadening the class of supported objectives and workloads, directly benefiting downstream applications such as contingency-table analysis and synthetic data generation.

minor comments (3)

The abstract asserts that ResidualPlanner 'can optimize for many loss functions that can be written as a convex function of marginal variances' yet provides no concrete examples of such loss functions or the corresponding convex programs; a short illustrative list or reference to the relevant section would clarify the scope of the generalization over prior single-objective work.
The claim that ResidualPlanner+ 'outperforms the prior state-of-the-art (HDMM) on prefix-sum queries both in terms of accuracy and speed' is presented without reference to the specific experimental configuration (workload size, privacy parameter, or number of repetitions); adding a pointer to the relevant table or figure would strengthen the statement.
Notation for the marginal-variance vector and the mapping from query matrix to marginal variances is introduced without an explicit equation or diagram in the abstract; a compact definition early in the introduction would aid readers unfamiliar with the matrix-mechanism literature.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, recognition of the work's potential significance, and recommendation for minor revision. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper formulates ResidualPlanner as solving a convex program directly over marginal variances to achieve optimality for Gaussian matrix mechanisms on marginal queries, with ResidualPlanner+ as an explicit scalable relaxation for mixed workloads. This is presented as an algorithmic improvement over prior matrix mechanisms (e.g., HDMM) without reducing the central optimality claim to a fitted parameter, self-citation chain, or definitional equivalence. The abstract and described claims contain no load-bearing self-citations, ansatz smuggling, or renaming of known results as derivations; the work is self-contained against external benchmarks in the matrix mechanism literature.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the central claims rest on unstated assumptions about query linearity and convexity that are standard in the matrix mechanism literature.

pith-pipeline@v0.9.0 · 5848 in / 1103 out tokens · 24114 ms · 2026-05-24T08:10:28.598260+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Residual matrices RA formed via Kronecker products of Sub|Atti|; reconstruction via Sub† pseudo-inverses; convex program over σ²A parameters (Thm 4.4, 4.7)
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Optimality via symmetry of marginals under regular loss functions (Def 4.1)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · 2 internal anchors

[1]

Privacy preserving synthetic data release using d eep learning

Nazmiye Ceren Abay, Y an Zhou, Murat Kantarcioglu, Bhava ni Thuraisingham, and Latanya Sweeney. Privacy preserving synthetic data release using d eep learning. In Machine Learning and Knowledge Discovery in Databases: European Conference , ECML PKDD 2018, Dublin, Ireland, September 10–14, 2018, Proceedings, Part I 18 , pages 510–526. Springer, 2019

work page 2018
[2]

John M. Abowd, Robert Ashmead, Ryan Cumings-Menon, Sims on Garﬁnkel, Micah Heineck, Christine Heiss, Robert Johns, Daniel Kifer, Philip Lecler c, Ashwin Machanavajjhala, Brett Moran, William Sexton, Matthew Spence, and Pavel Zhuravlev . The 2020 census disclosure avoidance system topdown algorithm. Harvard Data Science Review , forthcoming. Preprint https...

work page 2020
[3]

Differentially Private Release of High-Dimensional Datasets using the Gaussian Copula

Hassan Jameel Asghar, Ming Ding, Thierry Rakotoarivelo , Sirine Mrabet, and Mohamed Ali Kaafar. Differentially private release of high-dimension al datasets using the gaussian copula. arXiv preprint arXiv:1902.01499 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902
[4]

Differentially private query releas e through adaptive projection

Sergul A ydore, William Brown, Michael Kearns, Krishnar am Kenthapadi, Luca Melis, Aaron Roth, and Ankit A Siva. Differentially private query releas e through adaptive projection. In International Conference on Machine Learning , pages 457–467. PMLR, 2021

work page 2021
[5]

Improving the gaussian me chanism for differential privacy: Analytical calibration and optimal denoising

Borja Balle and Y u-Xiang Wang. Improving the gaussian me chanism for differential privacy: Analytical calibration and optimal denoising. In International Conference on Machine Learn- ing, ICML, 2018

work page 2018
[6]

Privacy, accuracy, and consistency too: a holistic solution to contingency table release

Boaz Barak, Kamalika Chaudhuri, Cynthia Dwork, Satyen K ale, Frank McSherry, and Kunal Talwar. Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 273–282, 2007

work page 2007
[7]

Concentrated differential privacy: Simpliﬁcations, extensions, and lower bounds

Mark Bun and Thomas Steinke. Concentrated differential privacy: Simpliﬁcations, extensions, and lower bounds. In Proceedings, Part I, of the 14th International Conference o n Theory of Cryptography - V olume 9985, 2016

work page 2016
[8]

Census Bureau

U.S. Census Bureau. Decennial census: 2010 summary ﬁles . https://www.census.gov/mp/www/cat/decennial_census_2010/

work page 2010
[9]

Census Bureau

U.S. Census Bureau. The current population survey (cps) . https://www.census.gov/programs-surveys/cps.html, 2023

work page 2023
[10]

D ata synthesis via differentially private markov random ﬁelds

Kuntai Cai, Xiaoyu Lei, Jianxin Wei, and Xiaokui Xiao. D ata synthesis via differentially private markov random ﬁelds. Proceedings of the VLDB Endowment, 14(11):2190–2202, 2021

work page 2021
[11]

Differe ntially private high-dimensional data publication via sampling-based inference

Rui Chen, Qian Xiao, Y u Zhang, and Jianliang Xu. Differe ntially private high-dimensional data publication via sampling-based inference. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data m ining, pages 129–138, 2015

work page 2015
[12]

CVXPY: A Python-embed ded modeling language for convex optimization

Steven Diamond and Stephen Boyd. CVXPY: A Python-embed ded modeling language for convex optimization. Journal of Machine Learning Research , 17(83):1–5, 2016

work page 2016
[13]

Differentially private data cubes: Optimizing noise sources and consistency

Bolin Ding, Marianne Winslett, Jiawei Han, and Zhenhui Li. Differentially private data cubes: Optimizing noise sources and consistency. In Proceedings of the 2011 ACM SIGMOD Inter- national Conference on Management of Data , 2011

work page 2011
[14]

Ecos: A n socp solver for embedded sys- tems

Alexander Domahidi, Eric Chu, and Stephen Boyd. Ecos: A n socp solver for embedded sys- tems. In 2013 European control conference (ECC), pages 3071–3076. IEEE, 2013

work page 2013
[15]

Jinshuo Dong, Aaron Roth, and Weijie J. Su. Gaussian dif ferential privacy. Journal of the Royal Statistical Society: Series B (Statistical Methodol ogy), 84(1):3–37, 2022

work page 2022
[16]

UCI machine learning reposi tory, 2019

Dheeru Dua and Casey Graff. UCI machine learning reposi tory, 2019

work page 2019
[17]

Our data, ourselves: Privacy via distributed noise generation

Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry , Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation . In EUROCRYPT, pages 486–503, 2006

work page 2006
[18]

The power of factorization mechanisms in local and central differential privacy

Alexander Edmonds, Aleksandar Nikolov, and Jonathan U llman. The power of factorization mechanisms in local and central differential privacy. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing , pages 425–438, 2020. 11

work page 2020
[19]

Dual query: Practical private query release for high dim ensional data

Marco Gaboardi, Emilio Jesús Gallego Arias, Justin Hsu , Aaron Roth, and Zhiwei Steven Wu. Dual query: Practical private query release for high dim ensional data. In International Conference on Machine Learning , pages 1170–1178. PMLR, 2014

work page 2014
[20]

Differentially private chi-squared hypothesis testing: Goodness of ﬁt and independence testin g

Marco Gaboardi, Hyun Lim, Ryan Rogers, and Salil V adhan. Differentially private chi-squared hypothesis testing: Goodness of ﬁt and independence testin g. In International conference on machine learning, pages 2111–2120. PMLR, 2016

work page 2016
[21]

Gurobi Optimizer Reference Manual, 2023

Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2023

work page 2023
[22]

A sim ple and practical algorithm for dif- ferentially private data release

Moritz Hardt, Katrina Ligett, and Frank McSherry. A sim ple and practical algorithm for dif- ferentially private data release. Advances in neural information processing systems , 25, 2012

work page 2012
[23]

In International conference on learning representations , 2019

James Jordon, Jinsung Y oon, and Mihaela V an Der Schaar.Pate-gan: Generating synthetic data with differential privacy guarantees. In International conference on learning representations , 2019

work page 2019
[24]

Loan prediction problem dataset

Kaggle. Loan prediction problem dataset. https://www.kaggle.com/altruistdelhite04/loan-predi ction-probl

work page
[25]

Accessed: May 8th, 2023

work page 2023
[26]

Diffe rentially private chi-squared test by unit circle mechanism

Kazuya Kakizaki, Kazuto Fukuchi, and Jun Sakuma. Diffe rentially private chi-squared test by unit circle mechanism. In International Conference on Machine Learning , pages 1761–1770. PMLR, 2017

work page 2017
[27]

A new class of private chi- square tests

Daniel Kifer and Ryan Rogers. A new class of private chi- square tests. In Proceedings of the 20th International Conference on Artiﬁcial Intelligence a nd Statistics, AISTATS , volume 17, pages 991–1000, 2016

work page 2016
[28]

Releas- ing search queries and clicks privately

Aleksandra Korolova, Krishnaram Kenthapadi, Nina Mis hra, and Alexandros Ntoulas. Releas- ing search queries and clicks privately. In Proceedings of the 18th international conference on W orld wide web, pages 171–180, 2009

work page 2009
[29]

Differentially Private Hierarchical Count-of-Counts Histograms

Y u-Hsuan Kuo, Cho-Chun Chiu, Daniel Kifer, Michael Hay , and Ashwin Machanava- jjhala. Differentially private hierarchical count-of-co unts histograms. arXiv preprint arXiv:1804.00370, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[30]

Differentially private m-estimators

Jing Lei. Differentially private m-estimators. Advances in Neural Information Processing Systems, 24, 2011

work page 2011
[31]

An adaptive mechanism for acc urate query answering under differential privacy

Chao Li and Gerome Miklau. An adaptive mechanism for acc urate query answering under differential privacy. Proceedings of the VLDB Endowment , 5(6), 2012

work page 2012
[32]

Optimal error of query sets un der the differentially-private matrix mechanism

Chao Li and Gerome Miklau. Optimal error of query sets un der the differentially-private matrix mechanism. In Proceedings of the 16th International Conference on Databa se Theory, pages 272–283, 2013

work page 2013
[33]

The matrix mechanism: Optimizing linear counting queries under diffe rential privacy

Chao Li, Gerome Miklau, Michael Hay, Andrew Mcgregor, a nd Vibhor Rastogi. The matrix mechanism: Optimizing linear counting queries under diffe rential privacy. The VLDB Journal, 24(6):757–781, December 2015

work page 2015
[34]

Differentiall y private synthesization of multi- dimensional data using copula functions

Haoran Li, Li Xiong, and Xiaoqian Jiang. Differentiall y private synthesization of multi- dimensional data using copula functions. In Advances in database technology: proceedings. International conference on extending database technology, volume 2014, page 475. NIH Pub- lic Access, 2014

work page 2014
[35]

Leverag- ing public data for practical private query release

Terrance Liu, Giuseppe Vietri, Thomas Steinke, Jonath an Ullman, and Steven Wu. Leverag- ing public data for practical private query release. In International Conference on Machine Learning, pages 6968–6977. PMLR, 2021

work page 2021
[36]

Iterati ve methods for private synthetic data: Unifying framework and new methods

Terrance Liu, Giuseppe Vietri, and Steven Z Wu. Iterati ve methods for private synthetic data: Unifying framework and new methods. Advances in Neural Information Processing Systems , 34, 2021

work page 2021
[37]

Private synthetic da ta with hierarchical structure

Terrance Liu and Zhiwei Steven Wu. Private synthetic da ta with hierarchical structure. arXiv preprint arXiv:2206.05942, 2022

work page arXiv 2022
[38]

Optimizing error of high-dimensional statistical queries under diffe rential privacy

Ryan McKenna, Gerome Miklau, Michael Hay, and Ashwin Ma chanavajjhala. Optimizing error of high-dimensional statistical queries under diffe rential privacy. Proceedings of the VLDB Endowment, 11(10), 2018. 12

work page 2018
[39]

Hdmm: Opti- mizing error of high-dimensional statistical queries unde r differential privacy

Ryan McKenna, Gerome Miklau, Michael Hay, and Ashwin Ma chanavajjhala. Hdmm: Opti- mizing error of high-dimensional statistical queries unde r differential privacy. arXiv preprint arXiv:2106.12118, 2021

work page arXiv 2021
[40]

Aim: An adaptive and iterative mechanism for differentially private synthetic data

Ryan McKenna, Brett Mullins, Daniel Sheldon, and Gerom e Miklau. Aim: An adaptive and iterative mechanism for differentially private synthetic data. arXiv preprint arXiv:2201.12677 , 2022

work page arXiv 2022
[41]

Relaxed marginal consistency for differentially private query answering

Ryan McKenna, Siddhant Pradhan, Daniel R Sheldon, and G erome Miklau. Relaxed marginal consistency for differentially private query answering. Advances in Neural Information Pro- cessing Systems, 34:20696–20707, 2021

work page 2021
[42]

Graph ical-model based estimation and inference for differential privacy

Ryan McKenna, Daniel Sheldon, and Gerome Miklau. Graph ical-model based estimation and inference for differential privacy. In International Conference on Machine Learning , pages 4435–4444. PMLR, 2019

work page 2019
[43]

New computational aspects of discrepancy theory

Aleksandar Nikolov. New computational aspects of discrepancy theory . PhD thesis, Rutgers University-Graduate School-New Brunswick, 2014

work page 2014
[44]

Priview: practical differentially private re- lease of marginal contingency tables

Wahbeh Qardaji, Weining Y ang, and Ninghui Li. Priview: practical differentially private re- lease of marginal contingency tables. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data , pages 1435–1446, 2014

work page 2014
[45]

New oracle-efﬁcient algorithms for private synthetic data release

Giuseppe Vietri, Grace Tian, Mark Bun, Thomas Steinke, and Steven Wu. New oracle-efﬁcient algorithms for private synthetic data release. In International Conference on Machine Learning, pages 9765–9774. PMLR, 2020

work page 2020
[46]

Differentially private sql wit h bounded user contribution

Royce Wilson, Celia Y uxin Zhang, William Lam, Damien De sfontaines, Daniel Simmons- Marengo, and Bryant Gipson. Differentially private sql wit h bounded user contribution. In Proceedings on Privacy Enhancing T echnologies Symposium, 2020

work page 2020
[47]

Optimizing ﬁtness- for-use of differentially private linear queries

Yingtai Xiao, Zeyu Ding, Y uxin Wang, Danfeng Zhang, and Daniel Kifer. Optimizing ﬁtness- for-use of differentially private linear queries. In VLDB, 2021

work page 2021
[48]

Answering private linear queries adaptively using the common mechanism

Yingtai Xiao, Guanhong Wang, Danfeng Zhang, and Daniel Kifer. Answering private linear queries adaptively using the common mechanism. https://arxiv.org/abs/2212.00135, 2022

work page arXiv 2022
[49]

Dp pro: Differentially private high- dimensional data release via random projection

Chugui Xu, Ju Ren, Y aoxue Zhang, Zhan Qin, and Kui Ren. Dp pro: Differentially private high- dimensional data release via random projection. IEEE Transactions on Information F orensics and Security, 12(12):3081–3093, 2017

work page 2017
[50]

Ac- curate and efﬁcient private release of datacubes and contin gency tables

Grigory Y aroslavtsev, Graham Cormode, Cecilia M Proco piuc, and Divesh Srivastava. Ac- curate and efﬁcient private release of datacubes and contin gency tables. In 2013 IEEE 29th International Conference on Data Engineering (ICDE) , pages 745–756. IEEE, 2013

work page 2013
[51]

Scalable privacy- preserving data sharing methodology for genome-wide assoc iation studies

Fei Y u, Stephen E Fienberg, Aleksandra B Slavkovi ´c, and Caroline Uhler. Scalable privacy- preserving data sharing methodology for genome-wide assoc iation studies. Journal of biomed- ical informatics, 50:133–141, 2014

work page 2014
[52]

Convex optimization for linear query processing under approximate differential privacy

Ganzhao Y uan, Yin Y ang, Zhenjie Zhang, and Zhifeng Hao. Convex optimization for linear query processing under approximate differential privacy. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery an d Data Mining , 2016

work page 2016
[53]

Low-rank mechanism: Optimizing batch queries under differ ential privacy

Ganzhao Y uan, Zhenjie Zhang, Marianne Winslett, Xiaok ui Xiao, Yin Y ang, and Zhifeng Hao. Low-rank mechanism: Optimizing batch queries under differ ential privacy. Proc. VLDB En- dow., 5(11):1352–1363, July 2012

work page 2012
[54]

Optimizing batch linear queries under exact and approximat e differential privacy

Ganzhao Y uan, Zhenjie Zhang, Marianne Winslett, Xiaok ui Xiao, Yin Y ang, and Zhifeng Hao. Optimizing batch linear queries under exact and approximat e differential privacy. ACM Trans- actions on Database Systems (TODS) , 40(2):1–47, 2015

work page 2015
[55]

Procopiuc, Dives h Srivastava, and Xiaokui Xiao

Jun Zhang, Graham Cormode, Cecilia M. Procopiuc, Dives h Srivastava, and Xiaokui Xiao. Privbayes: Private data release via bayesian networks. ACM Trans. Database Syst., 42(4), oct 2017

work page 2017
[56]

Differentially private high- dimensional data publication via markov network

Wei Zhang, Jingwen Zhao, Fengqiong Wei, and Y unfang Che n. Differentially private high- dimensional data publication via markov network. EAI Endorsed Transactions on Security and Safety, 6(19), 2019

work page 2019
[57]

Privsyn: Differentially private data synthesis

Zhikun Zhang, Tianhao Wang, Jean Honorio, Ninghui Li, M ichael Backes, Shibo He, Jiming Chen, and Y ang Zhang. Privsyn: Differentially private data synthesis. 2021. 13 Contents 1 Introduction 1 2 Preliminaries 2 2.1 Differential Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2 Matrix Mechanism . . . . . . . . . . . . . . ....

work page 2021
[58]

residual

For each attribute, we can one-hot encode its attribute values as row vectors. So, for Att1, the attribute value ’a’ is encoded as [1, 0] and ’b’ is encoded as [0, 1]. For Att2, the attribute value ’y’ is encoded as [1, 0] and ’n’ is encoded as [0, 1]. For attribute Att3, the attribute value ’1’ is encoded as [1, 0, 0], the value ’2’ is encoded as [0, 1, ...

work page
[59]

18∗ 12/ 11/c ≈ 4. 8/c etc. C Example Variance Calculations We next illustrate how the σ 2 A parameters affect the variance of different marginals base d on Theo- rem 4.7. We illustrate it with a toy dataset that has 5 attributes, ea ch with 3 possible values. In this discussion, the variance of a marginal is the largest varian ce of its cells (all cells w...

work page
[60]

871∗ 2/ 3∗ 1≈ 8

154∗ 1∗ 1/ 9 + 10. 871∗ 2/ 3∗ 1≈ 8. 154. For the variance on the 5-way marginal, it consists of 1 term f or A′ =∅, 5 terms for when A′ has 1 attribute since there are 5 such A′ (but all the terms have the same value), 10 terms for when A′ has 2 attributes (all have the same value), 10 terms for when A′ has 3 attributes (again these terms all are equal), 5...

work page
[61]

a,y,1” in the dataset, the s econd component corresponds to record “a,y,2

Suppose we have three attributes: Att1 takes values ‘a’ or ‘b’; Att2 takes values ‘y’ or ‘n’; Att3 takes values 1 or 2 or 3. ay1 ay2 ay3 an1 an2 an3 by1 by2 by3 bn1 bn2 bn3 bn1: [0, 1]⊗ [0, 1]⊗ [1, 0, 0] 0 0 0 0 0 0 0 0 0 1 0 0 [1, 1]⊗ [1,− 1]⊗ [1,− 1, 0] 1 -1 0 -1 1 0 1 -1 0 -1 1 0 [1,− 1]⊗ [1, 1]⊗ [1, 0,− 1] 1 0 -1 1 0 -1 -1 0 1 -1 0 1 Table 8: Kron pro...

work page
[62]

Expressing the privacy cost of the optimal mechanism M∗ as a linear combination of the 1/σ 2 A values takes O(∑ A∈ W kload #cells(A)) total time

work page
[63]

Expressing all of the V ar(A;M∗ ), for A∈ W kload, as a linear combinations of the σ 2 A values can be done in O(∑ A∈ W kload #cells(A)) total time

work page
[64]

Note that the total number of cells on marginals in W kload is O (∑ A∈ W kload ∏ Atti∈ A|Atti| )

Computing all the noisy outputs of the optimal mechanism (i.e., MA(x; σ 2 A) for A ∈ closure(W kload)) takes O ( na ∑ A∈ W kload ∏ Atti∈ A(|Atti|+ 1) ) total time after the true an- swers have been precomputed (Line 1 in Algorithm 1). Note that the total number of cells on marginals in W kload is O (∑ A∈ W kload ∏ Atti∈ A|Atti| )

work page
[65]

Reconstructing marginals for all A∈ W kload takes O(∑ A∈ W kload|A|#cells(A)2) total time

work page
[66]

Proof of Theorem 4.8

Computing the variance of the cells for all of the marginal s for A∈ W kload can be done in O(∑ A∈ W kload #cells(A)) total time. Proof of Theorem 4.8. First we establish that |closure(W kload)| ≤ ∑ A∈ W kload #cells(A). Given an set A∈ W kload, we note that it has 2|A| subsets, so that |closure(A)| = 2|A|. How- ever, #cells(A) is at least 2|A| (because e...

work page
[67]

The RMSE on the workload that the selection step guarantees is also meas ured

We evaluate the running time and accuracy of the select ion step Table 9 shows the running time for the selection step of HDMM and ResidualPlanner. The RMSE on the workload that the selection step guarantees is also meas ured. Both HDMM and ResidualPlanner have no trouble here. HDMM is nearly optimal in RMSE and Resid ualPlanner is optimal, as shown by ag...

work page
[68]

For d beween 8 and 15, HDMM decides to add noise to all 3-way marginals and nothing else (even though the workload is all ≤ 3 marginals)

This maximum is always achieved for the sum query (a zer o-dimensional marginal) for the following reason. For d beween 8 and 15, HDMM decides to add noise to all 3-way marginals and nothing else (even though the workload is all ≤ 3 marginals). The privacy loss budget is split equally among them. Thus, each of the (d 3 ) marginals it measures gets N (0, (...

work page

[1] [1]

Privacy preserving synthetic data release using d eep learning

Nazmiye Ceren Abay, Y an Zhou, Murat Kantarcioglu, Bhava ni Thuraisingham, and Latanya Sweeney. Privacy preserving synthetic data release using d eep learning. In Machine Learning and Knowledge Discovery in Databases: European Conference , ECML PKDD 2018, Dublin, Ireland, September 10–14, 2018, Proceedings, Part I 18 , pages 510–526. Springer, 2019

work page 2018

[2] [2]

John M. Abowd, Robert Ashmead, Ryan Cumings-Menon, Sims on Garﬁnkel, Micah Heineck, Christine Heiss, Robert Johns, Daniel Kifer, Philip Lecler c, Ashwin Machanavajjhala, Brett Moran, William Sexton, Matthew Spence, and Pavel Zhuravlev . The 2020 census disclosure avoidance system topdown algorithm. Harvard Data Science Review , forthcoming. Preprint https...

work page 2020

[3] [3]

Differentially Private Release of High-Dimensional Datasets using the Gaussian Copula

Hassan Jameel Asghar, Ming Ding, Thierry Rakotoarivelo , Sirine Mrabet, and Mohamed Ali Kaafar. Differentially private release of high-dimension al datasets using the gaussian copula. arXiv preprint arXiv:1902.01499 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902

[4] [4]

Differentially private query releas e through adaptive projection

Sergul A ydore, William Brown, Michael Kearns, Krishnar am Kenthapadi, Luca Melis, Aaron Roth, and Ankit A Siva. Differentially private query releas e through adaptive projection. In International Conference on Machine Learning , pages 457–467. PMLR, 2021

work page 2021

[5] [5]

Improving the gaussian me chanism for differential privacy: Analytical calibration and optimal denoising

Borja Balle and Y u-Xiang Wang. Improving the gaussian me chanism for differential privacy: Analytical calibration and optimal denoising. In International Conference on Machine Learn- ing, ICML, 2018

work page 2018

[6] [6]

Privacy, accuracy, and consistency too: a holistic solution to contingency table release

Boaz Barak, Kamalika Chaudhuri, Cynthia Dwork, Satyen K ale, Frank McSherry, and Kunal Talwar. Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 273–282, 2007

work page 2007

[7] [7]

Concentrated differential privacy: Simpliﬁcations, extensions, and lower bounds

Mark Bun and Thomas Steinke. Concentrated differential privacy: Simpliﬁcations, extensions, and lower bounds. In Proceedings, Part I, of the 14th International Conference o n Theory of Cryptography - V olume 9985, 2016

work page 2016

[8] [8]

Census Bureau

U.S. Census Bureau. Decennial census: 2010 summary ﬁles . https://www.census.gov/mp/www/cat/decennial_census_2010/

work page 2010

[9] [9]

Census Bureau

U.S. Census Bureau. The current population survey (cps) . https://www.census.gov/programs-surveys/cps.html, 2023

work page 2023

[10] [10]

D ata synthesis via differentially private markov random ﬁelds

Kuntai Cai, Xiaoyu Lei, Jianxin Wei, and Xiaokui Xiao. D ata synthesis via differentially private markov random ﬁelds. Proceedings of the VLDB Endowment, 14(11):2190–2202, 2021

work page 2021

[11] [11]

Differe ntially private high-dimensional data publication via sampling-based inference

Rui Chen, Qian Xiao, Y u Zhang, and Jianliang Xu. Differe ntially private high-dimensional data publication via sampling-based inference. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data m ining, pages 129–138, 2015

work page 2015

[12] [12]

CVXPY: A Python-embed ded modeling language for convex optimization

Steven Diamond and Stephen Boyd. CVXPY: A Python-embed ded modeling language for convex optimization. Journal of Machine Learning Research , 17(83):1–5, 2016

work page 2016

[13] [13]

Differentially private data cubes: Optimizing noise sources and consistency

Bolin Ding, Marianne Winslett, Jiawei Han, and Zhenhui Li. Differentially private data cubes: Optimizing noise sources and consistency. In Proceedings of the 2011 ACM SIGMOD Inter- national Conference on Management of Data , 2011

work page 2011

[14] [14]

Ecos: A n socp solver for embedded sys- tems

Alexander Domahidi, Eric Chu, and Stephen Boyd. Ecos: A n socp solver for embedded sys- tems. In 2013 European control conference (ECC), pages 3071–3076. IEEE, 2013

work page 2013

[15] [15]

Jinshuo Dong, Aaron Roth, and Weijie J. Su. Gaussian dif ferential privacy. Journal of the Royal Statistical Society: Series B (Statistical Methodol ogy), 84(1):3–37, 2022

work page 2022

[16] [16]

UCI machine learning reposi tory, 2019

Dheeru Dua and Casey Graff. UCI machine learning reposi tory, 2019

work page 2019

[17] [17]

Our data, ourselves: Privacy via distributed noise generation

Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry , Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation . In EUROCRYPT, pages 486–503, 2006

work page 2006

[18] [18]

The power of factorization mechanisms in local and central differential privacy

Alexander Edmonds, Aleksandar Nikolov, and Jonathan U llman. The power of factorization mechanisms in local and central differential privacy. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing , pages 425–438, 2020. 11

work page 2020

[19] [19]

Dual query: Practical private query release for high dim ensional data

Marco Gaboardi, Emilio Jesús Gallego Arias, Justin Hsu , Aaron Roth, and Zhiwei Steven Wu. Dual query: Practical private query release for high dim ensional data. In International Conference on Machine Learning , pages 1170–1178. PMLR, 2014

work page 2014

[20] [20]

Differentially private chi-squared hypothesis testing: Goodness of ﬁt and independence testin g

Marco Gaboardi, Hyun Lim, Ryan Rogers, and Salil V adhan. Differentially private chi-squared hypothesis testing: Goodness of ﬁt and independence testin g. In International conference on machine learning, pages 2111–2120. PMLR, 2016

work page 2016

[21] [21]

Gurobi Optimizer Reference Manual, 2023

Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2023

work page 2023

[22] [22]

A sim ple and practical algorithm for dif- ferentially private data release

Moritz Hardt, Katrina Ligett, and Frank McSherry. A sim ple and practical algorithm for dif- ferentially private data release. Advances in neural information processing systems , 25, 2012

work page 2012

[23] [23]

In International conference on learning representations , 2019

James Jordon, Jinsung Y oon, and Mihaela V an Der Schaar.Pate-gan: Generating synthetic data with differential privacy guarantees. In International conference on learning representations , 2019

work page 2019

[24] [24]

Loan prediction problem dataset

Kaggle. Loan prediction problem dataset. https://www.kaggle.com/altruistdelhite04/loan-predi ction-probl

work page

[25] [25]

Accessed: May 8th, 2023

work page 2023

[26] [26]

Diffe rentially private chi-squared test by unit circle mechanism

Kazuya Kakizaki, Kazuto Fukuchi, and Jun Sakuma. Diffe rentially private chi-squared test by unit circle mechanism. In International Conference on Machine Learning , pages 1761–1770. PMLR, 2017

work page 2017

[27] [27]

A new class of private chi- square tests

Daniel Kifer and Ryan Rogers. A new class of private chi- square tests. In Proceedings of the 20th International Conference on Artiﬁcial Intelligence a nd Statistics, AISTATS , volume 17, pages 991–1000, 2016

work page 2016

[28] [28]

Releas- ing search queries and clicks privately

Aleksandra Korolova, Krishnaram Kenthapadi, Nina Mis hra, and Alexandros Ntoulas. Releas- ing search queries and clicks privately. In Proceedings of the 18th international conference on W orld wide web, pages 171–180, 2009

work page 2009

[29] [29]

Differentially Private Hierarchical Count-of-Counts Histograms

Y u-Hsuan Kuo, Cho-Chun Chiu, Daniel Kifer, Michael Hay , and Ashwin Machanava- jjhala. Differentially private hierarchical count-of-co unts histograms. arXiv preprint arXiv:1804.00370, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[30] [30]

Differentially private m-estimators

Jing Lei. Differentially private m-estimators. Advances in Neural Information Processing Systems, 24, 2011

work page 2011

[31] [31]

An adaptive mechanism for acc urate query answering under differential privacy

Chao Li and Gerome Miklau. An adaptive mechanism for acc urate query answering under differential privacy. Proceedings of the VLDB Endowment , 5(6), 2012

work page 2012

[32] [32]

Optimal error of query sets un der the differentially-private matrix mechanism

Chao Li and Gerome Miklau. Optimal error of query sets un der the differentially-private matrix mechanism. In Proceedings of the 16th International Conference on Databa se Theory, pages 272–283, 2013

work page 2013

[33] [33]

The matrix mechanism: Optimizing linear counting queries under diffe rential privacy

Chao Li, Gerome Miklau, Michael Hay, Andrew Mcgregor, a nd Vibhor Rastogi. The matrix mechanism: Optimizing linear counting queries under diffe rential privacy. The VLDB Journal, 24(6):757–781, December 2015

work page 2015

[34] [34]

Differentiall y private synthesization of multi- dimensional data using copula functions

Haoran Li, Li Xiong, and Xiaoqian Jiang. Differentiall y private synthesization of multi- dimensional data using copula functions. In Advances in database technology: proceedings. International conference on extending database technology, volume 2014, page 475. NIH Pub- lic Access, 2014

work page 2014

[35] [35]

Leverag- ing public data for practical private query release

Terrance Liu, Giuseppe Vietri, Thomas Steinke, Jonath an Ullman, and Steven Wu. Leverag- ing public data for practical private query release. In International Conference on Machine Learning, pages 6968–6977. PMLR, 2021

work page 2021

[36] [36]

Iterati ve methods for private synthetic data: Unifying framework and new methods

Terrance Liu, Giuseppe Vietri, and Steven Z Wu. Iterati ve methods for private synthetic data: Unifying framework and new methods. Advances in Neural Information Processing Systems , 34, 2021

work page 2021

[37] [37]

Private synthetic da ta with hierarchical structure

Terrance Liu and Zhiwei Steven Wu. Private synthetic da ta with hierarchical structure. arXiv preprint arXiv:2206.05942, 2022

work page arXiv 2022

[38] [38]

Optimizing error of high-dimensional statistical queries under diffe rential privacy

Ryan McKenna, Gerome Miklau, Michael Hay, and Ashwin Ma chanavajjhala. Optimizing error of high-dimensional statistical queries under diffe rential privacy. Proceedings of the VLDB Endowment, 11(10), 2018. 12

work page 2018

[39] [39]

Hdmm: Opti- mizing error of high-dimensional statistical queries unde r differential privacy

Ryan McKenna, Gerome Miklau, Michael Hay, and Ashwin Ma chanavajjhala. Hdmm: Opti- mizing error of high-dimensional statistical queries unde r differential privacy. arXiv preprint arXiv:2106.12118, 2021

work page arXiv 2021

[40] [40]

Aim: An adaptive and iterative mechanism for differentially private synthetic data

Ryan McKenna, Brett Mullins, Daniel Sheldon, and Gerom e Miklau. Aim: An adaptive and iterative mechanism for differentially private synthetic data. arXiv preprint arXiv:2201.12677 , 2022

work page arXiv 2022

[41] [41]

Relaxed marginal consistency for differentially private query answering

Ryan McKenna, Siddhant Pradhan, Daniel R Sheldon, and G erome Miklau. Relaxed marginal consistency for differentially private query answering. Advances in Neural Information Pro- cessing Systems, 34:20696–20707, 2021

work page 2021

[42] [42]

Graph ical-model based estimation and inference for differential privacy

Ryan McKenna, Daniel Sheldon, and Gerome Miklau. Graph ical-model based estimation and inference for differential privacy. In International Conference on Machine Learning , pages 4435–4444. PMLR, 2019

work page 2019

[43] [43]

New computational aspects of discrepancy theory

Aleksandar Nikolov. New computational aspects of discrepancy theory . PhD thesis, Rutgers University-Graduate School-New Brunswick, 2014

work page 2014

[44] [44]

Priview: practical differentially private re- lease of marginal contingency tables

Wahbeh Qardaji, Weining Y ang, and Ninghui Li. Priview: practical differentially private re- lease of marginal contingency tables. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data , pages 1435–1446, 2014

work page 2014

[45] [45]

New oracle-efﬁcient algorithms for private synthetic data release

Giuseppe Vietri, Grace Tian, Mark Bun, Thomas Steinke, and Steven Wu. New oracle-efﬁcient algorithms for private synthetic data release. In International Conference on Machine Learning, pages 9765–9774. PMLR, 2020

work page 2020

[46] [46]

Differentially private sql wit h bounded user contribution

Royce Wilson, Celia Y uxin Zhang, William Lam, Damien De sfontaines, Daniel Simmons- Marengo, and Bryant Gipson. Differentially private sql wit h bounded user contribution. In Proceedings on Privacy Enhancing T echnologies Symposium, 2020

work page 2020

[47] [47]

Optimizing ﬁtness- for-use of differentially private linear queries

Yingtai Xiao, Zeyu Ding, Y uxin Wang, Danfeng Zhang, and Daniel Kifer. Optimizing ﬁtness- for-use of differentially private linear queries. In VLDB, 2021

work page 2021

[48] [48]

Answering private linear queries adaptively using the common mechanism

Yingtai Xiao, Guanhong Wang, Danfeng Zhang, and Daniel Kifer. Answering private linear queries adaptively using the common mechanism. https://arxiv.org/abs/2212.00135, 2022

work page arXiv 2022

[49] [49]

Dp pro: Differentially private high- dimensional data release via random projection

Chugui Xu, Ju Ren, Y aoxue Zhang, Zhan Qin, and Kui Ren. Dp pro: Differentially private high- dimensional data release via random projection. IEEE Transactions on Information F orensics and Security, 12(12):3081–3093, 2017

work page 2017

[50] [50]

Ac- curate and efﬁcient private release of datacubes and contin gency tables

Grigory Y aroslavtsev, Graham Cormode, Cecilia M Proco piuc, and Divesh Srivastava. Ac- curate and efﬁcient private release of datacubes and contin gency tables. In 2013 IEEE 29th International Conference on Data Engineering (ICDE) , pages 745–756. IEEE, 2013

work page 2013

[51] [51]

Scalable privacy- preserving data sharing methodology for genome-wide assoc iation studies

Fei Y u, Stephen E Fienberg, Aleksandra B Slavkovi ´c, and Caroline Uhler. Scalable privacy- preserving data sharing methodology for genome-wide assoc iation studies. Journal of biomed- ical informatics, 50:133–141, 2014

work page 2014

[52] [52]

Convex optimization for linear query processing under approximate differential privacy

Ganzhao Y uan, Yin Y ang, Zhenjie Zhang, and Zhifeng Hao. Convex optimization for linear query processing under approximate differential privacy. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery an d Data Mining , 2016

work page 2016

[53] [53]

Low-rank mechanism: Optimizing batch queries under differ ential privacy

Ganzhao Y uan, Zhenjie Zhang, Marianne Winslett, Xiaok ui Xiao, Yin Y ang, and Zhifeng Hao. Low-rank mechanism: Optimizing batch queries under differ ential privacy. Proc. VLDB En- dow., 5(11):1352–1363, July 2012

work page 2012

[54] [54]

Optimizing batch linear queries under exact and approximat e differential privacy

Ganzhao Y uan, Zhenjie Zhang, Marianne Winslett, Xiaok ui Xiao, Yin Y ang, and Zhifeng Hao. Optimizing batch linear queries under exact and approximat e differential privacy. ACM Trans- actions on Database Systems (TODS) , 40(2):1–47, 2015

work page 2015

[55] [55]

Procopiuc, Dives h Srivastava, and Xiaokui Xiao

Jun Zhang, Graham Cormode, Cecilia M. Procopiuc, Dives h Srivastava, and Xiaokui Xiao. Privbayes: Private data release via bayesian networks. ACM Trans. Database Syst., 42(4), oct 2017

work page 2017

[56] [56]

Differentially private high- dimensional data publication via markov network

Wei Zhang, Jingwen Zhao, Fengqiong Wei, and Y unfang Che n. Differentially private high- dimensional data publication via markov network. EAI Endorsed Transactions on Security and Safety, 6(19), 2019

work page 2019

[57] [57]

Privsyn: Differentially private data synthesis

Zhikun Zhang, Tianhao Wang, Jean Honorio, Ninghui Li, M ichael Backes, Shibo He, Jiming Chen, and Y ang Zhang. Privsyn: Differentially private data synthesis. 2021. 13 Contents 1 Introduction 1 2 Preliminaries 2 2.1 Differential Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2 Matrix Mechanism . . . . . . . . . . . . . . ....

work page 2021

[58] [58]

residual

For each attribute, we can one-hot encode its attribute values as row vectors. So, for Att1, the attribute value ’a’ is encoded as [1, 0] and ’b’ is encoded as [0, 1]. For Att2, the attribute value ’y’ is encoded as [1, 0] and ’n’ is encoded as [0, 1]. For attribute Att3, the attribute value ’1’ is encoded as [1, 0, 0], the value ’2’ is encoded as [0, 1, ...

work page

[59] [59]

18∗ 12/ 11/c ≈ 4. 8/c etc. C Example Variance Calculations We next illustrate how the σ 2 A parameters affect the variance of different marginals base d on Theo- rem 4.7. We illustrate it with a toy dataset that has 5 attributes, ea ch with 3 possible values. In this discussion, the variance of a marginal is the largest varian ce of its cells (all cells w...

work page

[60] [60]

871∗ 2/ 3∗ 1≈ 8

154∗ 1∗ 1/ 9 + 10. 871∗ 2/ 3∗ 1≈ 8. 154. For the variance on the 5-way marginal, it consists of 1 term f or A′ =∅, 5 terms for when A′ has 1 attribute since there are 5 such A′ (but all the terms have the same value), 10 terms for when A′ has 2 attributes (all have the same value), 10 terms for when A′ has 3 attributes (again these terms all are equal), 5...

work page

[61] [61]

a,y,1” in the dataset, the s econd component corresponds to record “a,y,2

Suppose we have three attributes: Att1 takes values ‘a’ or ‘b’; Att2 takes values ‘y’ or ‘n’; Att3 takes values 1 or 2 or 3. ay1 ay2 ay3 an1 an2 an3 by1 by2 by3 bn1 bn2 bn3 bn1: [0, 1]⊗ [0, 1]⊗ [1, 0, 0] 0 0 0 0 0 0 0 0 0 1 0 0 [1, 1]⊗ [1,− 1]⊗ [1,− 1, 0] 1 -1 0 -1 1 0 1 -1 0 -1 1 0 [1,− 1]⊗ [1, 1]⊗ [1, 0,− 1] 1 0 -1 1 0 -1 -1 0 1 -1 0 1 Table 8: Kron pro...

work page

[62] [62]

Expressing the privacy cost of the optimal mechanism M∗ as a linear combination of the 1/σ 2 A values takes O(∑ A∈ W kload #cells(A)) total time

work page

[63] [63]

Expressing all of the V ar(A;M∗ ), for A∈ W kload, as a linear combinations of the σ 2 A values can be done in O(∑ A∈ W kload #cells(A)) total time

work page

[64] [64]

Note that the total number of cells on marginals in W kload is O (∑ A∈ W kload ∏ Atti∈ A|Atti| )

Computing all the noisy outputs of the optimal mechanism (i.e., MA(x; σ 2 A) for A ∈ closure(W kload)) takes O ( na ∑ A∈ W kload ∏ Atti∈ A(|Atti|+ 1) ) total time after the true an- swers have been precomputed (Line 1 in Algorithm 1). Note that the total number of cells on marginals in W kload is O (∑ A∈ W kload ∏ Atti∈ A|Atti| )

work page

[65] [65]

Reconstructing marginals for all A∈ W kload takes O(∑ A∈ W kload|A|#cells(A)2) total time

work page

[66] [66]

Proof of Theorem 4.8

Computing the variance of the cells for all of the marginal s for A∈ W kload can be done in O(∑ A∈ W kload #cells(A)) total time. Proof of Theorem 4.8. First we establish that |closure(W kload)| ≤ ∑ A∈ W kload #cells(A). Given an set A∈ W kload, we note that it has 2|A| subsets, so that |closure(A)| = 2|A|. How- ever, #cells(A) is at least 2|A| (because e...

work page

[67] [67]

The RMSE on the workload that the selection step guarantees is also meas ured

We evaluate the running time and accuracy of the select ion step Table 9 shows the running time for the selection step of HDMM and ResidualPlanner. The RMSE on the workload that the selection step guarantees is also meas ured. Both HDMM and ResidualPlanner have no trouble here. HDMM is nearly optimal in RMSE and Resid ualPlanner is optimal, as shown by ag...

work page

[68] [68]

For d beween 8 and 15, HDMM decides to add noise to all 3-way marginals and nothing else (even though the workload is all ≤ 3 marginals)

This maximum is always achieved for the sum query (a zer o-dimensional marginal) for the following reason. For d beween 8 and 15, HDMM decides to add noise to all 3-way marginals and nothing else (even though the workload is all ≤ 3 marginals). The privacy loss budget is split equally among them. Thus, each of the (d 3 ) marginals it measures gets N (0, (...

work page