ResidualPlanner+: a scalable matrix mechanism for marginals and beyond
Pith reviewed 2026-05-24 08:10 UTC · model grok-4.3
The pith
ResidualPlanner formulates matrix mechanism optimization for marginal queries as a convex program over variances, scaling to 100 attributes without full query matrices.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ResidualPlanner solves the matrix mechanism design problem for marginal queries by optimizing a convex function of the marginal variances directly; the resulting program yields the optimal Gaussian noise allocation for any such loss while remaining computationally tractable even when the underlying query matrix is too large to materialize. ResidualPlanner+ relaxes optimality to handle arbitrary user-specified combinations of marginals and range or prefix-sum queries on overlapping attribute sets, still using the same variance-centric convex program.
What carries the argument
A convex program whose decision variables are the variances (and covariances) of the target marginals; the program encodes the matrix mechanism's noise allocation without ever building the explicit query matrix.
If this is right
- Any convex loss written in terms of marginal variances can be optimized, removing the restriction to a single predefined objective.
- Variance and covariance values for every marginal can be returned directly without materializing large matrices.
- Mixed workloads that include both marginals and range or prefix-sum queries on the same data can be answered from one noise allocation.
- The same convex program runs in seconds on marginal workloads that exhaust memory in HDMM and finishes in a few minutes for 100-attribute tables.
Where Pith is reading between the lines
- The variance-centric view may let the same optimizer be reused for other linear query families whose error can be expressed through marginal variances.
- Because the program never builds the full matrix, it could be embedded inside systems that release many different marginal sets on the same underlying table without repeated matrix construction.
- Extending the convex program to additional query types would require only that their error contributions remain expressible as functions of the same marginal variances.
Load-bearing premise
The desired loss can be expressed as a convex function of the marginal variances alone, and the feasible set of variance vectors remains convex.
What would settle it
A concrete marginal query workload and convex loss for which the variance vector returned by the program produces strictly higher loss than the noise allocation obtained by solving the original matrix mechanism quadratic program on the full query matrix.
read the original abstract
Noisy marginals are a common form of confidentiality protecting data release and are useful for many downstream tasks such as contingency table analysis, construction of Bayesian networks, and even synthetic data generation. Privacy mechanisms that provide unbiased noisy answers to linear queries (such as marginals) are known as matrix mechanisms. We propose ResidualPlanner and ResidualPlanner+, two highly scalable matrix mechanisms. ResidualPlanner is both optimal and scalable for answering marginal queries with Gaussian noise, while ResidualPlanner+ provides support for more general workloads, such as combinations of marginals and range queries or prefix-sum queries. ResidualPlanner can optimize for many loss functions that can be written as a convex function of marginal variances (prior work was restricted to just one predefined objective function). ResidualPlanner can optimize the accuracy of marginals in large scale settings in seconds, even when the previous state of the art (HDMM) runs out of memory. It even runs on datasets with 100 attributes in a couple of minutes. Furthermore, ResidualPlanner can efficiently compute variance/covariance values for each marginal (prior methods quickly run out of memory, even for relatively small datasets). ResidualPlanner+ provides support for more complex workloads that combine marginal and range/prefix-sum queries (e.g., a marginal on race, a range query on age, and a combined race/age tabulation that answers age range queries for each race). It even supports custom user-defined workloads on different attributes. With this added flexibility, ResidualPlanner+ is not necessarily optimal, however it is still extremely scalable and outperforms the prior state-of-the-art (HDMM) on prefix-sum queries both in terms of accuracy and speed.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces ResidualPlanner, a matrix mechanism for releasing noisy answers to marginal queries under Gaussian noise that formulates the problem as a convex optimization over marginal variances to achieve optimality while supporting arbitrary convex loss functions of those variances; it further presents ResidualPlanner+ as a scalable (though not necessarily optimal) extension that handles mixed workloads combining marginals with range or prefix-sum queries on the same or different attributes. The work emphasizes practical scalability, reporting that ResidualPlanner solves large instances (including 100-attribute datasets) in minutes where prior methods such as HDMM exhaust memory, and that it can output per-marginal variance/covariance values efficiently.
Significance. If the optimality derivation and empirical scaling results hold, the contribution would be significant for the differential privacy and private data release literature: it removes a key practical barrier (memory and runtime) for high-dimensional marginal release while broadening the class of supported objectives and workloads, directly benefiting downstream applications such as contingency-table analysis and synthetic data generation.
minor comments (3)
- The abstract asserts that ResidualPlanner 'can optimize for many loss functions that can be written as a convex function of marginal variances' yet provides no concrete examples of such loss functions or the corresponding convex programs; a short illustrative list or reference to the relevant section would clarify the scope of the generalization over prior single-objective work.
- The claim that ResidualPlanner+ 'outperforms the prior state-of-the-art (HDMM) on prefix-sum queries both in terms of accuracy and speed' is presented without reference to the specific experimental configuration (workload size, privacy parameter, or number of repetitions); adding a pointer to the relevant table or figure would strengthen the statement.
- Notation for the marginal-variance vector and the mapping from query matrix to marginal variances is introduced without an explicit equation or diagram in the abstract; a compact definition early in the introduction would aid readers unfamiliar with the matrix-mechanism literature.
Simulated Author's Rebuttal
We thank the referee for the positive summary, recognition of the work's potential significance, and recommendation for minor revision. No specific major comments were provided in the report.
Circularity Check
No significant circularity detected
full rationale
The paper formulates ResidualPlanner as solving a convex program directly over marginal variances to achieve optimality for Gaussian matrix mechanisms on marginal queries, with ResidualPlanner+ as an explicit scalable relaxation for mixed workloads. This is presented as an algorithmic improvement over prior matrix mechanisms (e.g., HDMM) without reducing the central optimality claim to a fitted parameter, self-citation chain, or definitional equivalence. The abstract and described claims contain no load-bearing self-citations, ansatz smuggling, or renaming of known results as derivations; the work is self-contained against external benchmarks in the matrix mechanism literature.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Residual matrices RA formed via Kronecker products of Sub|Atti|; reconstruction via Sub† pseudo-inverses; convex program over σ²A parameters (Thm 4.4, 4.7)
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Optimality via symmetry of marginals under regular loss functions (Def 4.1)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Privacy preserving synthetic data release using d eep learning
Nazmiye Ceren Abay, Y an Zhou, Murat Kantarcioglu, Bhava ni Thuraisingham, and Latanya Sweeney. Privacy preserving synthetic data release using d eep learning. In Machine Learning and Knowledge Discovery in Databases: European Conference , ECML PKDD 2018, Dublin, Ireland, September 10–14, 2018, Proceedings, Part I 18 , pages 510–526. Springer, 2019
work page 2018
-
[2]
John M. Abowd, Robert Ashmead, Ryan Cumings-Menon, Sims on Garfinkel, Micah Heineck, Christine Heiss, Robert Johns, Daniel Kifer, Philip Lecler c, Ashwin Machanavajjhala, Brett Moran, William Sexton, Matthew Spence, and Pavel Zhuravlev . The 2020 census disclosure avoidance system topdown algorithm. Harvard Data Science Review , forthcoming. Preprint https...
work page 2020
-
[3]
Differentially Private Release of High-Dimensional Datasets using the Gaussian Copula
Hassan Jameel Asghar, Ming Ding, Thierry Rakotoarivelo , Sirine Mrabet, and Mohamed Ali Kaafar. Differentially private release of high-dimension al datasets using the gaussian copula. arXiv preprint arXiv:1902.01499 , 2019
work page internal anchor Pith review Pith/arXiv arXiv 1902
-
[4]
Differentially private query releas e through adaptive projection
Sergul A ydore, William Brown, Michael Kearns, Krishnar am Kenthapadi, Luca Melis, Aaron Roth, and Ankit A Siva. Differentially private query releas e through adaptive projection. In International Conference on Machine Learning , pages 457–467. PMLR, 2021
work page 2021
-
[5]
Borja Balle and Y u-Xiang Wang. Improving the gaussian me chanism for differential privacy: Analytical calibration and optimal denoising. In International Conference on Machine Learn- ing, ICML, 2018
work page 2018
-
[6]
Privacy, accuracy, and consistency too: a holistic solution to contingency table release
Boaz Barak, Kamalika Chaudhuri, Cynthia Dwork, Satyen K ale, Frank McSherry, and Kunal Talwar. Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 273–282, 2007
work page 2007
-
[7]
Concentrated differential privacy: Simplifications, extensions, and lower bounds
Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Proceedings, Part I, of the 14th International Conference o n Theory of Cryptography - V olume 9985, 2016
work page 2016
-
[8]
U.S. Census Bureau. Decennial census: 2010 summary files . https://www.census.gov/mp/www/cat/decennial_census_2010/
work page 2010
-
[9]
U.S. Census Bureau. The current population survey (cps) . https://www.census.gov/programs-surveys/cps.html, 2023
work page 2023
-
[10]
D ata synthesis via differentially private markov random fields
Kuntai Cai, Xiaoyu Lei, Jianxin Wei, and Xiaokui Xiao. D ata synthesis via differentially private markov random fields. Proceedings of the VLDB Endowment, 14(11):2190–2202, 2021
work page 2021
-
[11]
Differe ntially private high-dimensional data publication via sampling-based inference
Rui Chen, Qian Xiao, Y u Zhang, and Jianliang Xu. Differe ntially private high-dimensional data publication via sampling-based inference. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data m ining, pages 129–138, 2015
work page 2015
-
[12]
CVXPY: A Python-embed ded modeling language for convex optimization
Steven Diamond and Stephen Boyd. CVXPY: A Python-embed ded modeling language for convex optimization. Journal of Machine Learning Research , 17(83):1–5, 2016
work page 2016
-
[13]
Differentially private data cubes: Optimizing noise sources and consistency
Bolin Ding, Marianne Winslett, Jiawei Han, and Zhenhui Li. Differentially private data cubes: Optimizing noise sources and consistency. In Proceedings of the 2011 ACM SIGMOD Inter- national Conference on Management of Data , 2011
work page 2011
-
[14]
Ecos: A n socp solver for embedded sys- tems
Alexander Domahidi, Eric Chu, and Stephen Boyd. Ecos: A n socp solver for embedded sys- tems. In 2013 European control conference (ECC), pages 3071–3076. IEEE, 2013
work page 2013
-
[15]
Jinshuo Dong, Aaron Roth, and Weijie J. Su. Gaussian dif ferential privacy. Journal of the Royal Statistical Society: Series B (Statistical Methodol ogy), 84(1):3–37, 2022
work page 2022
-
[16]
UCI machine learning reposi tory, 2019
Dheeru Dua and Casey Graff. UCI machine learning reposi tory, 2019
work page 2019
-
[17]
Our data, ourselves: Privacy via distributed noise generation
Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry , Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation . In EUROCRYPT, pages 486–503, 2006
work page 2006
-
[18]
The power of factorization mechanisms in local and central differential privacy
Alexander Edmonds, Aleksandar Nikolov, and Jonathan U llman. The power of factorization mechanisms in local and central differential privacy. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing , pages 425–438, 2020. 11
work page 2020
-
[19]
Dual query: Practical private query release for high dim ensional data
Marco Gaboardi, Emilio Jesús Gallego Arias, Justin Hsu , Aaron Roth, and Zhiwei Steven Wu. Dual query: Practical private query release for high dim ensional data. In International Conference on Machine Learning , pages 1170–1178. PMLR, 2014
work page 2014
-
[20]
Differentially private chi-squared hypothesis testing: Goodness of fit and independence testin g
Marco Gaboardi, Hyun Lim, Ryan Rogers, and Salil V adhan. Differentially private chi-squared hypothesis testing: Goodness of fit and independence testin g. In International conference on machine learning, pages 2111–2120. PMLR, 2016
work page 2016
-
[21]
Gurobi Optimizer Reference Manual, 2023
Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2023
work page 2023
-
[22]
A sim ple and practical algorithm for dif- ferentially private data release
Moritz Hardt, Katrina Ligett, and Frank McSherry. A sim ple and practical algorithm for dif- ferentially private data release. Advances in neural information processing systems , 25, 2012
work page 2012
-
[23]
In International conference on learning representations , 2019
James Jordon, Jinsung Y oon, and Mihaela V an Der Schaar.Pate-gan: Generating synthetic data with differential privacy guarantees. In International conference on learning representations , 2019
work page 2019
-
[24]
Loan prediction problem dataset
Kaggle. Loan prediction problem dataset. https://www.kaggle.com/altruistdelhite04/loan-predi ction-probl
-
[25]
Accessed: May 8th, 2023
work page 2023
-
[26]
Diffe rentially private chi-squared test by unit circle mechanism
Kazuya Kakizaki, Kazuto Fukuchi, and Jun Sakuma. Diffe rentially private chi-squared test by unit circle mechanism. In International Conference on Machine Learning , pages 1761–1770. PMLR, 2017
work page 2017
-
[27]
A new class of private chi- square tests
Daniel Kifer and Ryan Rogers. A new class of private chi- square tests. In Proceedings of the 20th International Conference on Artificial Intelligence a nd Statistics, AISTATS , volume 17, pages 991–1000, 2016
work page 2016
-
[28]
Releas- ing search queries and clicks privately
Aleksandra Korolova, Krishnaram Kenthapadi, Nina Mis hra, and Alexandros Ntoulas. Releas- ing search queries and clicks privately. In Proceedings of the 18th international conference on W orld wide web, pages 171–180, 2009
work page 2009
-
[29]
Differentially Private Hierarchical Count-of-Counts Histograms
Y u-Hsuan Kuo, Cho-Chun Chiu, Daniel Kifer, Michael Hay , and Ashwin Machanava- jjhala. Differentially private hierarchical count-of-co unts histograms. arXiv preprint arXiv:1804.00370, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[30]
Differentially private m-estimators
Jing Lei. Differentially private m-estimators. Advances in Neural Information Processing Systems, 24, 2011
work page 2011
-
[31]
An adaptive mechanism for acc urate query answering under differential privacy
Chao Li and Gerome Miklau. An adaptive mechanism for acc urate query answering under differential privacy. Proceedings of the VLDB Endowment , 5(6), 2012
work page 2012
-
[32]
Optimal error of query sets un der the differentially-private matrix mechanism
Chao Li and Gerome Miklau. Optimal error of query sets un der the differentially-private matrix mechanism. In Proceedings of the 16th International Conference on Databa se Theory, pages 272–283, 2013
work page 2013
-
[33]
The matrix mechanism: Optimizing linear counting queries under diffe rential privacy
Chao Li, Gerome Miklau, Michael Hay, Andrew Mcgregor, a nd Vibhor Rastogi. The matrix mechanism: Optimizing linear counting queries under diffe rential privacy. The VLDB Journal, 24(6):757–781, December 2015
work page 2015
-
[34]
Differentiall y private synthesization of multi- dimensional data using copula functions
Haoran Li, Li Xiong, and Xiaoqian Jiang. Differentiall y private synthesization of multi- dimensional data using copula functions. In Advances in database technology: proceedings. International conference on extending database technology, volume 2014, page 475. NIH Pub- lic Access, 2014
work page 2014
-
[35]
Leverag- ing public data for practical private query release
Terrance Liu, Giuseppe Vietri, Thomas Steinke, Jonath an Ullman, and Steven Wu. Leverag- ing public data for practical private query release. In International Conference on Machine Learning, pages 6968–6977. PMLR, 2021
work page 2021
-
[36]
Iterati ve methods for private synthetic data: Unifying framework and new methods
Terrance Liu, Giuseppe Vietri, and Steven Z Wu. Iterati ve methods for private synthetic data: Unifying framework and new methods. Advances in Neural Information Processing Systems , 34, 2021
work page 2021
-
[37]
Private synthetic da ta with hierarchical structure
Terrance Liu and Zhiwei Steven Wu. Private synthetic da ta with hierarchical structure. arXiv preprint arXiv:2206.05942, 2022
-
[38]
Optimizing error of high-dimensional statistical queries under diffe rential privacy
Ryan McKenna, Gerome Miklau, Michael Hay, and Ashwin Ma chanavajjhala. Optimizing error of high-dimensional statistical queries under diffe rential privacy. Proceedings of the VLDB Endowment, 11(10), 2018. 12
work page 2018
-
[39]
Hdmm: Opti- mizing error of high-dimensional statistical queries unde r differential privacy
Ryan McKenna, Gerome Miklau, Michael Hay, and Ashwin Ma chanavajjhala. Hdmm: Opti- mizing error of high-dimensional statistical queries unde r differential privacy. arXiv preprint arXiv:2106.12118, 2021
-
[40]
Aim: An adaptive and iterative mechanism for differentially private synthetic data
Ryan McKenna, Brett Mullins, Daniel Sheldon, and Gerom e Miklau. Aim: An adaptive and iterative mechanism for differentially private synthetic data. arXiv preprint arXiv:2201.12677 , 2022
-
[41]
Relaxed marginal consistency for differentially private query answering
Ryan McKenna, Siddhant Pradhan, Daniel R Sheldon, and G erome Miklau. Relaxed marginal consistency for differentially private query answering. Advances in Neural Information Pro- cessing Systems, 34:20696–20707, 2021
work page 2021
-
[42]
Graph ical-model based estimation and inference for differential privacy
Ryan McKenna, Daniel Sheldon, and Gerome Miklau. Graph ical-model based estimation and inference for differential privacy. In International Conference on Machine Learning , pages 4435–4444. PMLR, 2019
work page 2019
-
[43]
New computational aspects of discrepancy theory
Aleksandar Nikolov. New computational aspects of discrepancy theory . PhD thesis, Rutgers University-Graduate School-New Brunswick, 2014
work page 2014
-
[44]
Priview: practical differentially private re- lease of marginal contingency tables
Wahbeh Qardaji, Weining Y ang, and Ninghui Li. Priview: practical differentially private re- lease of marginal contingency tables. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data , pages 1435–1446, 2014
work page 2014
-
[45]
New oracle-efficient algorithms for private synthetic data release
Giuseppe Vietri, Grace Tian, Mark Bun, Thomas Steinke, and Steven Wu. New oracle-efficient algorithms for private synthetic data release. In International Conference on Machine Learning, pages 9765–9774. PMLR, 2020
work page 2020
-
[46]
Differentially private sql wit h bounded user contribution
Royce Wilson, Celia Y uxin Zhang, William Lam, Damien De sfontaines, Daniel Simmons- Marengo, and Bryant Gipson. Differentially private sql wit h bounded user contribution. In Proceedings on Privacy Enhancing T echnologies Symposium, 2020
work page 2020
-
[47]
Optimizing fitness- for-use of differentially private linear queries
Yingtai Xiao, Zeyu Ding, Y uxin Wang, Danfeng Zhang, and Daniel Kifer. Optimizing fitness- for-use of differentially private linear queries. In VLDB, 2021
work page 2021
-
[48]
Answering private linear queries adaptively using the common mechanism
Yingtai Xiao, Guanhong Wang, Danfeng Zhang, and Daniel Kifer. Answering private linear queries adaptively using the common mechanism. https://arxiv.org/abs/2212.00135, 2022
-
[49]
Dp pro: Differentially private high- dimensional data release via random projection
Chugui Xu, Ju Ren, Y aoxue Zhang, Zhan Qin, and Kui Ren. Dp pro: Differentially private high- dimensional data release via random projection. IEEE Transactions on Information F orensics and Security, 12(12):3081–3093, 2017
work page 2017
-
[50]
Ac- curate and efficient private release of datacubes and contin gency tables
Grigory Y aroslavtsev, Graham Cormode, Cecilia M Proco piuc, and Divesh Srivastava. Ac- curate and efficient private release of datacubes and contin gency tables. In 2013 IEEE 29th International Conference on Data Engineering (ICDE) , pages 745–756. IEEE, 2013
work page 2013
-
[51]
Scalable privacy- preserving data sharing methodology for genome-wide assoc iation studies
Fei Y u, Stephen E Fienberg, Aleksandra B Slavkovi ´c, and Caroline Uhler. Scalable privacy- preserving data sharing methodology for genome-wide assoc iation studies. Journal of biomed- ical informatics, 50:133–141, 2014
work page 2014
-
[52]
Convex optimization for linear query processing under approximate differential privacy
Ganzhao Y uan, Yin Y ang, Zhenjie Zhang, and Zhifeng Hao. Convex optimization for linear query processing under approximate differential privacy. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery an d Data Mining , 2016
work page 2016
-
[53]
Low-rank mechanism: Optimizing batch queries under differ ential privacy
Ganzhao Y uan, Zhenjie Zhang, Marianne Winslett, Xiaok ui Xiao, Yin Y ang, and Zhifeng Hao. Low-rank mechanism: Optimizing batch queries under differ ential privacy. Proc. VLDB En- dow., 5(11):1352–1363, July 2012
work page 2012
-
[54]
Optimizing batch linear queries under exact and approximat e differential privacy
Ganzhao Y uan, Zhenjie Zhang, Marianne Winslett, Xiaok ui Xiao, Yin Y ang, and Zhifeng Hao. Optimizing batch linear queries under exact and approximat e differential privacy. ACM Trans- actions on Database Systems (TODS) , 40(2):1–47, 2015
work page 2015
-
[55]
Procopiuc, Dives h Srivastava, and Xiaokui Xiao
Jun Zhang, Graham Cormode, Cecilia M. Procopiuc, Dives h Srivastava, and Xiaokui Xiao. Privbayes: Private data release via bayesian networks. ACM Trans. Database Syst., 42(4), oct 2017
work page 2017
-
[56]
Differentially private high- dimensional data publication via markov network
Wei Zhang, Jingwen Zhao, Fengqiong Wei, and Y unfang Che n. Differentially private high- dimensional data publication via markov network. EAI Endorsed Transactions on Security and Safety, 6(19), 2019
work page 2019
-
[57]
Privsyn: Differentially private data synthesis
Zhikun Zhang, Tianhao Wang, Jean Honorio, Ninghui Li, M ichael Backes, Shibo He, Jiming Chen, and Y ang Zhang. Privsyn: Differentially private data synthesis. 2021. 13 Contents 1 Introduction 1 2 Preliminaries 2 2.1 Differential Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2 Matrix Mechanism . . . . . . . . . . . . . . ....
work page 2021
-
[58]
For each attribute, we can one-hot encode its attribute values as row vectors. So, for Att1, the attribute value ’a’ is encoded as [1, 0] and ’b’ is encoded as [0, 1]. For Att2, the attribute value ’y’ is encoded as [1, 0] and ’n’ is encoded as [0, 1]. For attribute Att3, the attribute value ’1’ is encoded as [1, 0, 0], the value ’2’ is encoded as [0, 1, ...
-
[59]
18∗ 12/ 11/c ≈ 4. 8/c etc. C Example Variance Calculations We next illustrate how the σ 2 A parameters affect the variance of different marginals base d on Theo- rem 4.7. We illustrate it with a toy dataset that has 5 attributes, ea ch with 3 possible values. In this discussion, the variance of a marginal is the largest varian ce of its cells (all cells w...
-
[60]
154∗ 1∗ 1/ 9 + 10. 871∗ 2/ 3∗ 1≈ 8. 154. For the variance on the 5-way marginal, it consists of 1 term f or A′ =∅, 5 terms for when A′ has 1 attribute since there are 5 such A′ (but all the terms have the same value), 10 terms for when A′ has 2 attributes (all have the same value), 10 terms for when A′ has 3 attributes (again these terms all are equal), 5...
-
[61]
a,y,1” in the dataset, the s econd component corresponds to record “a,y,2
Suppose we have three attributes: Att1 takes values ‘a’ or ‘b’; Att2 takes values ‘y’ or ‘n’; Att3 takes values 1 or 2 or 3. ay1 ay2 ay3 an1 an2 an3 by1 by2 by3 bn1 bn2 bn3 bn1: [0, 1]⊗ [0, 1]⊗ [1, 0, 0] 0 0 0 0 0 0 0 0 0 1 0 0 [1, 1]⊗ [1,− 1]⊗ [1,− 1, 0] 1 -1 0 -1 1 0 1 -1 0 -1 1 0 [1,− 1]⊗ [1, 1]⊗ [1, 0,− 1] 1 0 -1 1 0 -1 -1 0 1 -1 0 1 Table 8: Kron pro...
-
[62]
Expressing the privacy cost of the optimal mechanism M∗ as a linear combination of the 1/σ 2 A values takes O(∑ A∈ W kload #cells(A)) total time
-
[63]
Expressing all of the V ar(A;M∗ ), for A∈ W kload, as a linear combinations of the σ 2 A values can be done in O(∑ A∈ W kload #cells(A)) total time
-
[64]
Note that the total number of cells on marginals in W kload is O (∑ A∈ W kload ∏ Atti∈ A|Atti| )
Computing all the noisy outputs of the optimal mechanism (i.e., MA(x; σ 2 A) for A ∈ closure(W kload)) takes O ( na ∑ A∈ W kload ∏ Atti∈ A(|Atti|+ 1) ) total time after the true an- swers have been precomputed (Line 1 in Algorithm 1). Note that the total number of cells on marginals in W kload is O (∑ A∈ W kload ∏ Atti∈ A|Atti| )
-
[65]
Reconstructing marginals for all A∈ W kload takes O(∑ A∈ W kload|A|#cells(A)2) total time
-
[66]
Computing the variance of the cells for all of the marginal s for A∈ W kload can be done in O(∑ A∈ W kload #cells(A)) total time. Proof of Theorem 4.8. First we establish that |closure(W kload)| ≤ ∑ A∈ W kload #cells(A). Given an set A∈ W kload, we note that it has 2|A| subsets, so that |closure(A)| = 2|A|. How- ever, #cells(A) is at least 2|A| (because e...
-
[67]
The RMSE on the workload that the selection step guarantees is also meas ured
We evaluate the running time and accuracy of the select ion step Table 9 shows the running time for the selection step of HDMM and ResidualPlanner. The RMSE on the workload that the selection step guarantees is also meas ured. Both HDMM and ResidualPlanner have no trouble here. HDMM is nearly optimal in RMSE and Resid ualPlanner is optimal, as shown by ag...
-
[68]
This maximum is always achieved for the sum query (a zer o-dimensional marginal) for the following reason. For d beween 8 and 15, HDMM decides to add noise to all 3-way marginals and nothing else (even though the workload is all ≤ 3 marginals). The privacy loss budget is split equally among them. Thus, each of the (d 3 ) marginals it measures gets N (0, (...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.