pith. sign in

arxiv: 2305.08175 · v4 · submitted 2023-05-14 · 💻 cs.DB · cs.CR· cs.LG

ResidualPlanner+: a scalable matrix mechanism for marginals and beyond

Pith reviewed 2026-05-24 08:10 UTC · model grok-4.3

classification 💻 cs.DB cs.CRcs.LG
keywords matrix mechanismsdifferential privacymarginal queriesGaussian noiseconvex optimizationrange queriesprefix sumsscalable privacy
0
0 comments X

The pith

ResidualPlanner formulates matrix mechanism optimization for marginal queries as a convex program over variances, scaling to 100 attributes without full query matrices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ResidualPlanner as a method that reduces the problem of finding optimal noise for marginal queries under Gaussian noise to a convex optimization over the variances of those marginals. This formulation supports multiple convex loss functions and avoids constructing or storing the full query matrix, which lets it run on datasets with 100 attributes in minutes where prior methods exhaust memory. ResidualPlanner+ extends the approach to mixed workloads that combine marginals with range queries or prefix-sum queries on the same attributes. A reader would care because noisy marginal releases are a standard building block for contingency tables, Bayesian networks, and synthetic data, and current methods hit hard scalability limits on real attribute counts.

Core claim

ResidualPlanner solves the matrix mechanism design problem for marginal queries by optimizing a convex function of the marginal variances directly; the resulting program yields the optimal Gaussian noise allocation for any such loss while remaining computationally tractable even when the underlying query matrix is too large to materialize. ResidualPlanner+ relaxes optimality to handle arbitrary user-specified combinations of marginals and range or prefix-sum queries on overlapping attribute sets, still using the same variance-centric convex program.

What carries the argument

A convex program whose decision variables are the variances (and covariances) of the target marginals; the program encodes the matrix mechanism's noise allocation without ever building the explicit query matrix.

If this is right

  • Any convex loss written in terms of marginal variances can be optimized, removing the restriction to a single predefined objective.
  • Variance and covariance values for every marginal can be returned directly without materializing large matrices.
  • Mixed workloads that include both marginals and range or prefix-sum queries on the same data can be answered from one noise allocation.
  • The same convex program runs in seconds on marginal workloads that exhaust memory in HDMM and finishes in a few minutes for 100-attribute tables.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The variance-centric view may let the same optimizer be reused for other linear query families whose error can be expressed through marginal variances.
  • Because the program never builds the full matrix, it could be embedded inside systems that release many different marginal sets on the same underlying table without repeated matrix construction.
  • Extending the convex program to additional query types would require only that their error contributions remain expressible as functions of the same marginal variances.

Load-bearing premise

The desired loss can be expressed as a convex function of the marginal variances alone, and the feasible set of variance vectors remains convex.

What would settle it

A concrete marginal query workload and convex loss for which the variance vector returned by the program produces strictly higher loss than the noise allocation obtained by solving the original matrix mechanism quadratic program on the full query matrix.

read the original abstract

Noisy marginals are a common form of confidentiality protecting data release and are useful for many downstream tasks such as contingency table analysis, construction of Bayesian networks, and even synthetic data generation. Privacy mechanisms that provide unbiased noisy answers to linear queries (such as marginals) are known as matrix mechanisms. We propose ResidualPlanner and ResidualPlanner+, two highly scalable matrix mechanisms. ResidualPlanner is both optimal and scalable for answering marginal queries with Gaussian noise, while ResidualPlanner+ provides support for more general workloads, such as combinations of marginals and range queries or prefix-sum queries. ResidualPlanner can optimize for many loss functions that can be written as a convex function of marginal variances (prior work was restricted to just one predefined objective function). ResidualPlanner can optimize the accuracy of marginals in large scale settings in seconds, even when the previous state of the art (HDMM) runs out of memory. It even runs on datasets with 100 attributes in a couple of minutes. Furthermore, ResidualPlanner can efficiently compute variance/covariance values for each marginal (prior methods quickly run out of memory, even for relatively small datasets). ResidualPlanner+ provides support for more complex workloads that combine marginal and range/prefix-sum queries (e.g., a marginal on race, a range query on age, and a combined race/age tabulation that answers age range queries for each race). It even supports custom user-defined workloads on different attributes. With this added flexibility, ResidualPlanner+ is not necessarily optimal, however it is still extremely scalable and outperforms the prior state-of-the-art (HDMM) on prefix-sum queries both in terms of accuracy and speed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces ResidualPlanner, a matrix mechanism for releasing noisy answers to marginal queries under Gaussian noise that formulates the problem as a convex optimization over marginal variances to achieve optimality while supporting arbitrary convex loss functions of those variances; it further presents ResidualPlanner+ as a scalable (though not necessarily optimal) extension that handles mixed workloads combining marginals with range or prefix-sum queries on the same or different attributes. The work emphasizes practical scalability, reporting that ResidualPlanner solves large instances (including 100-attribute datasets) in minutes where prior methods such as HDMM exhaust memory, and that it can output per-marginal variance/covariance values efficiently.

Significance. If the optimality derivation and empirical scaling results hold, the contribution would be significant for the differential privacy and private data release literature: it removes a key practical barrier (memory and runtime) for high-dimensional marginal release while broadening the class of supported objectives and workloads, directly benefiting downstream applications such as contingency-table analysis and synthetic data generation.

minor comments (3)
  1. The abstract asserts that ResidualPlanner 'can optimize for many loss functions that can be written as a convex function of marginal variances' yet provides no concrete examples of such loss functions or the corresponding convex programs; a short illustrative list or reference to the relevant section would clarify the scope of the generalization over prior single-objective work.
  2. The claim that ResidualPlanner+ 'outperforms the prior state-of-the-art (HDMM) on prefix-sum queries both in terms of accuracy and speed' is presented without reference to the specific experimental configuration (workload size, privacy parameter, or number of repetitions); adding a pointer to the relevant table or figure would strengthen the statement.
  3. Notation for the marginal-variance vector and the mapping from query matrix to marginal variances is introduced without an explicit equation or diagram in the abstract; a compact definition early in the introduction would aid readers unfamiliar with the matrix-mechanism literature.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, recognition of the work's potential significance, and recommendation for minor revision. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper formulates ResidualPlanner as solving a convex program directly over marginal variances to achieve optimality for Gaussian matrix mechanisms on marginal queries, with ResidualPlanner+ as an explicit scalable relaxation for mixed workloads. This is presented as an algorithmic improvement over prior matrix mechanisms (e.g., HDMM) without reducing the central optimality claim to a fitted parameter, self-citation chain, or definitional equivalence. The abstract and described claims contain no load-bearing self-citations, ansatz smuggling, or renaming of known results as derivations; the work is self-contained against external benchmarks in the matrix mechanism literature.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the central claims rest on unstated assumptions about query linearity and convexity that are standard in the matrix mechanism literature.

pith-pipeline@v0.9.0 · 5848 in / 1103 out tokens · 24114 ms · 2026-05-24T08:10:28.598260+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · 2 internal anchors

  1. [1]

    Privacy preserving synthetic data release using d eep learning

    Nazmiye Ceren Abay, Y an Zhou, Murat Kantarcioglu, Bhava ni Thuraisingham, and Latanya Sweeney. Privacy preserving synthetic data release using d eep learning. In Machine Learning and Knowledge Discovery in Databases: European Conference , ECML PKDD 2018, Dublin, Ireland, September 10–14, 2018, Proceedings, Part I 18 , pages 510–526. Springer, 2019

  2. [2]

    John M. Abowd, Robert Ashmead, Ryan Cumings-Menon, Sims on Garfinkel, Micah Heineck, Christine Heiss, Robert Johns, Daniel Kifer, Philip Lecler c, Ashwin Machanavajjhala, Brett Moran, William Sexton, Matthew Spence, and Pavel Zhuravlev . The 2020 census disclosure avoidance system topdown algorithm. Harvard Data Science Review , forthcoming. Preprint https...

  3. [3]

    Differentially Private Release of High-Dimensional Datasets using the Gaussian Copula

    Hassan Jameel Asghar, Ming Ding, Thierry Rakotoarivelo , Sirine Mrabet, and Mohamed Ali Kaafar. Differentially private release of high-dimension al datasets using the gaussian copula. arXiv preprint arXiv:1902.01499 , 2019

  4. [4]

    Differentially private query releas e through adaptive projection

    Sergul A ydore, William Brown, Michael Kearns, Krishnar am Kenthapadi, Luca Melis, Aaron Roth, and Ankit A Siva. Differentially private query releas e through adaptive projection. In International Conference on Machine Learning , pages 457–467. PMLR, 2021

  5. [5]

    Improving the gaussian me chanism for differential privacy: Analytical calibration and optimal denoising

    Borja Balle and Y u-Xiang Wang. Improving the gaussian me chanism for differential privacy: Analytical calibration and optimal denoising. In International Conference on Machine Learn- ing, ICML, 2018

  6. [6]

    Privacy, accuracy, and consistency too: a holistic solution to contingency table release

    Boaz Barak, Kamalika Chaudhuri, Cynthia Dwork, Satyen K ale, Frank McSherry, and Kunal Talwar. Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 273–282, 2007

  7. [7]

    Concentrated differential privacy: Simplifications, extensions, and lower bounds

    Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Proceedings, Part I, of the 14th International Conference o n Theory of Cryptography - V olume 9985, 2016

  8. [8]

    Census Bureau

    U.S. Census Bureau. Decennial census: 2010 summary files . https://www.census.gov/mp/www/cat/decennial_census_2010/

  9. [9]

    Census Bureau

    U.S. Census Bureau. The current population survey (cps) . https://www.census.gov/programs-surveys/cps.html, 2023

  10. [10]

    D ata synthesis via differentially private markov random fields

    Kuntai Cai, Xiaoyu Lei, Jianxin Wei, and Xiaokui Xiao. D ata synthesis via differentially private markov random fields. Proceedings of the VLDB Endowment, 14(11):2190–2202, 2021

  11. [11]

    Differe ntially private high-dimensional data publication via sampling-based inference

    Rui Chen, Qian Xiao, Y u Zhang, and Jianliang Xu. Differe ntially private high-dimensional data publication via sampling-based inference. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data m ining, pages 129–138, 2015

  12. [12]

    CVXPY: A Python-embed ded modeling language for convex optimization

    Steven Diamond and Stephen Boyd. CVXPY: A Python-embed ded modeling language for convex optimization. Journal of Machine Learning Research , 17(83):1–5, 2016

  13. [13]

    Differentially private data cubes: Optimizing noise sources and consistency

    Bolin Ding, Marianne Winslett, Jiawei Han, and Zhenhui Li. Differentially private data cubes: Optimizing noise sources and consistency. In Proceedings of the 2011 ACM SIGMOD Inter- national Conference on Management of Data , 2011

  14. [14]

    Ecos: A n socp solver for embedded sys- tems

    Alexander Domahidi, Eric Chu, and Stephen Boyd. Ecos: A n socp solver for embedded sys- tems. In 2013 European control conference (ECC), pages 3071–3076. IEEE, 2013

  15. [15]

    Jinshuo Dong, Aaron Roth, and Weijie J. Su. Gaussian dif ferential privacy. Journal of the Royal Statistical Society: Series B (Statistical Methodol ogy), 84(1):3–37, 2022

  16. [16]

    UCI machine learning reposi tory, 2019

    Dheeru Dua and Casey Graff. UCI machine learning reposi tory, 2019

  17. [17]

    Our data, ourselves: Privacy via distributed noise generation

    Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry , Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation . In EUROCRYPT, pages 486–503, 2006

  18. [18]

    The power of factorization mechanisms in local and central differential privacy

    Alexander Edmonds, Aleksandar Nikolov, and Jonathan U llman. The power of factorization mechanisms in local and central differential privacy. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing , pages 425–438, 2020. 11

  19. [19]

    Dual query: Practical private query release for high dim ensional data

    Marco Gaboardi, Emilio Jesús Gallego Arias, Justin Hsu , Aaron Roth, and Zhiwei Steven Wu. Dual query: Practical private query release for high dim ensional data. In International Conference on Machine Learning , pages 1170–1178. PMLR, 2014

  20. [20]

    Differentially private chi-squared hypothesis testing: Goodness of fit and independence testin g

    Marco Gaboardi, Hyun Lim, Ryan Rogers, and Salil V adhan. Differentially private chi-squared hypothesis testing: Goodness of fit and independence testin g. In International conference on machine learning, pages 2111–2120. PMLR, 2016

  21. [21]

    Gurobi Optimizer Reference Manual, 2023

    Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2023

  22. [22]

    A sim ple and practical algorithm for dif- ferentially private data release

    Moritz Hardt, Katrina Ligett, and Frank McSherry. A sim ple and practical algorithm for dif- ferentially private data release. Advances in neural information processing systems , 25, 2012

  23. [23]

    In International conference on learning representations , 2019

    James Jordon, Jinsung Y oon, and Mihaela V an Der Schaar.Pate-gan: Generating synthetic data with differential privacy guarantees. In International conference on learning representations , 2019

  24. [24]

    Loan prediction problem dataset

    Kaggle. Loan prediction problem dataset. https://www.kaggle.com/altruistdelhite04/loan-predi ction-probl

  25. [25]

    Accessed: May 8th, 2023

  26. [26]

    Diffe rentially private chi-squared test by unit circle mechanism

    Kazuya Kakizaki, Kazuto Fukuchi, and Jun Sakuma. Diffe rentially private chi-squared test by unit circle mechanism. In International Conference on Machine Learning , pages 1761–1770. PMLR, 2017

  27. [27]

    A new class of private chi- square tests

    Daniel Kifer and Ryan Rogers. A new class of private chi- square tests. In Proceedings of the 20th International Conference on Artificial Intelligence a nd Statistics, AISTATS , volume 17, pages 991–1000, 2016

  28. [28]

    Releas- ing search queries and clicks privately

    Aleksandra Korolova, Krishnaram Kenthapadi, Nina Mis hra, and Alexandros Ntoulas. Releas- ing search queries and clicks privately. In Proceedings of the 18th international conference on W orld wide web, pages 171–180, 2009

  29. [29]

    Differentially Private Hierarchical Count-of-Counts Histograms

    Y u-Hsuan Kuo, Cho-Chun Chiu, Daniel Kifer, Michael Hay , and Ashwin Machanava- jjhala. Differentially private hierarchical count-of-co unts histograms. arXiv preprint arXiv:1804.00370, 2018

  30. [30]

    Differentially private m-estimators

    Jing Lei. Differentially private m-estimators. Advances in Neural Information Processing Systems, 24, 2011

  31. [31]

    An adaptive mechanism for acc urate query answering under differential privacy

    Chao Li and Gerome Miklau. An adaptive mechanism for acc urate query answering under differential privacy. Proceedings of the VLDB Endowment , 5(6), 2012

  32. [32]

    Optimal error of query sets un der the differentially-private matrix mechanism

    Chao Li and Gerome Miklau. Optimal error of query sets un der the differentially-private matrix mechanism. In Proceedings of the 16th International Conference on Databa se Theory, pages 272–283, 2013

  33. [33]

    The matrix mechanism: Optimizing linear counting queries under diffe rential privacy

    Chao Li, Gerome Miklau, Michael Hay, Andrew Mcgregor, a nd Vibhor Rastogi. The matrix mechanism: Optimizing linear counting queries under diffe rential privacy. The VLDB Journal, 24(6):757–781, December 2015

  34. [34]

    Differentiall y private synthesization of multi- dimensional data using copula functions

    Haoran Li, Li Xiong, and Xiaoqian Jiang. Differentiall y private synthesization of multi- dimensional data using copula functions. In Advances in database technology: proceedings. International conference on extending database technology, volume 2014, page 475. NIH Pub- lic Access, 2014

  35. [35]

    Leverag- ing public data for practical private query release

    Terrance Liu, Giuseppe Vietri, Thomas Steinke, Jonath an Ullman, and Steven Wu. Leverag- ing public data for practical private query release. In International Conference on Machine Learning, pages 6968–6977. PMLR, 2021

  36. [36]

    Iterati ve methods for private synthetic data: Unifying framework and new methods

    Terrance Liu, Giuseppe Vietri, and Steven Z Wu. Iterati ve methods for private synthetic data: Unifying framework and new methods. Advances in Neural Information Processing Systems , 34, 2021

  37. [37]

    Private synthetic da ta with hierarchical structure

    Terrance Liu and Zhiwei Steven Wu. Private synthetic da ta with hierarchical structure. arXiv preprint arXiv:2206.05942, 2022

  38. [38]

    Optimizing error of high-dimensional statistical queries under diffe rential privacy

    Ryan McKenna, Gerome Miklau, Michael Hay, and Ashwin Ma chanavajjhala. Optimizing error of high-dimensional statistical queries under diffe rential privacy. Proceedings of the VLDB Endowment, 11(10), 2018. 12

  39. [39]

    Hdmm: Opti- mizing error of high-dimensional statistical queries unde r differential privacy

    Ryan McKenna, Gerome Miklau, Michael Hay, and Ashwin Ma chanavajjhala. Hdmm: Opti- mizing error of high-dimensional statistical queries unde r differential privacy. arXiv preprint arXiv:2106.12118, 2021

  40. [40]

    Aim: An adaptive and iterative mechanism for differentially private synthetic data

    Ryan McKenna, Brett Mullins, Daniel Sheldon, and Gerom e Miklau. Aim: An adaptive and iterative mechanism for differentially private synthetic data. arXiv preprint arXiv:2201.12677 , 2022

  41. [41]

    Relaxed marginal consistency for differentially private query answering

    Ryan McKenna, Siddhant Pradhan, Daniel R Sheldon, and G erome Miklau. Relaxed marginal consistency for differentially private query answering. Advances in Neural Information Pro- cessing Systems, 34:20696–20707, 2021

  42. [42]

    Graph ical-model based estimation and inference for differential privacy

    Ryan McKenna, Daniel Sheldon, and Gerome Miklau. Graph ical-model based estimation and inference for differential privacy. In International Conference on Machine Learning , pages 4435–4444. PMLR, 2019

  43. [43]

    New computational aspects of discrepancy theory

    Aleksandar Nikolov. New computational aspects of discrepancy theory . PhD thesis, Rutgers University-Graduate School-New Brunswick, 2014

  44. [44]

    Priview: practical differentially private re- lease of marginal contingency tables

    Wahbeh Qardaji, Weining Y ang, and Ninghui Li. Priview: practical differentially private re- lease of marginal contingency tables. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data , pages 1435–1446, 2014

  45. [45]

    New oracle-efficient algorithms for private synthetic data release

    Giuseppe Vietri, Grace Tian, Mark Bun, Thomas Steinke, and Steven Wu. New oracle-efficient algorithms for private synthetic data release. In International Conference on Machine Learning, pages 9765–9774. PMLR, 2020

  46. [46]

    Differentially private sql wit h bounded user contribution

    Royce Wilson, Celia Y uxin Zhang, William Lam, Damien De sfontaines, Daniel Simmons- Marengo, and Bryant Gipson. Differentially private sql wit h bounded user contribution. In Proceedings on Privacy Enhancing T echnologies Symposium, 2020

  47. [47]

    Optimizing fitness- for-use of differentially private linear queries

    Yingtai Xiao, Zeyu Ding, Y uxin Wang, Danfeng Zhang, and Daniel Kifer. Optimizing fitness- for-use of differentially private linear queries. In VLDB, 2021

  48. [48]

    Answering private linear queries adaptively using the common mechanism

    Yingtai Xiao, Guanhong Wang, Danfeng Zhang, and Daniel Kifer. Answering private linear queries adaptively using the common mechanism. https://arxiv.org/abs/2212.00135, 2022

  49. [49]

    Dp pro: Differentially private high- dimensional data release via random projection

    Chugui Xu, Ju Ren, Y aoxue Zhang, Zhan Qin, and Kui Ren. Dp pro: Differentially private high- dimensional data release via random projection. IEEE Transactions on Information F orensics and Security, 12(12):3081–3093, 2017

  50. [50]

    Ac- curate and efficient private release of datacubes and contin gency tables

    Grigory Y aroslavtsev, Graham Cormode, Cecilia M Proco piuc, and Divesh Srivastava. Ac- curate and efficient private release of datacubes and contin gency tables. In 2013 IEEE 29th International Conference on Data Engineering (ICDE) , pages 745–756. IEEE, 2013

  51. [51]

    Scalable privacy- preserving data sharing methodology for genome-wide assoc iation studies

    Fei Y u, Stephen E Fienberg, Aleksandra B Slavkovi ´c, and Caroline Uhler. Scalable privacy- preserving data sharing methodology for genome-wide assoc iation studies. Journal of biomed- ical informatics, 50:133–141, 2014

  52. [52]

    Convex optimization for linear query processing under approximate differential privacy

    Ganzhao Y uan, Yin Y ang, Zhenjie Zhang, and Zhifeng Hao. Convex optimization for linear query processing under approximate differential privacy. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery an d Data Mining , 2016

  53. [53]

    Low-rank mechanism: Optimizing batch queries under differ ential privacy

    Ganzhao Y uan, Zhenjie Zhang, Marianne Winslett, Xiaok ui Xiao, Yin Y ang, and Zhifeng Hao. Low-rank mechanism: Optimizing batch queries under differ ential privacy. Proc. VLDB En- dow., 5(11):1352–1363, July 2012

  54. [54]

    Optimizing batch linear queries under exact and approximat e differential privacy

    Ganzhao Y uan, Zhenjie Zhang, Marianne Winslett, Xiaok ui Xiao, Yin Y ang, and Zhifeng Hao. Optimizing batch linear queries under exact and approximat e differential privacy. ACM Trans- actions on Database Systems (TODS) , 40(2):1–47, 2015

  55. [55]

    Procopiuc, Dives h Srivastava, and Xiaokui Xiao

    Jun Zhang, Graham Cormode, Cecilia M. Procopiuc, Dives h Srivastava, and Xiaokui Xiao. Privbayes: Private data release via bayesian networks. ACM Trans. Database Syst., 42(4), oct 2017

  56. [56]

    Differentially private high- dimensional data publication via markov network

    Wei Zhang, Jingwen Zhao, Fengqiong Wei, and Y unfang Che n. Differentially private high- dimensional data publication via markov network. EAI Endorsed Transactions on Security and Safety, 6(19), 2019

  57. [57]

    Privsyn: Differentially private data synthesis

    Zhikun Zhang, Tianhao Wang, Jean Honorio, Ninghui Li, M ichael Backes, Shibo He, Jiming Chen, and Y ang Zhang. Privsyn: Differentially private data synthesis. 2021. 13 Contents 1 Introduction 1 2 Preliminaries 2 2.1 Differential Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2 Matrix Mechanism . . . . . . . . . . . . . . ....

  58. [58]

    residual

    For each attribute, we can one-hot encode its attribute values as row vectors. So, for Att1, the attribute value ’a’ is encoded as [1, 0] and ’b’ is encoded as [0, 1]. For Att2, the attribute value ’y’ is encoded as [1, 0] and ’n’ is encoded as [0, 1]. For attribute Att3, the attribute value ’1’ is encoded as [1, 0, 0], the value ’2’ is encoded as [0, 1, ...

  59. [59]

    18∗ 12/ 11/c ≈ 4. 8/c etc. C Example Variance Calculations We next illustrate how the σ 2 A parameters affect the variance of different marginals base d on Theo- rem 4.7. We illustrate it with a toy dataset that has 5 attributes, ea ch with 3 possible values. In this discussion, the variance of a marginal is the largest varian ce of its cells (all cells w...

  60. [60]

    871∗ 2/ 3∗ 1≈ 8

    154∗ 1∗ 1/ 9 + 10. 871∗ 2/ 3∗ 1≈ 8. 154. For the variance on the 5-way marginal, it consists of 1 term f or A′ =∅, 5 terms for when A′ has 1 attribute since there are 5 such A′ (but all the terms have the same value), 10 terms for when A′ has 2 attributes (all have the same value), 10 terms for when A′ has 3 attributes (again these terms all are equal), 5...

  61. [61]

    a,y,1” in the dataset, the s econd component corresponds to record “a,y,2

    Suppose we have three attributes: Att1 takes values ‘a’ or ‘b’; Att2 takes values ‘y’ or ‘n’; Att3 takes values 1 or 2 or 3. ay1 ay2 ay3 an1 an2 an3 by1 by2 by3 bn1 bn2 bn3 bn1: [0, 1]⊗ [0, 1]⊗ [1, 0, 0] 0 0 0 0 0 0 0 0 0 1 0 0 [1, 1]⊗ [1,− 1]⊗ [1,− 1, 0] 1 -1 0 -1 1 0 1 -1 0 -1 1 0 [1,− 1]⊗ [1, 1]⊗ [1, 0,− 1] 1 0 -1 1 0 -1 -1 0 1 -1 0 1 Table 8: Kron pro...

  62. [62]

    Expressing the privacy cost of the optimal mechanism M∗ as a linear combination of the 1/σ 2 A values takes O(∑ A∈ W kload #cells(A)) total time

  63. [63]

    Expressing all of the V ar(A;M∗ ), for A∈ W kload, as a linear combinations of the σ 2 A values can be done in O(∑ A∈ W kload #cells(A)) total time

  64. [64]

    Note that the total number of cells on marginals in W kload is O (∑ A∈ W kload ∏ Atti∈ A|Atti| )

    Computing all the noisy outputs of the optimal mechanism (i.e., MA(x; σ 2 A) for A ∈ closure(W kload)) takes O ( na ∑ A∈ W kload ∏ Atti∈ A(|Atti|+ 1) ) total time after the true an- swers have been precomputed (Line 1 in Algorithm 1). Note that the total number of cells on marginals in W kload is O (∑ A∈ W kload ∏ Atti∈ A|Atti| )

  65. [65]

    Reconstructing marginals for all A∈ W kload takes O(∑ A∈ W kload|A|#cells(A)2) total time

  66. [66]

    Proof of Theorem 4.8

    Computing the variance of the cells for all of the marginal s for A∈ W kload can be done in O(∑ A∈ W kload #cells(A)) total time. Proof of Theorem 4.8. First we establish that |closure(W kload)| ≤ ∑ A∈ W kload #cells(A). Given an set A∈ W kload, we note that it has 2|A| subsets, so that |closure(A)| = 2|A|. How- ever, #cells(A) is at least 2|A| (because e...

  67. [67]

    The RMSE on the workload that the selection step guarantees is also meas ured

    We evaluate the running time and accuracy of the select ion step Table 9 shows the running time for the selection step of HDMM and ResidualPlanner. The RMSE on the workload that the selection step guarantees is also meas ured. Both HDMM and ResidualPlanner have no trouble here. HDMM is nearly optimal in RMSE and Resid ualPlanner is optimal, as shown by ag...

  68. [68]

    For d beween 8 and 15, HDMM decides to add noise to all 3-way marginals and nothing else (even though the workload is all ≤ 3 marginals)

    This maximum is always achieved for the sum query (a zer o-dimensional marginal) for the following reason. For d beween 8 and 15, HDMM decides to add noise to all 3-way marginals and nothing else (even though the workload is all ≤ 3 marginals). The privacy loss budget is split equally among them. Thus, each of the (d 3 ) marginals it measures gets N (0, (...