pith. sign in

arxiv: 2606.07883 · v1 · pith:I2MRALYDnew · submitted 2026-06-05 · 💻 cs.CR · cs.DB

DP4SQL: Differentially Private SQL with Flexible Privacy Policies

Pith reviewed 2026-06-27 21:22 UTC · model grok-4.3

classification 💻 cs.CR cs.DB
keywords differential privacySQLrelational databasesprivacy policiesdata protectionquery answeringplausible deniability
0
0 comments X

The pith

DP4SQL lets data curators specify flexible privacy policies for relational databases instead of using rigid one-size-fits-all rules.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing differentially private SQL systems require significant manual rewriting of privacy accountants whenever policies change, such as protecting record existence in some tables but only contents in others, or handling partially public columns. DP4SQL introduces support for customizable plausible deniability requirements that can vary across tables, columns, and parts of records. This customization prevents both under-protection of sensitive data and unnecessary noise in query results. The approach maintains correctness of the privacy guarantees while accommodating mixed public-private data and differing protection levels.

Core claim

DP4SQL is a differentially private SQL system that allows data curators to customize the plausible deniability requirements for their relational databases by supporting flexible policies on which pieces of information about an entity, spread across multiple relations, need protection. It handles partially public data and varying per-part protection levels without requiring manual updates to privacy accountants or proofs of correctness.

What carries the argument

A flexible privacy accounting mechanism that tracks stability and noise requirements across mixed public/private columns and varying per-part protection levels.

If this is right

  • Curators can protect existence of records in some tables while protecting only contents in others without rewriting the accountant.
  • Queries involving partially public columns receive noise calibrated to the actual sensitivity rather than a worst-case assumption.
  • Different parts of the same record can receive different protection levels while preserving end-to-end privacy accounting.
  • Small policy changes no longer force full re-verification of the privacy mechanism.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same accounting approach could extend to other query languages or data models that mix public and private attributes.
  • Database schema design might incorporate explicit privacy-level annotations as first-class metadata.
  • Testing could focus on whether mixed-policy queries produce statistically distinguishable outputs under the claimed bounds.

Load-bearing premise

A single flexible privacy accounting mechanism can correctly track stability and noise requirements across mixed public/private columns and varying per-part protection levels without introducing new derivation gaps or post-hoc adjustments.

What would settle it

A concrete query and policy combination where the system's noise calculation or stability bound differs from what a manual per-policy analysis would require, violating the claimed privacy guarantee.

Figures

Figures reproduced from arXiv: 2606.07883 by Andrew Cascio, Danfeng Zhang, Daniel Kifer, KinChin Tong, Zeyu Ding.

Figure 1
Figure 1. Figure 1: The data ownership graph of a university schema [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Syntax of relational algebra supported by [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 2
Figure 2. Figure 2: Inference rules for base relations. 𝑅 ★ is a distin￾guished entity relation and 𝑅 is any non-entity relation. The private (𝐴) predicate is true when 𝐴 is protected by deletion or replacement under P𝑅★. Non-Entity Relations. For a non-entity relation 𝑅 ≠ 𝑅 ★, deriving the plausible deniability action is slightly more complicated due to dependencies; 𝑅 may have foreign keys that refer to other relations. Let… view at source ↗
Figure 4
Figure 4. Figure 4: Action inference rules for select (𝜎𝜑 ), project (𝜋 A), grouping (𝛾 A), and grouping with count (𝛾 CNT(∗) A ). Project. Let 𝑆 be an expression and 𝜋 A (𝑆) be a projection of attributes A. If 𝑆 has action RepA1 𝑘 , then the action on 𝜋 A2 (𝑆) depends on if A1 and A2 share any attributes. If they do, then only attributes in A1 ∩ A2 can be replaced (T-Prj1). Otherwise, no attributes in 𝜋 A2 (𝑆) are replaceabl… view at source ↗
Figure 5
Figure 5. Figure 5: Records in red exist in the hypothetical world [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Action inference rules for key joins, where the join [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Inference rules to calculate Δˆ (𝑄), an upper bound on the global sensitivity Δ(𝑄). The final aggregation function may be immediately preceded by a grouping operation. and upper bounds of possible numeric values for 𝑆.𝐴, where 𝐿 ≤ 𝑈 . In certain cases, the range of an attribute can be updated to a tighter bound, which can decrease the sensitivity of SUM. The sensitivity of CNT is dependent on the underlyin… view at source ↗
Figure 8
Figure 8. Figure 8: Derived sensitivities for each privacy policy and [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: Sensitivities for TPC-H queries. Hatching indicates [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Relative error of DP4SQL and Tumult Analytics for TPC-H queries. C and R policies treat Customer and Supplier relations as the entity relations respectively. The dashed line denotes the error for a system that always outputs 0. 9.2 Comparison with Tumult Analytics Following existing work [13, 15], we evaluate on the TPC-H queries that have count aggregations: Q1, Q4, Q13, and Q16. Q1 Pricing Summary Repor… view at source ↗
Figure 12
Figure 12. Figure 12: Rules for maximum frequency upper bound. [PITH_FULL_IMAGE:figures/full_fig_p015_12.png] view at source ↗
read the original abstract

The plausible deniability model of differential privacy for single-table datasets is well-understood. However, applying differential privacy to relational databases is much trickier: each application needs flexibility in specifying the pieces of information about an entity, spread across multiple relations, that require plausible deniability guarantees. Existing differentially private SQL systems only support rigid privacy policies. Even seemingly small changes, such as specifying that some tables need to protect the existence of records while others only need to protect the record contents, require significant manual effort in updating their privacy accountants and proving their correctness. One example of a challenge is the presence of partially public data. Public columns in a table (e.g., faculty names in a university dataset and partial course enrollment information) can cause some queries to require more noise (compared to fully private data), while others require less noise. This kind of reasoning is not supported in existing systems. Another example is when different parts of records (e.g., demographics, financial data) require different levels of privacy protection. Again, existing differentially private SQL systems need to rewrite their rules for calculating query stability in order to support such a feature. This paper presents DP4SQL, a differentially private SQL system that allows data curators to better customize the plausible deniability requirements for their relational databases. This avoids the drawbacks of the "one-size-fits-all" systems that would either underprotect the data or inject too much noise into query answers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents DP4SQL, a differentially private SQL system for relational databases that supports flexible privacy policies. It allows data curators to customize plausible deniability requirements across multiple relations, including cases with partially public columns (which can increase or decrease required noise depending on the query) and heterogeneous protection levels for different parts of records (e.g., demographics vs. financial data). The system is claimed to avoid the manual rewrites and new proofs needed in existing rigid DP SQL systems when such policies change.

Significance. If the flexible accounting mechanism correctly handles stability and noise calibration across mixed public/private columns and per-part policies without derivation gaps, the work would advance practical DP deployment for relational data by enabling tailored protection levels that reduce unnecessary noise while maintaining end-to-end guarantees.

major comments (2)
  1. [Abstract] Abstract: the central claim that a single accountant correctly tracks stability and noise for queries over partially public columns and records with heterogeneous protection levels is presented without any stability definitions, case analysis, or composition rules; this is load-bearing for the flexibility claim and cannot be assessed from the given text.
  2. [Abstract] Abstract: no machine-checked proofs, parameter-free derivations, or detailed accountant description are referenced, leaving open the risk that new stability rules for public-column sensitivity or per-part policies introduce hidden assumptions that would invalidate the end-to-end guarantees.
minor comments (1)
  1. The abstract would benefit from a high-level outline of the key technical components (e.g., how the accountant is extended) to allow readers to gauge the scope of the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and comments on the abstract. The abstract summarizes the high-level contributions; the formal stability definitions, case analyses, composition rules, and accountant details appear in the body of the paper (Sections 3–5 and the appendix). We address each point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that a single accountant correctly tracks stability and noise for queries over partially public columns and records with heterogeneous protection levels is presented without any stability definitions, case analysis, or composition rules; this is load-bearing for the flexibility claim and cannot be assessed from the given text.

    Authors: The abstract is a concise overview and does not contain the full formal development. Section 3 defines stability under partially public columns and per-part heterogeneous policies; Section 4.2 provides the case analysis for how public columns can increase or decrease required noise; Section 5 presents the single accountant together with the composition rules that track both stability and noise calibration end-to-end. These sections supply the load-bearing material for the flexibility claim. revision: no

  2. Referee: [Abstract] Abstract: no machine-checked proofs, parameter-free derivations, or detailed accountant description are referenced, leaving open the risk that new stability rules for public-column sensitivity or per-part policies introduce hidden assumptions that would invalidate the end-to-end guarantees.

    Authors: Section 5 contains a detailed description of the accountant, including the parameter-free derivations for the new stability rules and the explicit statement of all assumptions (e.g., independence of public-column values from private data and the per-part privacy parameters). The proofs are manual but fully spelled out in the appendix; we do not claim machine-checked proofs. The end-to-end guarantees follow directly from the composition theorem stated in Section 5 once the per-query stabilities are computed under the policy. revision: no

Circularity Check

0 steps flagged

No circularity: DP4SQL presents independent system design for flexible privacy policies

full rationale

The paper introduces DP4SQL as a new differentially private SQL system supporting customizable policies for mixed public/private columns and heterogeneous per-part protections. The abstract and description frame this as addressing gaps in existing rigid systems via new accountant mechanisms, without any quoted equations, fitted parameters renamed as predictions, or load-bearing self-citations that reduce the central claims to prior inputs by construction. No self-definitional loops, ansatz smuggling, or renaming of known results appear in the provided text. The derivation chain for stability tracking and noise calibration is presented as novel engineering work rather than a mathematical reduction to the paper's own assumptions, making the result self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.1-grok · 5797 in / 1060 out tokens · 17472 ms · 2026-06-27T21:22:32.045232+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 9 canonical work pages

  1. [1]

    John M. Abowd, Robert Ashmead, Ryan Cumings-Menon, Simson Garfinkel, Micah Heineck, Christine Heiss, Robert Johns, Daniel Kifer, Philip Leclerc, Ash- win Machanavajjhala, Brett Moran, William Sexton, Matthew Spence, and Pavel Zhuravlev. forthcoming. Preprint https://www.census.gov/library/working- papers/2022/adrm/CED-WP-2022-002.html. The 2020 Census Dis...

  2. [2]

    Skye Berghel, Philip Bohannon, Damien Desfontaines, Charles Estes, Sam Haney, Luke Hartman, Michael Hay, Ashwin Machanavajjhala, Tom Magerlein, Gerome Miklau, Amritha Pai, William Sexton, and Ruchit Shrestha. 2022. Tumult Ana- lytics: a robust, easy-to-use, scalable, and expressive framework for differential privacy.arXiv preprint arXiv:2212.04133(2022)

  3. [3]

    Mark Bun and Thomas Steinke. 2016. Concentrated Differential Privacy: Sim- plifications, Extensions, and Lower Bounds. InProceedings, Part I, of the 14th International Conference on Theory of Cryptography - Volume 9985

  4. [4]

    Lei Cao, Dongqing Xiao, Yizhou Yan, Samuel Madden, and Guoliang Li. 2021. ATLANTIC: making database differentially private and faster with accuracy guarantee. InProceedings of the VLDB Endowment

  5. [5]

    T.P.P. Council. 2014. TPC Benchmark H. https://www.tpc.org/tpc_documents_ current_versions/pdf/tpc-h_v2.17.1.pdf

  6. [6]

    Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. 2017. Collecting Telemetry Data Privately. InProceedings of the 31st International Conference on Neural Information Processing Systems(Long Beach, California, USA)(NIPS’17). Curran Associates Inc., USA, 3574–3583. http://dl.acm.org/citation.cfm?id=3294996. 3295115

  7. [7]

    Jinshuo Dong, Aaron Roth, and Weijie J. Su. 2022. Gaussian differential privacy. Journal of the Royal Statistical Society: Series B (Statistical Methodology)84, 1 (2022), 3–37. arXiv:https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/rssb.12454 doi:10.1111/rssb.12454

  8. [8]

    Wei Dong, Juanru Fang, Ke Yi, Yuchao Tao, and Ashwin Machanavajjhala. 2022. R2t: Instance-optimal truncation for differentially private query evaluation with foreign keys. InProceedings of the 2022 International Conference on Management of Data. 759–772

  9. [9]

    Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Cali- brating noise to sensitivity in private data analysis. InProceedings of the Third Conference on Theory of Cryptography(New York, NY)(TCC’06). Springer-Verlag, Berlin, Heidelberg, 265–284. doi:10.1007/11681878_14

  10. [10]

    Cynthia Dwork and Aaron Roth. 2014. The algorithmic foundations of differential privacy.Theoretical Computer Science9, 3–4 (2014), 211–407

  11. [11]

    Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. RAPPOR: Ran- domized Aggregatable Privacy-Preserving Ordinal Response. InProceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (Scottsdale, Arizona, USA)(CCS ’14). ACM, New York, NY, USA, 1054–1067

  12. [12]

    Google. [n. d.]. Tensorflow Privacy Github. https://github.com/tensorflow/ privacy

  13. [13]

    Near, and Dawn Song

    Noah Johnson, Joseph P. Near, and Dawn Song. 2018. Towards practical differ- ential privacy for SQL queries.Proc. VLDB Endow.11, 5 (Jan. 2018), 526–539. doi:10.1145/3187009.3177733

  14. [14]

    Johnson, Joseph P

    Noah M. Johnson, Joseph P. Near, Joseph M. Hellerstein, and Dawn Song. 2018. Chorus: Differential Privacy via Query Rewriting.CoRRabs/1809.07750 (2018). arXiv:1809.07750 http://arxiv.org/abs/1809.07750

  15. [15]

    Ios Kotsogiannis, Yuchao Tao, Xi He, Maryam Fanaeepour, Ashwin Machanava- jjhala, Michael Hay, and Gerome Miklau. 2019. PrivateSQL: a differentially private SQL query engine.Proc. VLDB Endow.12, 11 (July 2019), 1371–1384. doi:10.14778/3342263.3342274

  16. [16]

    2022.Tumult Analytics

    Tumult Labs. 2022.Tumult Analytics. https://tmlt.dev

  17. [17]

    Ashwin Machanavajjhala, Daniel Kifer, John Abowd, Johannes Gehrke, and Lars Vilhuber. 2008. Privacy: Theory meets Practice on the Map. In2008 IEEE 24th International Conference on Data Engineering. 277–286. doi:10.1109/ICDE.2008. 4497436

  18. [18]

    McSherry

    Frank D. McSherry. 2009. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. InProceedings of the 2009 ACM SIGMOD International Conference on Management of Data(Providence, Rhode Island, USA) (SIGMOD ’09). Association for Computing Machinery, New York, NY, USA, 19–30. doi:10.1145/1559845.1559850

  19. [19]

    Solomon Messing, Bogdan State, Chaya Nayak, Gary King, and Nate Persily

  20. [20]

    InFacebook URL Shares

    URLs Dataset for RFP.pdf. InFacebook URL Shares. Harvard Dataverse. doi:10.7910/DVN/EIAACS/PMQG9X

  21. [21]

    Ilya Mironov. 2017. Rényi Differential Privacy. In30th IEEE Computer Security Foundations Symposium, CSF 2017, Santa Barbara, CA, USA, August 21-25, 2017. 263–275

  22. [22]

    Prashanth Mohan, Abhradeep Thakurta, Elaine Shi, Dawn Song, and David Culler

  23. [23]

    InProceedings of the 2012 ACM SIGMOD International Conference on Management of Data(Scottsdale, Arizona, USA)(SIGMOD ’12)

    GUPT: Privacy Preserving Data Analysis Made Easy. InProceedings of the 2012 ACM SIGMOD International Conference on Management of Data(Scottsdale, Arizona, USA)(SIGMOD ’12). ACM, New York, NY, USA, 349–360

  24. [24]

    Shangfu Peng, Yin Yang, Zhenjie Zhang, Marianne Winslett, and Yong Yu. 2013. Query optimization for differentially private data management systems. In2013 IEEE 29th International Conference on Data Engineering (ICDE). 1093–1104. doi:10. 1109/ICDE.2013.6544900

  25. [25]

    Davide Proserpio, Sharon Goldberg, and Frank McSherry. 2014. Calibrating Data to Sensitivity in Private Data Analysis: A Platform for Differentially-private Analysis of Weighted Datasets.Proc. VLDB Endow.7, 8 (April 2014), 637–648. doi:10.14778/2732296.2732300

  26. [26]

    Indrajit Roy, Srinath T. V. Setty, Ann Kilzer, Vitaly Shmatikov, and Emmett Witchel. 2010. Airavat: Security and Privacy for MapReduce. InProceedings of the 7th USENIX Conference on Networked Systems Design and Implementation (San Jose, California)(NSDI’10). USENIX Association, Berkeley, CA, USA, 20–20

  27. [27]

    Apple Differential Privacy Team. 2017. Learning with Privacy at Scale.Apple Machine Learning Journal1, 8 (2017)

  28. [28]

    The OpenDP Team. 2020. The OpenDP White Paper. https://projects.iq.harvard. edu/files/opendp/files/opendp_white_paper_11may2020.pdf

  29. [29]

    Michael Carl Tschantz, Shayak Sen, and Anupam Datta. 2020. SoK: Differential privacy as a causal property. In2020 IEEE Symposium on Security and Privacy (SP). IEEE, 354–371

  30. [30]

    Larry Wasserman and Shuheng Zhou. 2010. A statistical framework for differen- tial privacy.J. Amer. Statist. Assoc.105, 489 (2010), 375–389

  31. [31]

    Wilson, Celia Yuxin Zhang, William Lam, Damien Desfontaines, Daniel Simmons-Marengo, and Bryant Gipson

    Royce J. Wilson, Celia Yuxin Zhang, William Lam, Damien Desfontaines, Daniel Simmons-Marengo, and Bryant Gipson. 2019. Differentially Private SQL with Bounded User Contribution.CoRRabs/1909.01917 (2019). arXiv:1909.01917 http://arxiv.org/abs/1909.01917

  32. [32]

    Jianzhe Yu, Wei Dong, Juanru Fang, Dajun Sun, and Ke Yi. 2024. DOP-SQL: A General-Purpose, High-Utility, and Extensible Private SQL System.Proc. VLDB Endow.17, 12 (Aug. 2024), 4385–4388. doi:10.14778/3685800.3685881 14 DP4SQL: Differentially Private SQL with Flexible Privacy Policies mmf(𝜎 𝜑 (𝑆).𝐴)=mmf(𝑆 .𝐴) mmf(𝜋 A (𝑆).𝐴)=mmf(𝑆 .𝐴) mmf(𝛾 A (𝑆).𝐴)=mmf(𝑆 ....