Recognition: no theorem link
DPSQL+: A Differentially Private SQL Library with a Minimum Frequency Rule
Pith reviewed 2026-05-15 19:25 UTC · model grok-4.3
The pith
DPSQL+ combines user-level differential privacy with a minimum frequency rule in a modular SQL library that supports aggregates, joins, and quadratic statistics.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DPSQL+ achieves practical accuracy across a wide range of analytical workloads from basic aggregates to quadratic statistics and join operations and allows substantially more queries under a fixed global privacy budget than prior libraries by enforcing user-level (ε,δ)-DP together with the minimum frequency rule through a Validator that statically restricts queries to a DP-safe SQL subset, an Accountant that tracks cumulative privacy loss, and a Backend that interfaces with various database engines.
What carries the argument
The Validator that statically restricts incoming queries to a DP-safe subset of SQL, paired with the Accountant that maintains a consistent record of total privacy loss across multiple queries.
If this is right
- Basic aggregate queries can be answered with calibrated noise while still producing results that analysts can use.
- Join operations and quadratic statistics remain feasible inside the privacy and frequency constraints.
- A fixed global privacy budget supports more total queries than earlier DP SQL libraries in the same evaluation setting.
- The same privacy logic can be applied to different database engines without rewriting the validator or accountant.
Where Pith is reading between the lines
- Data platforms that must satisfy both privacy regulations and minimum-frequency governance rules could adopt the library as a drop-in query gateway.
- Extending the validator to additional SQL constructs would expand the range of analyses that can be performed without manual workarounds.
- The accounting mechanism could be reused in other query languages if a comparable static validator is built for them.
- Evaluating the system on production workloads with real schema complexity would reveal whether TPC-H results generalize to typical enterprise data.
Load-bearing premise
Statically restricting queries to a DP-safe SQL subset via the Validator preserves sufficient utility for typical exploratory data analysis workloads without needing post-hoc adjustments.
What would settle it
A controlled test that runs a realistic collection of exploratory SQL queries through the Validator and measures both the fraction of queries rejected and the end-to-end accuracy loss on the queries that pass would directly test whether utility remains adequate.
Figures
read the original abstract
SQL is the de facto interface for exploratory data analysis; however, releasing exact query results can expose sensitive information through membership or attribute inference attacks. Differential privacy (DP) provides rigorous privacy guarantees, but in practice, DP alone may not satisfy governance requirements such as the \emph{minimum frequency rule}, which requires each released group (cell) to include contributions from at least $k$ distinct individuals. In this paper, we present \textbf{DPSQL+}, a privacy-preserving SQL library that simultaneously enforces user-level $(\varepsilon,\delta)$-DP and the minimum frequency rule. DPSQL+ adopts a modular architecture consisting of: (i) a \emph{Validator} that statically restricts queries to a DP-safe subset of SQL; (ii) an \emph{Accountant} that consistently tracks cumulative privacy loss across multiple queries; and (iii) a \emph{Backend} that interfaces with various database engines, ensuring portability and extensibility. Experiments on the TPC-H benchmark demonstrate that DPSQL+ achieves practical accuracy across a wide range of analytical workloads -- from basic aggregates to quadratic statistics and join operations -- and allows substantially more queries under a fixed global privacy budget than prior libraries in our evaluation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents DPSQL+, a modular library for executing SQL queries under user-level (ε,δ)-differential privacy while also enforcing a minimum frequency rule requiring each output group to have contributions from at least k distinct individuals. The system uses a Validator to restrict queries to a DP-safe SQL subset, an Accountant to track privacy loss, and a Backend for database portability. Evaluation on the TPC-H benchmark claims practical accuracy for aggregates, quadratic statistics, and joins, along with allowing more queries under a fixed privacy budget compared to prior work.
Significance. If the results hold, this contribution is significant as it bridges differential privacy with practical governance requirements like the minimum frequency rule in a usable SQL interface. The modular design facilitates extensibility across database engines, and the TPC-H experiments provide evidence of utility for analytical workloads. This could enable broader adoption of privacy-preserving analytics in settings requiring both formal DP guarantees and frequency-based protections.
major comments (2)
- [§3.2] §3.2 (Validator): The description of the Validator's static restrictions for enforcing both DP and the minimum frequency rule lacks a formal characterization of the supported query language fragment (e.g., allowed join types, aggregation forms, or group-by clauses). Without this, it is unclear whether typical exploratory queries are supported or require reformulation, which directly impacts the central claim of practical accuracy across wide workloads.
- [§5] §5 (Evaluation): The reported gains in query volume under fixed budget and accuracy metrics for quadratic statistics and joins lack exact counts of accepted/rejected queries by the Validator, the precise privacy budget allocation, and details such as error bars or run-to-run variance. This makes it difficult to assess whether the performance claims generalize beyond the specific TPC-H subset tested.
minor comments (2)
- [§5] The experimental setup would benefit from an explicit table listing the (ε, δ, k) parameter values used across all TPC-H workloads.
- [Figure 4] Figure captions for throughput plots should include the number of runs and any statistical tests performed for the 'substantially more queries' comparison.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and positive recommendation for minor revision. We address each major comment below and will incorporate the suggested clarifications into the revised manuscript.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Validator): The description of the Validator's static restrictions for enforcing both DP and the minimum frequency rule lacks a formal characterization of the supported query language fragment (e.g., allowed join types, aggregation forms, or group-by clauses). Without this, it is unclear whether typical exploratory queries are supported or require reformulation, which directly impacts the central claim of practical accuracy across wide workloads.
Authors: We agree that an explicit formal characterization of the supported query fragment would improve clarity and help readers assess the scope of supported exploratory queries. In the revised version, we will add a dedicated paragraph and table in §3.2 that enumerates the allowed constructs: supported join types (inner joins on foreign-key relationships only), permitted aggregation functions (SUM, COUNT, AVG, and quadratic forms such as variance), group-by requirements, and the precise conditions under which the minimum-frequency rule is enforced by the Validator. This addition will directly address the concern without altering the underlying implementation. revision: yes
-
Referee: [§5] §5 (Evaluation): The reported gains in query volume under fixed budget and accuracy metrics for quadratic statistics and joins lack exact counts of accepted/rejected queries by the Validator, the precise privacy budget allocation, and details such as error bars or run-to-run variance. This makes it difficult to assess whether the performance claims generalize beyond the specific TPC-H subset tested.
Authors: We acknowledge that the current evaluation section would benefit from greater quantitative transparency. In the revision we will expand §5 with: (i) exact counts of queries accepted versus rejected by the Validator on the TPC-H workload, (ii) the concrete per-query privacy-budget allocation policy (including how the global (ε,δ) budget is partitioned), and (iii) error bars or standard deviations computed over multiple independent runs to quantify run-to-run variance. These additions will strengthen the evidence for the reported gains in query volume and accuracy. revision: yes
Circularity Check
No significant circularity in engineering system
full rationale
The paper describes an engineering library (DPSQL+) with modular components (Validator for DP-safe SQL subset, Accountant for privacy budget tracking, Backend for DB interfacing) and supports its claims of practical accuracy on TPC-H workloads via direct empirical evaluation rather than any mathematical derivation. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text; the minimum-frequency rule and DP guarantees are enforced by construction in the architecture but are not presented as derived results that reduce to their own inputs. The contribution is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Revealing information while preserving privacy
Irit Dinur and Kobbi Nissim. Revealing information while preserving privacy. InProceedings of the Twenty- Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’03, page 202–210, New York, NY , USA, 2003. Association for Computing Machinery
work page 2003
-
[2]
Cynthia Dwork and Moni Naor. On the difficulties of disclosure prevention in statistical databases or the case for differential privacy.Journal of Privacy and Confi- dentiality, 2(1), Sep. 2010
work page 2010
-
[3]
The algorithmic foun- dations of differential privacy.Found
Cynthia Dwork and Aaron Roth. The algorithmic foun- dations of differential privacy.Found. Trends Theor. Comput. Sci., 9(3–4):211–407, 2014
work page 2014
-
[4]
Noah M. Johnson, Joseph P. Near, Joseph M. Heller- stein, and Dawn Song. Chorus: a programming frame- work for building scalable differential privacy mecha- nisms. InIEEE European Symposium on Security and Privacy, EuroS&P 2020, Genoa, Italy, September 7-11, 2020, pages 535–551, 2020
work page 2020
-
[5]
ZetaSQL Differential Privacy extension
Google. ZetaSQL Differential Privacy extension. https://github.com/google/differential-privacy/tree/ main/examples/zetasql, 2023
work page 2023
-
[6]
OpenDP Community. SmartNoise SQL. https://docs. smartnoise.org/sql/, 2023
work page 2023
-
[7]
ByteDance. Jeddak-DPSQL. https://github.com/ bytedance/Jeddak-DPSQL, 2023
work page 2023
-
[8]
Qrlew: Rewriting sql into dif- ferentially private sql.arXiv, abs/2401.06273, 2024
Nicolas Grislain, Paul Roussel, and Victoria de Sainte Agathe. Qrlew: Rewriting sql into dif- ferentially private sql.arXiv, abs/2401.06273, 2024
-
[9]
DOP-SQL: A general-purpose, high-utility, and extensible private sql system.Proc
Jianzhe Yu, Wei Dong, Juanru Fang, Dajun Sun, and Ke Yi. DOP-SQL: A general-purpose, high-utility, and extensible private sql system.Proc. VLDB Endow., 17(12):4385–4388, 2024
work page 2024
-
[10]
Im- plementing multiple evaluation techniques in statistical disclosure control for tabular data
Amang Sukasih, Donsig Jang, and John Czajka. Im- plementing multiple evaluation techniques in statistical disclosure control for tabular data. InProceedings of the Fourth International Conference on Establishment Surveys (ICES 2012), 2012
work page 2012
-
[11]
Simson Garfinkel, Barbara Guttman, Joseph Near, Aref N. Dajani, and Phyllis Singer. De-identifying gov- ernment datasets: Techniques and governance.NIST Special Publication (SP) 800-188, National Institute of Standards and Technology, Gaithersburg, MD, 2023
work page 2023
-
[12]
Shokichi Takakura, Seng Liew, and Satoshi Hasegawa. Optimal variance and covariance estimation under dif- ferential privacy in the add-remove model and beyond. arXiv, abs/2509.04919, 2025
-
[13]
Concentrated differ- ential privacy: Simplifications, extensions, and lower bounds
Mark Bun and Thomas Steinke. Concentrated differ- ential privacy: Simplifications, extensions, and lower bounds. InTheory of Cryptography, pages 635–658, Berlin, Heidelberg, 2016. Springer Berlin Heidelberg
work page 2016
-
[14]
Google Differential Privacy Team. Privacy Loss Distri- butions. https://github.com/google/differential-privacy/ blob/main/common docs/Privacy Loss Distributions. pdf, 2025
work page 2025
-
[15]
On significance of the least significant bits for differential privacy
Ilya Mironov. On significance of the least significant bits for differential privacy. InProceedings of the 2012 ACM Conference on Computer and Communications Security, CCS ’12, page 650–661, New York, NY , USA,
work page 2012
-
[16]
Association for Computing Machinery
-
[17]
Samuel Haney, Damien Desfontaines, Luke Hartman, Ruchit Shrestha, and Michael Hay. Precision-based attacks and interval refining: how to break, then fix, differential privacy on finite computers.arXiv, abs/2207.13793, 2022
-
[18]
Royce J. Wilson, Celia Yuxin Zhang, William Lam, Damien Desfontaines, Daniel Simmons-Marengo, and Bryant Gipson. Differentially Private SQL with Bounded User Contribution.Proceedings on Privacy Enhancing Technologies, 2020(2):230–250, 2020
work page 2020
-
[19]
Plume: Differ- ential privacy at scale.arXiv, abs/2201.11603, 2022
Kareem Amin, Jennifer Gillenwater, Matthew Joseph, Alex Kulesza, and Sergei Vassilvitskii. Plume: Differ- ential privacy at scale.arXiv, abs/2201.11603, 2022
-
[20]
Arjun Wilkins, Daniel Kifer, Danfeng Zhang, and Brian Karrer. Exact privacy analysis of the gaussian sparse histogram mechanism.Journal of Privacy and Confi- dentiality, 14(1), 2024
work page 2024
-
[21]
Michael Shoemate, Andrew Vyrros, Chuck McCal- lum, Raman Prasad, Philip Durbin, S ´ılvia Casacu- berta Puig, Ethan Cowan, Vicki Xu, Zachary Ratliff, Nicol´as Berrios, Alex Whitworth, Michael Eliot, Chris- tian Lebeda, Oren Renard, and Claire McKay Bowen. OpenDP Library. https://github.com/opendp/opendp
-
[22]
Amazon Web Services, Inc. AWS Clean Rooms. https: //aws.amazon.com/clean-rooms/. Accessed: 2026-01- 20
work page 2026
-
[23]
Lo¨ıs Ecoffet, Veronika Rehn-Sonigo, Jean-Franc ¸ois Couchot, and Catuscia Palamidessi. Experiments & analysis of privacy-preserving sql query sanitization systems.arXiv, abs/2510.13528, 2025
-
[24]
TPC BENCHMARK H Standard Specification, Revision 3.0.1
Transaction Processing Performance Council. TPC BENCHMARK H Standard Specification, Revision 3.0.1. https://www.tpc.org/tpc documents current versions/pdf/tpc-h v3.0.1.pdf, 2022. Accessed: 2026- 02-13
work page 2022
-
[25]
Toward provably pri- vate analytics and insights into genai use.arXiv, abs/2510.21684, 2025
Albert Cheu, Artem Lagzdin, Brett McLarnon, Daniel Ramage, Katharine Daly, Marco Gruteser, Peter Kairouz, Rakshita Tandon, Stanislav Chiknavaryan, Ti- mon Overveldt, and Zoe Gong. Toward provably pri- vate analytics and insights into genai use.arXiv, abs/2510.21684, 2025. A Details of Evaluation Settings The experiments in Section 6 employ 10 query patter...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.