arxiv: 2605.07674 · v1 · submitted 2026-05-08 · 💻 cs.GT · cs.CR· cs.LG

Recognition: 2 theorem links

· Lean Theorem

Differentially Private Auditing Under Strategic Response

Florian A. D. Burnat

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:49 UTC · model grok-4.3

classification 💻 cs.GT cs.CRcs.LG

keywords differentially private auditingstrategic responseStackelberg gameunder-detection gapprivacy budget allocationAI system regulationwelfare-weighted harm

0 comments

The pith

When developers strategically reallocate mitigation efforts, naive uniform or proportional privacy budgets in differential privacy audits leave strictly more welfare-weighted undetected harm than non-strategic baselines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models AI system audits as a bilevel Stackelberg game in which the auditor first commits to a differential privacy budget allocation across harm dimensions and the developer then chooses mitigation efforts to minimize its cost given the resulting detection probabilities. It proves that uniform or harm-proportional allocations produce a strictly larger welfare-weighted under-detection gap than any fixed non-strategic mitigation plan, provided detectability varies across dimensions, welfare weights are not aligned with detectability, and the developer's optimum is interior. The paper characterizes the welfare-minimizing allocation as the point where four quantities balance: welfare weight, audit miss probability, detectability elasticity, and mitigation-cost curvature. It supplies a single-level reformulation of the game via the developer's KKT conditions and a projected-gradient algorithm that computes the optimal policy from the developer's best-response mapping.

Core claim

Naive DP auditing induces a strictly larger welfare-weighted under-detection gap B_w than any non-strategic mitigation baseline whenever effective detectability is heterogeneous, the welfare weights are not comonotone with detectability, and the developer's optimum is interior. The optimal auditor allocation equates a four-factor balance of welfare weight, audit miss-probability, detectability elasticity, and mitigation-cost curvature, and the bilevel problem admits a single-level reformulation through the developer's KKT system.

What carries the argument

The welfare-weighted under-detection gap B_w, which is the welfare-weighted true residual harm remaining after the developer chooses its strategic best response to the auditor's privacy budget allocation.

If this is right

The auditor's optimal privacy budget allocation must simultaneously account for welfare weight, miss probability, detectability elasticity, and mitigation cost curvature.
The bilevel Stackelberg game can be solved as a single-level program by substituting the developer's KKT conditions into the auditor's objective.
A projected-gradient procedure that differentiates through the developer's best response computes the optimal allocation in practice.
Any allocation that ignores the developer's strategic reallocation leaves a strictly larger welfare-weighted residual harm under the stated conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Regulators designing privacy-constrained audits should treat the developer's mitigation plan as endogenous rather than fixed when choosing budget splits.
Simulation experiments that vary the four balancing factors could quantify how much undetected harm is avoided by the optimal policy versus naive rules.
Extending the model to repeated interactions or multiple developers would reveal whether strategic responses become more or less costly over time.
Audit policies that ignore strategic response risk over-protecting high-detectability harms at the expense of high-welfare but harder-to-detect harms.

Load-bearing premise

The developer's mitigation choice is an interior optimum, so that small changes in the audit's detection probabilities can induce reallocation of effort across harm dimensions.

What would settle it

A concrete numerical example or simulation in which detectability is heterogeneous, welfare weights are not comonotone with detectability, the developer's optimum is interior, yet the welfare-weighted undetected harm under uniform or proportional allocation is no larger than the non-strategic baseline would falsify the strict inequality.

read the original abstract

Regulatory audits of AI systems increasingly rely on differential privacy (DP) to protect training data and model internals. We study audit design when the audited developer can strategically respond to the privacy-constrained audit interface. We formalize privacy-constrained auditing as a bilevel Stackelberg game, in which an auditor commits to a query policy and DP budget allocation across harm dimensions, and a strategic developer reallocates mitigation efforts in response. We introduce the welfare-weighted under-detection gap $B_w$, the welfare-weighted true residual harm the audit fails to detect at the developer's strategic best response, and prove that naive DP auditing (uniform or harm-proportional allocation) induces a strictly larger $B_w$ than any non-strategic mitigation baseline whenever effective detectability is heterogeneous, the welfare weights are not comonotone with detectability, and the developer's optimum is interior. We characterize the optimal auditor allocation as a four-factor balance of welfare weight, audit miss-probability, detectability elasticity, and mitigation-cost curvature, and provide a single-level reformulation of the bilevel problem via the developer's KKT system. We propose Strategic Private Audit Design (SPAD), a projected-gradient algorithm with hypergradients computed through the developer's best response.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a bilevel Stackelberg model for strategic developer responses to DP audits plus a new B_w gap metric and SPAD algorithm, but the headline claim that naive allocations are strictly worse holds only under an interior optimum whose conditions are not derived.

read the letter

The core contribution is a game-theoretic setup where the auditor commits first to a DP budget split across harm dimensions and the developer then reallocates mitigation effort. This leads to the welfare-weighted under-detection gap B_w, a four-factor optimality condition balancing welfare weights, miss probabilities, detectability elasticities, and cost curvature, plus a single-level reformulation through the developer's KKT system and the SPAD projected-gradient method that backprops through it.

Referee Report

3 major / 2 minor

Summary. The paper models differentially private auditing of AI systems as a bilevel Stackelberg game in which the auditor commits to a query policy and DP budget allocation across harm dimensions while a strategic developer reallocates mitigation effort. It defines the welfare-weighted under-detection gap B_w as the welfare-weighted true residual harm undetected at the developer's best response, proves that naive (uniform or harm-proportional) allocations induce strictly larger B_w than non-strategic baselines whenever effective detectability is heterogeneous, welfare weights are non-comonotone with detectability, and the developer's optimum is interior, characterizes the optimal auditor policy as a four-factor balance (welfare weight, miss probability, detectability elasticity, mitigation-cost curvature), supplies a single-level KKT reformulation of the bilevel problem, and proposes the SPAD projected-gradient algorithm.

Significance. If the central claims hold, the work supplies a principled game-theoretic framework for audit design that explicitly accounts for strategic developer response under privacy constraints, together with a computable single-level reformulation and algorithm. The explicit four-factor characterization and the B_w metric could guide regulators toward allocations that reduce welfare-weighted undetected harm relative to naive baselines.

major comments (3)

[Main theorem / abstract claim] The main result (abstract and §3–4) asserts a strict B_w inequality only conditionally on an interior developer optimum, yet no lemma derives sufficient conditions on the detectability function, mitigation-cost curvature, or DP budget that guarantee interiority precisely when heterogeneity and non-comonotonicity hold. Boundary solutions (m_i^*=0 or at upper bound) can arise under sharp elasticity differences, rendering the strict inequality indeterminate and the comparison to the non-strategic baseline inconclusive.
[Single-level reformulation section] The single-level reformulation via the developer's KKT system is offered to avoid circularity, but the manuscript supplies neither an independent numerical benchmark (e.g., direct bilevel solve on low-dimensional instances) nor an explicit verification that the recovered allocation is not tautological with the same welfare and detectability parameters used to define B_w.
[Numerical experiments / validation] Numerical validation of boundary behavior and robustness of the interior-optimum assumption is absent; the abstract claims strict inequalities and a single-level reformulation, yet the provided derivations, boundary-condition checks, and Monte-Carlo experiments under heterogeneous detectability are not visible.

minor comments (2)

[Introduction] Define B_w and the four-factor balance explicitly in the introduction before the abstract claim; the current notation for effective detectability and welfare weights appears only later.
[Algorithm section] Clarify the precise projection operator and hypergradient computation steps in the SPAD algorithm description to ensure reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for the careful reading and insightful comments on our paper. We respond to each major comment in turn, indicating where revisions will be made to address the concerns raised.

read point-by-point responses

Referee: The main result (abstract and §3–4) asserts a strict B_w inequality only conditionally on an interior developer optimum, yet no lemma derives sufficient conditions on the detectability function, mitigation-cost curvature, or DP budget that guarantee interiority precisely when heterogeneity and non-comonotonicity hold. Boundary solutions (m_i^*=0 or at upper bound) can arise under sharp elasticity differences, rendering the strict inequality indeterminate and the comparison to the non-strategic baseline inconclusive.

Authors: We appreciate the referee pointing out the conditional nature of our main result. The theorem in Section 3 is indeed stated under the assumption that the developer's optimum is interior, as explicitly noted in the abstract and the theorem statement. We agree that deriving sufficient conditions for interiority would make the result more robust. In the revised version, we will include an additional lemma in the appendix that provides sufficient conditions for an interior optimum, for instance, when the mitigation cost functions are strictly convex and the detectability elasticities are positive and bounded away from zero under the given DP budgets. Additionally, we will discuss boundary cases and note that while the strict inequality may not always hold, the SPAD allocation still provides the minimal B_w. This will clarify the comparison to non-strategic baselines. revision: yes
Referee: The single-level reformulation via the developer's KKT system is offered to avoid circularity, but the manuscript supplies neither an independent numerical benchmark (e.g., direct bilevel solve on low-dimensional instances) nor an explicit verification that the recovered allocation is not tautological with the same welfare and detectability parameters used to define B_w.

Authors: We thank the referee for this observation. The KKT reformulation is derived from the first-order optimality conditions of the developer's problem, which are independent of the auditor's objective, thus avoiding tautology. To address the lack of numerical benchmark, we will add in the revision a low-dimensional numerical example where we compare the solution obtained from the single-level KKT reformulation against a direct nested optimization approach for the bilevel problem. This will serve as an independent verification and demonstrate that the recovered allocations are consistent but derived through different computational paths. revision: yes
Referee: Numerical validation of boundary behavior and robustness of the interior-optimum assumption is absent; the abstract claims strict inequalities and a single-level reformulation, yet the provided derivations, boundary-condition checks, and Monte-Carlo experiments under heterogeneous detectability are not visible.

Authors: We acknowledge that the current manuscript is primarily theoretical and lacks extensive numerical validation for boundary behaviors and the interior assumption. We will incorporate a new experimental section in the revised manuscript featuring Monte-Carlo simulations under heterogeneous detectability settings. These experiments will include boundary condition checks (e.g., cases with sharp elasticity differences leading to m_i^*=0) and robustness tests to illustrate when the interior optimum holds and the strict inequality applies, as well as scenarios where it does not. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained under explicit assumptions

full rationale

The paper defines a bilevel Stackelberg game between auditor and strategic developer, introduces the welfare-weighted under-detection gap B_w as the residual harm at the developer's best response, and proves a conditional strict inequality for naive allocations. The single-level reformulation via the developer's KKT system is a standard equivalence transformation to convert the bilevel program into a single-level constrained optimization; it does not redefine B_w or the inequality in terms of itself. The interiority condition is stated explicitly as part of the 'whenever' clause rather than derived or smuggled. No self-citations, fitted parameters renamed as predictions, or ansatzes appear as load-bearing elements in the abstract or described structure. The central claim therefore rests on the model primitives and the listed assumptions without reducing to a tautology or input by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on standard assumptions from bilevel optimization and Stackelberg games plus domain-specific modeling choices about how developers reallocate mitigation effort; no free parameters or invented physical entities are introduced.

axioms (2)

standard math The developer's best response exists and is characterized by its KKT conditions.
Invoked to obtain the single-level reformulation of the bilevel problem.
domain assumption Effective detectability is heterogeneous across harm dimensions and welfare weights are not comonotone with detectability.
Required for the strict inequality between naive and optimal allocations.

invented entities (1)

welfare-weighted under-detection gap B_w no independent evidence
purpose: Quantify the residual harm that remains undetected after the developer's strategic best response.
New scalar defined to compare audit policies under strategic behavior.

pith-pipeline@v0.9.0 · 5514 in / 1519 out tokens · 31247 ms · 2026-05-11T01:49:31.369456+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 3.2: naïve DP auditing induces strictly larger B_w … under heterogeneous detectability, non-comonotone welfare weights, and interior optimum
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

four-factor balance of welfare weight, audit miss-probability, detectability elasticity, and mitigation-cost curvature

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 1 internal anchor

[1]

Proceedings of the 32nd USENIX Security Symposium , year =

Milad Nasr and Jamie Hayes and Thomas Steinke and Jonathan Hayase and Matthew Jagielski and Abhradeep Thakurta and Alina Oprea and Andreas Terzis and Andreas Marfo and Florian Tramèr , title =. Proceedings of the 32nd USENIX Security Symposium , year =. doi:10.48550/arxiv.2302.07956 , note =

work page doi:10.48550/arxiv.2302.07956
[2]

2022 , howpublished =

Florian Tramèr and Andreas Terzis and Thomas Steinke and Shuang Song and Matthew Jagielski and Nicholas Carlini , title =. 2022 , howpublished =

work page 2022
[3]

Proceedings of the IEEE Symposium on Security and Privacy (S&P) , year =

William Kong and Andrés Muñoz Medina and Mónica Ribero , title =. Proceedings of the IEEE Symposium on Security and Privacy (S&P) , year =

work page
[4]

2015 , journal =

Reza Shokri , title =. 2015 , journal =

work page 2015
[5]

2019 , booktitle =

Milad Nasr and Reza Shokri and Amir Houmansadr , title =. 2019 , booktitle =

work page 2019
[6]

2022 , booktitle =

Jiayuan Ye and Aadyaa Maddi and Sasi Kumar Murakonda and Reza Shokri , title =. 2022 , booktitle =

work page 2022
[7]

2023 , booktitle =

Nicholas Carlini and Jamie Hayes and Milad Nasr and Matthew Jagielski and Vikash Sehwag and Florian Tramèr and Borja Balle and Daphne Ippolito and Eric Wallace , title =. 2023 , booktitle =

work page 2023
[8]

2015 , booktitle =

Matt Fredrikson and Somesh Jha and Thomas Ristenpart , title =. 2015 , booktitle =

work page 2015
[9]

2019 , booktitle =

Jinyuan Jia and Ahmed Salem and Michael Backes and Yang Zhang and Neil Zhenqiang Gong , title =. 2019 , booktitle =

work page 2019
[10]

2022 , booktitle =

Mohammad Naseri and Jamie Hayes and Emiliano De Cristofaro , title =. 2022 , booktitle =

work page 2022
[11]

Denison and H

Natalia Ponomareva and Hussein Hazimeh and Alexey Kurakin and Zheng Xu and Carson E. Denison and H. Brendan McMahan and Sergei Vassilvitskii and Steve Chien and Abhradeep Thakurta , title =. 2023 , journal =

work page 2023
[12]

Duchi and Michael I

John C. Duchi and Michael I. Jordan and Martin J. Wainwright , title =. 2013 , booktitle =

work page 2013
[13]

2018 , booktitle =

Mark Bun and Jelani Nelson and Uri Stemmer , title =. 2018 , booktitle =

work page 2018
[14]

2016 , booktitle =

Raef Bassily and Kobbi Nissim and Adam Smith and Thomas Steinke and Uri Stemmer and Jonathan Ullman , title =. 2016 , booktitle =

work page 2016
[15]

2019 , booktitle =

Jingcheng Liu and Kunal Talwar , title =. 2019 , booktitle =

work page 2019
[16]

2018 , booktitle =

Jaewoo Lee and Daniel Kifer , title =. 2018 , booktitle =

work page 2018
[17]

Su , title =

Zhiqi Bu and Jinshuo Dong and Qi Long and Weijie J. Su , title =. 2020 , journal =

work page 2020
[18]

IEEE Computer Security Foundations Symposium (CSF) , year =

Ilya Mironov , title =. IEEE Computer Security Foundations Symposium (CSF) , year =

work page
[19]

2019 , journal =

Cynthia Dwork and Nitin Kohli and Deirdre Mulligan , title =. 2019 , journal =

work page 2019
[20]

2018 , booktitle =

Jinshuo Dong and Aaron Roth and Zachary Schutzman and Bo Waggoner and Zhiwei Steven Wu , title =. 2018 , booktitle =

work page 2018
[21]

Procaccia and Nisarg Shah , title =

Yiling Chen and Chara Podimata and Ariel D. Procaccia and Nisarg Shah , title =. 2018 , booktitle =

work page 2018
[22]

Procaccia and Arunesh Sinha , title =

Jeremiah Blocki and Nicolas Christin and Anupam Datta and Ariel D. Procaccia and Arunesh Sinha , title =. 2015 , booktitle =

work page 2015
[23]

2012 , booktitle =

Kobbi Nissim and Claudio Orlandi and Rann Smorodinsky , title =. 2012 , booktitle =

work page 2012
[24]

2013 , booktitle =

Arpita Ghosh and Katrina Ligett , title =. 2013 , booktitle =

work page 2013
[25]

2011 , journal =

Dmytro Korzhyk and Zhengyu Yin and Christopher Kiekintveld and Vincent Conitzer and Milind Tambe , title =. 2011 , journal =

work page 2011
[26]

2019 , journal =

Chao Yan and Bo Li and Yevgeniy Vorobeychik and Aron Laszka and Daniel Fabbri and Bradley Malin , title =. 2019 , journal =

work page 2019
[27]

2017 , booktitle =

Aaron Schlenker and Haifeng Xu and Mina Guirguis and Christopher Kiekintveld and Arunesh Sinha and Milind Tambe and Solomon Sonya and Darryl Balderas and Noah Dunstatter , title =. 2017 , booktitle =

work page 2017
[28]

2016 , booktitle =

Moritz Hardt and Nimrod Megiddo and Christos Papadimitriou and Mary Wootters , title =. 2016 , booktitle =

work page 2016
[29]

Perdomo and Tijana Zrnic and Celestine Mendler-Dünner and Moritz Hardt , title =

Juan C. Perdomo and Tijana Zrnic and Celestine Mendler-Dünner and Moritz Hardt , title =. 2020 , booktitle =

work page 2020
[30]

Co-designing for Compliance: Multi-party Computation Protocols for Post-Market Fairness Monitoring in Algorithmic Hiring

Changyang He and Nina Baranowska and Josu Andoni Eguíluz Castañeira and Guillem Escriba and Matthias Juentgen and Anna Via and Frederik Zuiderveen Borgesius and Asia J. Biega , title =. 2026 , journal =. doi:10.48550/arXiv.2602.01837 , note =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.01837 2026
[31]

2023 , booktitle =

Yating Yang and Tao Zhang and Quanyan Zhu , title =. 2023 , booktitle =

work page 2023
[32]

2026 , booktitle =

Sanmay Das and Fang-Yi Yu and Yuang Zhang , title =. 2026 , booktitle =

work page 2026
[33]

Mathematical programs with equilibrium constraints

Luo, Zhi-Quan and Pang, Jong-Shi and Ralph, Daniel. Mathematical programs with equilibrium constraints

work page