When Determinants Are Not Enough: Private Rare Switching
Pith reviewed 2026-05-25 05:02 UTC · model grok-4.3
The pith
A generalized Rayleigh quotient rule restores logarithmic rare switching and confidence-width comparisons in private linear bandits after privacy noise breaks determinant monotonicity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The generalized Rayleigh quotient supplies a rare-switching rule that, even after Gaussian privacy noise perturbs the design matrix, keeps the number of policy updates logarithmic in the horizon and ensures that the maximum directional width of the confidence ellipsoid satisfies the comparison needed for regret analysis up to a constant multiplicative factor.
What carries the argument
The generalized Rayleigh quotient rare-switching rule, which monitors the largest directional stretch of the perturbed matrix rather than its volume.
If this is right
- Only a logarithmic number of policy updates occur over the entire horizon even when privacy noise is present.
- Regret bounds for private linear bandits and RL follow from the same analysis as the non-private case, multiplied by a fixed constant.
- The same switching schedule works for both linear bandits and the linear-function-approximation RL setting.
- Privacy noise no longer forces a switch to linear-in-horizon update schedules.
Where Pith is reading between the lines
- The same directional-control idea could apply to other adaptive algorithms whose analysis relies on monotonic matrix growth but must tolerate additive noise.
- Empirical checks on synthetic private data streams could measure how large the hidden constant actually becomes in moderate dimensions.
- Matrix perturbation bounds already known for eigenvalues may directly quantify the constant factor lost to privacy noise.
Load-bearing premise
The generalized Rayleigh quotient still controls the worst single direction tightly enough for the regret proof to close after the privacy noise is added.
What would settle it
A calculation or simulation in which the maximum directional confidence width under the Rayleigh quotient rule exceeds the non-private width by more than a small constant factor for some sequence of private design matrices would falsify the claim.
read the original abstract
In this note, I would like to share a small research moment where Codex helped me find the right way to adapt rare switching to the private setting. The standard determinant-based update rule in linear bandits and RL works beautifully because the design matrix grows monotonically. But once Gaussian noise is added for privacy, this monotonicity can fail, and the usual analysis no longer goes through. The key reason is that determinant growth controls volume, while regret analysis needs control of the worst direction. To address this, Codex comes up with a different rare-switching rule based on the generalized Rayleigh quotient, which restores logarithmic policy updates and the desired confidence-width comparison up to a constant factor. I present my manually clean-up version of the proof here as well as some personal reflection on this example.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that Gaussian privacy noise breaks the monotonicity of the design matrix determinant, invalidating standard rare-switching rules in linear bandits and RL; it proposes a generalized Rayleigh quotient update rule that restores logarithmic policy switches and a confidence-width comparison (up to constant factor) sufficient for the regret analysis to close, and presents a manually cleaned-up proof of this claim.
Significance. If the central claim holds, the result supplies a concrete mechanism for adapting volume-control techniques to the private setting while preserving the logarithmic switch count and regret bounds needed for private linear bandits and RL. The explicit presentation of the cleaned-up proof is a strength, as it allows direct inspection of the derivation from first principles of worst-direction control.
major comments (1)
- [Proof section (generalized Rayleigh quotient rule)] The central claim (that the generalized Rayleigh quotient restores a confidence-width comparison up to constant factor after noise addition) is load-bearing for the regret bound, yet the manuscript does not state the precise comparison (e.g., how the maximum eigenvalue or worst-direction width is bounded relative to the non-private case) nor the dependence of the constant on noise variance and dimension. This matches the weakest assumption identified in the stress test and prevents verification that the logarithmic switch count survives.
minor comments (2)
- [Abstract] The title and abstract refer to 'Codex' and 'personal reflection'; for a formal journal submission these could be rephrased to focus on the technical contribution.
- [Proof section] Notation for the generalized Rayleigh quotient is introduced without an explicit equation label; adding an equation number would aid cross-reference in the proof.
Simulated Author's Rebuttal
We thank the referee for their careful reading of the manuscript and for identifying the need for greater explicitness in the proof of the generalized Rayleigh quotient rule. We address the major comment below and will incorporate the requested clarifications in a revised version.
read point-by-point responses
-
Referee: [Proof section (generalized Rayleigh quotient rule)] The central claim (that the generalized Rayleigh quotient restores a confidence-width comparison up to constant factor after noise addition) is load-bearing for the regret bound, yet the manuscript does not state the precise comparison (e.g., how the maximum eigenvalue or worst-direction width is bounded relative to the non-private case) nor the dependence of the constant on noise variance and dimension. This matches the weakest assumption identified in the stress test and prevents verification that the logarithmic switch count survives.
Authors: We agree that the manuscript would be strengthened by an explicit statement of the comparison and the dependence of the constant. The proof already establishes that the generalized Rayleigh quotient yields a confidence-width comparison up to a constant factor (via the worst-direction volume control property), but we will add a dedicated lemma in the revision that states the bound precisely: the worst-direction width after noise addition is at most C times the non-private width, where C depends on the noise variance σ² and dimension d through the term 1 + O(σ² d / λ_min(V_t)). This follows from the eigenvalue perturbation induced by the Gaussian noise and the definition of the quotient, directly implying that the logarithmic switch count is preserved up to constants independent of T. revision: yes
Circularity Check
No circularity: derivation of generalized Rayleigh quotient rule is self-contained
full rationale
The paper derives the rare-switching rule from first principles of volume control versus worst-direction control after Gaussian noise addition. No steps reduce by construction to fitted inputs, self-citations, or renamed known results; the generalized Rayleigh quotient is introduced as a new adaptation to restore the needed comparison up to a constant factor. The central claim does not rely on load-bearing self-citation or ansatz smuggling. This matches the default expectation of a non-circular paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
International Conference on Learning Representations , volume=
On differentially private federated linear contextual bandits , author=. International Conference on Learning Representations , volume=
-
[2]
Towards Differentially Private Reinforcement Learning with General Function Approximation
Towards Differentially Private Reinforcement Learning with General Function Approximation , author=. arXiv preprint arXiv:2605.07049 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
International Conference on Machine Learning , pages=
Shuffle Private Linear Contextual Bandits , author=. International Conference on Machine Learning , pages=. 2022 , organization=
work page 2022
-
[4]
Advances in neural information processing systems , volume=
Improved algorithms for linear stochastic bandits , author=. Advances in neural information processing systems , volume=
-
[5]
Between Pure and Approximate Differential Privacy
Between pure and approximate differential privacy , author=. arXiv preprint arXiv:1501.06095 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
arXiv preprint arXiv:2305.18465 , year=
Federated Learning of Gboard Language Models with Differential Privacy , author=. arXiv preprint arXiv:2305.18465 , year=
-
[7]
Zhou, Xingyu , title =
-
[8]
Thirty-seventh Conference on Neural Information Processing Systems , year=
Gradient Descent with Linearly Correlated Noise: Theory and Applications to Differential Privacy , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=
-
[9]
Optimizing error of high-dimensional statistical queries under differential privacy
Optimizing error of high-dimensional statistical queries under differential privacy , author=. arXiv preprint arXiv:1808.03537 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
Convex optimization for linear query processing under approximate differential privacy , author=. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=
-
[11]
Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing , pages=
The power of factorization mechanisms in local and central differential privacy , author=. Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing , pages=
-
[12]
The matrix mechanism: optimizing linear counting queries under differential privacy , author=. The VLDB journal , volume=. 2015 , publisher=
work page 2015
-
[13]
Advances in Neural Information Processing Systems , volume=
Improved differential privacy for sgd via optimal private linear operators on adaptive streams , author=. Advances in Neural Information Processing Systems , volume=
-
[14]
Theory and Practice of Differential Privacy (TPDP 2015), London, UK , volume=
Efficient use of differentially private binary trees , author=. Theory and Practice of Differential Privacy (TPDP 2015), London, UK , volume=
work page 2015
-
[15]
Proceedings of the forty-second ACM symposium on Theory of computing , pages=
Differential privacy under continual observation , author=. Proceedings of the forty-second ACM symposium on Theory of computing , pages=
- [16]
-
[17]
arXiv preprint arXiv:2207.03106 , year=
A Simple and Provably Efficient Algorithm for Asynchronous Federated Contextual Linear Bandits , author=. arXiv preprint arXiv:2207.03106 , year=
-
[18]
Advances in Neural Information Processing Systems , volume=
Federated linear contextual bandits , author=. Advances in Neural Information Processing Systems , volume=
-
[19]
Proceedings of the 5th conference on Innovations in theoretical computer science , pages=
Mechanism design in large games: Incentives and privacy , author=. Proceedings of the 5th conference on Innovations in theoretical computer science , pages=
-
[20]
International Conference on Machine Learning , pages=
Kernel methods for cooperative multi-agent contextual bandits , author=. International Conference on Machine Learning , pages=. 2020 , organization=
work page 2020
-
[21]
arXiv preprint arXiv:2206.05772 , year=
Distributed Differential Privacy in Multi-Armed Bandits , author=. arXiv preprint arXiv:2206.05772 , year=
-
[22]
arXiv preprint arXiv:2210.00597 , year=
Composition of Differential Privacy & Privacy Amplification by Subsampling , author=. arXiv preprint arXiv:2210.00597 , year=
-
[23]
Proceedings of the 39th International Conference on Machine Learning , pages =
Shuffle Private Linear Contextual Bandits , author =. Proceedings of the 39th International Conference on Machine Learning , pages =. 2022 , volume =
work page 2022
-
[24]
arXiv preprint arXiv:1806.06035 , year=
Customized local differential privacy for multi-agent distributed optimization , author=. arXiv preprint arXiv:1806.06035 , year=
-
[25]
arXiv preprint arXiv:2208.04591 , year=
Stronger Privacy Amplification by Shuffling for R 'enyi and Approximate Differential Privacy , author=. arXiv preprint arXiv:2208.04591 , year=
-
[26]
International Conference on Artificial Intelligence and Statistics , pages=
Federated f-differential privacy , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2021 , organization=
work page 2021
-
[27]
arXiv preprint arXiv:2206.07902 , year=
On Privacy and Personalization in Cross-Silo Federated Learning , author=. arXiv preprint arXiv:2206.07902 , year=
-
[28]
arXiv preprint arXiv:2203.06735 , year=
Private Non-Convex Federated Learning Without a Trusted Server , author=. arXiv preprint arXiv:2203.06735 , year=
-
[29]
International Conference on Machine Learning , pages=
Provably efficient exploration in policy optimization , author=. International Conference on Machine Learning , pages=. 2020 , organization=
work page 2020
-
[30]
2022 IEEE Wireless Communications and Networking Conference (WCNC) , pages=
FedQOGD: Federated Quantized Online Gradient Descent with Distributed Time-Series Data , author=. 2022 IEEE Wireless Communications and Networking Conference (WCNC) , pages=. 2022 , organization=
work page 2022
-
[31]
Proceedings of the 2016 ACM SIGSAC conference on computer and communications security , pages=
Deep learning with differential privacy , author=. Proceedings of the 2016 ACM SIGSAC conference on computer and communications security , pages=
work page 2016
-
[32]
A General Approach to Adding Differential Privacy to Iterative Training Procedures
A general approach to adding differential privacy to iterative training procedures , author=. arXiv preprint arXiv:1812.06210 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[33]
Learning Differentially Private Recurrent Language Models
Learning differentially private recurrent language models , author=. arXiv preprint arXiv:1710.06963 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[34]
Distributed bandit learning: How much communication is needed to achieve (near) optimal regret , author=. ICLR , year=
-
[35]
arXiv preprint arXiv:1911.09564 , year=
Parameter-free locally differentially private stochastic subgradient descent , author=. arXiv preprint arXiv:1911.09564 , year=
-
[36]
Online convex optimization over Erdos-R
Lei, Jinlong and Yi, Peng and Hong, Yiguang and Chen, Jie and Shi, Guodong , journal=. Online convex optimization over Erdos-R
-
[37]
International Conference on Machine Learning , pages=
Projection-free Distributed Online Convex Optimization with Communication Complexity , author=. International Conference on Machine Learning , pages=. 2020 , organization=
work page 2020
-
[38]
2013 IEEE 54th Annual Symposium on Foundations of Computer Science , pages=
Local privacy and statistical minimax rates , author=. 2013 IEEE 54th Annual Symposium on Foundations of Computer Science , pages=. 2013 , organization=
work page 2013
-
[39]
2014 IEEE 55th annual symposium on foundations of computer science , pages=
Private empirical risk minimization: Efficient algorithms and tight error bounds , author=. 2014 IEEE 55th annual symposium on foundations of computer science , pages=. 2014 , organization=
work page 2014
-
[40]
Advances in Neural Information Processing Systems , volume=
Federated Bayesian optimization via Thompson sampling , author=. Advances in Neural Information Processing Systems , volume=
-
[41]
Advances in Neural Information Processing Systems , volume=
Differentially private federated Bayesian optimization with distributed exploration , author=. Advances in Neural Information Processing Systems , volume=
-
[42]
A Modern Introduction to Online Learning
A modern introduction to online learning , author=. arXiv preprint arXiv:1912.13213 , year=
work page internal anchor Pith review Pith/arXiv arXiv 1912
-
[43]
Advances in Neural Information Processing Systems , volume=
(Nearly) optimal algorithms for private online learning in full-information and bandit settings , author=. Advances in Neural Information Processing Systems , volume=
-
[44]
Conference on Learning Theory , pages=
Differentially private online learning , author=. Conference on Learning Theory , pages=. 2012 , organization=
work page 2012
-
[45]
ACM Transactions on Information and System Security (TISSEC) , volume=
Private and continual release of statistics , author=. ACM Transactions on Information and System Security (TISSEC) , volume=. 2011 , publisher=
work page 2011
-
[46]
arXiv preprint arXiv:2202.01292 , year=
Improved Regret for Differentially Private Exploration in Linear MDP , author=. arXiv preprint arXiv:2202.01292 , year=
- [47]
-
[48]
SIAM Journal on Optimization , volume=
Differentially private accelerated optimization algorithms , author=. SIAM Journal on Optimization , volume=. 2022 , publisher=
work page 2022
-
[49]
Advances in Neural Information Processing Systems , volume=
Scalable generalized linear bandits: Online computation and hashing , author=. Advances in Neural Information Processing Systems , volume=
-
[50]
arXiv preprint arXiv:2202.01087 , year=
Communication Efficient Federated Learning for Generalized Linear Bandits , author=. arXiv preprint arXiv:2202.01087 , year=
-
[51]
Advances in Neural Information Processing Systems , volume=
Differentially-private federated linear bandits , author=. Advances in Neural Information Processing Systems , volume=
-
[52]
Journal of Machine Learning Research , volume=
A contextual bandit bake-off , author=. Journal of Machine Learning Research , volume=
-
[53]
International Conference on Machine Learning , pages=
Practical and private (deep) learning without sampling or shuffling , author=. International Conference on Machine Learning , pages=. 2021 , organization=
work page 2021
-
[54]
Advances in Neural Information Processing Systems , volume=
Stochastic Online Linear Regression: the Forward Algorithm to Replace Ridge , author=. Advances in Neural Information Processing Systems , volume=
-
[55]
Advances in Neural Information Processing Systems , volume=
Competitive on-line linear regression , author=. Advances in Neural Information Processing Systems , volume=
-
[56]
A generalized online mirror descent with applications to classification and regression , author=. Machine Learning , volume=. 2015 , publisher=
work page 2015
-
[57]
Advances in Neural Information Processing Systems , volume=
Efficient learning of generalized linear and single index models with isotonic regression , author=. Advances in Neural Information Processing Systems , volume=
-
[58]
Associative reinforcement learning using linear probabilistic concepts , author=. ICML , pages=. 1999 , organization=
work page 1999
-
[59]
International Conference on Machine Learning , pages=
Beyond ucb: Optimal and efficient contextual bandits with regression oracles , author=. International Conference on Machine Learning , pages=. 2020 , organization=
work page 2020
-
[60]
Mathematics of Operations Research , year=
Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability , author=. Mathematics of Operations Research , year=
-
[61]
arXiv preprint arXiv:2002.01919 , year=
Pure differentially private summation from anonymous messages , author=. arXiv preprint arXiv:2002.01919 , year=
-
[62]
2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS) , pages=
Hiding among the clones: A simple and nearly optimal analysis of privacy amplification by shuffling , author=. 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS) , pages=. 2022 , organization=
work page 2021
-
[63]
Annual International Cryptology Conference , pages=
The privacy blanket of the shuffle model , author=. Annual International Cryptology Conference , pages=. 2019 , organization=
work page 2019
-
[64]
Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms , pages=
Amplification by shuffling: From local to central differential privacy via anonymity , author=. Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms , pages=. 2019 , organization=
work page 2019
-
[65]
Annual International Conference on the Theory and Applications of Cryptographic Techniques , pages=
Private aggregation from fewer anonymous messages , author=. Annual International Conference on the Theory and Applications of Cryptographic Techniques , pages=. 2020 , organization=
work page 2020
-
[66]
Annual International Cryptology Conference , pages=
Distributed private data analysis: Simultaneously solving how and what , author=. Annual International Cryptology Conference , pages=. 2008 , organization=
work page 2008
-
[67]
European Symposium on Algorithms , pages=
Optimal lower bound for differentially private multi-party aggregation , author=. European Symposium on Algorithms , pages=. 2012 , organization=
work page 2012
-
[68]
Annual International Conference on the Theory and Applications of Cryptographic Techniques , pages=
Distributed differential privacy via shuffling , author=. Annual International Conference on the Theory and Applications of Cryptographic Techniques , pages=. 2019 , organization=
work page 2019
-
[69]
Tor: The second-generation onion router , author=. 2004 , institution=
work page 2004
-
[70]
arXiv preprint arXiv:2110.10133 , year=
Locally Differentially Private Reinforcement Learning for Linear Mixture Markov Decision Processes , author=. arXiv preprint arXiv:2110.10133 , year=
-
[71]
arXiv preprint arXiv:2112.01585 , year=
Differentially Private Exploration in Reinforcement Learning with Linear Representation , author=. arXiv preprint arXiv:2112.01585 , year=
-
[72]
Zhou, Xingyu , title =. Proc. ACM Meas. Anal. Comput. Syst. , month =. 2022 , issue_date =
work page 2022
-
[73]
2021 IEEE International Symposium on Information Theory (ISIT) , pages=
Adaptive control of differentially private linear quadratic systems , author=. 2021 IEEE International Symposium on Information Theory (ISIT) , pages=. 2021 , organization=
work page 2021
-
[74]
arXiv preprint arXiv:2112.10599 , year=
Differentially Private Regret Minimization in Episodic Markov Decision Processes , author=. arXiv preprint arXiv:2112.10599 , year=
-
[75]
Advances in Neural Information Processing Systems , volume=
Local Differential Privacy for Regret Minimization in Reinforcement Learning , author=. Advances in Neural Information Processing Systems , volume=
-
[76]
International Conference on Machine Learning , pages=
Private reinforcement learning with pac and regret guarantees , author=. International Conference on Machine Learning , pages=. 2020 , organization=
work page 2020
-
[77]
Proceedings of the AAAI Conference on Artificial Intelligence , author=
Local Differential Privacy for Bayesian Optimization , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2021 , month=
work page 2021
-
[78]
Cascading Bandit Under Differential Privacy , author=. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=. 2022 , organization=
work page 2022
-
[79]
International Conference on Machine Learning , pages=
(Locally) Differentially Private Combinatorial Semi-Bandits , author=. International Conference on Machine Learning , pages=. 2020 , organization=
work page 2020
-
[80]
International Conference on Artificial Intelligence and Statistics , pages=
Optimal rates of (locally) differentially private heavy-tailed multi-armed bandits , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2022 , organization=
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.