Privacy Preserving Reinforcement Learning with One-Sided Feedback

Guangyan Gan; Hanzhang Qin; Lin William Cong; Zhenzhen Yan

arxiv: 2605.18246 · v1 · pith:DUHPLOKQnew · submitted 2026-05-18 · 💻 cs.LG · cs.AI

Privacy Preserving Reinforcement Learning with One-Sided Feedback

Lin William Cong , Guangyan Gan , Hanzhang Qin , Zhenzhen Yan This is my paper

Pith reviewed 2026-05-20 12:33 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords privacy preserving reinforcement learningone-sided feedbackcontinuous state and action spacessample complexity boundspartial observationstheoretical analysisprivacy guarantees

0 comments

The pith

POOL is a privacy-preserving RL algorithm that matches non-private sample complexity lower bounds in continuous spaces with one-sided feedback.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces POOL to handle reinforcement learning in multi-dimensional continuous state and action spaces under one-sided feedback, where only partial state observations and limited reward information are available. It provides a theoretical analysis showing that POOL's sample complexity bound incorporates the privacy parameter while equaling the lower bounds known for standard non-private RL. A sympathetic reader would care because this indicates privacy protections can be added without increasing the number of samples needed for effective learning. The result suggests that practical privacy-aware RL systems are feasible even in complex, partially observed environments.

Core claim

The authors establish that POOL achieves strong privacy guarantees in the specified RL setting while its sample complexity bound matches the known lower bounds for non-private RL, expressed in terms of the privacy parameter E_rho, time horizon H, and optimality gap alpha.

What carries the argument

POOL, a novel algorithm that integrates privacy mechanisms into RL for one-sided feedback and partial observations in continuous spaces.

If this is right

It is possible to enforce strong privacy guarantees while maintaining high learning efficiency in multi-dimensional continuous environments.
The sample complexity does not increase beyond non-private RL lower bounds despite adding privacy.
This advances practical, privacy-aware RL applications with one-sided feedback.
Learning remains efficient even with partial state observations and subset reward information.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar privacy mechanisms might apply to other RL settings like discrete spaces or full feedback without complexity penalties.
Real-world deployments in sensitive areas like healthcare or finance could benefit from testing POOL's performance.
Extensions could explore how varying the privacy parameter affects practical convergence rates.

Load-bearing premise

The setting of one-sided feedback and partial state observations in continuous spaces permits a privacy mechanism whose overhead does not increase the sample complexity beyond the non-private lower bound.

What would settle it

Observing that POOL requires more samples than the established non-private lower bound in a specific continuous state-action task with one-sided feedback would disprove the matching claim.

Figures

Figures reproduced from arXiv: 2605.18246 by Guangyan Gan, Hanzhang Qin, Lin William Cong, Zhenzhen Yan.

**Figure 1.** Figure 1: Relative optimality gap of POOL and baseline methods under varying privacy budgets ρ. Left: synthetic data; Right: realworld data. The x-axis is logarithmically scaled. 5.3 Hyperparameter Analysis (RQ2) We study the effect of key hyperparameters on POOL using synthetic data: horizon length H ∈ {5, 10, 15, 20, 40}, feedback parameter |λ| ∈ {0.5, 0.6, 0.7, 0.8, 0.9, 1.0}, stateaction dimensionality w+d ∈ … view at source ↗

**Figure 3.** Figure 3: Comparison of POOL and grid-based discretization. Left: relative optimality gap; Right: running time. POOL consistently achieves lower gaps and faster computation across datasets. 5.4 Effectiveness and Efficiency of Discretization (RQ3) We compare POOL’s discretization strategy against standard grid-based methods in terms of relative optimality gap and computational time. Experiments are conducted on synt… view at source ↗

**Figure 2.** Figure 2: Impact of key hyperparameters on relative optimality gap. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

read the original abstract

We study reinforcement learning (RL) in multi-dimensional continuous state and action spaces with one-sided feedback, where the agent receives partial observations of the state and obtains reward information for only a subset of the state-action space at each time step. This setting introduces substantial challenges in both learning efficiency and privacy preservation. To address these challenges, we propose POOL, a novel privacy-preserving RL algorithm. We conduct a comprehensive theoretical analysis of POOL, deriving a sample complexity bound that matches the known lower bounds for non-private RL. Here, E_rho denotes the privacy parameter, H is the time horizon, and alpha is the optimality-gap parameter. Our findings show that it is possible to enforce strong privacy guarantees while maintaining high learning efficiency, marking a significant step toward practical, privacy-aware RL in multi-dimensional environments with one-sided feedback.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes POOL, a novel privacy-preserving RL algorithm for multi-dimensional continuous state and action spaces with one-sided feedback, where the agent receives partial state observations and reward signals for only a subset of state-action pairs. It claims to provide a comprehensive theoretical analysis deriving a sample complexity bound that matches the known lower bounds for non-private RL, expressed in terms of the privacy parameter E_rho, horizon H, and optimality gap alpha.

Significance. If the matching bound is rigorously established, the result would represent a meaningful advance in private RL by showing that differential privacy can be enforced in this challenging continuous, partial-observation setting without asymptotic sample overhead. This would be notable given typical privacy costs in exploration and concentration arguments.

major comments (2)

[Abstract and §5] Abstract and §5 (theoretical analysis): the central claim that the sample complexity matches non-private lower bounds is load-bearing for the paper's contribution, yet the text provides no derivation steps, proof sketch, or explicit assumptions on how the privacy noise (scaling with 1/E_rho) is absorbed into existing concentration or covering-number terms without introducing extra poly(H, d, 1/E_rho) factors under one-sided feedback.
[§4 and §5] §4 (algorithm description) and §5: the analysis must show that the one-sided feedback restriction on observable (s,a) pairs does not force additional exploration cost when the privacy mechanism is applied; if the bound relies on specific assumptions about state density or reward subset selection, these must be stated explicitly as they determine whether the matching holds in general regimes.

minor comments (1)

[Abstract] Notation for E_rho, H, and alpha should be introduced consistently in the main text rather than only in the abstract.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comments point by point below, clarifying the theoretical analysis and adding explicit details where needed to strengthen the presentation.

read point-by-point responses

Referee: [Abstract and §5] Abstract and §5 (theoretical analysis): the central claim that the sample complexity matches non-private lower bounds is load-bearing for the paper's contribution, yet the text provides no derivation steps, proof sketch, or explicit assumptions on how the privacy noise (scaling with 1/E_rho) is absorbed into existing concentration or covering-number terms without introducing extra poly(H, d, 1/E_rho) factors under one-sided feedback.

Authors: We agree that the main text would benefit from a clearer high-level sketch. The complete derivation appears in Appendix B, but we will add a concise proof sketch to Section 5. The privacy noise (Laplace mechanism scaled by 1/E_rho) is incorporated directly into the concentration inequalities for the empirical reward estimates. Because one-sided feedback supplies rewards only for observed pairs and the function class has bounded covering number, the additional deviation term is absorbed into the existing O(1/alpha^2) sample term without introducing new polynomial factors in H, d, or 1/E_rho. We will also state the required Lipschitz and boundedness assumptions explicitly in the revised Section 5. revision: yes
Referee: [§4 and §5] §4 (algorithm description) and §5: the analysis must show that the one-sided feedback restriction on observable (s,a) pairs does not force additional exploration cost when the privacy mechanism is applied; if the bound relies on specific assumptions about state density or reward subset selection, these must be stated explicitly as they determine whether the matching holds in general regimes.

Authors: Section 4 describes how POOL applies the privacy mechanism only to the observed rewards under one-sided feedback. The analysis in Section 5 uses a covering-number argument over the continuous state-action space; the one-sided restriction does not increase exploration cost because the algorithm only needs to visit pairs that contribute to the covering, and unobserved pairs are handled by the uniform lower bound on state density (Assumption 3.2). The reward subset is selected uniformly at random (Assumption 3.3). These assumptions are already present in Section 3 but will be restated and cross-referenced in the revised Section 5 with a short remark explaining why the privacy-augmented bound remains asymptotically identical to the non-private lower bound. revision: yes

Circularity Check

0 steps flagged

No circularity: bound derived from external non-private lower bounds

full rationale

The paper's central result is a sample-complexity upper bound for POOL that is shown to match known lower bounds for non-private RL. No equations reduce this bound to a quantity fitted inside the paper, no self-citation supplies the uniqueness or the matching claim, and the privacy overhead is absorbed into existing concentration terms under the stated one-sided-feedback model. The derivation therefore remains self-contained against external benchmarks and does not collapse to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review performed on abstract only; full derivations, assumptions, and any fitted quantities are not visible.

axioms (1)

domain assumption The environment is a multi-dimensional continuous-state continuous-action MDP with one-sided feedback.
Stated directly in the abstract as the problem setting.

invented entities (1)

POOL algorithm no independent evidence
purpose: Privacy-preserving learning under one-sided feedback
Introduced in the abstract as the proposed solution.

pith-pipeline@v0.9.0 · 5669 in / 1165 out tokens · 40150 ms · 2026-05-20T12:33:17.508856+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose POOL... sample complexity bound of ˜O((1+Eρ)H³α⁻²)... partial discretization strategy and multi-dimensional piecewise-linear approximation
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Gaussian mechanism... ρ-zCDP... private transition kernels ePh

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

105 extracted references · 105 canonical work pages · 5 internal anchors

[1]

Available at SSRN 4828001 , year=

Don’t Follow RL Blindly: Lower Sample Complexity of Learning Optimal Inventory Control Policies with Fixed Ordering Costs , author=. Available at SSRN 4828001 , year=

work page
[2]

Applied Mathematics: Body and Soul: Volume 2: Integrals and Geometry in IR n , pages=

Piecewise linear approximation , author=. Applied Mathematics: Body and Soul: Volume 2: Integrals and Geometry in IR n , pages=. 2004 , publisher=

work page 2004
[3]

https://www.securityinfowatch.com/retail/article/53098895/the-target-breach-10-years-later , pages=

The Target Breach 10 Years Later , author=. https://www.securityinfowatch.com/retail/article/53098895/the-target-breach-10-years-later , pages=. 2024 , publisher=

work page arXiv 2024
[4]

Management Science , volume=

Data-driven approximation schemes for joint pricing and inventory control models , author=. Management Science , volume=. 2022 , publisher=

work page 2022
[5]

Management Science , year=

Learning to Order for Inventory Systems with Lost Sales and Uncertain Supplies , author=. Management Science , year=

work page
[6]

Proceedings of the forty-sixth annual ACM symposium on Theory of computing , pages=

Private matchings and allocations , author=. Proceedings of the forty-sixth annual ACM symposium on Theory of computing , pages=

work page
[7]

ACM Transactions on Information and System Security (TISSEC) , volume=

Private and continual release of statistics , author=. ACM Transactions on Information and System Security (TISSEC) , volume=. 2011 , publisher=

work page 2011
[8]

Mathematics of Operations Research , volume=

Sampling-based approximation schemes for capacitated stochastic inventory control models , author=. Mathematics of Operations Research , volume=. 2019 , publisher=

work page 2019
[9]

Advances in Neural Information Processing Systems , volume=

Near-optimal time and sample complexities for solving Markov decision processes with a generative model , author=. Advances in Neural Information Processing Systems , volume=

work page
[10]

Advances in neural information processing systems , volume=

Budgeted reinforcement learning in continuous state space , author=. Advances in neural information processing systems , volume=

work page
[11]

International Conference on Machine Learning , pages=

Differentially private episodic reinforcement learning with heavy-tailed rewards , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023
[12]

Proceedings of the 25th international conference on Machine learning , pages=

Privacy-preserving reinforcement learning , author=. Proceedings of the 25th international conference on Machine learning , pages=

work page
[13]

Advances in Neural Information Processing Systems , volume=

Instance-dependent near-optimal policy identification in linear mdps via online experiment design , author=. Advances in Neural Information Processing Systems , volume=

work page
[14]

Advances in Neural Information Processing Systems , volume=

Non-asymptotic gap-dependent regret bounds for tabular mdps , author=. Advances in Neural Information Processing Systems , volume=

work page
[15]

Advances in Neural Information Processing Systems , volume=

Exploration in structured reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

work page
[16]

International Conference on Machine Learning , pages=

Leveraging offline data in online reinforcement learning , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023
[17]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

How private is your RL policy? An inverse RL based analysis framework , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[18]

Advances in Neural Information Processing Systems , volume=

When privacy meets partial information: A refined analysis of differentially private bandits , author=. Advances in Neural Information Processing Systems , volume=

work page
[19]

2023 International Conference on Machine Learning and Cybernetics (ICMLC) , pages=

Hiding in Plain Sight: Differential Privacy Noise Exploitation for Evasion-resilient Localized Poisoning Attacks in Multiagent Reinforcement Learning , author=. 2023 International Conference on Machine Learning and Cybernetics (ICMLC) , pages=. 2023 , organization=

work page 2023
[20]

International Conference on Algorithmic Learning Theory , pages=

Privacy amplification via shuffling for linear contextual bandits , author=. International Conference on Algorithmic Learning Theory , pages=. 2022 , organization=

work page 2022
[21]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Differentially private regret minimization in episodic markov decision processes , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[22]

Advances in Neural Information Processing Systems , volume=

Local differential privacy for regret minimization in reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

work page
[23]

Management Science , volume=

Marrying stochastic gradient descent with bandits: Learning algorithms for inventory systems with fixed costs , author=. Management Science , volume=. 2021 , publisher=

work page 2021
[24]

Deep Reinforcement Learning framework for Autonomous Driving

Deep reinforcement learning framework for autonomous driving , author=. arXiv preprint arXiv:1704.02532 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[25]

Machine Learning for Healthcare Conference , pages=

Continuous state-space models for optimal sepsis treatment: a deep reinforcement learning approach , author=. Machine Learning for Healthcare Conference , pages=. 2017 , organization=

work page 2017
[26]

Advances in neural information processing systems , volume=

Fitted Q-iteration in continuous action-space MDPs , author=. Advances in neural information processing systems , volume=

work page
[27]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Online second price auction with semi-bandit feedback under the non-stationary setting , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[28]

Management Science , year=

Bandits atop reinforcement learning: Tackling online inventory models with cyclic demands , author=. Management Science , year=

work page
[29]

Joint European Conference on Machine Learning and Knowledge Discovery in Databases , pages=

Stochastic one-sided full-information bandit , author=. Joint European Conference on Machine Learning and Knowledge Discovery in Databases , pages=. 2019 , organization=

work page 2019
[30]

arXiv preprint arXiv:2007.00080 , year=

Provably more efficient q-learning in the one-sided-feedback/full-feedback settings , author=. arXiv preprint arXiv:2007.00080 , year=

work page arXiv 2007
[31]

Journal of Machine Learning Research , volume=

Reinforcement learning in continuous time and space: A stochastic control approach , author=. Journal of Machine Learning Research , volume=

work page
[32]

Advances in neural information processing systems , volume=

Reinforcement learning for continuous stochastic control problems , author=. Advances in neural information processing systems , volume=

work page
[33]

Advances in Neural Information Processing Systems , volume=

Policy optimization for continuous reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

work page
[34]

Journal of Machine Learning Research , volume=

Policy gradient and actor-critic learning in continuous time and space: Theory and algorithms , author=. Journal of Machine Learning Research , volume=

work page
[35]

International Conference on Artificial Intelligence and Statistics , pages=

Privacy-constrained policies via mutual information regularized policy gradients , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2024 , organization=

work page 2024
[36]

Asian Conference on Machine Learning , pages=

Locally differentially private reinforcement learning for linear mixture markov decision processes , author=. Asian Conference on Machine Learning , pages=. 2023 , organization=

work page 2023
[37]

Mathematics of Operations Research , volume=

Provably near-optimal sampling-based policies for stochastic inventory control models , author=. Mathematics of Operations Research , volume=. 2007 , publisher=

work page 2007
[38]

IISE Transactions , volume=

Applying deep learning to the newsvendor problem , author=. IISE Transactions , volume=. 2020 , publisher=

work page 2020
[39]

Management Science , volume=

A practical end-to-end inventory management model with deep learning , author=. Management Science , volume=. 2023 , publisher=

work page 2023
[40]

ACM Computing Surveys , volume=

Reinforcement learning based recommender systems: A survey , author=. ACM Computing Surveys , volume=. 2022 , publisher=

work page 2022
[41]

Advances in Neural Information Processing Systems , volume=

Offline reinforcement learning with differential privacy , author=. Advances in Neural Information Processing Systems , volume=

work page
[42]

Management Science , volume=

Privacy-preserving dynamic personalized pricing with demand learning , author=. Management Science , volume=. 2022 , publisher=

work page 2022
[43]

International Conference on Machine Learning , pages=

Is pessimism provably efficient for offline rl? , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021
[44]

Theory of Cryptography Conference , pages=

Concentrated differential privacy: Simplifications, extensions, and lower bounds , author=. Theory of Cryptography Conference , pages=. 2016 , organization=

work page 2016
[45]

Advances in Neural Information Processing Systems , volume=

Privacy-preserving q-learning with functional noise in continuous spaces , author=. Advances in Neural Information Processing Systems , volume=

work page
[46]

Empirical Bernstein Bounds and Sample Variance Penalization

Empirical bernstein bounds and sample variance penalization , author=. arXiv preprint arXiv:0907.3740 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[47]

Van Erven, Tim and Harremos, Peter , journal=. R. 2014 , publisher=

work page 2014
[48]

Advances in Neural Information Processing Systems , volume=

Locally differentially private (contextual) bandits learning , author=. Advances in Neural Information Processing Systems , volume=

work page
[49]

International Conference on Machine Learning , pages=

Private reinforcement learning with pac and regret guarantees , author=. International Conference on Machine Learning , pages=. 2020 , organization=

work page 2020
[50]

International Conference on Machine Learning , pages=

Improved regret for differentially private exploration in linear mdp , author=. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022
[51]

International Conference on Artificial Intelligence and Statistics , pages=

Byzantine-robust online and offline distributed reinforcement learning , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2023 , organization=

work page 2023
[52]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Privacy-preserving policy iteration for decentralized POMDPs , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[53]

Operations Research , year=

Optimal and differentially private data acquisition: Central and local mechanisms , author=. Operations Research , year=

work page
[54]

Available at SSRN 4202576 , year=

Privacy-preserving personalized recommender systems , author=. Available at SSRN 4202576 , year=

work page
[55]

Operations Research , volume=

Differential privacy in personalized pricing with nonparametric demand models , author=. Operations Research , volume=. 2023 , publisher=

work page 2023
[56]

Advances in Neural Information Processing Systems , volume=

Bridging central and local differential privacy in data acquisition mechanisms , author=. Advances in Neural Information Processing Systems , volume=

work page
[57]

Management Science , year=

Privacy-preserving personalized revenue management , author=. Management Science , year=

work page
[58]

Operations Research , volume=

The big data newsvendor: Practical insights from machine learning , author=. Operations Research , volume=. 2019 , publisher=

work page 2019
[59]

International colloquium on automata, languages, and programming , pages=

Differential privacy , author=. International colloquium on automata, languages, and programming , pages=. 2006 , organization=

work page 2006
[60]

Foundations and Trends

The algorithmic foundations of differential privacy , author=. Foundations and Trends. 2014 , publisher=

work page 2014
[61]

2022 , publisher=

Introduction to algorithms , author=. 2022 , publisher=

work page 2022
[62]

Operations Research , volume=

The data-driven newsvendor problem: New bounds and insights , author=. Operations Research , volume=. 2015 , publisher=

work page 2015
[63]

2018 , publisher=

Reinforcement learning: An introduction , author=. 2018 , publisher=

work page 2018
[64]

Advances in Neural Information Processing Systems , volume=

(Nearly) optimal algorithms for private online learning in full-information and bandit settings , author=. Advances in Neural Information Processing Systems , volume=

work page
[65]

International Conference on Machine Learning , pages=

The distributed discrete gaussian mechanism for federated learning with secure aggregation , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021
[66]

Management Science , volume=

Feature-based dynamic pricing , author=. Management Science , volume=. 2020 , publisher=

work page 2020
[67]

Operations Research , volume=

Multiperiod airline overbooking with a single fare class , author=. Operations Research , volume=. 1998 , publisher=

work page 1998
[68]

Conference on Learning Theory , pages=

Algorithmic chaining and the role of partial feedback in online nonparametric learning , author=. Conference on Learning Theory , pages=. 2017 , organization=

work page 2017
[69]

ACM Transactions on Algorithms (TALG) , volume=

Approximate privacy: foundations and quantification , author=. ACM Transactions on Algorithms (TALG) , volume=. 2014 , publisher=

work page 2014
[70]

Management Science , volume=

Closing the gap: A learning algorithm for lost-sales inventory systems with lead times , author=. Management Science , volume=. 2020 , publisher=

work page 2020
[71]

Operations Research , volume=

Multidimensional binary search for contextual decision-making , author=. Operations Research , volume=. 2018 , publisher=

work page 2018
[72]

Advances in Neural Information Processing Systems , volume=

Gaussian Differential Privacy on Riemannian Manifolds , author=. Advances in Neural Information Processing Systems , volume=

work page
[73]

International Conference on Machine Learning , pages=

Improving the gaussian mechanism for differential privacy: Analytical calibration and optimal denoising , author=. International Conference on Machine Learning , pages=. 2018 , organization=

work page 2018
[74]

International Conference on Machine Learning , pages=

(Locally) differentially private combinatorial semi-bandits , author=. International Conference on Machine Learning , pages=. 2020 , organization=

work page 2020
[75]

Proceedings of the 2018 ACM Conference on Economics and Computation , pages=

Learning to bid without knowing your value , author=. Proceedings of the 2018 ACM Conference on Economics and Computation , pages=

work page 2018
[76]

Advances in Neural Information Processing Systems , volume=

Differentially private contextual linear bandits , author=. Advances in Neural Information Processing Systems , volume=

work page
[77]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Achieving privacy in the adversarial multi-armed bandit , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[78]

Off-Policy Policy Gradient with State Distribution Correction

Off-policy policy gradient with state distribution correction , author=. arXiv preprint arXiv:1904.08473 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1904
[79]

Concentrated Differential Privacy

Concentrated differential privacy , author=. arXiv preprint arXiv:1603.01887 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[80]

Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education

Clancey, William J. Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education. Proceedings of the Eighth International Joint Conference on Artificial Intelligence (IJCAI-83)

work page

Showing first 80 references.

[1] [1]

Available at SSRN 4828001 , year=

Don’t Follow RL Blindly: Lower Sample Complexity of Learning Optimal Inventory Control Policies with Fixed Ordering Costs , author=. Available at SSRN 4828001 , year=

work page

[2] [2]

Applied Mathematics: Body and Soul: Volume 2: Integrals and Geometry in IR n , pages=

Piecewise linear approximation , author=. Applied Mathematics: Body and Soul: Volume 2: Integrals and Geometry in IR n , pages=. 2004 , publisher=

work page 2004

[3] [3]

https://www.securityinfowatch.com/retail/article/53098895/the-target-breach-10-years-later , pages=

The Target Breach 10 Years Later , author=. https://www.securityinfowatch.com/retail/article/53098895/the-target-breach-10-years-later , pages=. 2024 , publisher=

work page arXiv 2024

[4] [4]

Management Science , volume=

Data-driven approximation schemes for joint pricing and inventory control models , author=. Management Science , volume=. 2022 , publisher=

work page 2022

[5] [5]

Management Science , year=

Learning to Order for Inventory Systems with Lost Sales and Uncertain Supplies , author=. Management Science , year=

work page

[6] [6]

Proceedings of the forty-sixth annual ACM symposium on Theory of computing , pages=

Private matchings and allocations , author=. Proceedings of the forty-sixth annual ACM symposium on Theory of computing , pages=

work page

[7] [7]

ACM Transactions on Information and System Security (TISSEC) , volume=

Private and continual release of statistics , author=. ACM Transactions on Information and System Security (TISSEC) , volume=. 2011 , publisher=

work page 2011

[8] [8]

Mathematics of Operations Research , volume=

Sampling-based approximation schemes for capacitated stochastic inventory control models , author=. Mathematics of Operations Research , volume=. 2019 , publisher=

work page 2019

[9] [9]

Advances in Neural Information Processing Systems , volume=

Near-optimal time and sample complexities for solving Markov decision processes with a generative model , author=. Advances in Neural Information Processing Systems , volume=

work page

[10] [10]

Advances in neural information processing systems , volume=

Budgeted reinforcement learning in continuous state space , author=. Advances in neural information processing systems , volume=

work page

[11] [11]

International Conference on Machine Learning , pages=

Differentially private episodic reinforcement learning with heavy-tailed rewards , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023

[12] [12]

Proceedings of the 25th international conference on Machine learning , pages=

Privacy-preserving reinforcement learning , author=. Proceedings of the 25th international conference on Machine learning , pages=

work page

[13] [13]

Advances in Neural Information Processing Systems , volume=

Instance-dependent near-optimal policy identification in linear mdps via online experiment design , author=. Advances in Neural Information Processing Systems , volume=

work page

[14] [14]

Advances in Neural Information Processing Systems , volume=

Non-asymptotic gap-dependent regret bounds for tabular mdps , author=. Advances in Neural Information Processing Systems , volume=

work page

[15] [15]

Advances in Neural Information Processing Systems , volume=

Exploration in structured reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

work page

[16] [16]

International Conference on Machine Learning , pages=

Leveraging offline data in online reinforcement learning , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023

[17] [17]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

How private is your RL policy? An inverse RL based analysis framework , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page

[18] [18]

Advances in Neural Information Processing Systems , volume=

When privacy meets partial information: A refined analysis of differentially private bandits , author=. Advances in Neural Information Processing Systems , volume=

work page

[19] [19]

2023 International Conference on Machine Learning and Cybernetics (ICMLC) , pages=

Hiding in Plain Sight: Differential Privacy Noise Exploitation for Evasion-resilient Localized Poisoning Attacks in Multiagent Reinforcement Learning , author=. 2023 International Conference on Machine Learning and Cybernetics (ICMLC) , pages=. 2023 , organization=

work page 2023

[20] [20]

International Conference on Algorithmic Learning Theory , pages=

Privacy amplification via shuffling for linear contextual bandits , author=. International Conference on Algorithmic Learning Theory , pages=. 2022 , organization=

work page 2022

[21] [21]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Differentially private regret minimization in episodic markov decision processes , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page

[22] [22]

Advances in Neural Information Processing Systems , volume=

Local differential privacy for regret minimization in reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

work page

[23] [23]

Management Science , volume=

Marrying stochastic gradient descent with bandits: Learning algorithms for inventory systems with fixed costs , author=. Management Science , volume=. 2021 , publisher=

work page 2021

[24] [24]

Deep Reinforcement Learning framework for Autonomous Driving

Deep reinforcement learning framework for autonomous driving , author=. arXiv preprint arXiv:1704.02532 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[25] [25]

Machine Learning for Healthcare Conference , pages=

Continuous state-space models for optimal sepsis treatment: a deep reinforcement learning approach , author=. Machine Learning for Healthcare Conference , pages=. 2017 , organization=

work page 2017

[26] [26]

Advances in neural information processing systems , volume=

Fitted Q-iteration in continuous action-space MDPs , author=. Advances in neural information processing systems , volume=

work page

[27] [27]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Online second price auction with semi-bandit feedback under the non-stationary setting , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page

[28] [28]

Management Science , year=

Bandits atop reinforcement learning: Tackling online inventory models with cyclic demands , author=. Management Science , year=

work page

[29] [29]

Joint European Conference on Machine Learning and Knowledge Discovery in Databases , pages=

Stochastic one-sided full-information bandit , author=. Joint European Conference on Machine Learning and Knowledge Discovery in Databases , pages=. 2019 , organization=

work page 2019

[30] [30]

arXiv preprint arXiv:2007.00080 , year=

Provably more efficient q-learning in the one-sided-feedback/full-feedback settings , author=. arXiv preprint arXiv:2007.00080 , year=

work page arXiv 2007

[31] [31]

Journal of Machine Learning Research , volume=

Reinforcement learning in continuous time and space: A stochastic control approach , author=. Journal of Machine Learning Research , volume=

work page

[32] [32]

Advances in neural information processing systems , volume=

Reinforcement learning for continuous stochastic control problems , author=. Advances in neural information processing systems , volume=

work page

[33] [33]

Advances in Neural Information Processing Systems , volume=

Policy optimization for continuous reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

work page

[34] [34]

Journal of Machine Learning Research , volume=

Policy gradient and actor-critic learning in continuous time and space: Theory and algorithms , author=. Journal of Machine Learning Research , volume=

work page

[35] [35]

International Conference on Artificial Intelligence and Statistics , pages=

Privacy-constrained policies via mutual information regularized policy gradients , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2024 , organization=

work page 2024

[36] [36]

Asian Conference on Machine Learning , pages=

Locally differentially private reinforcement learning for linear mixture markov decision processes , author=. Asian Conference on Machine Learning , pages=. 2023 , organization=

work page 2023

[37] [37]

Mathematics of Operations Research , volume=

Provably near-optimal sampling-based policies for stochastic inventory control models , author=. Mathematics of Operations Research , volume=. 2007 , publisher=

work page 2007

[38] [38]

IISE Transactions , volume=

Applying deep learning to the newsvendor problem , author=. IISE Transactions , volume=. 2020 , publisher=

work page 2020

[39] [39]

Management Science , volume=

A practical end-to-end inventory management model with deep learning , author=. Management Science , volume=. 2023 , publisher=

work page 2023

[40] [40]

ACM Computing Surveys , volume=

Reinforcement learning based recommender systems: A survey , author=. ACM Computing Surveys , volume=. 2022 , publisher=

work page 2022

[41] [41]

Advances in Neural Information Processing Systems , volume=

Offline reinforcement learning with differential privacy , author=. Advances in Neural Information Processing Systems , volume=

work page

[42] [42]

Management Science , volume=

Privacy-preserving dynamic personalized pricing with demand learning , author=. Management Science , volume=. 2022 , publisher=

work page 2022

[43] [43]

International Conference on Machine Learning , pages=

Is pessimism provably efficient for offline rl? , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021

[44] [44]

Theory of Cryptography Conference , pages=

Concentrated differential privacy: Simplifications, extensions, and lower bounds , author=. Theory of Cryptography Conference , pages=. 2016 , organization=

work page 2016

[45] [45]

Advances in Neural Information Processing Systems , volume=

Privacy-preserving q-learning with functional noise in continuous spaces , author=. Advances in Neural Information Processing Systems , volume=

work page

[46] [46]

Empirical Bernstein Bounds and Sample Variance Penalization

Empirical bernstein bounds and sample variance penalization , author=. arXiv preprint arXiv:0907.3740 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[47] [47]

Van Erven, Tim and Harremos, Peter , journal=. R. 2014 , publisher=

work page 2014

[48] [48]

Advances in Neural Information Processing Systems , volume=

Locally differentially private (contextual) bandits learning , author=. Advances in Neural Information Processing Systems , volume=

work page

[49] [49]

International Conference on Machine Learning , pages=

Private reinforcement learning with pac and regret guarantees , author=. International Conference on Machine Learning , pages=. 2020 , organization=

work page 2020

[50] [50]

International Conference on Machine Learning , pages=

Improved regret for differentially private exploration in linear mdp , author=. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022

[51] [51]

International Conference on Artificial Intelligence and Statistics , pages=

Byzantine-robust online and offline distributed reinforcement learning , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2023 , organization=

work page 2023

[52] [52]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Privacy-preserving policy iteration for decentralized POMDPs , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page

[53] [53]

Operations Research , year=

Optimal and differentially private data acquisition: Central and local mechanisms , author=. Operations Research , year=

work page

[54] [54]

Available at SSRN 4202576 , year=

Privacy-preserving personalized recommender systems , author=. Available at SSRN 4202576 , year=

work page

[55] [55]

Operations Research , volume=

Differential privacy in personalized pricing with nonparametric demand models , author=. Operations Research , volume=. 2023 , publisher=

work page 2023

[56] [56]

Advances in Neural Information Processing Systems , volume=

Bridging central and local differential privacy in data acquisition mechanisms , author=. Advances in Neural Information Processing Systems , volume=

work page

[57] [57]

Management Science , year=

Privacy-preserving personalized revenue management , author=. Management Science , year=

work page

[58] [58]

Operations Research , volume=

The big data newsvendor: Practical insights from machine learning , author=. Operations Research , volume=. 2019 , publisher=

work page 2019

[59] [59]

International colloquium on automata, languages, and programming , pages=

Differential privacy , author=. International colloquium on automata, languages, and programming , pages=. 2006 , organization=

work page 2006

[60] [60]

Foundations and Trends

The algorithmic foundations of differential privacy , author=. Foundations and Trends. 2014 , publisher=

work page 2014

[61] [61]

2022 , publisher=

Introduction to algorithms , author=. 2022 , publisher=

work page 2022

[62] [62]

Operations Research , volume=

The data-driven newsvendor problem: New bounds and insights , author=. Operations Research , volume=. 2015 , publisher=

work page 2015

[63] [63]

2018 , publisher=

Reinforcement learning: An introduction , author=. 2018 , publisher=

work page 2018

[64] [64]

Advances in Neural Information Processing Systems , volume=

(Nearly) optimal algorithms for private online learning in full-information and bandit settings , author=. Advances in Neural Information Processing Systems , volume=

work page

[65] [65]

International Conference on Machine Learning , pages=

The distributed discrete gaussian mechanism for federated learning with secure aggregation , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021

[66] [66]

Management Science , volume=

Feature-based dynamic pricing , author=. Management Science , volume=. 2020 , publisher=

work page 2020

[67] [67]

Operations Research , volume=

Multiperiod airline overbooking with a single fare class , author=. Operations Research , volume=. 1998 , publisher=

work page 1998

[68] [68]

Conference on Learning Theory , pages=

Algorithmic chaining and the role of partial feedback in online nonparametric learning , author=. Conference on Learning Theory , pages=. 2017 , organization=

work page 2017

[69] [69]

ACM Transactions on Algorithms (TALG) , volume=

Approximate privacy: foundations and quantification , author=. ACM Transactions on Algorithms (TALG) , volume=. 2014 , publisher=

work page 2014

[70] [70]

Management Science , volume=

Closing the gap: A learning algorithm for lost-sales inventory systems with lead times , author=. Management Science , volume=. 2020 , publisher=

work page 2020

[71] [71]

Operations Research , volume=

Multidimensional binary search for contextual decision-making , author=. Operations Research , volume=. 2018 , publisher=

work page 2018

[72] [72]

Advances in Neural Information Processing Systems , volume=

Gaussian Differential Privacy on Riemannian Manifolds , author=. Advances in Neural Information Processing Systems , volume=

work page

[73] [73]

International Conference on Machine Learning , pages=

Improving the gaussian mechanism for differential privacy: Analytical calibration and optimal denoising , author=. International Conference on Machine Learning , pages=. 2018 , organization=

work page 2018

[74] [74]

International Conference on Machine Learning , pages=

(Locally) differentially private combinatorial semi-bandits , author=. International Conference on Machine Learning , pages=. 2020 , organization=

work page 2020

[75] [75]

Proceedings of the 2018 ACM Conference on Economics and Computation , pages=

Learning to bid without knowing your value , author=. Proceedings of the 2018 ACM Conference on Economics and Computation , pages=

work page 2018

[76] [76]

Advances in Neural Information Processing Systems , volume=

Differentially private contextual linear bandits , author=. Advances in Neural Information Processing Systems , volume=

work page

[77] [77]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Achieving privacy in the adversarial multi-armed bandit , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page

[78] [78]

Off-Policy Policy Gradient with State Distribution Correction

Off-policy policy gradient with state distribution correction , author=. arXiv preprint arXiv:1904.08473 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1904

[79] [79]

Concentrated Differential Privacy

Concentrated differential privacy , author=. arXiv preprint arXiv:1603.01887 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[80] [80]

Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education

Clancey, William J. Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education. Proceedings of the Eighth International Joint Conference on Artificial Intelligence (IJCAI-83)

work page