Multi-agent Adaptive Mechanism Design

David Simchi-Levi; Qiushi Han; Renfei Tan; Zishuo Zhao

arxiv: 2512.21794 · v4 · submitted 2025-12-25 · 💻 cs.GT · cs.AI· cs.LG· cs.MA· econ.TH

Multi-agent Adaptive Mechanism Design

Qiushi Han , David Simchi-Levi , Renfei Tan , Zishuo Zhao This is my paper

Pith reviewed 2026-05-16 19:30 UTC · model grok-4.3

classification 💻 cs.GT cs.AIcs.LGcs.MAecon.TH

keywords adaptive mechanism designdistributionally robust optimizationonline learningtruthful reportingmulti-agent systemsregret boundssequential gamesincentive compatibility

0 comments

The pith

DRAM learns unknown agent beliefs over time by shrinking ambiguity sets in a robust linear program, guaranteeing truthful reports with high probability and Õ(√T) regret.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a sequential mechanism that starts with no knowledge of agents' private beliefs and must elicit truthful reports while minimizing total payments. It proposes the Distributionally Robust Adaptive Mechanism, which estimates beliefs from observed reports and repeatedly solves a distributionally robust linear program whose ambiguity sets shrink as data arrives. The resulting procedure keeps truthfulness intact with high probability while driving cumulative excess payments down at rate Õ(√T), and the authors prove that no adaptive mechanism can improve this rate asymptotically. A reader would care because the result supplies the first general adaptive framework that learns incentive constraints on the fly rather than assuming them known in advance.

Core claim

We introduce Distributionally Robust Adaptive Mechanism (DRAM) that combines mechanism design and online learning: at each round the mechanism forms an ambiguity set around the current belief estimate, solves a distributionally robust linear program to determine payments that remain incentive-compatible for every belief inside the set, then shrinks the set using new reports. This process yields truthful reporting with high probability and Õ(√T) cumulative regret against the best fixed mechanism, together with a matching lower bound showing no feasible adaptive mechanism can asymptotically do better.

What carries the argument

The Distributionally Robust Adaptive Mechanism (DRAM), which maintains shrinking ambiguity sets around belief estimates and solves distributionally robust linear programs at each step to enforce truthfulness while reducing payments.

If this is right

The same shrinking-ambiguity construction extends immediately to plug-in estimators that incorporate structured priors.
The framework continues to work when feedback arrives with arbitrary delay.
No non-adaptive mechanism or mechanism that ignores distributional robustness can improve the Õ(√T) rate when beliefs must be learned.
Truthfulness holds with high probability under the maintained rationality assumption even as the mechanism adapts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may be useful for designing repeated procurement or ad auctions where bidders' value distributions are initially unknown.
If agents can learn the mechanism's update rule, the high-probability truthfulness guarantee could erode and new robustness conditions would be needed.
The shrinking ambiguity sets resemble online robust optimization techniques, suggesting possible transfer of regret bounds to other dynamic incentive problems.

Load-bearing premise

Agents remain rational and best-respond to the current mechanism at every round without coordinating or learning the adaptation rule itself.

What would settle it

A sequence of agent reports in which the empirical frequency of truthful reporting falls below 1-δ for some fixed δ>0, or total payments that exceed the optimal fixed-mechanism cost by ω(√T) with probability bounded away from zero.

Figures

Figures reproduced from arXiv: 2512.21794 by David Simchi-Levi, Qiushi Han, Renfei Tan, Zishuo Zhao.

**Figure 2.** Figure 2: Minimum reward gap between truthful reporting and other pure strategies across 1000 runs of a [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗

**Figure 3.** Figure 3: Average cumulative regret over time across 1000 runs of a sequential labeling game. The first [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗

read the original abstract

We study a sequential mechanism design problem in which a principal seeks to elicit truthful reports from multiple rational agents while starting with no prior knowledge of agents' beliefs. We introduce Distributionally Robust Adaptive Mechanism (DRAM), a general framework combining insights from both mechanism design and online learning to jointly address truthfulness and cost-optimality. Throughout the sequential game, the mechanism estimates agents' beliefs and iteratively updates a distributionally robust linear program with shrinking ambiguity sets to reduce payments while preserving truthfulness. Our mechanism guarantees truthful reporting with high probability while achieving $\tilde{O}(\sqrt{T})$ cumulative regret, and we establish a matching lower bound showing that no feasible adaptive mechanism can asymptotically do better. The framework generalizes to plug-in estimators, supporting structured priors and delayed feedback. To our knowledge, this is the first adaptive mechanism under general settings that maintains truthfulness and achieves optimal regret when incentive constraints are unknown and must be learned.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DRAM gives the first general adaptive truthful mechanism with matching Õ(√T) regret via shrinking ambiguity sets in DRO, but the myopic best-response assumption leaves the incentive guarantee open to forward-looking agents.

read the letter

This paper's key contribution is DRAM, a framework for multi-agent sequential mechanism design that learns beliefs adaptively using distributionally robust optimization with shrinking ambiguity sets. It claims high-probability truthfulness and optimal Õ(√T) regret with a matching lower bound, which would be the first such general adaptive truthful mechanism for unknown beliefs. They do a good job showing how to update the robust LP iteratively while controlling the estimation errors through the shrinking sets. The use of online convex optimization for the regret analysis is straightforward and avoids some common pitfalls in adaptive mechanisms. Generalization to plug-in estimators and delayed feedback is a nice touch that makes it more practical. The soft spot is in the incentive part. The analysis assumes agents best-respond to the mechanism in each round treating future changes as fixed. But because reports affect the ambiguity set updates, agents who anticipate the adaptation could deviate to influence future mechanisms. No equilibrium analysis or bound is given for that case, so the truthfulness might not be robust to sophisticated agents. For readers in algorithmic game theory working on online MD, this is worth looking at. The math on the learning side seems clean, and the lower bound is independent. It should go to peer review to check the details on how the robustness closes the incentive gaps.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the Distributionally Robust Adaptive Mechanism (DRAM) for sequential multi-agent mechanism design with unknown agent beliefs. DRAM iteratively estimates beliefs from reports and updates a distributionally robust linear program over shrinking ambiguity sets to enforce truthfulness while minimizing the principal's payments. The central claims are that the mechanism guarantees truthful reporting with high probability, achieves Õ(√T) cumulative regret, and that this rate is optimal via a matching information-theoretic lower bound. The framework is presented as generalizing to plug-in estimators and delayed feedback.

Significance. If the high-probability incentive-compatibility argument closes all gaps arising from estimation error and the regret analysis is tight, the result would be a meaningful advance: the first general adaptive mechanism that maintains truthfulness while attaining optimal regret when incentive constraints must be learned on the fly. The matching lower bound and the explicit handling of distributionally robust updates are strengths that would distinguish the work from prior myopic or non-adaptive approaches.

major comments (2)

[§4 (proof of high-probability IC)] The high-probability truthfulness guarantee (abstract and §4) is derived under the assumption that each agent myopically best-responds to the mechanism in the current round. Because DRAM updates the ambiguity set and the robust LP from the entire history of reports, a forward-looking agent can choose a misreport in round t that shifts the estimated distribution and thereby alters payments in rounds t+1 onward. No equilibrium analysis or regret bound is supplied for such strategic foresight or for coordinated agents.
[§5 (regret analysis)] The regret bound is obtained by applying online convex optimization to a sequence of shrinking feasible sets (abstract and §5). It is not shown how the rate at which the ambiguity sets shrink interacts with the additive estimation error that appears in the robust LP; without an explicit propagation bound, it is unclear whether the Õ(√T) claim remains valid once the incentive constraints are enforced with high probability.

minor comments (2)

Notation for the ambiguity-set radius and the robust objective is introduced without a consolidated table; a single reference table would improve readability.
[§6] The lower-bound construction (Theorem 6.1) is information-theoretic and independent of the upper-bound mechanism, but the precise class of adaptive mechanisms to which it applies is stated only informally.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major point below, clarifying the modeling assumptions and strengthening the presentation of the regret analysis where appropriate.

read point-by-point responses

Referee: [§4 (proof of high-probability IC)] The high-probability truthfulness guarantee (abstract and §4) is derived under the assumption that each agent myopically best-responds to the mechanism in the current round. Because DRAM updates the ambiguity set and the robust LP from the entire history of reports, a forward-looking agent can choose a misreport in round t that shifts the estimated distribution and thereby alters payments in rounds t+1 onward. No equilibrium analysis or regret bound is supplied for such strategic foresight or for coordinated agents.

Authors: We agree that the high-probability IC guarantee in §4 is derived under the assumption of myopic best responses. The manuscript does not analyze the case of forward-looking agents who may misreport to influence future ambiguity sets, nor does it treat coordinated agents. This is a genuine limitation of the current analysis. In the revision we will explicitly state the myopic assumption in the model section (§2) and add a remark in §4 discussing the difficulties of extending the result to non-myopic or coordinated settings. revision: yes
Referee: [§5 (regret analysis)] The regret bound is obtained by applying online convex optimization to a sequence of shrinking feasible sets (abstract and §5). It is not shown how the rate at which the ambiguity sets shrink interacts with the additive estimation error that appears in the robust LP; without an explicit propagation bound, it is unclear whether the Õ(√T) claim remains valid once the incentive constraints are enforced with high probability.

Authors: The analysis in §5 already accounts for the interaction: the ambiguity sets are shrunk at a rate proportional to the estimation error (O(√(log T / t)) with high probability from the concentration bounds), ensuring that the additive error term remains dominated by the reduction in set diameter. This permits the direct application of online convex optimization over the time-varying feasible sets while preserving high-probability feasibility of the robust LP, yielding the stated Õ(√T) regret. To improve clarity we will insert an explicit error-propagation lemma (Lemma 5.3) in the appendix that bounds the effect of the estimation error on the sequence of robust LPs. revision: yes

standing simulated objections not resolved

Full subgame-perfect equilibrium analysis and regret bounds for forward-looking or coordinated agents

Circularity Check

0 steps flagged

No significant circularity; derivation relies on standard online optimization and information-theoretic lower bound

full rationale

The claimed Õ(√T) regret upper bound is obtained by applying online convex optimization to a sequence of distributionally robust LPs whose ambiguity sets shrink with empirical estimates of beliefs; this is a standard reduction that does not define the target regret quantity in terms of itself. The matching lower bound is information-theoretic and independent of the upper-bound construction. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the derivation chain. The myopic best-response assumption is an explicit modeling choice rather than a hidden definitional loop.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on standard rational-agent best-response, the existence of a distributionally robust LP that preserves dominant-strategy truthfulness, and the ability of online learning to shrink ambiguity sets at a controlled rate. No new entities are postulated.

axioms (2)

domain assumption Agents are rational and best-respond to the posted mechanism in every round.
Invoked throughout the sequential game description to guarantee that truthful reporting is dominant strategy.
domain assumption The ambiguity sets shrink at a rate compatible with online convex optimization regret bounds.
Required for the Õ(√T) cumulative regret claim.

pith-pipeline@v0.9.0 · 5464 in / 1280 out tokens · 37546 ms · 2026-05-16T19:30:39.365688+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce Distributionally Robust Adaptive Mechanism (DRAM)... iteratively updates a distributionally robust linear program with shrinking ambiguity sets... Õ(√T) cumulative regret
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

min E[R_i(X_i,X_j)] s.t. truthfulness, IR, no-free-lunch constraints; margin-δ robust variant

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

[1]

Dirk Bergemann and Stephen Morris

Second-order stochastic optimization for machine learning in linear time.Journal of Machine Learning Research18, 116 (2017), 1–40. Dirk Bergemann and Stephen Morris

work page 2017
[2]

Omar Besbes, Yonatan Gur, and Assaf Zeevi

Robust mechanism design.Econometrica(2005), 1771–1813. Omar Besbes, Yonatan Gur, and Assaf Zeevi

work page 2005
[3]

David Blackwell

Optimization in online content recommendation services: Beyond click-through rates.Manufacturing & Service Operations Management18, 1 (2016), 15–33. David Blackwell

work page 2016
[4]

Avrim Blum, Vijay Kumar, Atri Rudra, and Felix Wu

Equivalent comparisons of experiments.The Annals of Mathematical Statistics(1953), 265–272. Avrim Blum, Vijay Kumar, Atri Rudra, and Felix Wu

work page 1953
[5]

Rui Castro, Fredrik Hellström, and Tim van Erven

Online learning in online auctions.Theoretical Computer Science 324, 2-3 (2004), 137–146. Rui Castro, Fredrik Hellström, and Tim van Erven

work page 2004
[6]

Advances in Neural Information Processing Systems36 (2023), 134–154

Adaptive selective sampling for online prediction with experts. Advances in Neural Information Processing Systems36 (2023), 134–154. Nicolo Cesa-Bianchi, Alex Conconi, and Claudio Gentile

work page 2023
[7]

IEEE Transactions on Information Theory50, 9 (2004), 2050–2057

On the generalization ability of on-line learning algorithms. IEEE Transactions on Information Theory50, 9 (2004), 2050–2057. Nicolo Cesa-Bianchi, Claudio Gentile, and Yishay Mansour

work page 2004
[8]

Nicolo Cesa-Bianchi and Gábor Lugosi

Regret minimization for reserve prices in second-price auctions.IEEE Transactions on Information Theory61, 1 (2014), 549–564. Nicolo Cesa-Bianchi and Gábor Lugosi. 2006.Prediction, learning, and games. Cambridge university press. Nicolo Cesa-Bianchi, Gábor Lugosi, and Gilles Stoltz

work page 2014
[9]

Nicolo Cesa-Bianchi, Yishay Mansour, and Gilles Stoltz

Minimizing regret with label efficient prediction.IEEE Transactions on Information Theory51, 6 (2005), 2152–2162. Nicolo Cesa-Bianchi, Yishay Mansour, and Gilles Stoltz

work page 2005
[10]

Olivier Chapelle and Lihong Li

Improved second-order bounds for prediction with expert advice.Machine Learning66, 2 (2007), 321–352. Olivier Chapelle and Lihong Li

work page 2007
[11]

Yiling Chen, Yiheng Shen, and Shuran Zheng

An empirical evaluation of thompson sampling.Advances in neural information processing systems24 (2011). Yiling Chen, Yiheng Shen, and Shuran Zheng

work page 2011
[12]

Hana Choi, Carl F Mela, Santiago R Balseiro, and Adam Leary

Truthful data acquisition via peer prediction.Advances in Neural Information Processing Systems33 (2020), 18194–18204. Hana Choi, Carl F Mela, Santiago R Balseiro, and Adam Leary

work page 2020
[13]

Thomas M Cover

Online display advertising markets: A literature review and future directions.Information systems research31, 2 (2020), 556–575. Thomas M Cover. 1999.Elements of information theory. John Wiley & Sons. Jacques Crémer and Richard P McLean

work page 2020
[14]

Econometrica: Journal of the Econometric Society(1988), 1247–1257

Full extraction of the surplus in Bayesian and dominant strategy auctions. Econometrica: Journal of the Econometric Society(1988), 1247–1257. Anirban Dasgupta and Arpita Ghosh

work page 1988
[15]

Mohammad Taghi Hajiaghayi, Robert Kleinberg, and David C Parkes

Batched multi-armed bandits problem.Advances in Neural Information Processing Systems32 (2019). Mohammad Taghi Hajiaghayi, Robert Kleinberg, and David C Parkes

work page 2019
[16]

David Helmbold and Sandra Panizza

A simple adaptive procedure leading to correlated equilibrium.Econometrica68, 5 (2000), 1127–1150. David Helmbold and Sandra Panizza

work page 2000
[17]

Management Science66, 1 (2020), 159–189

Distributionally robust mechanism design. Management Science66, 1 (2020), 159–189. Yuqing Kong

work page 2020
[18]

ACM71, 2 (2024), 1–49

Dominantly truthful peer prediction mechanisms with a finite number of tasks.J. ACM71, 2 (2024), 1–49. Yuqing Kong and Grant Schoenebeck

work page 2024
[19]

Nicolas S Lambert

An information theoretic framework for designing information elicitation mechanisms that reward truth-telling.ACM Transactions on Economics and Computation (TEAC)7, 1 (2019), 1–33. Nicolas S Lambert

work page 2019
[20]

Tor Lattimore and Csaba Szepesvári

Elicitation and evaluation of statistical forecasts.Preprint(2011). Tor Lattimore and Csaba Szepesvári. 2020.Bandit algorithms. Cambridge University Press. Yingkai Li, Jason D Hartline, Liren Shan, and Yifan Wu

work page 2011
[21]

Nolan Miller, Paul Resnick, and Richard Zeckhauser

Auction market design: Recent innovations.Annual Review of Economics11, 1 (2019), 383–405. Nolan Miller, Paul Resnick, and Richard Zeckhauser

work page 2019
[22]

Management Science51, 9 (2005), 1359–1373

Eliciting informative feedback: The peer-prediction method. Management Science51, 9 (2005), 1359–1373. Siddharth Mitra and Aditya Gopalan

work page 2005
[23]

Roger B Myerson

Incentive compatibility and the bargaining problem.Econometrica(1979), 61–73. Roger B Myerson

work page 1979
[24]

Christos Papadimitriou, George Pierrakos, Alexandros Psomas, and Aviad Rubinstein

Optimal auction design.Mathematics of operations research6, 1 (1981), 58–73. Christos Papadimitriou, George Pierrakos, Alexandros Psomas, and Aviad Rubinstein

work page 1981
[25]

Vianney Perchet, Philippe Rigollet, Sylvain Chassang, and Eric Snowberg

On the complexity of dynamic mechanism design.Games and Economic Behavior134 (2022), 399–427. Vianney Perchet, Philippe Rigollet, Sylvain Chassang, and Eric Snowberg

work page 2022
[26]

Goran Radanovic and Boi Faltings

Batched Bandit Problems.The Annals of Statistics44, 2 (2016), 660–681. Goran Radanovic and Boi Faltings

work page 2016
[27]

ACM53, 7 (2010), 78–86

Algorithmic game theory.Commun. ACM53, 7 (2010), 78–86. Bharadwaj Satchidanandan and Munther A Dahleh

work page 2010
[28]

IEEE Transactions on Control of Network Systems11, 1 (2023), 295–306

Incentive compatibility in two-stage repeated stochastic games. IEEE Transactions on Control of Network Systems11, 1 (2023), 295–306. Victor Shnayder, Arpit Agarwal, Rafael Frongillo, and David C Parkes

work page 2023
[29]

InProceedings of the 2016 ACM Conference on Economics and Computation

Informed truthfulness in multi-task peer prediction. InProceedings of the 2016 ACM Conference on Economics and Computation. 179–196. David Simchi-Levi and Yunzong Xu

work page 2016
[30]

John Von Neumann and Oskar Morgenstern

Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability.Mathematics of Operations Research47, 3 (2022), 1904–1931. John Von Neumann and Oskar Morgenstern

work page 2022
[31]

Shengling Wang, Xidi Qu, Qin Hu, Xia Wang, and Xiuzhen Cheng

Display advertising with real-time bidding (RTB) and behavioural targeting.Foundations and Trends®in Information Retrieval11, 4-5 (2017), 297–435. Shengling Wang, Xidi Qu, Qin Hu, Xia Wang, and Xiuzhen Cheng

work page 2017
[32]

Tsachy Weissman, Erik Ordentlich, Gadiel Seroussi, Sergio Verdu, and Marcelo J Weinberger

An uncertainty-and collusion-proof voting consensus mechanism in blockchain.IEEE/ACM Transactions on Networking31, 5 (2023), 2376–2388. Tsachy Weissman, Erik Ordentlich, Gadiel Seroussi, Sergio Verdu, and Marcelo J Weinberger

work page 2023
[33]

Rep(2003),

Inequalities for the l1 deviation of the empirical distribution.Hewlett-Packard Labs, Tech. Rep(2003),

work page 2003
[34]

1985.Game-Theoretic Analysis of Trading Processes.Technical Report

Robert Wilson. 1985.Game-Theoretic Analysis of Trading Processes.Technical Report. Yichi Zhang, Shengwei Xu, David Pennock, and Grant Schoenebeck

work page 1985
[35]

Zishuo Zhao, Xi Chen, and Yuan Zhou

Stochastically Dominant Peer Prediction.arXiv preprint arXiv:2506.02259(2025). Zishuo Zhao, Xi Chen, and Yuan Zhou

work page arXiv 2025
[36]

arXiv preprint arXiv:2406.01794(2024)

It Takes Two: A Peer-Prediction Solution for Blockchain Verifier’s Dilemma. arXiv preprint arXiv:2406.01794(2024). Shuran Zheng, Xuan Qi, Rui Ray Chen, Yongchan Kwon, and James Zou

work page arXiv 2024
[37]

Qiushi Han, David Simchi-Levi, Renfei Tan, and Zishuo Zhao21 Banghua Zhu, Stephen Bates, Zhuoran Yang, Yixin Wang, Jiantao Jiao, and Michael I Jordan

Proper Dataset Valuation by Pointwise Mutual Information.arXiv preprint arXiv:2405.18253(2024). Qiushi Han, David Simchi-Levi, Renfei Tan, and Zishuo Zhao21 Banghua Zhu, Stephen Bates, Zhuoran Yang, Yixin Wang, Jiantao Jiao, and Michael I Jordan

work page arXiv 2024
[38]

Qiushi Han, David Simchi-Levi, Renfei Tan, and Zishuo Zhao22 Appendix A Deferred Proofs A.1 Proof of Proposition 2.1 Suppose the past rounds’ report history isz 1, z2,· · ·, z𝑡−1

The sample complexity of online contract design.arXiv preprint arXiv:2211.05732(2022). Qiushi Han, David Simchi-Levi, Renfei Tan, and Zishuo Zhao22 Appendix A Deferred Proofs A.1 Proof of Proposition 2.1 Suppose the past rounds’ report history isz 1, z2,· · ·, z𝑡−1 . All probability laws and strategies discussed below are conditional on such history. In r...

work page arXiv 2022
[39]

We apply the same strategy as proof of Theorem 3.2, that is, to consider the intermediate solutionM = BR⊺

separately. We apply the same strategy as proof of Theorem 3.2, that is, to consider the intermediate solutionM = BR⊺. The mechanism can be easily acquired byR =( B−1M)⊺. With this reformulation (see Section A.2 for details), the problem can be constructed as min M 𝜅 s.t.∥B −1M∥ max ≤𝜅, M𝑥𝑥 ≥𝑐+𝛿,∀𝑥∈ Y M𝑥 𝑦 ≤𝑐−𝛿,∀𝑥≠𝑦∈ Y M⊺d′ ≤ −𝛿·1, whered ′ 𝑥 =P(𝑋 𝑖 =𝑥). ...

work page 2003
[40]

Suppose we have a policy 𝜋 that maps the i.i.d

The first inequality holds sincelog(1+𝑥) ≤𝑥, and the second holds since𝛿∈ (0,1/4).□ Now we consider the mechanism design problem. Suppose we have a policy 𝜋 that maps the i.i.d. collected data𝐻 𝑡 to a reward mechanism𝑅 𝑡+1. Consider the good event𝐺 𝑘,𝑘∈ {0,1}: 𝐺𝑘 ={𝑅 𝑡+1 satisfies IC constraints and pays less than1+𝛿in expectation} Notice that 𝐺0 and 𝐺1 a...

work page 2013

[1] [1]

Dirk Bergemann and Stephen Morris

Second-order stochastic optimization for machine learning in linear time.Journal of Machine Learning Research18, 116 (2017), 1–40. Dirk Bergemann and Stephen Morris

work page 2017

[2] [2]

Omar Besbes, Yonatan Gur, and Assaf Zeevi

Robust mechanism design.Econometrica(2005), 1771–1813. Omar Besbes, Yonatan Gur, and Assaf Zeevi

work page 2005

[3] [3]

David Blackwell

Optimization in online content recommendation services: Beyond click-through rates.Manufacturing & Service Operations Management18, 1 (2016), 15–33. David Blackwell

work page 2016

[4] [4]

Avrim Blum, Vijay Kumar, Atri Rudra, and Felix Wu

Equivalent comparisons of experiments.The Annals of Mathematical Statistics(1953), 265–272. Avrim Blum, Vijay Kumar, Atri Rudra, and Felix Wu

work page 1953

[5] [5]

Rui Castro, Fredrik Hellström, and Tim van Erven

Online learning in online auctions.Theoretical Computer Science 324, 2-3 (2004), 137–146. Rui Castro, Fredrik Hellström, and Tim van Erven

work page 2004

[6] [6]

Advances in Neural Information Processing Systems36 (2023), 134–154

Adaptive selective sampling for online prediction with experts. Advances in Neural Information Processing Systems36 (2023), 134–154. Nicolo Cesa-Bianchi, Alex Conconi, and Claudio Gentile

work page 2023

[7] [7]

IEEE Transactions on Information Theory50, 9 (2004), 2050–2057

On the generalization ability of on-line learning algorithms. IEEE Transactions on Information Theory50, 9 (2004), 2050–2057. Nicolo Cesa-Bianchi, Claudio Gentile, and Yishay Mansour

work page 2004

[8] [8]

Nicolo Cesa-Bianchi and Gábor Lugosi

Regret minimization for reserve prices in second-price auctions.IEEE Transactions on Information Theory61, 1 (2014), 549–564. Nicolo Cesa-Bianchi and Gábor Lugosi. 2006.Prediction, learning, and games. Cambridge university press. Nicolo Cesa-Bianchi, Gábor Lugosi, and Gilles Stoltz

work page 2014

[9] [9]

Nicolo Cesa-Bianchi, Yishay Mansour, and Gilles Stoltz

Minimizing regret with label efficient prediction.IEEE Transactions on Information Theory51, 6 (2005), 2152–2162. Nicolo Cesa-Bianchi, Yishay Mansour, and Gilles Stoltz

work page 2005

[10] [10]

Olivier Chapelle and Lihong Li

Improved second-order bounds for prediction with expert advice.Machine Learning66, 2 (2007), 321–352. Olivier Chapelle and Lihong Li

work page 2007

[11] [11]

Yiling Chen, Yiheng Shen, and Shuran Zheng

An empirical evaluation of thompson sampling.Advances in neural information processing systems24 (2011). Yiling Chen, Yiheng Shen, and Shuran Zheng

work page 2011

[12] [12]

Hana Choi, Carl F Mela, Santiago R Balseiro, and Adam Leary

Truthful data acquisition via peer prediction.Advances in Neural Information Processing Systems33 (2020), 18194–18204. Hana Choi, Carl F Mela, Santiago R Balseiro, and Adam Leary

work page 2020

[13] [13]

Thomas M Cover

Online display advertising markets: A literature review and future directions.Information systems research31, 2 (2020), 556–575. Thomas M Cover. 1999.Elements of information theory. John Wiley & Sons. Jacques Crémer and Richard P McLean

work page 2020

[14] [14]

Econometrica: Journal of the Econometric Society(1988), 1247–1257

Full extraction of the surplus in Bayesian and dominant strategy auctions. Econometrica: Journal of the Econometric Society(1988), 1247–1257. Anirban Dasgupta and Arpita Ghosh

work page 1988

[15] [15]

Mohammad Taghi Hajiaghayi, Robert Kleinberg, and David C Parkes

Batched multi-armed bandits problem.Advances in Neural Information Processing Systems32 (2019). Mohammad Taghi Hajiaghayi, Robert Kleinberg, and David C Parkes

work page 2019

[16] [16]

David Helmbold and Sandra Panizza

A simple adaptive procedure leading to correlated equilibrium.Econometrica68, 5 (2000), 1127–1150. David Helmbold and Sandra Panizza

work page 2000

[17] [17]

Management Science66, 1 (2020), 159–189

Distributionally robust mechanism design. Management Science66, 1 (2020), 159–189. Yuqing Kong

work page 2020

[18] [18]

ACM71, 2 (2024), 1–49

Dominantly truthful peer prediction mechanisms with a finite number of tasks.J. ACM71, 2 (2024), 1–49. Yuqing Kong and Grant Schoenebeck

work page 2024

[19] [19]

Nicolas S Lambert

An information theoretic framework for designing information elicitation mechanisms that reward truth-telling.ACM Transactions on Economics and Computation (TEAC)7, 1 (2019), 1–33. Nicolas S Lambert

work page 2019

[20] [20]

Tor Lattimore and Csaba Szepesvári

Elicitation and evaluation of statistical forecasts.Preprint(2011). Tor Lattimore and Csaba Szepesvári. 2020.Bandit algorithms. Cambridge University Press. Yingkai Li, Jason D Hartline, Liren Shan, and Yifan Wu

work page 2011

[21] [21]

Nolan Miller, Paul Resnick, and Richard Zeckhauser

Auction market design: Recent innovations.Annual Review of Economics11, 1 (2019), 383–405. Nolan Miller, Paul Resnick, and Richard Zeckhauser

work page 2019

[22] [22]

Management Science51, 9 (2005), 1359–1373

Eliciting informative feedback: The peer-prediction method. Management Science51, 9 (2005), 1359–1373. Siddharth Mitra and Aditya Gopalan

work page 2005

[23] [23]

Roger B Myerson

Incentive compatibility and the bargaining problem.Econometrica(1979), 61–73. Roger B Myerson

work page 1979

[24] [24]

Christos Papadimitriou, George Pierrakos, Alexandros Psomas, and Aviad Rubinstein

Optimal auction design.Mathematics of operations research6, 1 (1981), 58–73. Christos Papadimitriou, George Pierrakos, Alexandros Psomas, and Aviad Rubinstein

work page 1981

[25] [25]

Vianney Perchet, Philippe Rigollet, Sylvain Chassang, and Eric Snowberg

On the complexity of dynamic mechanism design.Games and Economic Behavior134 (2022), 399–427. Vianney Perchet, Philippe Rigollet, Sylvain Chassang, and Eric Snowberg

work page 2022

[26] [26]

Goran Radanovic and Boi Faltings

Batched Bandit Problems.The Annals of Statistics44, 2 (2016), 660–681. Goran Radanovic and Boi Faltings

work page 2016

[27] [27]

ACM53, 7 (2010), 78–86

Algorithmic game theory.Commun. ACM53, 7 (2010), 78–86. Bharadwaj Satchidanandan and Munther A Dahleh

work page 2010

[28] [28]

IEEE Transactions on Control of Network Systems11, 1 (2023), 295–306

Incentive compatibility in two-stage repeated stochastic games. IEEE Transactions on Control of Network Systems11, 1 (2023), 295–306. Victor Shnayder, Arpit Agarwal, Rafael Frongillo, and David C Parkes

work page 2023

[29] [29]

InProceedings of the 2016 ACM Conference on Economics and Computation

Informed truthfulness in multi-task peer prediction. InProceedings of the 2016 ACM Conference on Economics and Computation. 179–196. David Simchi-Levi and Yunzong Xu

work page 2016

[30] [30]

John Von Neumann and Oskar Morgenstern

Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability.Mathematics of Operations Research47, 3 (2022), 1904–1931. John Von Neumann and Oskar Morgenstern

work page 2022

[31] [31]

Shengling Wang, Xidi Qu, Qin Hu, Xia Wang, and Xiuzhen Cheng

Display advertising with real-time bidding (RTB) and behavioural targeting.Foundations and Trends®in Information Retrieval11, 4-5 (2017), 297–435. Shengling Wang, Xidi Qu, Qin Hu, Xia Wang, and Xiuzhen Cheng

work page 2017

[32] [32]

Tsachy Weissman, Erik Ordentlich, Gadiel Seroussi, Sergio Verdu, and Marcelo J Weinberger

An uncertainty-and collusion-proof voting consensus mechanism in blockchain.IEEE/ACM Transactions on Networking31, 5 (2023), 2376–2388. Tsachy Weissman, Erik Ordentlich, Gadiel Seroussi, Sergio Verdu, and Marcelo J Weinberger

work page 2023

[33] [33]

Rep(2003),

Inequalities for the l1 deviation of the empirical distribution.Hewlett-Packard Labs, Tech. Rep(2003),

work page 2003

[34] [34]

1985.Game-Theoretic Analysis of Trading Processes.Technical Report

Robert Wilson. 1985.Game-Theoretic Analysis of Trading Processes.Technical Report. Yichi Zhang, Shengwei Xu, David Pennock, and Grant Schoenebeck

work page 1985

[35] [35]

Zishuo Zhao, Xi Chen, and Yuan Zhou

Stochastically Dominant Peer Prediction.arXiv preprint arXiv:2506.02259(2025). Zishuo Zhao, Xi Chen, and Yuan Zhou

work page arXiv 2025

[36] [36]

arXiv preprint arXiv:2406.01794(2024)

It Takes Two: A Peer-Prediction Solution for Blockchain Verifier’s Dilemma. arXiv preprint arXiv:2406.01794(2024). Shuran Zheng, Xuan Qi, Rui Ray Chen, Yongchan Kwon, and James Zou

work page arXiv 2024

[37] [37]

Qiushi Han, David Simchi-Levi, Renfei Tan, and Zishuo Zhao21 Banghua Zhu, Stephen Bates, Zhuoran Yang, Yixin Wang, Jiantao Jiao, and Michael I Jordan

Proper Dataset Valuation by Pointwise Mutual Information.arXiv preprint arXiv:2405.18253(2024). Qiushi Han, David Simchi-Levi, Renfei Tan, and Zishuo Zhao21 Banghua Zhu, Stephen Bates, Zhuoran Yang, Yixin Wang, Jiantao Jiao, and Michael I Jordan

work page arXiv 2024

[38] [38]

Qiushi Han, David Simchi-Levi, Renfei Tan, and Zishuo Zhao22 Appendix A Deferred Proofs A.1 Proof of Proposition 2.1 Suppose the past rounds’ report history isz 1, z2,· · ·, z𝑡−1

The sample complexity of online contract design.arXiv preprint arXiv:2211.05732(2022). Qiushi Han, David Simchi-Levi, Renfei Tan, and Zishuo Zhao22 Appendix A Deferred Proofs A.1 Proof of Proposition 2.1 Suppose the past rounds’ report history isz 1, z2,· · ·, z𝑡−1 . All probability laws and strategies discussed below are conditional on such history. In r...

work page arXiv 2022

[39] [39]

We apply the same strategy as proof of Theorem 3.2, that is, to consider the intermediate solutionM = BR⊺

separately. We apply the same strategy as proof of Theorem 3.2, that is, to consider the intermediate solutionM = BR⊺. The mechanism can be easily acquired byR =( B−1M)⊺. With this reformulation (see Section A.2 for details), the problem can be constructed as min M 𝜅 s.t.∥B −1M∥ max ≤𝜅, M𝑥𝑥 ≥𝑐+𝛿,∀𝑥∈ Y M𝑥 𝑦 ≤𝑐−𝛿,∀𝑥≠𝑦∈ Y M⊺d′ ≤ −𝛿·1, whered ′ 𝑥 =P(𝑋 𝑖 =𝑥). ...

work page 2003

[40] [40]

Suppose we have a policy 𝜋 that maps the i.i.d

The first inequality holds sincelog(1+𝑥) ≤𝑥, and the second holds since𝛿∈ (0,1/4).□ Now we consider the mechanism design problem. Suppose we have a policy 𝜋 that maps the i.i.d. collected data𝐻 𝑡 to a reward mechanism𝑅 𝑡+1. Consider the good event𝐺 𝑘,𝑘∈ {0,1}: 𝐺𝑘 ={𝑅 𝑡+1 satisfies IC constraints and pays less than1+𝛿in expectation} Notice that 𝐺0 and 𝐺1 a...

work page 2013