Learning to Persuade a Biased Receiver

Milind Tambe; Sadie Zhao; Yiling Chen; Yuqi Pan

arxiv: 2605.15331 · v1 · pith:D3AOF3LZnew · submitted 2026-05-14 · 💻 cs.GT

Learning to Persuade a Biased Receiver

Yuqi Pan , Sadie Zhao , Milind Tambe , Yiling Chen This is my paper

Pith reviewed 2026-05-19 15:28 UTC · model grok-4.3

classification 💻 cs.GT

keywords information designpersuasionbiased belief updatingregret minimizationsafe explorationonline learningrepeated interactions

0 comments

The pith

A sender can learn a receiver's fixed bias in belief updating through safe exploration while achieving O(log log T) regret against a full-information oracle.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates repeated persuasion where the receiver mixes the prior and Bayesian posterior according to an unknown fixed bias parameter. The sender selects signaling schemes each round and only observes the receiver's action, seeking to minimize regret compared to knowing the bias. The safe exploration algorithm uses probes that limit the downside of errors by being conservative, allowing estimation of the bias with minimal impact on persuasion success. This results in regret that grows very slowly as O(log log T), which is shown to be optimal by a matching lower bound. Such a result indicates that effective learning is possible in information design settings even when feedback is limited to actions.

Core claim

For general finite state and action spaces and arbitrary bounded utilities, the safe exploration algorithm achieves O(log log T) regret relative to a full-information oracle that knows the receiver's biased updating rule, with a matching Omega(log log T) lower bound. The algorithm exploits the asymmetric cost of probing: conservative probes incur only local loss, whereas overly aggressive probes may lose the persuasive opportunity entirely.

What carries the argument

The safe exploration algorithm that exploits asymmetric probing costs to learn the single unknown bias parameter while preserving high persuasion value.

If this is right

The O(log log T) regret bound holds for arbitrary finite state and action spaces with bounded utilities.
The algorithm maintains persuasive effectiveness during the learning phase.
Analysis of receiver welfare follows from the learned bias and chosen signals.
Similar regret rates apply in extensions with jointly unknown priors or contextual time-varying environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Techniques like this could inform the design of adaptive systems in marketing or policy where user biases need to be inferred from responses.
The optimality of log log regret suggests that very long horizons are needed to fully amortize the learning cost in persuasion.
Extensions to dynamic biases or multiple receivers might require new probing strategies to retain the same rate.

Load-bearing premise

The receiver applies the same bias to update beliefs in every round and the sender perfectly observes each chosen action but gets no other feedback.

What would settle it

A simulation or experiment tracking cumulative regret over increasing T and verifying whether it stays within O(log log T) or violates the rate would confirm or disprove the bound.

Figures

Figures reproduced from arXiv: 2605.15331 by Milind Tambe, Sadie Zhao, Yiling Chen, Yuqi Pan.

**Figure 1.** Figure 1: Binary instance B = (µ0, qth) = (0.25, 0.60) and true bias α ⋆ = 0.70. inducing action 1. On the receiver side, action 0 is set as optimal at the prior without loss of generality, otherwise the sender can induce action 1 without learning and regret is zero. Since the receiver’s utility difference between actions 1 and 0 is affine in the distorted belief, the action follows a cutoff rule. Let qth ∈ (µ0, 1) … view at source ↗

**Figure 2.** Figure 2: Action regions for µ0 = (0.50, 0.27, 0.23). In state order (e0, e1, e2), uR(a0, ·) = (0, 0, 0), uR(a1, ·) = (−2.1, 0.3, 0.9), and uR(a2, ·) = (−2.1, −0.3, 1.5). We now specify the condition for tie-breaking rules. Define the set of relevant actions: Arel = n a ∈ A : ∃ν ∈ ∆(Ω) that a uniquely maximizes X ω∈Ω (α ⋆ ν(ω)+(1−α ⋆ )µ0(ω))uR(a, ω) o . Actions outside Arel are weakly dominated for the receiver: the… view at source ↗

read the original abstract

We study a repeated information design setting in which the receiver, who is also the decision-maker, updates beliefs in a systematically biased way. More specifically, a distorted posterior in our model can be written as a convex combination of the prior and the Bayesian posterior, governed by a fixed but unknown parameter. Over repeated interactions, the sender chooses persuasive signaling schemes, observes only the receiver's realized actions, and seeks to minimize regret relative to a full-information oracle that knows the receiver's biased updating rule. We propose a safe exploration algorithm for learning the receiver's bias while maintaining high persuasion value. The algorithm exploits the asymmetric cost of probing: conservative probes incur only local loss, whereas overly aggressive probes may lose the persuasive opportunity entirely. For general finite state and action spaces and arbitrary bounded utilities, our method achieves $O(\log\log T)$ regret. A matching $\Omega(\log\log T)$ lower bound shows that this rate is optimal. We further discuss the influence on receiver welfare, as well as extensions to jointly unknown prior and bias, and contextual settings with time-varying priors and utilities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper studies repeated persuasion where a sender designs signals for a receiver who updates beliefs as a convex combination of the prior and Bayesian posterior, governed by a fixed unknown bias parameter. The sender observes only realized actions, chooses persuasive schemes each round, and minimizes regret relative to a full-information oracle knowing the bias. For general finite state/action spaces and arbitrary bounded utilities, a safe exploration algorithm is proposed that exploits asymmetric probing costs to achieve O(log log T) regret, with a matching Omega(log log T) lower bound. Extensions to unknown priors, contextual settings, and receiver welfare are discussed.

Significance. If the analysis holds, the result is significant for online learning in information design: it establishes an optimal double-logarithmic regret rate under model misspecification on the receiver side, enabled by conservative probes that incur only local loss. The matching lower bound and the general finite-space setting with arbitrary utilities are strengths. The paper also provides concrete algorithmic design and discusses welfare implications, which could inform future work on strategic communication and mechanism design with learning.

major comments (2)

[regret analysis of safe exploration algorithm] Regret analysis for the safe exploration procedure: the O(log log T) upper bound requires that conservative probes refine the bias estimate at double-exponential speed while remaining persuasive after each update. For arbitrary bounded utilities the receiver's best response can be discontinuous in the perceived posterior; it is unclear whether the probe construction guarantees that an updated estimate keeps the chosen signal persuasive and informative, which is load-bearing for obtaining the claimed rate rather than a slower polynomial rate.
[lower bound section] Lower bound construction: the Omega(log log T) lower bound relies on the necessity of safe exploration and that discontinuities in best responses do not permit faster learning. A verification that the construction carries through for general utilities (where discontinuities might alter the information-gathering speed) would be needed to confirm optimality.

minor comments (2)

[abstract] Abstract: the precise mathematical definition of the biased posterior (convex combination parameter) should be stated explicitly rather than described only in words.
[algorithm description] The manuscript would benefit from a short illustrative example with discontinuous best response to demonstrate that the safe probes remain valid after estimate updates.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and valuable comments on our paper. We address each major comment below, providing clarifications on the regret analysis and lower bound. We believe these responses resolve the concerns while maintaining the integrity of our results for general settings.

read point-by-point responses

Referee: Regret analysis for the safe exploration procedure: the O(log log T) upper bound requires that conservative probes refine the bias estimate at double-exponential speed while remaining persuasive after each update. For arbitrary bounded utilities the receiver's best response can be discontinuous in the perceived posterior; it is unclear whether the probe construction guarantees that an updated estimate keeps the chosen signal persuasive and informative, which is load-bearing for obtaining the claimed rate rather than a slower polynomial rate.

Authors: We appreciate this observation. In Section 4 of the manuscript, the safe exploration algorithm constructs probes that are persuasive for all bias parameters in the current uncertainty interval. This is achieved by solving for signals that induce the desired action for the worst-case bias in the interval, ensuring that updates to the estimate (which refine the interval) keep the signal persuasive because the new interval is a subset. Although best responses may be discontinuous, the persuasion condition is maintained within the regions where the action is fixed, and our algorithm avoids crossing discontinuity points by using conservative probes. The double-exponential convergence is due to the multiplicative reduction in uncertainty per probe. To make this clearer, we will add a remark in the main text and expand the proof in the appendix to explicitly handle the discontinuity cases for arbitrary utilities. revision: partial
Referee: Lower bound construction: the Omega(log log T) lower bound relies on the necessity of safe exploration and that discontinuities in best responses do not permit faster learning. A verification that the construction carries through for general utilities (where discontinuities might alter the information-gathering speed) would be needed to confirm optimality.

Authors: For the lower bound in Section 5, the construction uses a specific family of utility functions where the best response discontinuities are present but do not allow bypassing the safe exploration requirement. The information-theoretic argument shows that to achieve low regret, the algorithm must perform probes that safely narrow the bias parameter, and any attempt to use aggressive probes risks high regret due to potential loss of persuasion. This holds for general utilities because the lower bound instance is embedded in the general setting, and discontinuities only increase the difficulty of learning without safe probes. We can provide additional details in the revision to verify the construction explicitly for discontinuous cases. revision: partial

Circularity Check

0 steps flagged

No significant circularity; regret bound derived independently of fitted bias parameter

full rationale

The paper presents a safe exploration algorithm that learns a single fixed scalar bias parameter through conservative probes while maintaining persuasion. The O(log log T) upper bound and matching lower bound are obtained via standard analysis of asymmetric-cost exploration against an external full-information oracle benchmark. No step reduces the claimed regret expression to a quantity defined by the same bias parameter, nor does the derivation rely on self-citation chains, imported uniqueness theorems, or ansatz smuggling. The central result remains self-contained with independent content for general finite spaces and arbitrary utilities.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the modeling assumption that the bias is a single fixed scalar governing a convex combination of prior and posterior, plus standard assumptions on finite spaces and bounded utilities. No free parameters are fitted inside the algorithm itself; the bias is the unknown to be learned.

axioms (2)

domain assumption Receiver's distorted posterior is a convex combination of prior and Bayesian posterior with fixed unknown weight
Stated in the abstract as the model of biased updating
domain assumption Sender observes only the realized action after each signal
Explicit in the problem setup

pith-pipeline@v0.9.0 · 5718 in / 1345 out tokens · 28368 ms · 2026-05-19T15:28:38.900846+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 1 internal anchor

[1]

MIT press, 1995

Robert J Aumann, Michael Maschler, and Richard E Stearns.Repeated games with incomplete information. MIT press, 1995

work page 1995
[2]

Over-and underreaction to information: Belief updating with cognitive constraints

Cuimin Ba, J Aislinn Bohren, and Alex Imas. Over-and underreaction to information: Belief updating with cognitive constraints. Technical report, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, 2025

work page 2025
[3]

Markov persuasion processes: Learning to persuade from scratch.arXiv preprint arXiv:2402.03077, 2024

Francesco Bacchiocchi, Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi, and Nicola Gatti. Markov persuasion processes: Learning to persuade from scratch.arXiv preprint arXiv:2402.03077, 2024

work page arXiv 2024
[4]

A meta-analysis of the weight of advice in decision-making.Current Psychology, 42(28): 24516–24541, 2023

Phoebe E Bailey, Tarren Leon, Natalie C Ebner, Ahmed A Moustafa, and Gabrielle Weidemann. A meta-analysis of the weight of advice in decision-making.Current Psychology, 42(28): 24516–24541, 2023

work page 2023
[5]

The base-rate fallacy in probability judgments.Acta Psychologica, 44(3): 211–233, 1980

Maya Bar-Hillel. The base-rate fallacy in probability judgments.Acta Psychologica, 44(3): 211–233, 1980

work page 1980
[6]

Errors in probabilistic reasoning and judgment biases.Handbook of Behavioral Economics: Applications and Foundations 1, 2:69–186, 2019

Daniel J Benjamin. Errors in probabilistic reasoning and judgment biases.Handbook of Behavioral Economics: Applications and Foundations 1, 2:69–186, 2019

work page 2019
[7]

Information design, bayesian persuasion, and bayes correlated equilibrium.American Economic Review, 106(5):586–591, 2016

Dirk Bergemann and Stephen Morris. Information design, bayesian persuasion, and bayes correlated equilibrium.American Economic Review, 106(5):586–591, 2016

work page 2016
[8]

Equivalent comparisons of experiments.The annals of mathematical statistics, pages 265–272, 1953

David Blackwell. Equivalent comparisons of experiments.The annals of mathematical statistics, pages 265–272, 1953

work page 1953
[9]

Online bayesian persua- sion.Advances in neural information processing systems, 33:16188–16198, 2020

Matteo Castiglioni, Andrea Celli, Alberto Marchesi, and Nicola Gatti. Online bayesian persua- sion.Advances in neural information processing systems, 33:16188–16198, 2020

work page 2020
[10]

Multi-receiver online bayesian persuasion

Matteo Castiglioni, Alberto Marchesi, Andrea Celli, and Nicola Gatti. Multi-receiver online bayesian persuasion. InInternational Conference on Machine Learning, pages 1314–1323. PMLR, 2021

work page 2021
[11]

Bias detection via signaling.Advances in Neural Information Processing Systems, 37:69120–69143, 2024

Yiling Chen, Tao Lin, Ariel D Procaccia, Aaditya Ramdas, and Itai Shapira. Bias detection via signaling.Advances in Neural Information Processing Systems, 37:69120–69143, 2024

work page 2024
[12]

Learning to signal in the goldilocks zone: Improving adversary compliance in security games

Sarah Cooney, Kai Wang, Elizabeth Bondi, Thanh Nguyen, Phebe Vayanos, Hailey Winetrobe, Edward A Cranford, Cleotilde Gonzalez, Christian Lebiere, and Milind Tambe. Learning to signal in the goldilocks zone: Improving adversary compliance in security games. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 725–740....

work page 2019
[13]

Toward personalized deceptive signaling for cyber defense using cognitive models.Topics in Cognitive Science, 12(3):992–1011, 2020

Edward A Cranford, Cleotilde Gonzalez, Palvi Aggarwal, Sarah Cooney, Milind Tambe, and Christian Lebiere. Toward personalized deceptive signaling for cyber defense using cognitive models.Topics in Cognitive Science, 12(3):992–1011, 2020

work page 2020
[14]

A theory of learning to infer.Psychological review, 127(3):412, 2020

Ishita Dasgupta, Eric Schulz, Joshua B Tenenbaum, and Samuel J Gershman. A theory of learning to infer.Psychological review, 127(3):412, 2020

work page 2020
[15]

Non-bayesian persuasion.Journal of Political Economy, 130(10):2594–2642, 2022

Geoffroy De Clippel and Xu Zhang. Non-bayesian persuasion.Journal of Political Economy, 130(10):2594–2642, 2022

work page 2022
[16]

Algorithm aversion: people erroneously avoid algorithms after seeing them err.Journal of experimental psychology: General, 144(1):114, 2015

Berkeley J Dietvorst, Joseph P Simmons, and Cade Massey. Algorithm aversion: people erroneously avoid algorithms after seeing them err.Journal of experimental psychology: General, 144(1):114, 2015

work page 2015
[17]

Algorithmic bayesian persuasion

Shaddin Dughmi and Haifeng Xu. Algorithmic bayesian persuasion. InProceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 412–425, 2016

work page 2016
[18]

Algorithmic persuasion with no externalities

Shaddin Dughmi and Haifeng Xu. Algorithmic persuasion with no externalities. InProceedings of the 2017 ACM Conference on Economics and Computation, pages 351–368, 2017. 11

work page 2017
[19]

Conservatism in human information processing.Formal representation of human judgment, 1968

Ward Edwards. Conservatism in human information processing.Formal representation of human judgment, 1968

work page 1968
[20]

Non-bayesian learning.The BE Journal of Theoretical Economics, 10(1):1–20, 2010

Larry G Epstein, Jawwad Noor, Alvaro Sandroni, et al. Non-bayesian learning.The BE Journal of Theoretical Economics, 10(1):1–20, 2010

work page 2010
[21]

Online bayesian recommendation with no regret

Yiding Feng, Wei Tang, and Haifeng Xu. Online bayesian recommendation with no regret. In Proceedings of the 23rd ACM Conference on Economics and Computation, pages 818–819, 2022

work page 2022
[22]

Rationality-robust information design: Bayesian persuasion under quantal response

Yiding Feng, Chien-Ju Ho, and Wei Tang. Rationality-robust information design: Bayesian persuasion under quantal response. InProceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 501–546. SIAM, 2024

work page 2024
[23]

A rothschild-stiglitz approach to bayesian persuasion

Matthew Gentzkow and Emir Kamenica. A rothschild-stiglitz approach to bayesian persuasion. American Economic Review, 106(5):597–601, 2016

work page 2016
[24]

Bayes rule as a descriptive model: The representativeness heuristic.The Quarterly journal of economics, 95(3):537–557, 1980

David M Grether. Bayes rule as a descriptive model: The representativeness heuristic.The Quarterly journal of economics, 95(3):537–557, 1980

work page 1980
[25]

Persuasion with motivated beliefs

David Hagmann and George Loewenstein. Persuasion with motivated beliefs. InOpinion Dynamics & Collective Decisions Workshop, 2017

work page 2017
[26]

Algorithmic persuasion through simulation.arXiv preprint arXiv:2311.18138, 2023

Keegan Harris, Nicole Immorlica, Brendan Lucier, and Aleksandrs Slivkins. Algorithmic persuasion through simulation.arXiv preprint arXiv:2311.18138, 2023

work page arXiv 2023
[27]

Bayesian persuasion.American Economic Review, 101(6):2590–2615, 2011

Emir Kamenica and Matthew Gentzkow. Bayesian persuasion.American Economic Review, 101(6):2590–2615, 2011

work page 2011
[28]

The value of knowing a demand curve: Bounds on regret for online posted-price auctions

Robert Kleinberg and Tom Leighton. The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings., pages 594–605. IEEE, 2003

work page 2003
[29]

Dynamic Non-Bayesian Persuasion

Masanori Kobayashi. Dynamic non-bayesian persuasion.arXiv preprint arXiv:2508.12328, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

Information design with unknown prior.arXiv preprint arXiv:2410.05533, 2024

Ce Li and Tao Lin. Information design with unknown prior.arXiv preprint arXiv:2410.05533, 2024

work page arXiv 2024
[31]

What influences algorithmic decision-making? a systematic literature review on algorithm aversion

Hasan Mahmud, AKM Najmul Islam, Syed Ishtiaque Ahmed, and Kari Smolander. What influences algorithmic decision-making? a systematic literature review on algorithm aversion. Technological forecasting and social change, 175:121390, 2022

work page 2022
[32]

Optimal information disclosure.Journal of political Economy, 118 (5):949–987, 2010

Luis Rayo and Ilya Segal. Optimal information disclosure.Journal of political Economy, 118 (5):949–987, 2010

work page 2010
[33]

Motivated skepticism in the evaluation of political beliefs

Charles S Taber and Milton Lodge. Motivated skepticism in the evaluation of political beliefs. American journal of political science, 50(3):755–769, 2006

work page 2006
[34]

On the bayesian rational assumption in information design

Wei Tang and Chien-Ju Ho. On the bayesian rational assumption in information design. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, volume 9, pages 120–130, 2021

work page 2021
[35]

Bayesian or biased? analytic thinking and political belief updating.Cognition, 204:104375, 2020

Ben M Tappin, Gordon Pennycook, and David G Rand. Bayesian or biased? analytic thinking and political belief updating.Cognition, 204:104375, 2020

work page 2020
[36]

Sequential information design: Markov persuasion process and its efficient reinforcement learning.arXiv preprint arXiv:2202.10678, 2022

Jibang Wu, Zixuan Zhang, Zhe Feng, Zhaoran Wang, Zhuoran Yang, Michael I Jordan, and Haifeng Xu. Sequential information design: Markov persuasion process and its efficient reinforcement learning.arXiv preprint arXiv:2202.10678, 2022

work page arXiv 2022
[37]

Probabilistic computations: Toward a unified measure of complexity

Andrew Chi-Chin Yao. Probabilistic computations: Toward a unified measure of complexity. In18th Annual Symposium on Foundations of Computer Science (sfcs 1977), pages 222–227. IEEE Computer Society, 1977

work page 1977
[38]

SkX t=1 Ht # =E

You Zu, Krishnamurthy Iyer, and Haifeng Xu. Learning to persuade on the fly: Robustness against ignorance. InProceedings of the 22nd ACM Conference on Economics and Computation, pages 927–928, 2021. 12 A Related Work (Detailed Discussion) A large body of empirical and theoretical work shows that Bayesian updating is a useful normative benchmark but often ...

work page 2021
[39]

Each physical round contributes at most ∆Umax regret

Therefore E[Xk]≤ √ 2 δµ0 ,E " NX k=1 Xk # ≤ √ 2N δµ0 =O(log logT). Each physical round contributes at most ∆Umax regret. The hidden constant here is √ 2∆Umax/δµ0. C.3 Relevant Actions Recall the relevant-action set Arel = n a∈A:∃ν∈∆(Ω)thatauniquely maximizes X ω∈Ω (α⋆ν(ω)+(1−α ⋆)µ0(ω))uR(a, ω) o . The following discussion shows that without excluding thos...

work page
[40]

28 Also, bλ+ ≤ 2∥r∥2 δµ0 ≤ 2pϵ δµ0

=µ 0. 28 Also, bλ+ ≤ 2∥r∥2 δµ0 ≤ 2pϵ δµ0 . For utility, write va(ν) := X ω∈Ω ν(ω)u S(a, ω). Then |va(ν)−v a(ν′)| ≤U max p |Ω| ∥ν−ν ′∥2. Therefore, |US(τ)−U S(bτ)| ≤Umax p |Ω| X ν′ i̸=νi λi∥νi −ν ′ i∥2 +U max X i |λi −bλi|+U maxbλ+ ≤U max p |Ω|pϵ+ 2U maxbλ+ ≤U max p |Ω|pϵ+ 4Umax δµ0 pϵ. This proves the claim. C.4.3 Proof of Proposition 3 Since τopt ={(λ i,...

work page
[41]

The true bias is detectable in every context:α ⋆ ≥¯αmin := maxx∈X αmin(Ix)

work page
[42]

The sender utilities are uniformly bounded:U X max := maxx∈X maxa,ω |uS,x(a, ω)|<∞

work page
[43]

For every contextx, the analogue of Proposition 5 holds, and there exists a positive constant c= min x cx >0, ϵ= min x ϵx >0 for the first branch in Proposition 5, and a positive constantδ= min x δx >0for the second branch

work page
[44]

Equivalently, there exist positive constant δµ0 = min x∈X δµ0,x and finite constantsκ, G max, Lb satisfying κ= max x∈X κx, G max = max x∈X Gmax,x, L b = max x∈X Lb,x

The constants in Theorem 2 can be chosen uniformly over x∈ X , see more details in Section C.1. Equivalently, there exist positive constant δµ0 = min x∈X δµ0,x and finite constantsκ, G max, Lb satisfying κ= max x∈X κx, G max = max x∈X Gmax,x, L b = max x∈X Lb,x. Note that the above assumption naturally holds if the context setXis finite. Contextual algori...

work page
[45]

2:Initialize:interval[a, b]←[0,1], step sizeε← 1 2, timet←1

the receiver takes action 1 if and only if the distorted posterior is at leastbq, where 0< µ ∗ < bq <1; 38 Algorithm 9Safe Exploration for Unknown Bias and Unknown Prior 1:Input:horizonT, thresholdbq. 2:Initialize:interval[a, b]←[0,1], step sizeε← 1 2, timet←1. 3:Stage 1: Safe exploration 4:whileb−a > T −1 andt≤Tdo 5:m←a,m prev ←a 6:whilem≤bandt≤Tdo 7:Com...

work page
[46]

persuasion is feasible, namely α∗ ≥α min = bq−µ∗ 1−µ ∗ so thatm ∗ ∈[0,1]is well-defined

work page
[47]

LetΠ SEJ denote Algorithm 9

ties are broken in favor of action1. LetΠ SEJ denote Algorithm 9. Then, for everyT≥4, RegT ΠSEJ;α ∗, µ∗ =O(log logT). In particular, RegT ΠSEJ;α ∗, µ∗ ≤ 1−µ ∗ µ∗ + 1 (2 +⌈log 2 log2 T⌉) + 1. Proof. Let V ∗ denote the full-information benchmark value. As noted above, when (α⋆, µ∗) is known, an optimal scheme is supported on{0, ν ∗}, soV ∗ = µ∗ ν∗ =µ ∗ + (1...

work page

[1] [1]

MIT press, 1995

Robert J Aumann, Michael Maschler, and Richard E Stearns.Repeated games with incomplete information. MIT press, 1995

work page 1995

[2] [2]

Over-and underreaction to information: Belief updating with cognitive constraints

Cuimin Ba, J Aislinn Bohren, and Alex Imas. Over-and underreaction to information: Belief updating with cognitive constraints. Technical report, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, 2025

work page 2025

[3] [3]

Markov persuasion processes: Learning to persuade from scratch.arXiv preprint arXiv:2402.03077, 2024

Francesco Bacchiocchi, Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi, and Nicola Gatti. Markov persuasion processes: Learning to persuade from scratch.arXiv preprint arXiv:2402.03077, 2024

work page arXiv 2024

[4] [4]

A meta-analysis of the weight of advice in decision-making.Current Psychology, 42(28): 24516–24541, 2023

Phoebe E Bailey, Tarren Leon, Natalie C Ebner, Ahmed A Moustafa, and Gabrielle Weidemann. A meta-analysis of the weight of advice in decision-making.Current Psychology, 42(28): 24516–24541, 2023

work page 2023

[5] [5]

The base-rate fallacy in probability judgments.Acta Psychologica, 44(3): 211–233, 1980

Maya Bar-Hillel. The base-rate fallacy in probability judgments.Acta Psychologica, 44(3): 211–233, 1980

work page 1980

[6] [6]

Errors in probabilistic reasoning and judgment biases.Handbook of Behavioral Economics: Applications and Foundations 1, 2:69–186, 2019

Daniel J Benjamin. Errors in probabilistic reasoning and judgment biases.Handbook of Behavioral Economics: Applications and Foundations 1, 2:69–186, 2019

work page 2019

[7] [7]

Information design, bayesian persuasion, and bayes correlated equilibrium.American Economic Review, 106(5):586–591, 2016

Dirk Bergemann and Stephen Morris. Information design, bayesian persuasion, and bayes correlated equilibrium.American Economic Review, 106(5):586–591, 2016

work page 2016

[8] [8]

Equivalent comparisons of experiments.The annals of mathematical statistics, pages 265–272, 1953

David Blackwell. Equivalent comparisons of experiments.The annals of mathematical statistics, pages 265–272, 1953

work page 1953

[9] [9]

Online bayesian persua- sion.Advances in neural information processing systems, 33:16188–16198, 2020

Matteo Castiglioni, Andrea Celli, Alberto Marchesi, and Nicola Gatti. Online bayesian persua- sion.Advances in neural information processing systems, 33:16188–16198, 2020

work page 2020

[10] [10]

Multi-receiver online bayesian persuasion

Matteo Castiglioni, Alberto Marchesi, Andrea Celli, and Nicola Gatti. Multi-receiver online bayesian persuasion. InInternational Conference on Machine Learning, pages 1314–1323. PMLR, 2021

work page 2021

[11] [11]

Bias detection via signaling.Advances in Neural Information Processing Systems, 37:69120–69143, 2024

Yiling Chen, Tao Lin, Ariel D Procaccia, Aaditya Ramdas, and Itai Shapira. Bias detection via signaling.Advances in Neural Information Processing Systems, 37:69120–69143, 2024

work page 2024

[12] [12]

Learning to signal in the goldilocks zone: Improving adversary compliance in security games

Sarah Cooney, Kai Wang, Elizabeth Bondi, Thanh Nguyen, Phebe Vayanos, Hailey Winetrobe, Edward A Cranford, Cleotilde Gonzalez, Christian Lebiere, and Milind Tambe. Learning to signal in the goldilocks zone: Improving adversary compliance in security games. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 725–740....

work page 2019

[13] [13]

Toward personalized deceptive signaling for cyber defense using cognitive models.Topics in Cognitive Science, 12(3):992–1011, 2020

Edward A Cranford, Cleotilde Gonzalez, Palvi Aggarwal, Sarah Cooney, Milind Tambe, and Christian Lebiere. Toward personalized deceptive signaling for cyber defense using cognitive models.Topics in Cognitive Science, 12(3):992–1011, 2020

work page 2020

[14] [14]

A theory of learning to infer.Psychological review, 127(3):412, 2020

Ishita Dasgupta, Eric Schulz, Joshua B Tenenbaum, and Samuel J Gershman. A theory of learning to infer.Psychological review, 127(3):412, 2020

work page 2020

[15] [15]

Non-bayesian persuasion.Journal of Political Economy, 130(10):2594–2642, 2022

Geoffroy De Clippel and Xu Zhang. Non-bayesian persuasion.Journal of Political Economy, 130(10):2594–2642, 2022

work page 2022

[16] [16]

Algorithm aversion: people erroneously avoid algorithms after seeing them err.Journal of experimental psychology: General, 144(1):114, 2015

Berkeley J Dietvorst, Joseph P Simmons, and Cade Massey. Algorithm aversion: people erroneously avoid algorithms after seeing them err.Journal of experimental psychology: General, 144(1):114, 2015

work page 2015

[17] [17]

Algorithmic bayesian persuasion

Shaddin Dughmi and Haifeng Xu. Algorithmic bayesian persuasion. InProceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 412–425, 2016

work page 2016

[18] [18]

Algorithmic persuasion with no externalities

Shaddin Dughmi and Haifeng Xu. Algorithmic persuasion with no externalities. InProceedings of the 2017 ACM Conference on Economics and Computation, pages 351–368, 2017. 11

work page 2017

[19] [19]

Conservatism in human information processing.Formal representation of human judgment, 1968

Ward Edwards. Conservatism in human information processing.Formal representation of human judgment, 1968

work page 1968

[20] [20]

Non-bayesian learning.The BE Journal of Theoretical Economics, 10(1):1–20, 2010

Larry G Epstein, Jawwad Noor, Alvaro Sandroni, et al. Non-bayesian learning.The BE Journal of Theoretical Economics, 10(1):1–20, 2010

work page 2010

[21] [21]

Online bayesian recommendation with no regret

Yiding Feng, Wei Tang, and Haifeng Xu. Online bayesian recommendation with no regret. In Proceedings of the 23rd ACM Conference on Economics and Computation, pages 818–819, 2022

work page 2022

[22] [22]

Rationality-robust information design: Bayesian persuasion under quantal response

Yiding Feng, Chien-Ju Ho, and Wei Tang. Rationality-robust information design: Bayesian persuasion under quantal response. InProceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 501–546. SIAM, 2024

work page 2024

[23] [23]

A rothschild-stiglitz approach to bayesian persuasion

Matthew Gentzkow and Emir Kamenica. A rothschild-stiglitz approach to bayesian persuasion. American Economic Review, 106(5):597–601, 2016

work page 2016

[24] [24]

Bayes rule as a descriptive model: The representativeness heuristic.The Quarterly journal of economics, 95(3):537–557, 1980

David M Grether. Bayes rule as a descriptive model: The representativeness heuristic.The Quarterly journal of economics, 95(3):537–557, 1980

work page 1980

[25] [25]

Persuasion with motivated beliefs

David Hagmann and George Loewenstein. Persuasion with motivated beliefs. InOpinion Dynamics & Collective Decisions Workshop, 2017

work page 2017

[26] [26]

Algorithmic persuasion through simulation.arXiv preprint arXiv:2311.18138, 2023

Keegan Harris, Nicole Immorlica, Brendan Lucier, and Aleksandrs Slivkins. Algorithmic persuasion through simulation.arXiv preprint arXiv:2311.18138, 2023

work page arXiv 2023

[27] [27]

Bayesian persuasion.American Economic Review, 101(6):2590–2615, 2011

Emir Kamenica and Matthew Gentzkow. Bayesian persuasion.American Economic Review, 101(6):2590–2615, 2011

work page 2011

[28] [28]

The value of knowing a demand curve: Bounds on regret for online posted-price auctions

Robert Kleinberg and Tom Leighton. The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings., pages 594–605. IEEE, 2003

work page 2003

[29] [29]

Dynamic Non-Bayesian Persuasion

Masanori Kobayashi. Dynamic non-bayesian persuasion.arXiv preprint arXiv:2508.12328, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[30] [30]

Information design with unknown prior.arXiv preprint arXiv:2410.05533, 2024

Ce Li and Tao Lin. Information design with unknown prior.arXiv preprint arXiv:2410.05533, 2024

work page arXiv 2024

[31] [31]

What influences algorithmic decision-making? a systematic literature review on algorithm aversion

Hasan Mahmud, AKM Najmul Islam, Syed Ishtiaque Ahmed, and Kari Smolander. What influences algorithmic decision-making? a systematic literature review on algorithm aversion. Technological forecasting and social change, 175:121390, 2022

work page 2022

[32] [32]

Optimal information disclosure.Journal of political Economy, 118 (5):949–987, 2010

Luis Rayo and Ilya Segal. Optimal information disclosure.Journal of political Economy, 118 (5):949–987, 2010

work page 2010

[33] [33]

Motivated skepticism in the evaluation of political beliefs

Charles S Taber and Milton Lodge. Motivated skepticism in the evaluation of political beliefs. American journal of political science, 50(3):755–769, 2006

work page 2006

[34] [34]

On the bayesian rational assumption in information design

Wei Tang and Chien-Ju Ho. On the bayesian rational assumption in information design. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, volume 9, pages 120–130, 2021

work page 2021

[35] [35]

Bayesian or biased? analytic thinking and political belief updating.Cognition, 204:104375, 2020

Ben M Tappin, Gordon Pennycook, and David G Rand. Bayesian or biased? analytic thinking and political belief updating.Cognition, 204:104375, 2020

work page 2020

[36] [36]

Sequential information design: Markov persuasion process and its efficient reinforcement learning.arXiv preprint arXiv:2202.10678, 2022

Jibang Wu, Zixuan Zhang, Zhe Feng, Zhaoran Wang, Zhuoran Yang, Michael I Jordan, and Haifeng Xu. Sequential information design: Markov persuasion process and its efficient reinforcement learning.arXiv preprint arXiv:2202.10678, 2022

work page arXiv 2022

[37] [37]

Probabilistic computations: Toward a unified measure of complexity

Andrew Chi-Chin Yao. Probabilistic computations: Toward a unified measure of complexity. In18th Annual Symposium on Foundations of Computer Science (sfcs 1977), pages 222–227. IEEE Computer Society, 1977

work page 1977

[38] [38]

SkX t=1 Ht # =E

You Zu, Krishnamurthy Iyer, and Haifeng Xu. Learning to persuade on the fly: Robustness against ignorance. InProceedings of the 22nd ACM Conference on Economics and Computation, pages 927–928, 2021. 12 A Related Work (Detailed Discussion) A large body of empirical and theoretical work shows that Bayesian updating is a useful normative benchmark but often ...

work page 2021

[39] [39]

Each physical round contributes at most ∆Umax regret

Therefore E[Xk]≤ √ 2 δµ0 ,E " NX k=1 Xk # ≤ √ 2N δµ0 =O(log logT). Each physical round contributes at most ∆Umax regret. The hidden constant here is √ 2∆Umax/δµ0. C.3 Relevant Actions Recall the relevant-action set Arel = n a∈A:∃ν∈∆(Ω)thatauniquely maximizes X ω∈Ω (α⋆ν(ω)+(1−α ⋆)µ0(ω))uR(a, ω) o . The following discussion shows that without excluding thos...

work page

[40] [40]

28 Also, bλ+ ≤ 2∥r∥2 δµ0 ≤ 2pϵ δµ0

=µ 0. 28 Also, bλ+ ≤ 2∥r∥2 δµ0 ≤ 2pϵ δµ0 . For utility, write va(ν) := X ω∈Ω ν(ω)u S(a, ω). Then |va(ν)−v a(ν′)| ≤U max p |Ω| ∥ν−ν ′∥2. Therefore, |US(τ)−U S(bτ)| ≤Umax p |Ω| X ν′ i̸=νi λi∥νi −ν ′ i∥2 +U max X i |λi −bλi|+U maxbλ+ ≤U max p |Ω|pϵ+ 2U maxbλ+ ≤U max p |Ω|pϵ+ 4Umax δµ0 pϵ. This proves the claim. C.4.3 Proof of Proposition 3 Since τopt ={(λ i,...

work page

[41] [41]

The true bias is detectable in every context:α ⋆ ≥¯αmin := maxx∈X αmin(Ix)

work page

[42] [42]

The sender utilities are uniformly bounded:U X max := maxx∈X maxa,ω |uS,x(a, ω)|<∞

work page

[43] [43]

For every contextx, the analogue of Proposition 5 holds, and there exists a positive constant c= min x cx >0, ϵ= min x ϵx >0 for the first branch in Proposition 5, and a positive constantδ= min x δx >0for the second branch

work page

[44] [44]

Equivalently, there exist positive constant δµ0 = min x∈X δµ0,x and finite constantsκ, G max, Lb satisfying κ= max x∈X κx, G max = max x∈X Gmax,x, L b = max x∈X Lb,x

The constants in Theorem 2 can be chosen uniformly over x∈ X , see more details in Section C.1. Equivalently, there exist positive constant δµ0 = min x∈X δµ0,x and finite constantsκ, G max, Lb satisfying κ= max x∈X κx, G max = max x∈X Gmax,x, L b = max x∈X Lb,x. Note that the above assumption naturally holds if the context setXis finite. Contextual algori...

work page

[45] [45]

2:Initialize:interval[a, b]←[0,1], step sizeε← 1 2, timet←1

the receiver takes action 1 if and only if the distorted posterior is at leastbq, where 0< µ ∗ < bq <1; 38 Algorithm 9Safe Exploration for Unknown Bias and Unknown Prior 1:Input:horizonT, thresholdbq. 2:Initialize:interval[a, b]←[0,1], step sizeε← 1 2, timet←1. 3:Stage 1: Safe exploration 4:whileb−a > T −1 andt≤Tdo 5:m←a,m prev ←a 6:whilem≤bandt≤Tdo 7:Com...

work page

[46] [46]

persuasion is feasible, namely α∗ ≥α min = bq−µ∗ 1−µ ∗ so thatm ∗ ∈[0,1]is well-defined

work page

[47] [47]

LetΠ SEJ denote Algorithm 9

ties are broken in favor of action1. LetΠ SEJ denote Algorithm 9. Then, for everyT≥4, RegT ΠSEJ;α ∗, µ∗ =O(log logT). In particular, RegT ΠSEJ;α ∗, µ∗ ≤ 1−µ ∗ µ∗ + 1 (2 +⌈log 2 log2 T⌉) + 1. Proof. Let V ∗ denote the full-information benchmark value. As noted above, when (α⋆, µ∗) is known, an optimal scheme is supported on{0, ν ∗}, soV ∗ = µ∗ ν∗ =µ ∗ + (1...

work page