A safe exploration algorithm learns an unknown receiver bias parameter in repeated information design and achieves O(log log T) regret with a matching lower bound.
Sequential information design: Markov persuasion process and its efficient reinforcement learning.arXiv preprint arXiv:2202.10678, 2022
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.GT 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Learning to Persuade a Biased Receiver
A safe exploration algorithm learns an unknown receiver bias parameter in repeated information design and achieves O(log log T) regret with a matching lower bound.