RL-trained AI double agents using combined ToM and fooling rewards outperform prompted frontier models on a new belief-steering task and show bidirectional emergence between the two skills.
Decoupling Strategy and Generation in Negotiation Dialogues
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
METRO induces both short-term actions and long-term planning from expert transcripts into a Strategy Forest, outperforming prior methods by 9-10% on two non-collaborative dialogue benchmarks.
A dual hierarchical RL framework with two agents coordinates high-level dialogue strategy and low-level question generation to emulate judicial questioning and extract key information from Supreme Court arguments, outperforming baselines.
citing papers explorer
-
Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind
RL-trained AI double agents using combined ToM and fooling rewards outperform prompted frontier models on a new belief-steering task and show bidirectional emergence between the two skills.
-
METRO: Towards Strategy Induction from Expert Dialogue Transcripts for Non-collaborative Dialogues
METRO induces both short-term actions and long-term planning from expert transcripts into a Strategy Forest, outperforming prior methods by 9-10% on two non-collaborative dialogue benchmarks.
-
Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents
A dual hierarchical RL framework with two agents coordinates high-level dialogue strategy and low-level question generation to emulate judicial questioning and extract key information from Supreme Court arguments, outperforming baselines.