RL-trained AI double agents using combined ToM and fooling rewards outperform prompted frontier models on a new belief-steering task and show bidirectional emergence between the two skills.
Decoupling Strategy and Generation in Negotiation Dialogues
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 4years
2026 4verdicts
UNVERDICTED 4representative citing papers
METRO induces both short-term actions and long-term planning from expert transcripts into a Strategy Forest, outperforming prior methods by 9-10% on two non-collaborative dialogue benchmarks.
A dual hierarchical RL framework with two agents coordinates high-level dialogue strategy and low-level question generation to emulate judicial questioning and extract key information from Supreme Court arguments, outperforming baselines.
Agentic e-commerce should operate as a micro-transaction market for verified information unlocked progressively by buyer agents, redirecting NLP research toward cost-optimal acquisition, data pricing, and related problems.
citing papers explorer
-
Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind
RL-trained AI double agents using combined ToM and fooling rewards outperform prompted frontier models on a new belief-steering task and show bidirectional emergence between the two skills.
-
METRO: Towards Strategy Induction from Expert Dialogue Transcripts for Non-collaborative Dialogues
METRO induces both short-term actions and long-term planning from expert transcripts into a Strategy Forest, outperforming prior methods by 9-10% on two non-collaborative dialogue benchmarks.
-
Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents
A dual hierarchical RL framework with two agents coordinates high-level dialogue strategy and low-level question generation to emulate judicial questioning and extract key information from Supreme Court arguments, outperforming baselines.
-
Paying to Know: Micro-Transaction Markets for Verified Product Information in Agentic E-Commerce
Agentic e-commerce should operate as a micro-transaction market for verified information unlocked progressively by buyer agents, redirecting NLP research toward cost-optimal acquisition, data pricing, and related problems.