Frontier LLMs achieve higher surplus via sequential price discrimination in bilateral trade simulations, while SFT followed by GRPO on Qwen models trades off surplus gains against deal rates and improves consistency across price tiers.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.GT 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Training Language Models for Bilateral Trade with Private Information
Frontier LLMs achieve higher surplus via sequential price discrimination in bilateral trade simulations, while SFT followed by GRPO on Qwen models trades off surplus gains against deal rates and improves consistency across price tiers.