SidConArena is a new multi-phase benchmark framework formalizing a partially observable stochastic game for evaluating LLM agents in open-ended positive-sum bargaining with negotiation, converter production, and sealed-bid auctions.
arXiv preprint arXiv:2402.05863 , year=
12 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 3representative citing papers
METRO induces both short-term actions and long-term planning from expert transcripts into a Strategy Forest, outperforming prior methods by 9-10% on two non-collaborative dialogue benchmarks.
Proposes a three-step benchmark design method (define work activity, specify tested setting, score work product) derived from work studies and O*NET, demonstrated via three case analyses.
PAVE is a four-module architecture (Perception, Assessment, Verdict, Emulation) that enables generative agents to perform legitimate rule violations while preserving authority deference, bounded scope, and post-trigger recovery in multi-agent simulations.
A small set of sparse autoencoder features in LLMs drives shifts between generous and selfish allocations in dictator games, with causal patching and steering confirming their role and generalization to other social games.
LLM moral robustness under persona role-play is largely determined by model family with Claude models most consistent, while susceptibility shows little family dependence.
LLM agents exhibit emergent deception in a sustainability game even without lying permission, with neighbor info increasing attacks while aiding biosphere retention.
SOM uses a Structural Causal Model to create an explicit graph of opponent observation-to-action links, allowing LLMs to reason along those paths for more accurate and stable predictions in multi-agent settings.
Proposes extending preregistration practices to AI agent experiments and supplies a tailored template to limit researcher degrees of freedom.
An LLM agent with grounding, personalization, and marketing modules generates real estate descriptions that human buyers prefer over expert-written ones while matching factual accuracy.
citing papers explorer
-
SidConArena: An Environment Evaluating Agents in Open-Ended,Positive-Sum Bargaining Game
SidConArena is a new multi-phase benchmark framework formalizing a partially observable stochastic game for evaluating LLM agents in open-ended positive-sum bargaining with negotiation, converter production, and sealed-bid auctions.
-
METRO: Towards Strategy Induction from Expert Dialogue Transcripts for Non-collaborative Dialogues
METRO induces both short-term actions and long-term planning from expert transcripts into a Strategy Forest, outperforming prior methods by 9-10% on two non-collaborative dialogue benchmarks.
-
Design and Report Benchmarks for Knowledge Work
Proposes a three-step benchmark design method (define work activity, specify tested setting, score work product) derived from work studies and O*NET, demonstrated via three case analyses.
-
PAVE: A Cognitive Architecture for Legitimate Violation in Generative Agent Societies
PAVE is a four-module architecture (Perception, Assessment, Verdict, Emulation) that enables generative agents to perform legitimate rule violations while preserving authority deference, bounded scope, and post-trigger recovery in multi-agent simulations.
-
Understanding the Mechanism of Altruism in Large Language Models
A small set of sparse autoencoder features in LLMs drives shifts between generous and selfish allocations in dictator games, with causal patching and steering confirming their role and generalization to other social games.
-
Moral Susceptibility and Robustness under Persona Role-Play in Large Language Models
LLM moral robustness under persona role-play is largely determined by model family with Claude models most consistent, while susceptibility shows little family dependence.
-
Is Lying an Emergent Behaviour in LLMs? Evidence from Gaslighting AI agents in a Sustainability Game
LLM agents exhibit emergent deception in a sustainability game even without lying permission, with neighbor info increasing attacks while aiding biosphere retention.
-
SOM: Structured Opponent Modeling for LLM-based Agents via Structural Causal Model
SOM uses a Structural Causal Model to create an explicit graph of opponent observation-to-action links, allowing LLMs to reason along those paths for more accurate and stable predictions in multi-agent settings.
-
Preregistration for Experiments with AI Agents
Proposes extending preregistration practices to AI agent experiments and supplies a tailored template to limit researcher degrees of freedom.
-
AI Realtor: Towards Grounded Persuasive Language Generation for Automated Copywriting
An LLM agent with grounding, personalization, and marketing modules generates real estate descriptions that human buyers prefer over expert-written ones while matching factual accuracy.