Introduces HumanStudy-Bench to evaluate LLM agents against 12 replicated human behavioral studies, finding agent design affects alignment more than model scale with polarized outcomes.
false consensus effect
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
A framework jointly models annotator-specific NLI labels and explanations using conditioned representations and two explainer architectures, improving predictive performance over baselines.
LLMs can be statistically superior to humans at estimating group-level judgments on subjective tasks because of their low variance and decoupled representation-processing biases.
On-demand runtime generation of persona-based agents can enable personalized multi-agent AI workflows beyond fixed hard-coded architectures.
citing papers explorer
-
Validated Hypotheses as a Lens for Human-Likeness Evaluation in AI Agents
Introduces HumanStudy-Bench to evaluate LLM agents against 12 replicated human behavioral studies, finding agent design affects alignment more than model scale with polarized outcomes.
-
Fine-Grained Perspectives: Modeling Explanations with Annotator-Specific Rationales
A framework jointly models annotator-specific NLI labels and explanations using conditioned representations and two explainer architectures, improving predictive performance over baselines.
-
From Fallback to Frontline: When Can LLMs be Superior Annotators of Human Perspectives?
LLMs can be statistically superior to humans at estimating group-level judgments on subjective tasks because of their low variance and decoupled representation-processing biases.
-
Building Persona-Based Agents On Demand: Tailoring Multi-Agent Workflows to User Needs
On-demand runtime generation of persona-based agents can enable personalized multi-agent AI workflows beyond fixed hard-coded architectures.