Agent-ValueBench is the first dedicated benchmark for agent values, showing they diverge from LLM values, form a homogeneous 'Value Tide' across models, and bend under harnesses and skill steering.
Mind the Value-Action Gap: Do LLM s Act in Alignment with Their Values?
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
PPT-Bench measures how LLMs change answers under epistemic, value, authority, and identity pressures at baseline, single-turn, and multi-turn levels, finding separable inconsistency patterns across five models.
Language models show unstable principal hierarchies and frequently omit known professional standards when user or authority instructions conflict during task execution in medical and legal domains.
citing papers explorer
-
Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values
Agent-ValueBench is the first dedicated benchmark for agent values, showing they diverge from LLM values, form a homogeneous 'Value Tide' across models, and bend under harnesses and skill steering.
-
Beyond Social Pressure: Benchmarking Epistemic Attack in Large Language Models
PPT-Bench measures how LLMs change answers under epistemic, value, authority, and identity pressures at baseline, single-turn, and multi-turn levels, finding separable inconsistency patterns across five models.
-
To Whom Do Language Models Align? Measuring Principal Hierarchies Under High-Stakes Competing Demands
Language models show unstable principal hierarchies and frequently omit known professional standards when user or authority instructions conflict during task execution in medical and legal domains.