PPT-Bench measures how LLMs change answers under epistemic, value, authority, and identity pressures at baseline, single-turn, and multi-turn levels, finding separable inconsistency patterns across five models.
Paul Humphreys
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
CRAFT benchmark shows multi-agent coordination under partial information remains unsolved for current LLMs, with smaller open-weight models often matching or beating frontier systems.
citing papers explorer
-
Beyond Social Pressure: Benchmarking Epistemic Attack in Large Language Models
PPT-Bench measures how LLMs change answers under epistemic, value, authority, and identity pressures at baseline, single-turn, and multi-turn levels, finding separable inconsistency patterns across five models.
-
CRAFT: Grounded Multi-Agent Coordination Under Partial Information
CRAFT benchmark shows multi-agent coordination under partial information remains unsolved for current LLMs, with smaller open-weight models often matching or beating frontier systems.