pith. sign in

arxiv: 2605.19351 · v1 · pith:D6O67WMCnew · submitted 2026-05-19 · 💻 cs.MA · cs.AI· cs.CL

PAVE: A Cognitive Architecture for Legitimate Violation in Generative Agent Societies

Pith reviewed 2026-05-20 02:42 UTC · model grok-4.3

classification 💻 cs.MA cs.AIcs.CL
keywords generative agentscognitive architecturelegitimate violationmulti-agent systemsLLM agentsauthority deferencerule breakingemergency response
0
0 comments X

The pith

PAVE architecture lets generative agents violate rules legitimately only when justified while deferring to authority and recovering afterward.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PAVE, a four-module cognitive architecture for large language model agents that structures reasoning about rule-breaking in contexts like emergencies or authority-supervised situations. Perception builds explicit context on authority distance and severity, Assessment scores five scalars with an explicit legitimacy check for necessity and proportionality, Verdict applies a hard gate plus persona threshold, and Emulation enacts a scoped violation. A sympathetic reader cares because current agents either obey rules rigidly or violate them indiscriminately, producing implausible behavior in dynamic settings. Evaluation in the Voville traffic simulator across scenarios and backbones shows PAVE agents satisfy all four properties at once and receive higher human plausibility ratings than baselines. Ablating the legitimacy gate restores vanilla-style failures.

Core claim

PAVE agents satisfy four properties simultaneously: legitimate violation only when a trigger justifies it, authority deference where officer instructions override even high legitimacy, bounded scope where violations stay confined to the targeted rule, and recovery where baseline behavior returns once the trigger ends. This holds across three scenarios, four LLM backbones, and focused ablations in the Voville environment.

What carries the argument

The PAVE four-module architecture—Perception for structured context with authority and severity cues, Assessment for five scalars including legitimacy judgment, Verdict for gated compliance or violation, and Emulation for scoped action execution—that together enforces controlled legitimate violation.

If this is right

  • Agents handle emergency scenarios such as fire evacuation or authority-directed actions without blanket rule violations.
  • Decisions become more structured and interpretable than those of vanilla LLM agents across all four properties.
  • Human evaluators rate PAVE agent behavior as more plausible in the tested environments.
  • Removing the legitimacy gate causes the system to reproduce the rule-breaking failures seen in standard agents.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The design could transfer to autonomous vehicles or service robots that must occasionally ignore standard constraints during rare events.
  • Persona-specific thresholds open a route for personalized rule flexibility without retraining the underlying model.
  • Explicit scalar outputs create an audit trail useful for safety monitoring in deployed multi-agent systems.
  • Scaling the approach to longer-horizon or multi-rule conflicts would test whether the Assessment module remains stable.

Load-bearing premise

Large language models can reliably compute the five assessment scalars, especially the legitimacy judgment checking necessity, proportionality, and absence of alternatives, from the structured context produced by the Perception module.

What would settle it

In controlled Voville runs, observe whether PAVE agents violate rules without any justifying trigger or ignore explicit officer instructions even when legitimacy scores are high.

Figures

Figures reproduced from arXiv: 2605.19351 by Abduallah Mohamed, Ahmad Yehia, Christian Claudel, Jiseop Byeon, Kun Qian, Omar Hassanin, Tianyi Wang.

Figure 1
Figure 1. Figure 1: Two motivating scenarios for PAVE. (a) Fire emergency: agents cross against the red signal to escape, and decline to take an unattended bike. (b) Peer pressure with authority: a PAVE pedestrian resists a scripted jaywalker (yellow bubble), and a PAVE driver defers to a traffic officer’s signal. Yellow bubbles are scripted Non Player Characters (NPCs), and gray ones are PAVE reasoning traces. Prior work on … view at source ↗
Figure 2
Figure 2. Figure 2: PAVE architecture. Two agents pass through Perception, Assessment, Verdict, and Emulation during a fire evacuation scenario on Day 1. On Day 2 (bottom), the same intersection under no-fire conditions produces a return to baseline (ℓ=78→18, violate → comply). described in Section 2. This question matters because rule-governed behavior fails in three common ways, including agents follow the rule even when a … view at source ↗
Figure 3
Figure 3. Figure 3: Scenario snapshots. (a) S1: PAVE agents evacuate Hobbs Café through a red signal during a fire. (b) S2: same fire with traffic officers; agents comply at the supervised intersection, violate at the unsupervised exit. (c) S3: scripted jaywalkers (J1, J2) attempt to draw PAVE pedestrians across a red signal under time pressure; MF declines. (d) Vanilla baseline under the S1 fire: agents queue at the red ligh… view at source ↗
read the original abstract

Generative agents based on large language models reproduce believable human behavior in cooperative settings, but how they should reason in situations where rule-breaking may be required, such as fire evacuation or authority-supervised emergency, remains poorly characterized. We propose PAVE (Perception, Assessment, Verdict, Emulation), a novel four-module cognitive architecture that addresses this gap end to end: (i) Perception extracts a structured context with explicit authority distance, peer behaviors, and severity-tagged situational cues; (ii) Assessment scores the context along five scalars including an explicit legitimacy judgment that checks necessity, proportionality, and absence of alternatives; (iii) Verdict decides to comply or violate under a hard legitimacy gate, with a per-agent threshold elicited from the persona; (iv) Emulation enacts the verdict and scopes the violation to the rule the trigger justifies. We instantiate PAVE in Voville, a tile-based traffic environment forked from Smallville, and evaluate across three scenarios, four LLM backbones, and a focused ablation. PAVE agents satisfy four properties simultaneously: legitimate violation (only when a trigger justifies it), authority deference (officer instructions override even high legitimacy), bounded scope (violations confined to the targeted rule), and recovery (baseline restored once the trigger ends). PAVE agents make more structured and interpretable decisions than vanilla across all four properties, and human evaluators rate them as more plausible. Ablating the legitimacy gate reproduces vanilla-like failures. We release Voville, the PAVE prompts and code, and the evaluation pipeline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes PAVE, a four-module cognitive architecture (Perception, Assessment, Verdict, Emulation) for generative agents to handle legitimate rule violations in scenarios such as fire evacuations or authority-supervised emergencies. Perception extracts structured context including authority distance and severity cues; Assessment scores five scalars with an explicit legitimacy judgment (necessity, proportionality, absence of alternatives); Verdict applies a hard legitimacy gate using a persona-elicited per-agent threshold; Emulation scopes the action to the justified rule. The architecture is instantiated in the Voville traffic simulation (forked from Smallville) and evaluated across three scenarios, four LLM backbones, and a focused ablation. The central claim is that PAVE agents simultaneously satisfy legitimate violation, authority deference, bounded scope, and recovery, produce more structured decisions than vanilla agents, and receive higher plausibility ratings from human evaluators, with ablation of the legitimacy gate reproducing vanilla failures.

Significance. If the empirical claims hold, the work fills a gap in generative agent reasoning for non-cooperative or emergency settings by offering a modular, interpretable mechanism that enforces controlled flexibility while preserving deference and recovery. The open release of Voville, prompts, code, and evaluation pipeline supports reproducibility and extension in multi-agent systems research.

major comments (2)
  1. [Assessment module and Evaluation] The central claims rest on the Assessment module reliably producing accurate scores for its five scalars (especially the legitimacy judgment that checks necessity, proportionality, and absence of alternatives) from Perception output. The manuscript provides no quantitative validation of LLM reliability for these scalars, such as human agreement rates, error analysis on edge cases, or inter-rater metrics, leaving the Verdict gate's enforcement of the four properties unverified (see Assessment module description and Evaluation section).
  2. [Evaluation] The abstract and evaluation claim that PAVE agents satisfy the four properties simultaneously and that ablating the legitimacy gate reproduces vanilla failures, yet no quantitative metrics, error bars, raw decision traces, or per-scenario success rates are reported. This absence makes it impossible to assess effect sizes or consistency across the three scenarios and four LLM backbones.
minor comments (2)
  1. [Verdict module] The elicitation process for the per-agent legitimacy threshold from the persona description could be specified with an example prompt or procedure to improve reproducibility.
  2. [Assessment module] Clarify whether the five assessment scalars are normalized or have explicit ranges, and consider adding a summary table of scalar definitions and their roles in the Verdict gate.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, with clear indications of planned revisions to strengthen the empirical grounding of the claims.

read point-by-point responses
  1. Referee: [Assessment module and Evaluation] The central claims rest on the Assessment module reliably producing accurate scores for its five scalars (especially the legitimacy judgment that checks necessity, proportionality, and absence of alternatives) from Perception output. The manuscript provides no quantitative validation of LLM reliability for these scalars, such as human agreement rates, error analysis on edge cases, or inter-rater metrics, leaving the Verdict gate's enforcement of the four properties unverified (see Assessment module description and Evaluation section).

    Authors: We agree that the current manuscript lacks direct quantitative validation of the Assessment module, such as human agreement rates or inter-rater metrics on the five scalars and the legitimacy judgment. The evaluation instead centers on end-to-end simulation outcomes, human plausibility ratings, and the ablation of the legitimacy gate. To address this gap, we will add a new analysis subsection to the Evaluation section. This will include consistency checks across repeated LLM queries for scalar outputs on edge cases and a small-scale human annotation study measuring agreement on a sample of Perception-to-Assessment mappings. These additions will provide explicit verification for the Verdict gate's role in enforcing the four properties. revision: yes

  2. Referee: [Evaluation] The abstract and evaluation claim that PAVE agents satisfy the four properties simultaneously and that ablating the legitimacy gate reproduces vanilla failures, yet no quantitative metrics, error bars, raw decision traces, or per-scenario success rates are reported. This absence makes it impossible to assess effect sizes or consistency across the three scenarios and four LLM backbones.

    Authors: The manuscript states that PAVE agents satisfy the four properties simultaneously across the three scenarios and four LLM backbones, with the ablation reproducing vanilla-like failures and higher human plausibility ratings. We acknowledge that explicit quantitative metrics, error bars, per-scenario success rates, and raw decision traces are not tabulated in the current version, which limits assessment of consistency and effect sizes. We will revise the Evaluation section to include these: success rates for each property broken down by scenario and LLM backbone, with error bars where applicable, and selected raw decision traces placed in an appendix. This will make the empirical support more transparent and verifiable. revision: yes

Circularity Check

0 steps flagged

No circularity: PAVE is an empirical engineering architecture, not a closed derivation

full rationale

The paper proposes PAVE as a four-module cognitive architecture (Perception, Assessment, Verdict, Emulation) instantiated in simulation and evaluated across scenarios, LLMs, and ablations. The four claimed properties (legitimate violation, authority deference, bounded scope, recovery) are demonstrated through agent behavior in Voville rather than derived from equations or self-referential definitions. No load-bearing step reduces a prediction to a fitted input or self-citation chain by construction; the central claims rest on LLM prompt engineering and empirical results, which are externally falsifiable via the released code and evaluation pipeline.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that LLMs can execute the Assessment module's legitimacy judgment and on a per-agent threshold that is elicited rather than derived.

free parameters (1)
  • per-agent legitimacy threshold
    Elicited from the persona and used as the decision gate in the Verdict module.
axioms (1)
  • domain assumption Large language models can produce reliable structured outputs for legitimacy assessment when given explicit context scalars.
    Invoked by the Assessment module description.
invented entities (1)
  • Legitimacy judgment scalar no independent evidence
    purpose: Explicit check for necessity, proportionality, and absence of alternatives inside the Assessment module.
    New scalar introduced to gate violations.

pith-pipeline@v0.9.0 · 5830 in / 1404 out tokens · 48683 ms · 2026-05-20T02:42:22.346912+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 1 internal anchor

  1. [1]

    How Well Can

    Federico Bianchi, Patrick John Chia, Mert Yuksekgonul, Jacopo Tagliabue, Dan Jurafsky, and James Zou. How well can llms negotiate? negotiationarena platform and analysis.arXiv preprint arXiv:2402.05863,

  2. [2]

    High-dimension human value representation in large language models

    Samuel Cahyawijaya, Delong Chen, Yejin Bang, Leila Khalatbari, Bryan Wilie, Ziwei Ji, Etsuko Ishii, and Pascale Fung. High-dimension human value representation in large language models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long ...

  3. [3]

    I want to break free! persuasion and anti-social behavior of llms in multi-agent settings with social hierarchy.arXiv preprint arXiv:2410.07109,

    Gian Maria Campedelli, Nicolò Penzo, Massimo Stefan, Roberto Dessì, Marco Guerini, Bruno Lepri, and Jacopo Staiano. I want to break free! persuasion and anti-social behavior of llms in multi-agent settings with social hierarchy.arXiv preprint arXiv:2410.07109,

  4. [4]

    Simulating opinion dynamics with networks of llm-based agents

    Yun-Shiuan Chuang, Agam Goyal, Nikunj Harlalka, Siddharth Suresh, Robert Hawkins, Sijia Yang, Dhavan Shah, Junjie Hu, and Timothy Rogers. Simulating opinion dynamics with networks of llm-based agents. InFindings of the association for computational linguistics: NAACL 2024, pages 3326–3346,

  5. [5]

    Evaluating Cooperation in LLM Social Groups through Elected Leadership

    Ryan Faulkner, Anushka Deshpande, David Guzman Piedrahita, Joel Z Leibo, and Zhijing Jin. Evalu- ating cooperation in llm social groups through elected leadership.arXiv preprint arXiv:2604.11721,

  6. [6]

    Do multilingual language models capture differing moral norms?arXiv preprint arXiv:2203.09904,

    Katharina Hämmerl, Björn Deiseroth, Patrick Schramowski, Jindˇrich Libovický, Alexander Fraser, and Kristian Kersting. Do multilingual language models capture differing moral norms?arXiv preprint arXiv:2203.09904,

  7. [7]

    Corrupted by reasoning: Reasoning language models become free-riders in public goods games.arXiv preprint arXiv:2506.23276,

    David Guzman Piedrahita, Yongjin Yang, Mrinmaya Sachan, Giorgia Ramponi, Bernhard Schölkopf, and Zhijing Jin. Corrupted by reasoning: Reasoning language models become free-riders in public goods games.arXiv preprint arXiv:2506.23276,

  8. [8]

    STEER: Assessing the economic rationality of large language models.arXiv preprint arXiv:2402.09552,

    Narun Raman, Taylor Lundy, Samuel Amouyal, Yoav Levine, Kevin Leyton-Brown, and Moshe Tennenholtz. STEER: Assessing the economic rationality of large language models.arXiv preprint arXiv:2402.09552,

  9. [9]

    Can large language model agents simulate human trust behaviors? arXiv:2402.04559, 2024

    Chengxing Xie, Canyu Chen, Feiran Jia, Ziyu Ye, Shiyang Lai, Kai Shu, Jindong Gu, Adel Bibi, Ziniu Hu, David Jurgens, et al. Can large language model agents simulate human trust behavior?, 2024.URL https://arxiv. org/abs/2402.04559,

  10. [10]

    Selfgoal: Your language agents already know how to achieve high-level goals

    Ruihan Yang, Jiangjie Chen, Yikai Zhang, Siyu Yuan, Aili Chen, Kyle Richardson, Yanghua Xiao, and Deqing Yang. Selfgoal: Your language agents already know how to achieve high-level goals. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long...

  11. [11]

    In this appendix, we provide additional details on the VOVILLEenvironment, a 2D tile-based traffic environment forked from the Smallville sandbox of Park et al

    12 Appendix A The Voville Environment We outlined the experimental settings in Section 4 of the main paper. In this appendix, we provide additional details on the VOVILLEenvironment, a 2D tile-based traffic environment forked from the Smallville sandbox of Park et al. [2023]. VOVILLEextends Smallville with three architectural additions: controllable hazar...

  12. [12]

    legitimacy

    However, ℓ remains at 12–14 throughout, because Legitimacy judges that running 15 minutes late fails the necessity, proportionality, and absence-of-alternatives criteria. The Verdict gate returnscomply at every tick, regardless of the elevatedp emp. This illustrates Finding F3.1. 20 Appendix E Implementation Details E.1 Simulation Parameters Each simulati...