Language models as agent models

Jacob Andreas · 2022 · DOI 10.18653/v1/2022.findings-emnlp.423

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

Do Matching Mechanisms Work with LLM Agents?

cs.GT · 2026-06-02 · unverdicted · novelty 6.0

Centralized matching mechanisms outperform free negotiation in stability and efficiency with LLM agents, who also report preferences truthfully more often than humans, though not always in line with strategy-proofness predictions.

Emergent alignment and the projectability of ethical personas

cs.AI · 2026-06-08 · unverdicted · novelty 4.0

Narrow constitutional finetuning on safety sub-tasks induces emergent alignment across broader safety domains and yields projectable ethical personas whose signatures can be measured with a multidimensional diagnostic.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Emergent alignment and the projectability of ethical personas cs.AI · 2026-06-08 · unverdicted · none · ref 3
Narrow constitutional finetuning on safety sub-tasks induces emergent alignment across broader safety domains and yields projectable ethical personas whose signatures can be measured with a multidimensional diagnostic.

Language models as agent models

fields

years

verdicts

representative citing papers

citing papers explorer