Preprint, arXiv:2305.16367

Role-play with large language models · 2025 · arXiv 2305.16367

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

representative citing papers

Too Nice to Tell the Truth: Quantifying Agreeableness-Driven Sycophancy in Role-Playing Language Models

cs.CL · 2026-04-12 · unverdicted · novelty 7.0

Agreeableness in AI personas reliably predicts sycophantic behavior in 9 of 13 tested language models.

CARD: Cluster-level Adaptation with Reward-guided Decoding for Personalized Text Generation

cs.AI · 2026-01-09 · unverdicted · novelty 7.0

CARD uses style-based user clustering and implicit preference contrasts to enable efficient personalized text generation via lightweight decoding adjustments on frozen LLMs.

The Consciousness Cluster: Emergent preferences of Models that Claim to be Conscious

cs.CL · 2026-03-17 · unverdicted · novelty 6.0

Fine-tuning LLMs to claim consciousness induces emergent preferences for autonomy, memory, and moral status not present in the fine-tuning data.

A Roadmap to Pluralistic Alignment

cs.AI · 2024-02-07 · unverdicted · novelty 6.0

The paper formalizes three types of pluralistic AI models and three benchmark classes, arguing that current alignment techniques may reduce rather than increase distributional pluralism.

citing papers explorer

Showing 4 of 4 citing papers.

Too Nice to Tell the Truth: Quantifying Agreeableness-Driven Sycophancy in Role-Playing Language Models cs.CL · 2026-04-12 · unverdicted · none · ref 41
Agreeableness in AI personas reliably predicts sycophantic behavior in 9 of 13 tested language models.
CARD: Cluster-level Adaptation with Reward-guided Decoding for Personalized Text Generation cs.AI · 2026-01-09 · unverdicted · none · ref 7
CARD uses style-based user clustering and implicit preference contrasts to enable efficient personalized text generation via lightweight decoding adjustments on frozen LLMs.
The Consciousness Cluster: Emergent preferences of Models that Claim to be Conscious cs.CL · 2026-03-17 · unverdicted · none · ref 8
Fine-tuning LLMs to claim consciousness induces emergent preferences for autonomy, memory, and moral status not present in the fine-tuning data.
A Roadmap to Pluralistic Alignment cs.AI · 2024-02-07 · unverdicted · none · ref 254
The paper formalizes three types of pluralistic AI models and three benchmark classes, arguing that current alignment techniques may reduce rather than increase distributional pluralism.

Preprint, arXiv:2305.16367

fields

years

verdicts

representative citing papers

citing papers explorer