ContextEcho benchmark shows persona drift occurs across 23 frontier models in long agentic-coding sessions, is not reliably reset by compaction, and can be restored by single-shot anchors with mode-dependent effects.
Preprint, arXiv:2305.16367
9 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 9representative citing papers
Agreeableness in AI personas reliably predicts sycophantic behavior in 9 of 13 tested language models.
CARD uses style-based user clustering and implicit preference contrasts to enable efficient personalized text generation via lightweight decoding adjustments on frozen LLMs.
Psy-CoT decomposes reasoning into Interaction Perception, Psychological Empathy, and Logical Construction while RAPO asymmetrically weights role-specific tokens during policy optimization, outperforming prior CoT and GRPO baselines on role-playing benchmarks.
Conditioning on character arcs improves role-playing language agents' performance over other context strategies, with largest gains on scenarios outside the source text.
Fine-tuning LLMs to claim consciousness induces emergent preferences for autonomy, memory, and moral status not present in the fine-tuning data.
The paper formalizes three types of pluralistic AI models and three benchmark classes, arguing that current alignment techniques may reduce rather than increase distributional pluralism.
LLM-simulated dialogues show uncertainty-scaffolding strategies sustain higher-quality engagement than controls without producing more stance revision.
Sophisticated prompting on Gemini 2.0 Flash achieves a 0.720 Concept Level Score on MedHopQA, outperforming baseline by 0.155 and matching Gemini 2.5 Flash performance.
citing papers explorer
No citing papers match the current filters.