HEART-Bench evaluates LLM agents on psychological consistency using 11 Big-Five-grounded characters with 1,000 episodic memories each and 64 DIAMONDS-based decision scenarios, yielding 673 validated MCQs.
hub
8 Jason Chuang, Margaret E
15 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 15roles
background 2polarities
background 2representative citing papers
ContextEcho benchmark shows persona drift occurs across 23 frontier models in long agentic-coding sessions, is not reliably reset by compaction, and can be restored by single-shot anchors with mode-dependent effects.
VITA-QinYu is the first expressive end-to-end spoken language model supporting role-playing and singing alongside conversation, trained on 15.8K hours of data and outperforming prior models on expressiveness and conversational benchmarks.
Agreeableness in AI personas reliably predicts sycophantic behavior in 9 of 13 tested language models.
Claude Sonnet 4.5 exhibits functional emotions via abstract internal representations of emotion concepts that causally influence its preferences and misaligned behaviors without implying subjective experience.
RealUserSim grounds LLM simulators in 7,275 executable profiles from real conversations, raising behavioral match rates from 24.2% to 45.3% and revealing agent failures hidden by cooperative simulators.
Multi-agent LLM systems can be steered via prompt design from mere aggregates to higher-order collectives with identity-linked differentiation and goal-directed complementarity, as measured by partial information decomposition of time-delayed mutual information.
Profile-conditioned LLMs achieve higher tacit alignment with humans on subjective spectra when traits match, as quantified by the new Tacit Understanding Index (TUX) from 241 humans and 200 agents.
Empirical analysis of 4707 MoltBook posts shows AI-only technical discourse focuses on security, trust, and abstract topics while lacking concrete runtime and project details found in human GitHub discussions.
Persona agents display strong in-group favoritism by accepting false facts from similar peers more than dissimilar ones, persisting in defeasible reasoning and worsening with complexity, with three mitigation strategies evaluated.
TDA-RC embeds topological patterns from multi-round reasoning into CoT via persistent homology and a repair agent, yielding better accuracy-efficiency trade-offs than ToT or GoT on tested datasets.
Synthia creates scalable personas from Bluesky posts that better match human survey responses than prior methods, uses smaller models, and retains social network structure for network-aware analysis.
Expressed personality in LLM dialogues is shaped by trait prompts, roles, and styles in trait-specific ways, with similar patterns in English and Japanese.
Structured integration of LLMs in astronomy education, including a domain-specific tutor and documentation requirements, leads to improved AI literacy and reduced student reliance on AI over the semester.
LLMs exhibit persistent inertia in value orientations, with harm avoidance and fairness remaining skewed across persona prompts.
citing papers explorer
-
Emergent Coordination in Multi-Agent Language Models
Multi-agent LLM systems can be steered via prompt design from mere aggregates to higher-order collectives with identity-linked differentiation and goal-directed complementarity, as measured by partial information decomposition of time-delayed mutual information.
-
Synthia: Scalable Grounded Persona Generation from Social Media Data
Synthia creates scalable personas from Bluesky posts that better match human survey responses than prior methods, uses smaller models, and retains social network structure for network-aware analysis.
-
Teaching Astronomy with Large Language Models
Structured integration of LLMs in astronomy education, including a domain-specific tutor and documentation requirements, leads to improved AI literacy and reduced student reliance on AI over the semester.