The paper presents Proactive Availability Backdoor (PAB) attacks on LLMs that achieve 73.1% effective success rate by proactively inducing users via suggestions in a Five-Factor Model simulation.
hub
arXiv preprint arXiv:2307.00184 (2023)
19 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
HEART-Bench evaluates LLM agents on psychological consistency using 11 Big-Five-grounded characters with 1,000 episodic memories each and 64 DIAMONDS-based decision scenarios, yielding 673 validated MCQs.
ActTraitBench is a human-grounded benchmark using psychometric-to-behavior mappings and quantile calibration that reveals pervasive knowledge-decision gaps in 14 LLMs, larger in capable models, with CoCA proposed as mitigation.
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
Agreeableness in AI personas reliably predicts sycophantic behavior in 9 of 13 tested language models.
GenPT applies generative projective testing to LLM agents and reports lower directional bias plus greater longitudinal sensitivity than self-report questionnaires.
Fine-tuning LLMs on essays reduces variance in IPIP-NEO responses across models but does not raise full five-trait profile accuracy above near-chance levels from unguided text.
A survey that introduces a taxonomy for LLM-based conversational user simulation, analyzes core techniques and evaluation methods, and identifies open challenges in the field.
Researchers rendered cognitive dissonance, self-consistency, and self-perception theories as generative simulations that reproduce classic experimental behavioral patterns after iterative manual stabilization.
A gamified system with multiple LLM agents of varied personalities gathers interaction data to produce more effective and interpretable Big Five personality assessments than single-context methods.
A survey of LLM-based autonomous agents that proposes a unified framework for their construction and reviews applications in social science, natural science, and engineering along with evaluation methods and future directions.
A survey proposing a three-pillar framework to evaluate LLMs as tools for measuring latent psychological constructs and reviewing applications in personality and mental health.
ELDER-SIM builds personality-stable elderly digital twins via LLM orchestration with OCEAN traits, Beck CBT diagrams, long-term memory, and LoRA fine-tuning on CHARLS data, validated by Cronbach's alpha 0.70-0.94 and ICC 0.85-0.96.
High agreeableness in LLM voice assistants increases older adults' empathy perceptions and real-time explanations outperform history-based ones, but personality does not affect perceived intelligence.
Medium personality expression in LLM agents yields the most positive user perceptions in goal-oriented tasks, further improved by trait alignment.
Applies sparse autoencoders to locate and steer latent features for OCEAN personality traits in LLMs while preserving benchmark performance.
A survey of emerging AI agent architectures that organizes single and multi-agent designs around reasoning, planning, tool use, communication, and reflection phases.
Impersonating complex misaligned personas via biographies and role-play bypasses safety in ChatGPT, Gemini, and Deepseek, succeeding on 38-40 out of 40 illicit questions across tested models.
citing papers explorer
-
GenPT: Beyond Self-Report for Reliable LLM Psychometrics via Generative Projective Testing
GenPT applies generative projective testing to LLM agents and reports lower directional bias plus greater longitudinal sensitivity than self-report questionnaires.