The paper presents Proactive Availability Backdoor (PAB) attacks on LLMs that achieve 73.1% effective success rate by proactively inducing users via suggestions in a Five-Factor Model simulation.
hub
arXiv preprint arXiv:2307.00184 (2023)
19 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
HEART-Bench evaluates LLM agents on psychological consistency using 11 Big-Five-grounded characters with 1,000 episodic memories each and 64 DIAMONDS-based decision scenarios, yielding 673 validated MCQs.
ActTraitBench is a human-grounded benchmark using psychometric-to-behavior mappings and quantile calibration that reveals pervasive knowledge-decision gaps in 14 LLMs, larger in capable models, with CoCA proposed as mitigation.
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
Agreeableness in AI personas reliably predicts sycophantic behavior in 9 of 13 tested language models.
GenPT applies generative projective testing to LLM agents and reports lower directional bias plus greater longitudinal sensitivity than self-report questionnaires.
Fine-tuning LLMs on essays reduces variance in IPIP-NEO responses across models but does not raise full five-trait profile accuracy above near-chance levels from unguided text.
A survey that introduces a taxonomy for LLM-based conversational user simulation, analyzes core techniques and evaluation methods, and identifies open challenges in the field.
Researchers rendered cognitive dissonance, self-consistency, and self-perception theories as generative simulations that reproduce classic experimental behavioral patterns after iterative manual stabilization.
A gamified system with multiple LLM agents of varied personalities gathers interaction data to produce more effective and interpretable Big Five personality assessments than single-context methods.
A survey of LLM-based autonomous agents that proposes a unified framework for their construction and reviews applications in social science, natural science, and engineering along with evaluation methods and future directions.
A survey proposing a three-pillar framework to evaluate LLMs as tools for measuring latent psychological constructs and reviewing applications in personality and mental health.
ELDER-SIM builds personality-stable elderly digital twins via LLM orchestration with OCEAN traits, Beck CBT diagrams, long-term memory, and LoRA fine-tuning on CHARLS data, validated by Cronbach's alpha 0.70-0.94 and ICC 0.85-0.96.
High agreeableness in LLM voice assistants increases older adults' empathy perceptions and real-time explanations outperform history-based ones, but personality does not affect perceived intelligence.
Medium personality expression in LLM agents yields the most positive user perceptions in goal-oriented tasks, further improved by trait alignment.
Applies sparse autoencoders to locate and steer latent features for OCEAN personality traits in LLMs while preserving benchmark performance.
A survey of emerging AI agent architectures that organizes single and multi-agent designs around reasoning, planning, tool use, communication, and reflection phases.
Impersonating complex misaligned personas via biographies and role-play bypasses safety in ChatGPT, Gemini, and Deepseek, succeeding on 38-40 out of 40 illicit questions across tested models.
citing papers explorer
-
The Invitation Trap: Proactive Availability Backdoor in LLMs via Conversational Induction
The paper presents Proactive Availability Backdoor (PAB) attacks on LLMs that achieve 73.1% effective success rate by proactively inducing users via suggestions in a Five-Factor Model simulation.
-
HEART-Bench: Do LLM Agents Exhibit Human-like Psychology?
HEART-Bench evaluates LLM agents on psychological consistency using 11 Big-Five-grounded characters with 1,000 episodic memories each and 64 DIAMONDS-based decision scenarios, yielding 673 validated MCQs.
-
ActTraitBench: Quantifying the Knowledge-Decision Gap in Large Language Models via Human-Grounded Behavioral Validation
ActTraitBench is a human-grounded benchmark using psychometric-to-behavior mappings and quantile calibration that reveals pervasive knowledge-decision gaps in 14 LLMs, larger in capable models, with CoCA proposed as mitigation.
-
The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
-
Too Nice to Tell the Truth: Quantifying Agreeableness-Driven Sycophancy in Role-Playing Language Models
Agreeableness in AI personas reliably predicts sycophantic behavior in 9 of 13 tested language models.
-
GenPT: Beyond Self-Report for Reliable LLM Psychometrics via Generative Projective Testing
GenPT applies generative projective testing to LLM agents and reports lower directional bias plus greater longitudinal sensitivity than self-report questionnaires.
-
Evaluation Drift in LLM Personality Induction: Are We Moving the Goalpost?
Fine-tuning LLMs on essays reduces variance in IPIP-NEO responses across models but does not raise full five-trait profile accuracy above near-chance levels from unguided text.
-
A Survey on LLM-based Conversational User Simulation
A survey that introduces a taxonomy for LLM-based conversational user simulation, analyzes core techniques and evaluation methods, and identifies open challenges in the field.
-
Stabilising Generative Models of Attitude Change
Researchers rendered cognitive dissonance, self-consistency, and self-perception theories as generative simulations that reproduce classic experimental behavioral patterns after iterative manual stabilization.
-
Exploring a Gamified Personality Assessment Method through Interaction with LLM Agents Embodying Different Personalities
A gamified system with multiple LLM agents of varied personalities gathers interaction data to produce more effective and interpretable Big Five personality assessments than single-context methods.
-
A Survey on Large Language Model based Autonomous Agents
A survey of LLM-based autonomous agents that proposes a unified framework for their construction and reviews applications in social science, natural science, and engineering along with evaluation methods and future directions.
-
A Survey of Large Language Models for Perception and Measurement of Human Psychology
A survey proposing a three-pillar framework to evaluate LLMs as tools for measuring latent psychological constructs and reviewing applications in personality and mental health.
-
Elder-Sim: A Psychometrically Validated Platform for Personality-Stable Elderly Digital Twins
ELDER-SIM builds personality-stable elderly digital twins via LLM orchestration with OCEAN traits, Beck CBT diagrams, long-term memory, and LoRA fine-tuning on CHARLS data, validated by Cronbach's alpha 0.70-0.94 and ICC 0.85-0.96.
-
The Differential Effects of Agreeableness and Extraversion on Older Adults' Perceptions of Conversational AI Explanations in Assistive Settings
High agreeableness in LLM voice assistants increases older adults' empathy perceptions and real-time explanations outperform history-based ones, but personality does not affect perceived intelligence.
-
Vibe Check: Understanding the Effects of LLM-Based Conversational Agents' Personality and Alignment on User Perceptions in Goal-Oriented Tasks
Medium personality expression in LLM agents yields the most positive user perceptions in goal-oriented tasks, further improved by trait alignment.
-
Mechanistic Personality Analysis of LLMs Steering Personality via Latent Feature Interventions
Applies sparse autoencoders to locate and steer latent features for OCEAN personality traits in LLMs while preserving benchmark performance.
-
The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey
A survey of emerging AI agent architectures that organizes single and multi-agent designs around reasoning, planning, tool use, communication, and reflection phases.
-
Dr. Jekyll and Mr. Hyde: Two Faces of LLMs
Impersonating complex misaligned personas via biographies and role-play bypasses safety in ChatGPT, Gemini, and Deepseek, succeeding on 38-40 out of 40 illicit questions across tested models.
- Human Psychometric Questionnaires Mischaracterize LLM Behavior