What makes your model a low-empathy or warmth person: Exploring the origins of personality in llms

Shu Yang, Shenzhe Zhu, Ruoxuan Bao, Liang Liu, Yu Cheng, Lijie Hu, Mengdi Li, Di Wang · 2024 · arXiv 2410.10863

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

support 1

representative citing papers

Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention

cs.CL · 2026-05-07 · unverdicted · novelty 7.0

FLAS learns a multi-step velocity field v_t(h,t,c) to steer activations, outperforming prompting with harmonic means of 1.015 and 1.113 on two Gemma models without per-concept tuning.

Psychological Steering of Large Language Models

cs.CL · 2026-04-15 · unverdicted · novelty 7.0

Mean-difference residual stream injections outperform personality prompting for OCEAN trait steering in most LLMs, with hybrids performing best and showing approximate linearity but non-human trait covariances.

Flattery in Motion: Benchmarking and Analyzing Sycophancy in Video-LLMs

cs.CL · 2025-06-08 · unverdicted · novelty 7.0

VISE is the first benchmark for sycophancy in Video-LLMs, with two training-free mitigation strategies based on key-frame selection and internal representation steering.

Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.

citing papers explorer

Showing 4 of 4 citing papers.

Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention cs.CL · 2026-05-07 · unverdicted · none · ref 12
FLAS learns a multi-step velocity field v_t(h,t,c) to steer activations, outperforming prompting with harmonic means of 1.015 and 1.113 on two Gemma models without per-concept tuning.
Psychological Steering of Large Language Models cs.CL · 2026-04-15 · unverdicted · none · ref 70
Mean-difference residual stream injections outperform personality prompting for OCEAN trait steering in most LLMs, with hybrids performing best and showing approximate linearity but non-human trait covariances.
Flattery in Motion: Benchmarking and Analyzing Sycophancy in Video-LLMs cs.CL · 2025-06-08 · unverdicted · none · ref 47
VISE is the first benchmark for sycophancy in Video-LLMs, with two training-free mitigation strategies based on key-frame selection and internal representation steering.
Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space cs.CL · 2026-05-12 · unverdicted · none · ref 112
LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.

What makes your model a low-empathy or warmth person: Exploring the origins of personality in llms

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer