FLAS learns a multi-step velocity field v_t(h,t,c) to steer activations, outperforming prompting with harmonic means of 1.015 and 1.113 on two Gemma models without per-concept tuning.
What makes your model a low-empathy or warmth person: Exploring the origins of personality in llms
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CL 4verdicts
UNVERDICTED 4roles
background 1polarities
support 1representative citing papers
Mean-difference residual stream injections outperform personality prompting for OCEAN trait steering in most LLMs, with hybrids performing best and showing approximate linearity but non-human trait covariances.
VISE is the first benchmark for sycophancy in Video-LLMs, with two training-free mitigation strategies based on key-frame selection and internal representation steering.
LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.
citing papers explorer
-
Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention
FLAS learns a multi-step velocity field v_t(h,t,c) to steer activations, outperforming prompting with harmonic means of 1.015 and 1.113 on two Gemma models without per-concept tuning.
-
Psychological Steering of Large Language Models
Mean-difference residual stream injections outperform personality prompting for OCEAN trait steering in most LLMs, with hybrids performing best and showing approximate linearity but non-human trait covariances.
-
Flattery in Motion: Benchmarking and Analyzing Sycophancy in Video-LLMs
VISE is the first benchmark for sycophancy in Video-LLMs, with two training-free mitigation strategies based on key-frame selection and internal representation steering.
-
Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space
LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.