pith. sign in

Angular steering: Behavior control via rotation in activation space.arXiv preprint arXiv:2510.26243

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

years

2026 5

roles

background 1

polarities

background 1

representative citing papers

Psychological Steering of Large Language Models

cs.CL · 2026-04-15 · unverdicted · novelty 7.0

Mean-difference residual stream injections outperform personality prompting for OCEAN trait steering in most LLMs, with hybrids performing best and showing approximate linearity but non-human trait covariances.

Adaptive Probe-based Steering for Robust LLM Jailbreaking

cs.CR · 2026-05-19 · unverdicted · novelty 5.0

Adaptive probe-based steering guided by model extraction and activation statistics improves LLM jailbreak success rates from 6% to 70% average harmfulness without extra contrastive prompts or manual tuning.

citing papers explorer

Showing 5 of 5 citing papers.