VerifySteer selectively steers hidden states at paragraph boundaries using latent correctness signals to control verifier strictness and outperform baselines on ProcessBench and Hard2Verify with lower compute.
Angular steering: Behavior control via rotation in activation space.arXiv preprint arXiv:2510.26243
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5roles
background 1polarities
background 1representative citing papers
Local linearity of LLM layers enables LQR-based closed-loop activation steering with theoretical tracking guarantees.
Mean-difference residual stream injections outperform personality prompting for OCEAN trait steering in most LLMs, with hybrids performing best and showing approximate linearity but non-human trait covariances.
Edge-based circuits in vision transformers can be automatically recovered to explain and steer model computations for classification and adversarial behaviors.
Adaptive probe-based steering guided by model extraction and activation statistics improves LLM jailbreak success rates from 6% to 70% average harmfulness without extra contrastive prompts or manual tuning.
citing papers explorer
-
The Hidden Signal of Verifier Strictness: Controlling and Improving Step-Wise Verification via Selective Latent Steering
VerifySteer selectively steers hidden states at paragraph boundaries using latent correctness signals to control verifier strictness and outperform baselines on ProcessBench and Hard2Verify with lower compute.
-
Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control
Local linearity of LLM layers enables LQR-based closed-loop activation steering with theoretical tracking guarantees.
-
Psychological Steering of Large Language Models
Mean-difference residual stream injections outperform personality prompting for OCEAN trait steering in most LLMs, with hybrids performing best and showing approximate linearity but non-human trait covariances.
-
Seeing Through Circuits: Faithful Mechanistic Interpretability for Vision Transformers
Edge-based circuits in vision transformers can be automatically recovered to explain and steer model computations for classification and adversarial behaviors.
-
Adaptive Probe-based Steering for Robust LLM Jailbreaking
Adaptive probe-based steering guided by model extraction and activation statistics improves LLM jailbreak success rates from 6% to 70% average harmfulness without extra contrastive prompts or manual tuning.