VerifySteer selectively steers hidden states at paragraph boundaries using latent correctness signals to control verifier strictness and outperform baselines on ProcessBench and Hard2Verify with lower compute.
Reza Bayat, Ali Rahimi-Kalahroudi, Mohammad Pezeshki, Sarath Chandar, and Pascal Vincent
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Answer tokens show forward drift and key-anchor focus when reading correct reasoning traces; a geometric-plus-semantic SRQ steering method boosts quantitative reasoning accuracy without training.
CLAS dynamically adapts linear activation steering strengths to context, outperforming fixed-strength steering and matching or exceeding ReFT and LoRA on eleven benchmarks across four model families with limited labeled data.
The Master Key Hypothesis states that capabilities are low-dimensional directions transferable across models through linear subspace alignment, with UNLOCK demonstrating gains such as 12.1% accuracy improvement on MATH when transferring CoT from 14B to 7B models.
PAS automates activation steering for LLMs using labeled data to improve behavior control on tasks like bias and alignment, with gains over ICL and SFT but limited effect on intelligence tasks.
citing papers explorer
-
The Hidden Signal of Verifier Strictness: Controlling and Improving Step-Wise Verification via Selective Latent Steering
VerifySteer selectively steers hidden states at paragraph boundaries using latent correctness signals to control verifier strictness and outperform baselines on ProcessBench and Hard2Verify with lower compute.
-
How Do Answer Tokens Read Reasoning Traces? Self-Reading Patterns in Thinking LLMs for Quantitative Reasoning
Answer tokens show forward drift and key-anchor focus when reading correct reasoning traces; a geometric-plus-semantic SRQ steering method boosts quantitative reasoning accuracy without training.
-
Contextual Linear Activation Steering of Language Models
CLAS dynamically adapts linear activation steering strengths to context, outperforming fixed-strength steering and matching or exceeding ReFT and LoRA on eleven benchmarks across four model families with limited labeled data.
-
The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment
The Master Key Hypothesis states that capabilities are low-dimensional directions transferable across models through linear subspace alignment, with UNLOCK demonstrating gains such as 12.1% accuracy improvement on MATH when transferring CoT from 14B to 7B models.
-
Painless Activation Steering: An Automated, Lightweight Approach for Post-Training Large Language Models
PAS automates activation steering for LLMs using labeled data to improve behavior control on tasks like bias and alignment, with gains over ICL and SFT but limited effect on intelligence tasks.