pith. sign in

arxiv: 2602.08169 · v2 · pith:JU2SVR7Wnew · submitted 2026-02-09 · 💻 cs.LG · cs.CL

Spherical Steering: Geometry-Aware Activation Rotation for Language Models

Pith reviewed 2026-05-21 13:53 UTC · model grok-4.3

classification 💻 cs.LG cs.CL
keywords spherical steeringactivation rotationinference time steeringlanguage modelsgeodesicrepresentation preservationmodel control
0
0 comments X

The pith

Rotating activations along geodesics steers language models more effectively than vector addition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Spherical Steering, a method for controlling language models during inference by rotating their hidden state activations along a geodesic on a sphere toward a desired direction. This approach aims to steer the model without changing the magnitude of the activations, which the authors argue prevents representation collapse and maintains the quality of open-ended text generation. In experiments, it outperforms traditional activation addition methods, achieving gains of around 10 percent on benchmarks such as TruthfulQA, COPA, and StoryCloze. A confidence gate is added to adjust the steering intensity depending on the model's uncertainty about the input. The work emphasizes that maintaining geometric consistency in the activation space provides a better primitive for precise control at inference time.

Core claim

Spherical Steering rotates activations along a geodesic toward a target direction rather than adding a fixed vector, preserving the norm of the hidden states and thereby improving steering performance on multiple-choice tasks while keeping open-ended generation quality intact.

What carries the argument

The geodesic rotation operation that moves the activation vector on the unit sphere in the direction of the target steering vector while keeping its length unchanged.

Load-bearing premise

Rotating the activation vector along the geodesic maintains the integrity of the information it carries better than simply adding a vector does, without causing problems in subsequent processing layers.

What would settle it

Measuring whether the performance advantage disappears if the addition baseline is also normalized to keep the same magnitude as the original activation.

read the original abstract

Inference-time steering offers a promising way to control language models (LMs) without retraining. However, standard approaches typically rely on activation addition, which inevitably alters the hidden-state magnitudes raising concerns about representation collapse and degraded open-ended generation. In this work, we explore Spherical Steering, a training-free primitive that resolves this trade-off through activation rotation. Rather than shifting activations with a fixed vector, our method rotates them along a geodesic toward a target direction, preserving signal integrity while steering toward the target concept. To further enhance adaptivity, we incorporate a confidence gate that dynamically modulates steering strength based on input uncertainty. Extensive experiments across multiple-choice benchmarks demonstrate that Spherical Steering significantly outperforms addition-based baselines (notably by +10% on TruthfulQA, COPA, and Storycloze), while simultaneously maintaining the model's general open-ended generation quality. This work highlights the value of geometric consistency, suggesting that norm-preserving rotation is a robust and effective primitive for precise inference-time control. The code is available at: https://github.com/chili-lab/Spherical-Steering.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes Spherical Steering, a training-free inference-time method for controlling language models by rotating hidden-state activations along a geodesic on the unit sphere toward a target direction, rather than adding a steering vector. This is intended to preserve activation norms and mitigate representation collapse. The approach incorporates a confidence gate that scales the rotation strength based on input uncertainty. Experiments on multiple-choice benchmarks are reported to yield gains of approximately +10% over addition-based baselines on TruthfulQA, COPA, and StoryCloze, while open-ended generation quality remains comparable to the unsteered model. Code is made available.

Significance. If the central claims hold after verification, the work is significant for offering a geometrically motivated primitive that directly addresses magnitude distortion in activation steering. The norm-preserving rotation and adaptive gate could provide a more robust alternative to vector addition for representation engineering. Code availability is a clear strength supporting reproducibility.

major comments (2)
  1. [Abstract] Abstract: The reported +10% gains on TruthfulQA, COPA, and StoryCloze are presented without any description of the experimental setup, including model and layer choices, how target directions were obtained, exact baselines, number of runs, or statistical tests. This absence is load-bearing because the performance advantage is the primary evidence offered for the superiority of geodesic rotation.
  2. [Method] Method (implied by abstract description): The central claim attributes gains and preserved generation quality to norm-preserving geodesic rotation. However, the method also introduces a confidence gate that dynamically modulates steering strength. No ablation is described that holds steering magnitude fixed while swapping rotation for addition (or vice versa), so it is unclear whether the geometric rotation itself is necessary for the results or whether the gate accounts for most of the lift.
minor comments (1)
  1. [Abstract] Abstract: The phrase 'extensive experiments across multiple-choice benchmarks' is vague; specifying the models, layers, and exact benchmark versions would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below, indicating where revisions have been made to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The reported +10% gains on TruthfulQA, COPA, and StoryCloze are presented without any description of the experimental setup, including model and layer choices, how target directions were obtained, exact baselines, number of runs, or statistical tests. This absence is load-bearing because the performance advantage is the primary evidence offered for the superiority of geodesic rotation.

    Authors: We agree that the abstract would benefit from additional context on the experimental setup to better support the reported gains. In the revised manuscript, we have expanded the abstract to briefly specify the models (Llama-2-7B and similar), the layers at which steering is applied, the method for obtaining target directions, and the primary baselines. Due to abstract length constraints, comprehensive details on the number of runs and statistical tests remain in the Experiments section, where they are now more prominently highlighted. revision: yes

  2. Referee: [Method] Method (implied by abstract description): The central claim attributes gains and preserved generation quality to norm-preserving geodesic rotation. However, the method also introduces a confidence gate that dynamically modulates steering strength. No ablation is described that holds steering magnitude fixed while swapping rotation for addition (or vice versa), so it is unclear whether the geometric rotation itself is necessary for the results or whether the gate accounts for most of the lift.

    Authors: The referee correctly notes that the confidence gate is an integral component of the proposed method. To clarify the contribution of the geodesic rotation, we have added an ablation study in the revised manuscript. This study compares (i) full Spherical Steering, (ii) rotation without the gate, (iii) addition-based steering with the gate, and (iv) addition without the gate, while controlling for overall steering magnitude. The results show that the rotation provides measurable improvements in both benchmark accuracy and preservation of open-ended generation quality beyond what the gate alone achieves, supporting the geometric motivation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method defined as independent geometric primitive

full rationale

The paper presents Spherical Steering as a training-free activation rotation along a geodesic, augmented by a confidence gate for dynamic scaling. No equations, derivations, or self-citations are shown that reduce the claimed +10% gains or norm preservation to a fitted quantity defined from the target data, a self-referential definition, or a load-bearing prior result by the same authors. The central proposal relies on geometric description and empirical benchmarks rather than any closed loop where the output is constructed from the inputs by definition. This is the most common honest non-finding for a methods paper introducing a new primitive.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the geometric assumption that activations can be usefully treated as directions on a sphere whose magnitudes should be preserved; no free parameters or new entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Activation vectors can be normalized and rotated along geodesics without destroying useful information in the representation
    This premise is required for the claim that rotation avoids representation collapse while still steering effectively.

pith-pipeline@v0.9.0 · 5714 in / 1206 out tokens · 45182 ms · 2026-05-21T13:53:24.099701+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. The Hidden Signal of Verifier Strictness: Controlling and Improving Step-Wise Verification via Selective Latent Steering

    cs.LG 2026-05 conditional novelty 7.0

    VerifySteer selectively steers hidden states at paragraph boundaries using latent correctness signals to control verifier strictness and outperform baselines on ProcessBench and Hard2Verify with lower compute.

  2. Conceptors for Semantic Steering

    cs.LG 2026-05 unverdicted novelty 6.0

    Conceptors as soft projection matrices from bipolar activations offer a multidimensional, compositional, and geometrically principled method for semantic steering in LLMs that outperforms single-vector baselines in mu...