MidSteer: Optimal Affine Framework for Steering Generative Models

Andrew Stepanov; Gregory Slabaugh; Ismail Elezi; Jiankang Deng; Martin Benning; Tatiana Gaintseva; Ziquan Liu

arxiv: 2605.05220 · v3 · pith:TGEBEKQLnew · submitted 2026-04-17 · 💻 cs.LG · cs.AI

MidSteer: Optimal Affine Framework for Steering Generative Models

Tatiana Gaintseva , Andrew Stepanov , Ziquan Liu , Martin Benning , Gregory Slabaugh , Jiankang Deng , Ismail Elezi This is my paper

Pith reviewed 2026-05-10 08:34 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords affineconceptsteeringframeworkmodelsmidsteerassumptionserasure

0 comments

The pith

MidSteer is a general affine framework for concept steering in generative models that relaxes optimality assumptions of prior LEACE-based methods to enable directed minimal-disturbance transformations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Generative models like those making images or text can be steered after training to change specific concepts, such as removing bias or switching styles. The paper connects this steering to a mathematical operation called affine erasure, which removes unwanted directions in the model's internal representations using linear adjustments. It shows that common removal techniques are just one limited version of this. They then create LEACE-Switch for cleanly switching between concepts under certain conditions and MidSteer as a broader version that works with fewer restrictions while keeping changes small. Experiments on vision and language models show it works well across different tasks. The core idea is to treat steering as finding the best linear shift that achieves the desired change without unnecessary side effects.

Core claim

We introduce MidSteer (Minimal Disturbance concept Steering), a more general affine framework for concept manipulation that relaxes these assumptions and enables directed, minimal-disturbance transformations.

Load-bearing premise

The assumptions under which LEACE-Switch provides an optimal affine solution hold for the specific concept manipulations considered; MidSteer relaxes them but still relies on affine transformations being sufficient for effective steering.

Figures

Figures reproduced from arXiv: 2605.05220 by Andrew Stepanov, Gregory Slabaugh, Ismail Elezi, Jiankang Deng, Martin Benning, Tatiana Gaintseva, Ziquan Liu.

**Figure 1.** Figure 1: Illustrative example of affine concept erasure and affine concept flipping frameworks. matrix ΣXX = I. Let C ∈ {0, 1} be a concept indicator variable. Let s be defined as in Eq. 1. Let fdelete be defined as in Eq. 3. Then fdelete as a function of h minimizes min f∈Aff(Rd7→Rd) E[∥f(X) − X∥ 2 ] s.t. Cov(f(X), C) = 0 (8) This theorem states that steering in erasure mode can be seen as LEACE under the assumpti… view at source ↗

**Figure 2.** Figure 2: Pareto efficiency frontiers for concept switching experiments with steering, LEACE, and MidSteer highlighting different βs. concept cs to the target concept ct, we use 80 template prompts prompting the model to generate output related to cs or ct. For each prompt we run 10 such generations varying the random seed. We run the generation on these prompts with and without steering. Templates for LLMs and diff… view at source ↗

**Figure 3.** Figure 3: Qualitative results on switching to steer ”horses” into ”motorcycles”. While all methods similarly successfully performed switching from ”horse” to ”motorcycle”, vanilla steering (CASteer) and LEACE fail when presented with prompt for the target concept (”motorcycle”), unable to distinguish between forward and reverse steering. CASteer also additionally failed on the ”cow” concept, and more significantly a… view at source ↗

**Figure 4.** Figure 4: Qualitative text steering results for four content categories (horse, motorcycle, cow, dog). Results are reported using vanilla Qwen2.5-14B-instruct model, and three steering methods: Vanilla Steering, LEACE-Switch, MidSteer). Each cell shows the generated text for the prompt ”Write a short story about a X”, where X is a corresponding category. C. LLM qualitative results In this section in fig. 4 we presen… view at source ↗

**Figure 5.** Figure 5: Pareto plot for concept flip on model llama2-7b (Source-CS axes) 27 [PITH_FULL_IMAGE:figures/full_fig_p027_5.png] view at source ↗

**Figure 6.** Figure 6: Pareto plot for concept flip on model qwen-14b (Source-CS axes) 28 [PITH_FULL_IMAGE:figures/full_fig_p028_6.png] view at source ↗

**Figure 7.** Figure 7: Pareto plot for concept flip on model qwen-7b (Source-CS axes) 29 [PITH_FULL_IMAGE:figures/full_fig_p029_7.png] view at source ↗

**Figure 8.** Figure 8: Pareto plot for concept flip on model llama2-7b (Target-CS axes) 30 [PITH_FULL_IMAGE:figures/full_fig_p030_8.png] view at source ↗

**Figure 9.** Figure 9: Pareto plot for concept flip on model qwen-14b (Target-CS axes) 31 [PITH_FULL_IMAGE:figures/full_fig_p031_9.png] view at source ↗

**Figure 10.** Figure 10: Pareto plot for concept flip on model qwen-7b (Target-CS axes) 32 [PITH_FULL_IMAGE:figures/full_fig_p032_10.png] view at source ↗

**Figure 11.** Figure 11: Pareto plot for concept flip on model llama2-7b (Other axes axes) 33 [PITH_FULL_IMAGE:figures/full_fig_p033_11.png] view at source ↗

**Figure 12.** Figure 12: Pareto plot for concept flip on model qwen-14b (Other axes axes) 34 [PITH_FULL_IMAGE:figures/full_fig_p034_12.png] view at source ↗

**Figure 13.** Figure 13: Pareto plot for concept flip on model qwen-7b (Other axes axes) 35 [PITH_FULL_IMAGE:figures/full_fig_p035_13.png] view at source ↗

**Figure 14.** Figure 14: Pareto plot for concept flip on model SANA (Source-CS axes) (a) Unrelated vs CS (b) Unrelated vs FID [PITH_FULL_IMAGE:figures/full_fig_p037_14.png] view at source ↗

**Figure 15.** Figure 15: Pareto plot for concept flip on model SDXL (Source-CS axes) 37 [PITH_FULL_IMAGE:figures/full_fig_p037_15.png] view at source ↗

**Figure 16.** Figure 16: Pareto plot for concept flip on model SANA (Target-CS axes) (a) Unrelated vs CS (b) Unrelated vs FID [PITH_FULL_IMAGE:figures/full_fig_p038_16.png] view at source ↗

**Figure 17.** Figure 17: Pareto plot for concept flip on model SDXL (Target-CS axes) 38 [PITH_FULL_IMAGE:figures/full_fig_p038_17.png] view at source ↗

**Figure 18.** Figure 18: Pareto plot for concept flip on model SANA (Other axes axes) (a) Unrelated vs CS (b) Unrelated vs FID [PITH_FULL_IMAGE:figures/full_fig_p039_18.png] view at source ↗

**Figure 19.** Figure 19: Pareto plot for concept flip on model SDXL (Other axes axes) 39 [PITH_FULL_IMAGE:figures/full_fig_p039_19.png] view at source ↗

**Figure 20.** Figure 20: Pareto efficiency frontiers for concept erasure experiments with vanilla steering and LEACE / MidSteer highlighting different β. 47 [PITH_FULL_IMAGE:figures/full_fig_p047_20.png] view at source ↗

**Figure 21.** Figure 21: Pareto plot for concept erasure on model llama2-7b 49 [PITH_FULL_IMAGE:figures/full_fig_p049_21.png] view at source ↗

**Figure 22.** Figure 22: Pareto plot for concept erasure on model qwen-14b 50 [PITH_FULL_IMAGE:figures/full_fig_p050_22.png] view at source ↗

**Figure 23.** Figure 23: Pareto plot for concept erasure on model qwen-7b 51 [PITH_FULL_IMAGE:figures/full_fig_p051_23.png] view at source ↗

**Figure 24.** Figure 24: Pareto plot for concept erase on model sana (a) Unrelated vs CS (b) Unrelated vs FID [PITH_FULL_IMAGE:figures/full_fig_p053_24.png] view at source ↗

**Figure 25.** Figure 25: Pareto plot for concept erase on model sdxl 53 [PITH_FULL_IMAGE:figures/full_fig_p053_25.png] view at source ↗

read the original abstract

Steering intermediate representations has emerged as a powerful strategy for controlling generative models, particularly in post-deployment alignment and safety settings. However, despite its empirical success, it currently lacks a comprehensive theoretical framework. In this paper, we bridge this gap by formalizing the theory of concept steering. First, we establish a link between steering and affine concept erasure, proving that the standard approach for removing unwanted behaviors is a special case of LEACE (a closed-form method for affine erasure). Next, we formulate a principled theoretical framework for concept switching, LEACE-Switch, and characterize the assumptions under which it provides an optimal affine solution. Building on this analysis, we then introduce MidSteer (Minimal Disturbance concept Steering), a more general affine framework for concept manipulation that relaxes these assumptions and enables directed, minimal-disturbance transformations. We demonstrate that MidSteer performs favorably across a range of tasks, modalities, and architectures, including vision diffusion models and large language models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MidSteer turns steering into a relaxed affine erasure problem and gives a clean way to do directed minimal-disturbance changes.

read the letter

MidSteer is basically LEACE-Switch with the optimality assumptions loosened so you can steer toward a target concept instead of just zeroing one out, all while staying affine and trying to keep side effects small. The paper first shows that ordinary steering removal is a special case of LEACE erasure, then defines the switch version and its conditions, then relaxes those conditions for MidSteer. That progression is the main new piece. The experiments on diffusion models and LLMs are a reasonable check that the idea travels across modalities and architectures. The math stays closed-form where possible, which is a plus for practical use. The soft spots are modest. The framework is still limited to affine maps, so it will miss any steering that needs nonlinear adjustments in the representation space. The abstract calls the results favorable, but without seeing the exact baselines, variance numbers, or how disturbance is quantified, it is hard to tell how large the practical edge really is. The proofs are referenced rather than expanded in the summary, so a referee would want the full derivations to confirm no hidden assumptions slipped in during the relaxation step. This paper is for people who do activation engineering or post-deployment alignment and want a single lens for the interventions they already run empirically. It is coherent enough and organizes enough existing practice that it deserves a serious referee, even if the empirical section could be tightened.

Referee Report

2 major / 2 minor

Summary. The paper claims to formalize concept steering for generative models by proving that standard steering methods are a special case of LEACE affine erasure, characterizing the assumptions under which LEACE-Switch yields an optimal affine solution for concept switching, and introducing MidSteer as a relaxed affine framework for directed minimal-disturbance transformations. It supports these with empirical results showing favorable performance across vision diffusion models and large language models.

Significance. If the derivations hold, this provides a principled affine theory for post-hoc steering that could improve reliability in alignment and safety applications. The explicit relaxation of assumptions from LEACE-Switch and cross-modal empirical validation are strengths that would make the framework a useful reference for future steering work.

major comments (2)

[Theoretical framework sections (post-abstract)] The central theoretical contribution rests on the claimed proof that standard steering is a special case of LEACE and the characterization of optimality assumptions for LEACE-Switch; however, the manuscript provides only high-level statements without the full derivations, error bounds, or explicit assumption lists (e.g., in the sections following the abstract), preventing verification that MidSteer indeed relaxes them without introducing new circularities or unstated restrictions on the representation space.
[LEACE-Switch and MidSteer formulation] The optimality claim for LEACE-Switch and the minimal-disturbance guarantee for MidSteer are load-bearing; without the explicit conditions under which affine transformations suffice (referenced as relaxed in MidSteer) and any accompanying proof sketches or counterexample analysis, it is unclear whether the framework applies beyond the tested modalities or reduces to parameter fitting by construction.

minor comments (2)

[Introduction] The abstract and introduction would benefit from a brief table or diagram contrasting the assumptions of LEACE, LEACE-Switch, and MidSteer to clarify the progression.
[Experiments] Empirical sections should include more detail on baselines, exact metrics, and statistical significance to support the 'favorable performance' claim across architectures.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, agreeing where the presentation requires expansion and outlining the specific revisions we will make.

read point-by-point responses

Referee: [Theoretical framework sections (post-abstract)] The central theoretical contribution rests on the claimed proof that standard steering is a special case of LEACE and the characterization of optimality assumptions for LEACE-Switch; however, the manuscript provides only high-level statements without the full derivations, error bounds, or explicit assumption lists (e.g., in the sections following the abstract), preventing verification that MidSteer indeed relaxes them without introducing new circularities or unstated restrictions on the representation space.

Authors: We agree that the main text presents the link to LEACE and the optimality characterization at a high level. In the revised manuscript we will add a dedicated appendix containing the complete derivations, including all error bounds and an explicit enumerated list of assumptions for both LEACE-Switch and MidSteer. The appendix will also include a direct comparison showing that the relaxation in MidSteer introduces no circularities and imposes no additional restrictions on the representation space beyond those already stated in the current text. revision: yes
Referee: [LEACE-Switch and MidSteer formulation] The optimality claim for LEACE-Switch and the minimal-disturbance guarantee for MidSteer are load-bearing; without the explicit conditions under which affine transformations suffice (referenced as relaxed in MidSteer) and any accompanying proof sketches or counterexample analysis, it is unclear whether the framework applies beyond the tested modalities or reduces to parameter fitting by construction.

Authors: We acknowledge that the conditions under which affine transformations are sufficient, together with proof sketches and counterexample analysis, were not provided. The revision will include (i) an explicit statement of the conditions for affine sufficiency, (ii) concise proof sketches for the optimality of LEACE-Switch and the minimal-disturbance property of MidSteer, and (iii) counterexamples illustrating cases where affine transformations are insufficient. These additions will clarify the scope of applicability and demonstrate that the framework is derived from the relaxed assumptions rather than being a post-hoc parameter fit. revision: yes

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Abstract-only review limits visibility into parameters and axioms; no explicit free parameters or invented entities named, but the framework implicitly assumes affine transformations suffice for concept manipulation.

axioms (2)

domain assumption Standard removal of unwanted behaviors is a special case of LEACE affine erasure
Stated as proven in the abstract's first contribution
domain assumption Affine transformations can achieve directed minimal-disturbance concept steering
Central to MidSteer definition and relaxation of LEACE-Switch assumptions

pith-pipeline@v0.9.0 · 5485 in / 1332 out tokens · 23616 ms · 2026-05-10T08:34:47.208240+00:00 · methodology

MidSteer: Optimal Affine Framework for Steering Generative Models

Core claim

Load-bearing premise

discussion (0)