Closing the Loop: PID Feedback Control for Interpretable Activation Steering in Symbolic Music Generation

Ioannis Prokopiou; Maximos Kaliakatsos-Papakostas; Pantelis Vikatos; Themos Stafylakis; Theodoros Giannakopoulos

arxiv: 2606.18790 · v1 · pith:23ZEE75Knew · submitted 2026-06-17 · 💻 cs.SD · cs.AI· cs.LG

Closing the Loop: PID Feedback Control for Interpretable Activation Steering in Symbolic Music Generation

Ioannis Prokopiou , Pantelis Vikatos , Maximos Kaliakatsos-Papakostas , Theodoros Giannakopoulos , Themos Stafylakis This is my paper

Pith reviewed 2026-06-26 19:50 UTC · model grok-4.3

classification 💻 cs.SD cs.AIcs.LG

keywords activation steeringsymbolic musiclinear representation hypothesisgram-schmidtdifference-in-meansmusic transformerinterpretable AI

0 comments

The pith

Gram-Schmidt orthogonalization enables independent pitch and duration control via activation steering in music transformers

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that difference-in-means can isolate latent directions corresponding to pitch and duration in a music generation transformer. It introduces a dual steering method that orthogonalizes these directions using Gram-Schmidt to prevent one attribute from interfering with the other. This allows deterministic adjustments to the generated music at inference time. The results show better independence and less quality loss than simply adding the vectors together.

Core claim

Difference-in-means isolates usable directions for pitch and duration in the residual stream, validating the linear representation hypothesis. Gram-Schmidt orthogonalization in the dual steering framework then decouples these directions, which reduces conceptual interference and signal degradation relative to naive addition and permits independent deterministic control despite autoregressive conditioning.

What carries the argument

Dual Steering framework with Gram-Schmidt Orthogonalization, which geometrically decouples entangled steering vectors for different musical attributes.

Load-bearing premise

The Linear Representation Hypothesis holds in the MMT residual stream, allowing Difference-in-Means to isolate usable latent directions for Pitch and Duration.

What would settle it

Measuring the change in unintended attributes when steering an orthogonalized vector; if the interference remains as high as with naive addition, the benefit of the geometric decoupling would not hold.

Figures

Figures reproduced from arXiv: 2606.18790 by Ioannis Prokopiou, Maximos Kaliakatsos-Papakostas, Pantelis Vikatos, Themos Stafylakis, Theodoros Giannakopoulos.

**Figure 1.** Figure 1: The Top-K threshold failure. Top: Static cosine ramp vs. PID’s dynamic λ(t), which settles lower. Bottom: Static ramping zeros target features throughout the ramp; PID’s integral accumulation maintains non-zero activations from the onset. 1995). Recent adaptive methods—IDS (Vogels et al., 2025), DIRECTER (Kang & Kim, 2026), SVF (Li et al., 2026), SMITIN (Luo et al., 2025)—operate in dense settings where … view at source ↗

**Figure 2.** Figure 2: Temporal PID λ(t) trajectory (steer-up, mtarget=2.0). The controller overshoots briefly at threshold breach, then settles to λ ≈ 1.15—62% below static SAS’s λ=3.0. 4. Experiments We evaluate on the SOD corpus using contrastive sets (1,280 samples each, Section A) defined by the 20th/80th percentiles of pitch (≤60 vs. ≥67.6 semitones) and duration (≤6.5 vs. ≥14.5 ticks). SAEs are trained at each MMT layer … view at source ↗

read the original abstract

Transformer-based architectures have significantly advanced the generation of complex symbolic sequences, yet a significant gap remains in achieving fine-grained, interpretable control over discrete signal attributes. This paper investigates the mechanistic interpretability of the Multitrack Music Transformer (MMT) and proposes a framework for deterministic attribute modulation without retraining to bridge this gap via inference-time activation steering. Utilizing the Difference-in-Means (DiffMean) methodology, we isolate latent directions for signal attributes, specifically Pitch and Duration, within the residual stream. We validate the Linear Representation Hypothesis in this domain, achieving high correlation between steering magnitude and attribute shift. To address the inherent feature entanglement in multi-attribute steering, we introduce a Dual Steering framework utilizing Gram-Schmidt Orthogonalization. Experimental results demonstrate that this geometric decoupling reduces conceptual interference and signal degradation compared to naive vector addition, enabling independent deterministic control even against strong autoregressive conditioning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DiffMean plus Gram-Schmidt gives independent pitch and duration steering in MMT, but the abstract omits numbers and never mentions the PID control from the title.

read the letter

The main takeaway is that difference-in-means can extract usable directions for pitch and duration in the MMT residual stream, and Gram-Schmidt orthogonalization lets you combine them for independent control.

What the paper does is extend activation steering methods to symbolic music generation. The dual steering framework is the concrete new piece, and they show it cuts down on interference relative to just adding the vectors. The high correlation they report between steering magnitude and attribute shift gives some direct evidence that the directions are meaningful.

That lines up with the linear representation hypothesis holding in this model. The approach is simple and the geometric fix makes sense for reducing entanglement.

The title highlights PID feedback control, but the abstract does not describe any feedback mechanism or how PID is applied. It stays with the DiffMean extraction and orthogonalization steps. If the PID part is important, its absence from the summary is a gap.

The abstract also gives no quantitative details—no specific correlation values, no dataset sizes, no error bars, and no ablation comparisons. Without those, the strength of the reduced interference claim is difficult to judge.

This work targets researchers focused on controllable generation for music and creative applications. Readers looking for practical inference-time methods in transformers would find the framework description relevant.

The paper deserves peer review to evaluate the full experimental setup and whether the PID component actually closes the loop as the title suggests.

Referee Report

2 major / 2 minor

Summary. The manuscript investigates mechanistic interpretability of the Multitrack Music Transformer (MMT) for symbolic music generation. It applies Difference-in-Means to isolate latent directions for attributes such as Pitch and Duration in the residual stream, validates the Linear Representation Hypothesis via reported high correlation between steering magnitude and attribute shift, and introduces a Dual Steering framework with Gram-Schmidt orthogonalization to decouple entangled features. The approach claims to reduce conceptual interference and signal degradation relative to naive vector addition, enabling independent deterministic control under autoregressive conditioning; the title indicates incorporation of PID feedback control to achieve closed-loop modulation.

Significance. If the empirical claims hold with adequate validation, the work would demonstrate a training-free geometric method for fine-grained, interpretable attribute control in music transformers, extending activation steering techniques to a new domain while addressing multi-attribute entanglement.

major comments (2)

[Abstract] Abstract: the central claims of 'high correlation between steering magnitude and attribute shift' and 'reduces conceptual interference and signal degradation compared to naive vector addition' are asserted without any quantitative metrics, error bars, dataset sizes, ablation results, or statistical tests, preventing assessment of whether the evidence supports the claims of usability and superiority.
[Title, Abstract] Title vs. Abstract: the title highlights 'PID Feedback Control' as central to 'Closing the Loop,' yet the abstract describes only DiffMean extraction and Gram-Schmidt orthogonalization with no reference to PID, feedback, or closed-loop adjustment; this leaves unclear how (or whether) PID is used to modulate steering strength for deterministic targets.

minor comments (2)

Provide explicit dataset details, model layer(s) used for direction extraction, and the precise definition of the correlation metric in the methods section.
Clarify whether the orthogonalization is performed once or per generation step, and how the PID controller interfaces with the steering vectors (if present in the full text).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below and will revise the manuscript to improve clarity and completeness.

read point-by-point responses

Referee: [Abstract] Abstract: the central claims of 'high correlation between steering magnitude and attribute shift' and 'reduces conceptual interference and signal degradation compared to naive vector addition' are asserted without any quantitative metrics, error bars, dataset sizes, ablation results, or statistical tests, preventing assessment of whether the evidence supports the claims of usability and superiority.

Authors: We acknowledge that the abstract states these claims at a high level without supporting numbers. The body of the manuscript reports the relevant quantitative results (Pearson correlations between steering magnitude and attribute shift, ablation comparisons of interference metrics, and dataset details). To address the concern, we will revise the abstract to include the key quantitative findings and dataset sizes so that the strength of evidence is evident from the abstract alone. revision: yes
Referee: [Title, Abstract] Title vs. Abstract: the title highlights 'PID Feedback Control' as central to 'Closing the Loop,' yet the abstract describes only DiffMean extraction and Gram-Schmidt orthogonalization with no reference to PID, feedback, or closed-loop adjustment; this leaves unclear how (or whether) PID is used to modulate steering strength for deterministic targets.

Authors: The referee correctly identifies an inconsistency. The title foregrounds the PID component, yet the current abstract omits any mention of it. The manuscript does employ PID feedback to close the loop: the proportional-integral-derivative controller continuously adjusts the steering coefficient based on the error between the observed attribute value and the user-specified target, enabling deterministic control under autoregressive generation. We will revise the abstract to explicitly note the use of PID feedback for closed-loop modulation of steering strength. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The described pipeline applies the standard Difference-in-Means procedure to extract directions under the Linear Representation Hypothesis, followed by Gram-Schmidt orthogonalization for decoupling. Both are pre-existing techniques from the mechanistic interpretability literature and are not derived from or fitted to the paper's own target results. Validation proceeds via direct empirical measurement of correlation between steering magnitude and observed attribute shift, which constitutes an independent test rather than a definitional identity. No self-citations, uniqueness theorems, or ansatzes imported from prior author work are invoked as load-bearing premises in the supplied text. The derivation chain therefore remains self-contained against external methodological benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Ledger populated from abstract only; full paper may contain additional parameters or assumptions.

axioms (1)

domain assumption Linear Representation Hypothesis holds for Pitch and Duration attributes in the MMT residual stream
Invoked to justify isolating latent directions via Difference-in-Means methodology.

pith-pipeline@v0.9.1-grok · 5709 in / 1107 out tokens · 23910 ms · 2026-06-26T19:50:03.002469+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 5 linked inside Pith

[1]

Proceedings of the International Conference on Learning Representations (

Activation Steering with a Feedback Controller , author=. Proceedings of the International Conference on Learning Representations (
[2]

Proceedings of the

Multitrack Music Transformer , author=. Proceedings of the
[3]

Advances in Neural Information Processing Systems , year=

Activation Addition: Steering Language Models Without Optimization , author=. Advances in Neural Information Processing Systems , year=
[4]

Steering

Rimsky, Nina and Gabrieli, Nick and Schulz, Julian and Smith, Meg and Tong, Isabella and Hubinger, Evan , journal=. Steering
[5]

arXiv preprint arXiv:2406.11717 , year=

Refusal in Language Models Is Mediated by a Single Direction , author=. arXiv preprint arXiv:2406.11717 , year=

Pith/arXiv arXiv
[6]

Proceedings of the International Conference on Learning Representations (

Mean Activation Transport for Activation Steering , author=. Proceedings of the International Conference on Learning Representations (
[7]

arXiv preprint arXiv:2501.15148 , year=

Sparse Autoencoders for Scalable Concept Steering in Large Language Models , author=. arXiv preprint arXiv:2501.15148 , year=

arXiv
[8]

Transformer Circuits Thread , year=

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning , author=. Transformer Circuits Thread , year=
[9]

arXiv preprint arXiv:2309.08600 , year=

Sparse Autoencoders Find Highly Interpretable Features in Language Models , author=. arXiv preprint arXiv:2309.08600 , year=

Pith/arXiv arXiv
[10]

Transformer Circuits Thread , year=

Toy Models of Superposition , author=. Transformer Circuits Thread , year=
[11]

arXiv preprint arXiv:2311.03658 , year=

The Linear Representation Hypothesis and the Geometry of Large Language Models , author=. arXiv preprint arXiv:2311.03658 , year=

Pith/arXiv arXiv
[12]

Scaling Monosemanticity: Extracting Interpretable Features from

Templeton, Adly and Conerly, Tom and Marcus, Jonathan and Lindsey, Jack and Bricken, Trenton and Chen, Brian and Pearce, Adam and Citro, Craig and Ameisen, Emmanuel and Jones, Andy and others , journal=. Scaling Monosemanticity: Extracting Interpretable Features from
[13]

Advances in Neural Information Processing Systems , year=

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model , author=. Advances in Neural Information Processing Systems , year=
[14]

Proceedings of the International Society for Music Information Retrieval Conference (

A Database for Orchestration Research and Its Use for Timbre Interpolation , author=. Proceedings of the International Society for Music Information Retrieval Conference (
[15]

Dong, Hao-Wen and Chen, Ke and McAuley, Julian and Berg-Kirkpatrick, Taylor , booktitle=
[16]

Activation Patching for Controllable Generation in

Facchiano, Filippo and others , journal=. Activation Patching for Controllable Generation in
[17]

arXiv preprint , year=

Steering Music Generation with Activation Engineering , author=. arXiv preprint , year=
[18]

Wu, Yinghao Aaron and others , journal=. Fr\'
[19]

Wu, Shangda and others , booktitle=
[20]

Fradet, Nathan and Briot, Jean-Pierre and Chhel, Fabien and El Fallah Seghrouchni, Amal and Music, Sony Computer Science Laboratories , journal=
[21]

arXiv preprint arXiv:2510.13285 , year=

In-Distribution Steering: Balancing Control and Coherence in Language Model Generation , author=. arXiv preprint arXiv:2510.13285 , year=

arXiv
[22]

Enhancing Instruction Following of

Kang, Minjae and Kim, Jaehyung , booktitle=. Enhancing Instruction Following of
[23]

arXiv preprint arXiv:2602.01654 , year=

Steering Vector Fields for Context-Aware Inference-Time Control in Large Language Models , author=. arXiv preprint arXiv:2602.01654 , year=

arXiv
[24]

Proceedings of the Conference on Empirical Methods in Natural Language Processing (

Route Sparse Autoencoder to Interpret Large Language Models , author=. Proceedings of the Conference on Empirical Methods in Natural Language Processing (
[25]

Luo, Sarah and others , journal=
[26]

Wu, Yusong and others , journal=
[27]

Advances in Neural Information Processing Systems , year=

Attention Is All You Need , author=. Advances in Neural Information Processing Systems , year=
[28]

Jumping Ahead: Improving Reconstruction Fidelity with

Rajamanoharan, Senthooran and Conmy, Arthur and Smith, Lewis and Lieberum, Tom and Kramár, János and Nanda, Neel , journal=. Jumping Ahead: Improving Reconstruction Fidelity with
[29]

arXiv preprint arXiv:2506.18831 , year=

STU-PID: Steering Token Usage via PID Controller for Efficient Large Language Model Reasoning , author=. arXiv preprint arXiv:2506.18831 , year=

Pith/arXiv arXiv
[30]

and Gamper, H

Gui, A. and Gamper, H. and Braun, S. and Emmanouilidou, D. , booktitle=. Adapting Fr
[31]

ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , year=

Benchmarking music generation models and metrics via human preference studies , author=. ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , year=

2025
[32]

ACM Computing Surveys , year=

Survey on the evaluation of generative models in music , author=. ACM Computing Surveys , year=
[33]

arXiv preprint arXiv:2506.10225 , year=

Genre controlled music generation via activation steering , author=. arXiv preprint arXiv:2506.10225 , year=

Pith/arXiv arXiv

[1] [1]

Proceedings of the International Conference on Learning Representations (

Activation Steering with a Feedback Controller , author=. Proceedings of the International Conference on Learning Representations (

[2] [2]

Proceedings of the

Multitrack Music Transformer , author=. Proceedings of the

[3] [3]

Advances in Neural Information Processing Systems , year=

Activation Addition: Steering Language Models Without Optimization , author=. Advances in Neural Information Processing Systems , year=

[4] [4]

Steering

Rimsky, Nina and Gabrieli, Nick and Schulz, Julian and Smith, Meg and Tong, Isabella and Hubinger, Evan , journal=. Steering

[5] [5]

arXiv preprint arXiv:2406.11717 , year=

Refusal in Language Models Is Mediated by a Single Direction , author=. arXiv preprint arXiv:2406.11717 , year=

Pith/arXiv arXiv

[6] [6]

Proceedings of the International Conference on Learning Representations (

Mean Activation Transport for Activation Steering , author=. Proceedings of the International Conference on Learning Representations (

[7] [7]

arXiv preprint arXiv:2501.15148 , year=

Sparse Autoencoders for Scalable Concept Steering in Large Language Models , author=. arXiv preprint arXiv:2501.15148 , year=

arXiv

[8] [8]

Transformer Circuits Thread , year=

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning , author=. Transformer Circuits Thread , year=

[9] [9]

arXiv preprint arXiv:2309.08600 , year=

Sparse Autoencoders Find Highly Interpretable Features in Language Models , author=. arXiv preprint arXiv:2309.08600 , year=

Pith/arXiv arXiv

[10] [10]

Transformer Circuits Thread , year=

Toy Models of Superposition , author=. Transformer Circuits Thread , year=

[11] [11]

arXiv preprint arXiv:2311.03658 , year=

The Linear Representation Hypothesis and the Geometry of Large Language Models , author=. arXiv preprint arXiv:2311.03658 , year=

Pith/arXiv arXiv

[12] [12]

Scaling Monosemanticity: Extracting Interpretable Features from

Templeton, Adly and Conerly, Tom and Marcus, Jonathan and Lindsey, Jack and Bricken, Trenton and Chen, Brian and Pearce, Adam and Citro, Craig and Ameisen, Emmanuel and Jones, Andy and others , journal=. Scaling Monosemanticity: Extracting Interpretable Features from

[13] [13]

Advances in Neural Information Processing Systems , year=

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model , author=. Advances in Neural Information Processing Systems , year=

[14] [14]

Proceedings of the International Society for Music Information Retrieval Conference (

A Database for Orchestration Research and Its Use for Timbre Interpolation , author=. Proceedings of the International Society for Music Information Retrieval Conference (

[15] [15]

Dong, Hao-Wen and Chen, Ke and McAuley, Julian and Berg-Kirkpatrick, Taylor , booktitle=

[16] [16]

Activation Patching for Controllable Generation in

Facchiano, Filippo and others , journal=. Activation Patching for Controllable Generation in

[17] [17]

arXiv preprint , year=

Steering Music Generation with Activation Engineering , author=. arXiv preprint , year=

[18] [18]

Wu, Yinghao Aaron and others , journal=. Fr\'

[19] [19]

Wu, Shangda and others , booktitle=

[20] [20]

Fradet, Nathan and Briot, Jean-Pierre and Chhel, Fabien and El Fallah Seghrouchni, Amal and Music, Sony Computer Science Laboratories , journal=

[21] [21]

arXiv preprint arXiv:2510.13285 , year=

In-Distribution Steering: Balancing Control and Coherence in Language Model Generation , author=. arXiv preprint arXiv:2510.13285 , year=

arXiv

[22] [22]

Enhancing Instruction Following of

Kang, Minjae and Kim, Jaehyung , booktitle=. Enhancing Instruction Following of

[23] [23]

arXiv preprint arXiv:2602.01654 , year=

Steering Vector Fields for Context-Aware Inference-Time Control in Large Language Models , author=. arXiv preprint arXiv:2602.01654 , year=

arXiv

[24] [24]

Proceedings of the Conference on Empirical Methods in Natural Language Processing (

Route Sparse Autoencoder to Interpret Large Language Models , author=. Proceedings of the Conference on Empirical Methods in Natural Language Processing (

[25] [25]

Luo, Sarah and others , journal=

[26] [26]

Wu, Yusong and others , journal=

[27] [27]

Advances in Neural Information Processing Systems , year=

Attention Is All You Need , author=. Advances in Neural Information Processing Systems , year=

[28] [28]

Jumping Ahead: Improving Reconstruction Fidelity with

Rajamanoharan, Senthooran and Conmy, Arthur and Smith, Lewis and Lieberum, Tom and Kramár, János and Nanda, Neel , journal=. Jumping Ahead: Improving Reconstruction Fidelity with

[29] [29]

arXiv preprint arXiv:2506.18831 , year=

STU-PID: Steering Token Usage via PID Controller for Efficient Large Language Model Reasoning , author=. arXiv preprint arXiv:2506.18831 , year=

Pith/arXiv arXiv

[30] [30]

and Gamper, H

Gui, A. and Gamper, H. and Braun, S. and Emmanouilidou, D. , booktitle=. Adapting Fr

[31] [31]

ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , year=

Benchmarking music generation models and metrics via human preference studies , author=. ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , year=

2025

[32] [32]

ACM Computing Surveys , year=

Survey on the evaluation of generative models in music , author=. ACM Computing Surveys , year=

[33] [33]

arXiv preprint arXiv:2506.10225 , year=

Genre controlled music generation via activation steering , author=. arXiv preprint arXiv:2506.10225 , year=

Pith/arXiv arXiv