Recoverable but Not Stationary:Local Linear Structures in Weights and Activations

Irina Piontkovskaia; Sergey Nikolenko

arxiv: 2606.10929 · v1 · pith:TOL7SXNBnew · submitted 2026-06-09 · 💻 cs.LG · cs.AI

Recoverable but Not Stationary:Local Linear Structures in Weights and Activations

Irina Piontkovskaia , Sergey Nikolenko This is my paper

Pith reviewed 2026-06-27 13:49 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords LoRAactivation steeringtask vectorslocal linearitygradient structureparameter recoveryneural network geometry

0 comments

The pith

Linear structures in neural network weights and activations recover task behavior through early updates but drift rapidly instead of forming fixed global directions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether linear directions from task vectors, LoRA, and activation steering represent stable global features or local evolving geometries. In synthetic multitask transformers and LoRA adapters on DistilGPT-2 and GPT-2, strong local low-rank structure appears in task gradients near pretrained weights. Static bases fail to capture recovery directions because the useful basis shifts substantially within 100 steps. The first recovery updates nevertheless form a trajectory-prefix basis that captures 77 percent of the LoRA recovery displacement. A single gradient step also creates an activation shift with 0.58 cosine similarity to contrastive activation addition steering vectors, and the same effect appears on Qwen-0.5B BoolQ statements.

Core claim

Strong local low-rank task-gradient structure exists in trained networks but the fixed-task-plane hypothesis is rejected because static bases miss the recovery direction and the useful basis drifts substantially within 100 steps; the first recovery updates form a trajectory-prefix basis capturing 77% of the LoRA recovery displacement, while a single gradient step produces an activation shift with 0.58 cosine to a labelled-contrast CAA steering vector with similar steering effect on Qwen-0.5B BoolQ statements.

What carries the argument

The trajectory-prefix basis formed by the first few recovery updates, which spans most of the LoRA recovery displacement in parameter space and aligns with activation steering shifts.

If this is right

Random parameter search succeeds in high dimensions because a Gaussian local-linear theorem shows local linearity around pretrained weights.
Gradient steps in weight space produce activation changes that can be used directly for steering with measurable cosine alignment.
Linear control of behavior operates through evolving local geometries that partially persist across parameter and activation spaces rather than global task directions.
Recovery trajectories contain recoverable prefix bases that explain most displacement even when later directions change.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the local structures appear in larger models, then small targeted weight updates could steer behavior more precisely than full fine-tuning.
The observed drift between weight and activation spaces suggests hybrid editing methods that combine early LoRA steps with activation steering.
Tracking how the trajectory-prefix basis evolves could improve continual learning by updating the active linear directions instead of assuming fixed planes.

Load-bearing premise

The observed local low-rank structures, 77% capture rate, and 0.58 cosine alignment hold across architectures and tasks without substantial variation.

What would settle it

Measuring that the first recovery updates capture below 30% of total LoRA displacement or that single-step activation shifts fall below 0.2 cosine similarity to steering vectors on a new model or task would show the local structures do not persist.

Figures

Figures reproduced from arXiv: 2606.10929 by Irina Piontkovskaia, Sergey Nikolenko.

**Figure 2.** Figure 2: Trajectory prefixes capture most of the LoRA recovery displacement, and that predicts [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Per-candidate view (∼1.2k candidates pooled over 6 runs): (a) alignment with the gradient recovery direction predicts gain (r ≈ 0.52); (b) overlap with the static local task plane does not predict gain (r ≈ −0.40). Alternating Raw grad-sum Norm. grad-sum PCGrad-lite Equal-loss-dec. Loss-balanced Joint mixed Seq. A→B Seq. B→A 0.0 0.2 0.4 0.6 0.8 1.0 Mean worst-task EM Alternating Raw grad-sum Norm. grad-sum… view at source ↗

**Figure 4.** Figure 4: Sequential specialization is brittle, simultaneous schedules are robust: mean worst-task [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Measured gradient rank vs. a random-sketch control on Qwen. [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Locating the linear regime on Qwen2.5-0.5B. Each panel shows the first-order ( [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: A single gradient step yields a CAA-like steering vector at late layers (Qwen2.5-0.5B): [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: The clean (deterministic) route to a steering vector. Two independent constructions — a [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: The perturbation-to-steering pipeline (random-search version). The three splits are disjoint [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

read the original abstract

Task vectors, LoRA, activation steering, and random search around pretrained weights all suggest that learned behaviour can be controlled by linear directions. We ask which linear structures actually exist and on what scale. In a synthetic multitask transformer and LoRA adapters on DistilGPT-2 / GPT-2 we find strong local low-rank task-gradient structure but reject the fixed-task-plane hypothesis: static bases miss the recovery direction, and the useful basis drifts substantially within 100 steps. However, the first recovery updates form a trajectory-prefix basis capturing 77% of the LoRA recovery displacement. We develop random search theory with a Gaussian local-linear theorem that justifies the effectiveness of random parameter search even in very high dimensions. We also study the relation between parameter perturbations and activation steering: a single gradient step produces an activation shift with 0.58 cosine to a labelled-contrast CAA steering vector, with a similar steering effect on Qwen-0.5B BoolQ statements. We validate our results with experiments on synthetic Transformers and LLMs. Our results suggest that linear structures in trained networks are not global task directions, but evolving local geometries that partially persist across parameter and activation spaces.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Local linear structures drift over short trajectories instead of sitting on fixed planes, with early recovery steps capturing most of the displacement and a gradient-CAA link at 0.58 cosine.

read the letter

The main point is that the useful linear directions for recovery and steering are not stationary global task planes. In the synthetic multitask transformer and the LoRA runs on DistilGPT-2 and GPT-2, static bases miss the recovery direction while the first few updates form a trajectory-prefix basis that covers 77% of the displacement. A single gradient step also produces an activation shift whose cosine with a labelled-contrast CAA vector reaches 0.58 on Qwen-0.5B BoolQ, and the effect is comparable to the steering vector itself.

The work does a clean job rejecting the fixed-plane hypothesis with controlled experiments and adds a Gaussian local-linear theorem that gives a reason random search can still succeed in high dimensions. Connecting parameter-space perturbations to activation-space steering is a direct step past the usual task-vector and LoRA citations.

The numbers 77% and 0.58 are presented without error bars, statistical tests, or an explicit null baseline, so it is still unclear how far they exceed what random vectors or shuffled trajectories would produce in those dimensions. The stress-test concern holds: without that control or the precise definition of recovery displacement, the quantitative support for evolving geometries is weaker than the abstract suggests. The experiments stay on small models, so the degree of non-stationarity on larger systems is open.

People working on model editing and activation steering would get value from the local-geometry angle and the parameter-to-activation link. The paper shows clear thinking and honest engagement with the literature, so it deserves a serious referee even though the statistical framing needs tightening.

Referee Report

2 major / 1 minor

Summary. The paper claims that linear structures in neural network weights and activations are local and evolving rather than fixed global task directions. Experiments on a synthetic multitask transformer and LoRA adapters on DistilGPT-2/GPT-2 show strong local low-rank task-gradient structure, with the first recovery updates forming a trajectory-prefix basis that captures 77% of the LoRA recovery displacement; static bases miss the recovery direction and the useful basis drifts within 100 steps. A Gaussian local-linear theorem is developed to justify random parameter search in high dimensions. A single gradient step produces an activation shift with 0.58 cosine similarity to a labelled-contrast CAA steering vector, with similar steering effects observed on Qwen-0.5B BoolQ statements. Results are validated on synthetic Transformers and LLMs, suggesting evolving local geometries that partially persist across parameter and activation spaces.

Significance. If the empirical findings hold, the work advances understanding of the geometry of trained networks by demonstrating that useful linear directions for recovery and steering are local, non-stationary, and only partially persistent, rather than global fixed planes. A clear strength is the Gaussian local-linear theorem, which supplies a parameter-free theoretical justification for the effectiveness of random search even in very high dimensions. These results could inform more targeted approaches to model editing, task vectors, and activation steering, though the narrow experimental scope limits immediate generality.

major comments (2)

[Abstract] Abstract: the central quantitative claim that the trajectory-prefix basis captures 77% of the LoRA recovery displacement is presented without a null baseline (random subspace, shuffled trajectories, or expected value under the paper's Gaussian local-linear theorem), error bars, or statistical test. In spaces of dimension 10^6–10^8, subspace capture rates of this magnitude can occur by chance depending on how the target displacement is defined, so the evidence that this constitutes a meaningful 'trajectory-prefix basis' is not yet secured.
[Abstract] Abstract: the reported 0.58 cosine similarity between the single-gradient-step activation shift and the CAA steering vector lacks a null distribution, statistical test, details on hyperparameter choices, or data exclusion criteria. Without these, it is impossible to determine whether the alignment exceeds chance levels or what would be predicted by the Gaussian local-linear model introduced in the paper, weakening support for the claimed relation between parameter perturbations and activation steering.

minor comments (1)

[Abstract] The abstract and methods descriptions would benefit from an explicit definition of 'recovery displacement' and the precise construction of the trajectory-prefix basis to allow readers to assess the measurement procedure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful review and for pointing out the need for stronger statistical grounding of our quantitative claims. We agree these additions will improve the manuscript and address each comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the central quantitative claim that the trajectory-prefix basis captures 77% of the LoRA recovery displacement is presented without a null baseline (random subspace, shuffled trajectories, or expected value under the paper's Gaussian local-linear theorem), error bars, or statistical test. In spaces of dimension 10^6–10^8, subspace capture rates of this magnitude can occur by chance depending on how the target displacement is defined, so the evidence that this constitutes a meaningful 'trajectory-prefix basis' is not yet secured.

Authors: We agree that a null baseline is required to establish that the 77% capture is meaningful rather than chance. In revision we will add: (i) overlap with random subspaces of identical dimension, (ii) overlap with shuffled trajectory prefixes, (iii) the analytic expectation under the Gaussian local-linear theorem, (iv) error bars across independent runs, and (v) a statistical test against the null. These will be reported both in the abstract and in the main results section. revision: yes
Referee: [Abstract] Abstract: the reported 0.58 cosine similarity between the single-gradient-step activation shift and the CAA steering vector lacks a null distribution, statistical test, details on hyperparameter choices, or data exclusion criteria. Without these, it is impossible to determine whether the alignment exceeds chance levels or what would be predicted by the Gaussian local-linear model introduced in the paper, weakening support for the claimed relation between parameter perturbations and activation steering.

Authors: We accept that the 0.58 cosine result needs a null distribution and supporting details. The revised manuscript will include: a null distribution obtained from random activation shifts and from permuted CAA vectors, a statistical test of the observed similarity, explicit hyperparameter values for the single gradient step and CAA computation, and data exclusion criteria. We will also compare the empirical value to the prediction of the Gaussian local-linear theorem. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical measurements and developed theorem are self-contained.

full rationale

The paper reports direct experimental measurements (77% subspace capture, 0.58 cosine alignment) on specific models and develops a Gaussian local-linear theorem to justify random search. No derivation step reduces by the paper's own equations to a fitted quantity from the same data, nor relies on load-bearing self-citation chains or ansatzes smuggled from prior author work. The central claims rest on observable quantities and an independently stated theorem rather than tautological redefinitions or predictions forced by construction. This is the normal case of an empirical paper whose quantitative support is external to its own fitting procedure.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper relies on standard machine-learning assumptions about local linearity around pretrained weights plus a newly stated Gaussian local-linear theorem; no free parameters or invented entities are introduced beyond conventional LoRA ranks and experimental choices.

axioms (2)

domain assumption Local linearity of the loss landscape around pretrained weights
Invoked to justify both the random-search theorem and the interpretation of gradient structures.
ad hoc to paper Gaussian local-linear theorem for random parameter search
Developed in the paper to explain effectiveness of random search in high dimensions.

pith-pipeline@v0.9.1-grok · 5740 in / 1432 out tokens · 27943 ms · 2026-06-27T13:49:44.791650+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 7 linked inside Pith

[1]

Intrinsic dimensionality explains the effectiveness of language model fine-tuning

Armen Aghajanyan, Sonal Gupta, and Luke Zettlemoyer. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. InAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

2021
[2]

GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks

Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks. InInternational Conference on Machine Learning (ICML), 2018

2018
[3]

Hamprecht

Felix Draxler, Kambis Veschgini, Manfred Salmhofer, and Fred A. Hamprecht. Essentially no barriers in neural network energy landscape. InInternational Conference on Machine Learning (ICML), 2018. URLhttps://arxiv.org/abs/1803.00885

Pith/arXiv arXiv 2018
[4]

Deep ensembles: A loss landscape perspective.arXiv preprint arXiv:1912.02757, 2019

Stanislav Fort, Huiyi Hu, and Balaji Lakshminarayanan. Deep ensembles: A loss landscape perspective.arXiv preprint arXiv:1912.02757, 2019

arXiv 1912
[5]

Neural thickets: Diverse task experts are dense around pretrained weights.arXiv preprint arXiv:2603.12228, 2026

Gan et al. Neural thickets: Diverse task experts are dense around pretrained weights.arXiv preprint arXiv:2603.12228, 2026. URLhttps://arxiv.org/abs/2603.12228

arXiv 2026
[6]

Loss surfaces, mode connectivity, and fast ensembling of DNNs

Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin, Dmitry Vetrov, and Andrew Gordon Wilson. Loss surfaces, mode connectivity, and fast ensembling of DNNs. InAdvances in Neural Information Processing Systems (NeurIPS), 2018. URLhttps://arxiv.org/abs/1802.10026

Pith/arXiv arXiv 2018
[7]

Parameter-efficient transfer learning for NLP

Neil Houlsby, Andrei Giurgiu, Stanisław Jastrzębski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for NLP. InInternational Conference on Machine Learning (ICML), 2019

2019
[8]

LoRA: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InInternational Conference on Learning Representations (ICLR), 2022. URLhttps://arxiv.org/abs/2106. 09685

2022
[9]

Editing models with task arithmetic

Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. Editing models with task arithmetic. InInternational Conference on Learning Representations (ICLR), 2023. URLhttps://arxiv.org/abs/2212.04089

Pith/arXiv arXiv 2023
[10]

Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences, 114(13):3521–3526, 2017

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences, 114(13):3521–3526, 2017. 19

2017
[11]

Measuring the intrinsic dimension of objective landscapes

Chunyuan Li, Heerad Farkhoor, Rosanne Liu, and Jason Yosinski. Measuring the intrinsic dimension of objective landscapes. InInternational Conference on Learning Representations (ICLR), 2018. URLhttps://arxiv.org/abs/1804.08838

Pith/arXiv arXiv 2018
[12]

Visualizing the loss landscape of neural nets

Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. Visualizing the loss landscape of neural nets. InAdvances in Neural Information Processing Systems (NeurIPS), 2018

2018
[13]

Conflict-averse gradient descent for multi-task learning

Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, and Qiang Liu. Conflict-averse gradient descent for multi-task learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2021

2021
[14]

Gradient episodic memory for continual learning

David Lopez-Paz and Marc’Aurelio Ranzato. Gradient episodic memory for continual learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2017

2017
[15]

Fine-tuning language models with just forward passes

Sadhika Malladi, Tianyu Gao, Eshaan Nichani, Alex Damian, Jason D Lee, Danqi Chen, and Sanjeev Arora. Fine-tuning language models with just forward passes. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

2023
[16]

Evolution strategies as a scalable alternative to reinforcement learning.arXiv preprint arXiv:1703.03864, 2017

Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. Evolution strategies as a scalable alternative to reinforcement learning.arXiv preprint arXiv:1703.03864, 2017

Pith/arXiv arXiv 2017
[17]

Multi-task learning as multi-objective optimization

Ozan Sener and Vladlen Koltun. Multi-task learning as multi-objective optimization. In Advances in Neural Information Processing Systems (NeurIPS), 2018

2018
[18]

Li, Arnab Sen Sharma, Aaron Mueller, Byron C

Eric Todd, Millicent L. Li, Arnab Sen Sharma, Aaron Mueller, Byron C. Wallace, and David Bau. Function vectors in large language models. InInternational Conference on Learning Representations (ICLR), 2024. URLhttps://arxiv.org/abs/2310.15213

arXiv 2024
[19]

Steering language models with activation engineering.arXiv preprint arXiv:2308.10248, 2023

Alexander Matt Turner, Lisa Thiergart, Gavin Leech, David Udell, Juan J Vazquez, Ulisse Mini, and Monte MacDiarmid. Steering language models with activation engineering.arXiv preprint arXiv:2308.10248, 2023. URLhttps://arxiv.org/abs/2308.10248

Pith/arXiv arXiv 2023
[20]

Model soups: Averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo- Lopes, Ari S Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, and Ludwig Schmidt. Model soups: Averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. InInternational Conference on Machine Learning (I...

arXiv 2022
[21]

Manning, and Christopher Potts

Zhengxuan Wu, Aryaman Arora, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, and Christopher Potts. ReFT: Representation finetuning for language models. In Advances in Neural Information Processing Systems (NeurIPS), 2024. URLhttps://arxiv. org/abs/2404.03592

arXiv 2024
[22]

TIES-merging: Resolving interference when merging models

Prateek Yadav, Derek Tam, Leshem Choshen, Colin Raffel, and Mohit Bansal. TIES-merging: Resolving interference when merging models. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. URLhttps://arxiv.org/abs/2306.01708

arXiv 2023
[23]

Gradient surgery for multi-task learning

Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, and Chelsea Finn. Gradient surgery for multi-task learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2020. URLhttps://arxiv.org/abs/2001.06782. 20

arXiv 2020
[24]

prefix is best

Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, et al. Representation engineering: A top-down approach to AI transparency.arXiv preprint arXiv:2310.01405, 2023. URL https://arxiv.org/abs/2310.01405. 21 Table 7: Per-run LoRA random search, ens-k-of-Nand pass@NatK=...

Pith/arXiv arXiv 2023

[1] [1]

Intrinsic dimensionality explains the effectiveness of language model fine-tuning

Armen Aghajanyan, Sonal Gupta, and Luke Zettlemoyer. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. InAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

2021

[2] [2]

GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks

Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks. InInternational Conference on Machine Learning (ICML), 2018

2018

[3] [3]

Hamprecht

Felix Draxler, Kambis Veschgini, Manfred Salmhofer, and Fred A. Hamprecht. Essentially no barriers in neural network energy landscape. InInternational Conference on Machine Learning (ICML), 2018. URLhttps://arxiv.org/abs/1803.00885

Pith/arXiv arXiv 2018

[4] [4]

Deep ensembles: A loss landscape perspective.arXiv preprint arXiv:1912.02757, 2019

Stanislav Fort, Huiyi Hu, and Balaji Lakshminarayanan. Deep ensembles: A loss landscape perspective.arXiv preprint arXiv:1912.02757, 2019

arXiv 1912

[5] [5]

Neural thickets: Diverse task experts are dense around pretrained weights.arXiv preprint arXiv:2603.12228, 2026

Gan et al. Neural thickets: Diverse task experts are dense around pretrained weights.arXiv preprint arXiv:2603.12228, 2026. URLhttps://arxiv.org/abs/2603.12228

arXiv 2026

[6] [6]

Loss surfaces, mode connectivity, and fast ensembling of DNNs

Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin, Dmitry Vetrov, and Andrew Gordon Wilson. Loss surfaces, mode connectivity, and fast ensembling of DNNs. InAdvances in Neural Information Processing Systems (NeurIPS), 2018. URLhttps://arxiv.org/abs/1802.10026

Pith/arXiv arXiv 2018

[7] [7]

Parameter-efficient transfer learning for NLP

Neil Houlsby, Andrei Giurgiu, Stanisław Jastrzębski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for NLP. InInternational Conference on Machine Learning (ICML), 2019

2019

[8] [8]

LoRA: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InInternational Conference on Learning Representations (ICLR), 2022. URLhttps://arxiv.org/abs/2106. 09685

2022

[9] [9]

Editing models with task arithmetic

Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. Editing models with task arithmetic. InInternational Conference on Learning Representations (ICLR), 2023. URLhttps://arxiv.org/abs/2212.04089

Pith/arXiv arXiv 2023

[10] [10]

Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences, 114(13):3521–3526, 2017

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences, 114(13):3521–3526, 2017. 19

2017

[11] [11]

Measuring the intrinsic dimension of objective landscapes

Chunyuan Li, Heerad Farkhoor, Rosanne Liu, and Jason Yosinski. Measuring the intrinsic dimension of objective landscapes. InInternational Conference on Learning Representations (ICLR), 2018. URLhttps://arxiv.org/abs/1804.08838

Pith/arXiv arXiv 2018

[12] [12]

Visualizing the loss landscape of neural nets

Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. Visualizing the loss landscape of neural nets. InAdvances in Neural Information Processing Systems (NeurIPS), 2018

2018

[13] [13]

Conflict-averse gradient descent for multi-task learning

Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, and Qiang Liu. Conflict-averse gradient descent for multi-task learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2021

2021

[14] [14]

Gradient episodic memory for continual learning

David Lopez-Paz and Marc’Aurelio Ranzato. Gradient episodic memory for continual learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2017

2017

[15] [15]

Fine-tuning language models with just forward passes

Sadhika Malladi, Tianyu Gao, Eshaan Nichani, Alex Damian, Jason D Lee, Danqi Chen, and Sanjeev Arora. Fine-tuning language models with just forward passes. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

2023

[16] [16]

Evolution strategies as a scalable alternative to reinforcement learning.arXiv preprint arXiv:1703.03864, 2017

Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. Evolution strategies as a scalable alternative to reinforcement learning.arXiv preprint arXiv:1703.03864, 2017

Pith/arXiv arXiv 2017

[17] [17]

Multi-task learning as multi-objective optimization

Ozan Sener and Vladlen Koltun. Multi-task learning as multi-objective optimization. In Advances in Neural Information Processing Systems (NeurIPS), 2018

2018

[18] [18]

Li, Arnab Sen Sharma, Aaron Mueller, Byron C

Eric Todd, Millicent L. Li, Arnab Sen Sharma, Aaron Mueller, Byron C. Wallace, and David Bau. Function vectors in large language models. InInternational Conference on Learning Representations (ICLR), 2024. URLhttps://arxiv.org/abs/2310.15213

arXiv 2024

[19] [19]

Steering language models with activation engineering.arXiv preprint arXiv:2308.10248, 2023

Alexander Matt Turner, Lisa Thiergart, Gavin Leech, David Udell, Juan J Vazquez, Ulisse Mini, and Monte MacDiarmid. Steering language models with activation engineering.arXiv preprint arXiv:2308.10248, 2023. URLhttps://arxiv.org/abs/2308.10248

Pith/arXiv arXiv 2023

[20] [20]

Model soups: Averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo- Lopes, Ari S Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, and Ludwig Schmidt. Model soups: Averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. InInternational Conference on Machine Learning (I...

arXiv 2022

[21] [21]

Manning, and Christopher Potts

Zhengxuan Wu, Aryaman Arora, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, and Christopher Potts. ReFT: Representation finetuning for language models. In Advances in Neural Information Processing Systems (NeurIPS), 2024. URLhttps://arxiv. org/abs/2404.03592

arXiv 2024

[22] [22]

TIES-merging: Resolving interference when merging models

Prateek Yadav, Derek Tam, Leshem Choshen, Colin Raffel, and Mohit Bansal. TIES-merging: Resolving interference when merging models. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. URLhttps://arxiv.org/abs/2306.01708

arXiv 2023

[23] [23]

Gradient surgery for multi-task learning

Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, and Chelsea Finn. Gradient surgery for multi-task learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2020. URLhttps://arxiv.org/abs/2001.06782. 20

arXiv 2020

[24] [24]

prefix is best

Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, et al. Representation engineering: A top-down approach to AI transparency.arXiv preprint arXiv:2310.01405, 2023. URL https://arxiv.org/abs/2310.01405. 21 Table 7: Per-run LoRA random search, ens-k-of-Nand pass@NatK=...

Pith/arXiv arXiv 2023