BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling

Andrea Blasi N\'u\~nez; Annemette Broch Pirchert; Gianluca Barmina; Lukas Galke Poech; Peter Schneider-Kamp

arxiv: 2606.09707 · v1 · pith:ABEABRRZnew · submitted 2026-06-08 · 💻 cs.LG · cs.CL

BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling

Gianluca Barmina , Annemette Broch Pirchert , Andrea Blasi N\'u\~nez , Lukas Galke Poech , Peter Schneider-Kamp This is my paper

Pith reviewed 2026-06-27 16:53 UTC · model grok-4.3

classification 💻 cs.LG cs.CL

keywords model editingweight manipulationdeclarative configurationtensor surgeryreproducibilitymodel upcyclingYAML plans

0 comments

The pith

BrainSurgery replaces ad-hoc Python scripts with declarative YAML plans for editing neural network weights.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents BrainSurgery as a system for performing reliable modifications on large neural network checkpoints. It lets users specify transformations in YAML files that cover structural changes to layers, mathematical operations on tensors, and reshaping, all addressed through regex patterns and structural selectors. Assertions built into the tool check shapes, data types, and values at each step to block silent mistakes. This approach matters because growing model sizes make one-off scripts difficult to maintain, share, or debug. If the method works as described, common tasks such as upcycling models or extracting adapters can be expressed once and executed repeatedly without custom code.

Core claim

BrainSurgery executes complex transformations through declarative YAML plans. It supports structural modifications, mathematical transformations, and tensor reshaping through expressive regex and structural targeting, while built-in assertions validate tensor shapes, data types, and values to prevent silent errors.

What carries the argument

Declarative YAML plans that abstract storage formats and memory management, using regex and structural targeting to select and alter tensors.

If this is right

Layer restructuring and precision changes can be documented in shareable YAML files instead of scattered code.
Assertions reduce the chance that shape or type mismatches go unnoticed during upcycling workflows.
LoRA extraction and similar adapter operations become repeatable across different base models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If YAML plans prove general enough, they could serve as a common interchange format for recording weight changes in published models.
Version-control systems could track the YAML plans themselves, creating an auditable history of model modifications.

Load-bearing premise

The declarative YAML plans and their abstractions can express the full range of needed transformations without users reverting to custom scripts, and the assertions will catch all relevant errors during real use.

What would settle it

A standard editing operation such as merging two checkpoints or applying low-rank updates that cannot be written as a YAML plan or that passes all assertions yet yields an incorrect resulting model.

Figures

Figures reproduced from arXiv: 2606.09707 by Andrea Blasi N\'u\~nez, Annemette Broch Pirchert, Gianluca Barmina, Lukas Galke Poech, Peter Schneider-Kamp.

**Figure 1.** Figure 1: Overview of the BRAINSURGERY workflow. Checkpoint rewrites are expressed as explicit declarative plans, inspected interactively, and validated through executable checks such as assert and diff. The depicted plan fragment is illustrative and includes advanced operations such as phlora, reflecting that the same workflow supports both simple tensor edits and more complex expert-rewriting pipelines. ation of d… view at source ↗

**Figure 2.** Figure 2: Full PHLoRA workflow with validation. When assertions, reference comparison, checkpoint I/O, and sharded output are included, the imperative baseline must configure loading, mutation, validation, and persistence explicitly, while BRAINSURGERY keeps the workflow in one declarative plan. Imperative Python/Re baseline import re import torch sd = torch.load("models/input.pt") pattern = re.compile(r".*self_attn… view at source ↗

**Figure 3.** Figure 3: Bulk tensor targeting. The imperative baseline loops over matching checkpoint names; the BRAINSURGERY fragment expresses the same regex target family and scale operation as one declarative transform. Tensor surgery validation The local assertions and reference comparison in [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: BRAINSURGERY Web UI figure showing model dump [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: BRAINSURGERY Web UI figure showing model move [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: BRAINSURGERY Web UI figure showing zoom-in on diff between the original model and the rewritten model after applying scale_. is useful, and how explicit plans turn checkpoint manipulation and its validation into reviewable research artifacts. Case studies compare larger imperative rewrites with the corresponding BRAINSURGERY transform fragments. When a block is cropped from a longer script or plan, [...] … view at source ↗

**Figure 9.** Figure 9: Bulk tensor targeting. The imperative baseline loops over matching checkpoint names; the BRAINSURGERY fragment expresses the same regex target family and scale operation as one declarative transform. Prefix Rewrite The example in [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 10.** Figure 10: Prefix rewrite. The imperative baseline loops over checkpoint names and manually rewrites matching keys; the BRAINSURGERY fragment expresses the same regex capture and move as one declarative transform. Example: Tensor Surgery Validation [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗

**Figure 8.** Figure 8: Example validation as executable invariants. Both sides check the same existence, shape, equality, and deletion post-conditions. Example: Bulk Tensor Targeting The example in [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 11.** Figure 11: Validation with diff. Local invariants can be checked with assert, while end-to-end agreement with an independent reference can be checked by diffing the reference output alias against the output produced by the BRAINSURGERY plan. Case Study: Expert Rewrites/PHLoRA Factorization [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗

**Figure 12.** Figure 12: Full dense-to-expert MoE workflow with validation. Including checkpoint I/O, reference comparison, and sharded output makes the imperative baseline responsible for loading, mutation, validation, and persistence, while BRAINSURGERY keeps the same structural rewrite and checks in one plan. 13 [PITH_FULL_IMAGE:figures/full_fig_p013_12.png] view at source ↗

**Figure 13.** Figure 13: Full PHLoRA workflow with validation. When assertions, reference comparison, checkpoint I/O, and sharded output are included, the imperative baseline must configure loading, mutation, validation, and persistence explicitly, while BRAINSURGERY keeps the workflow in one declarative plan. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_13.png] view at source ↗

**Figure 14.** Figure 14: Full in-place low-rank expert rewrite with validation. The imperative baseline spells out checkpoint loading, SVD-based low-rank reconstruction, dtype conversion, reference comparison, and sharded output; the BRAINSURGERY plan expresses the same workflow with subtract_, phlora_, add_, cast_, assert, and diff. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_14.png] view at source ↗

read the original abstract

As deep learning models scale, managing, inspecting, and modifying large checkpoints has become increasingly challenging. Researchers often need to alter model weights for layer restructuring, precision casting, low-rank factorization, and architectural debugging, yet these workflows often rely on fragile ad-hoc Python scripts. Here, we introduce BrainSurgery, a tool for robust and reproducible "tensor surgery" on neural network checkpoints, and provide a system demonstration covering four examples and three case studies from model upcycling to LoRA extraction. By abstracting storage formats and memory management, BrainSurgery executes complex transformations through declarative YAML plans. It supports structural modifications, mathematical transformations, and tensor reshaping through expressive regex and structural targeting, while built-in assertions validate tensor shapes, data types, and values to prevent silent errors. We envision that BrainSurgery will provide a strong foundation for future research through its reproducible and validated operations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BrainSurgery gives a declarative YAML layer over tensor edits with regex targeting and assertions, which is a modest but useful engineering step for checkpoint work.

read the letter

The main thing here is a new named tool that turns common model-editing steps into YAML plans instead of one-off Python. It handles structural changes, math ops, and reshaping through regex and structural selectors, plus assertions that check shapes, dtypes, and values before anything runs. That combination is not described in the cited prior work, so the contribution sits in the tooling space rather than in new algorithms.

What works is the abstraction over storage formats and memory, plus the four examples and three case studies on upcycling and LoRA extraction. Those show the interface in action and make the reproducibility claim concrete. For someone who already spends time writing fragile scripts to move weights around, this could cut down on silent errors.

The soft spot is coverage. The stress-test note is right: nothing in the description enumerates which edits stay inside the declarative boundary and which still need custom code. The assertions are presented as preventive, but the paper gives demonstrations rather than a systematic test of what they catch versus what slips through in practice. Without that, the claim that the tool replaces ad-hoc scripts remains plausible but unproven.

This is for ML engineers who maintain or adapt large checkpoints. A practitioner looking for a cleaner workflow would get immediate value if the code is released and documented. A theorist or benchmark-focused reader would not. It is coherent on its own terms and shows honest attention to a real pain point, so it clears the bar for peer review in a tools or systems track even though the scientific advance is narrow.

Referee Report

2 major / 2 minor

Summary. The paper presents BrainSurgery, a tool for reproducible and reliable declarative weight manipulations on neural network checkpoints. It abstracts storage formats and memory management to enable complex transformations (structural modifications, mathematical operations, tensor reshaping) via expressive regex and structural targeting in YAML plans, with built-in assertions for validating shapes, dtypes, and values to avoid silent errors. The work provides four examples and three case studies covering model upcycling and LoRA extraction.

Significance. If the central claims hold, the tool offers a practical advance by replacing ad-hoc Python scripts with validated, declarative workflows, which could improve reproducibility in model editing and upcycling research. The emphasis on assertions and abstractions for storage/memory is a concrete strength; the inclusion of multiple case studies demonstrates utility beyond toy examples.

major comments (2)

[Case studies and examples sections] The claim that regex+structural targeting in YAML plans plus the storage/memory abstractions suffice for the full range of practical transformations (layer restructuring, low-rank factorization, etc.) without fallback to ad-hoc scripts is load-bearing for the paper's contribution, yet the demonstrations provide only positive examples rather than a systematic enumeration or boundary test of supported vs. unsupported operations.
[Assertions and validation description] The assertion system is presented as preventing silent errors, but no evidence is given on its coverage (e.g., which classes of shape/dtype/value errors arise in the case studies and whether they are all caught), which is required to substantiate the reliability claim.

minor comments (2)

Clarify the distinction between the four examples and three case studies, and consider adding a summary table of supported YAML operations.
The manuscript would benefit from explicit discussion of limitations or operations that remain outside the declarative interface.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The two major comments correctly identify areas where additional evidence would strengthen the manuscript's claims. We outline targeted revisions below.

read point-by-point responses

Referee: [Case studies and examples sections] The claim that regex+structural targeting in YAML plans plus the storage/memory abstractions suffice for the full range of practical transformations (layer restructuring, low-rank factorization, etc.) without fallback to ad-hoc scripts is load-bearing for the paper's contribution, yet the demonstrations provide only positive examples rather than a systematic enumeration or boundary test of supported vs. unsupported operations.

Authors: We agree that the current presentation relies on positive demonstrations and does not systematically delineate supported versus unsupported operations. In revision we will add a new subsection that enumerates the transformation classes expressible via the YAML syntax and abstractions (structural targeting, regex-based selection, arithmetic and reshaping primitives), provides concrete examples of each, and explicitly notes classes of operations (e.g., certain dynamic control-flow or architecture-specific low-rank updates) that still require fallback scripts. This will make the scope and limitations of the declarative approach transparent. revision: yes
Referee: [Assertions and validation description] The assertion system is presented as preventing silent errors, but no evidence is given on its coverage (e.g., which classes of shape/dtype/value errors arise in the case studies and whether they are all caught), which is required to substantiate the reliability claim.

Authors: We concur that concrete evidence of assertion coverage is needed. The revised manuscript will include an analysis (new table and accompanying text) that logs every assertion executed across the three case studies, reports the specific shape, dtype, and value mismatches that were caught, and discusses error classes that the current assertion set does not yet address. We will also expand the methods section to describe the assertion API and its design rationale more explicitly. revision: yes

Circularity Check

0 steps flagged

No circularity: tool description with no derivations or fitted predictions

full rationale

The paper is a system demonstration of a software tool for model editing via declarative YAML plans. It contains no equations, no first-principles derivations, no fitted parameters presented as predictions, and no uniqueness theorems or self-citation chains that bear load on any claimed result. All content consists of feature descriptions, examples, and case studies whose validity rests on external reproducibility rather than internal reduction to inputs. This is the expected non-finding for a non-mathematical engineering paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a software tool introduction paper rather than a scientific derivation; no free parameters, axioms, or invented entities are involved in any central claim.

pith-pipeline@v0.9.1-grok · 5709 in / 1234 out tokens · 31737 ms · 2026-06-27T16:53:12.038065+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 9 canonical work pages · 3 internal anchors

[1]

The Eleventh International Conference on Learning Representations , year =

Editing Models with Task Arithmetic , author =. The Eleventh International Conference on Learning Representations , year =
[2]

and Bansal, Mohit , booktitle =

Yadav, Prateek and Tam, Derek and Choshen, Leshem and Raffel, Colin A. and Bansal, Mohit , booktitle =. 2023 , url =

2023
[3]

2021 , eprint=

LoRA: Low-Rank Adaptation of Large Language Models , author=. 2021 , eprint=

2021
[4]

IEEE transactions on pattern analysis and machine intelligence , volume=

Structured pruning for deep convolutional neural networks: A survey , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2023 , publisher=

2023
[5]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2024 , publisher=

2024
[6]

Proceedings of the National Academy of Sciences , volume =

Overcoming Catastrophic Forgetting in Neural Networks , author =. Proceedings of the National Academy of Sciences , volume =. 2017 , doi =

2017
[7]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =

A Continual Learning Survey: Defying Forgetting in Classification Tasks , author =. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =. 2022 , doi =

2022
[8]

ACM Computing Surveys , volume=

Model merging in llms, mllms, and beyond: Methods, theories, applications, and opportunities , author=. ACM Computing Surveys , volume=. 2026 , publisher=

2026
[9]

arXiv preprint arXiv:2309.00244 , year=

NeuroSurgeon: A Toolkit for Subnetwork Analysis , author=. arXiv preprint arXiv:2309.00244 , year=. doi:10.48550/arXiv.2309.00244 , url=

work page doi:10.48550/arxiv.2309.00244
[10]

Eliciting Latent Predictions from Transformers with the Tuned Lens

Eliciting Latent Predictions from Transformers with the Tuned Lens , author=. arXiv preprint arXiv:2303.08112 , year=. doi:10.48550/arXiv.2303.08112 , url=

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.08112
[11]

2022 , howpublished=

TransformerLens , author=. 2022 , howpublished=

2022
[12]

arXiv preprint arXiv:2403.13257 , year=

Arcee's MergeKit: A Toolkit for Merging Large Language Models , author=. arXiv preprint arXiv:2403.13257 , year=. doi:10.48550/arXiv.2403.13257 , url=

work page doi:10.48550/arxiv.2403.13257
[13]

Findings of the Association for Computational Linguistics: EMNLP 2024 , month = nov, year =

A Unified Framework for Model Editing , author =. Findings of the Association for Computational Linguistics: EMNLP 2024 , month = nov, year =. doi:10.18653/v1/2024.findings-emnlp.903 , url =

work page doi:10.18653/v1/2024.findings-emnlp.903 2024
[14]

Interpreto: An Explainability Library for Transformers

Interpreto: An Explainability Library for Transformers , author=. arXiv preprint arXiv:2512.09730 , year=. doi:10.48550/arXiv.2512.09730 , url=

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2512.09730
[15]

arXiv preprint arXiv:2407.14561 , year=

NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals , author=. arXiv preprint arXiv:2407.14561 , year=. doi:10.48550/arXiv.2407.14561 , url=

work page doi:10.48550/arxiv.2407.14561
[16]

arXiv preprint arXiv:2511.14465 , year=

nnterp: A Standardized Interface for Mechanistic Interpretability of Transformers , author=. arXiv preprint arXiv:2511.14465 , year=. doi:10.48550/arXiv.2511.14465 , url=

work page doi:10.48550/arxiv.2511.14465
[17]

Advances in Neural Information Processing Systems 35 (NeurIPS 2022) , year=

Locating and Editing Factual Associations in GPT , author=. Advances in Neural Information Processing Systems 35 (NeurIPS 2022) , year=

2022
[18]

Mass-Editing Memory in a Transformer

Mass-Editing Memory in a Transformer , author=. arXiv preprint arXiv:2210.07229 , year=. doi:10.48550/arXiv.2210.07229 , url=

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2210.07229
[19]

ACM Trans

Zhao, Haiyan and Chen, Hanjie and Yang, Fan and Liu, Ninghao and Deng, Huiqi and Cai, Hengyi and Wang, Shuaiqiang and Yin, Dawei and Du, Mengnan , title =. ACM Trans. Intell. Syst. Technol. , month = feb, articleno =. 2024 , issue_date =. doi:10.1145/3639372 , abstract =

work page doi:10.1145/3639372 2024
[20]

2026 , eprint=

A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications , author=. 2026 , eprint=

2026
[21]

2025 , eprint=

OLMoE: Open Mixture-of-Experts Language Models , author=. 2025 , eprint=

2025
[22]

2025 , eprint=

FlexOlmo: Open Language Models for Flexible Data Use , author=. 2025 , eprint=

2025
[23]

2026 , eprint=

FlexMoRE: A Flexible Mixture of Rank-heterogeneous Experts for Efficient Federatedly-trained Large Language Models , author=. 2026 , eprint=

2026
[24]

2025 , eprint=

PHLoRA: data-free Post-hoc Low-Rank Adapter extraction from full-rank checkpoint , author=. 2025 , eprint=

2025

[1] [1]

The Eleventh International Conference on Learning Representations , year =

Editing Models with Task Arithmetic , author =. The Eleventh International Conference on Learning Representations , year =

[2] [2]

and Bansal, Mohit , booktitle =

Yadav, Prateek and Tam, Derek and Choshen, Leshem and Raffel, Colin A. and Bansal, Mohit , booktitle =. 2023 , url =

2023

[3] [3]

2021 , eprint=

LoRA: Low-Rank Adaptation of Large Language Models , author=. 2021 , eprint=

2021

[4] [4]

IEEE transactions on pattern analysis and machine intelligence , volume=

Structured pruning for deep convolutional neural networks: A survey , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2023 , publisher=

2023

[5] [5]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2024 , publisher=

2024

[6] [6]

Proceedings of the National Academy of Sciences , volume =

Overcoming Catastrophic Forgetting in Neural Networks , author =. Proceedings of the National Academy of Sciences , volume =. 2017 , doi =

2017

[7] [7]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =

A Continual Learning Survey: Defying Forgetting in Classification Tasks , author =. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =. 2022 , doi =

2022

[8] [8]

ACM Computing Surveys , volume=

Model merging in llms, mllms, and beyond: Methods, theories, applications, and opportunities , author=. ACM Computing Surveys , volume=. 2026 , publisher=

2026

[9] [9]

arXiv preprint arXiv:2309.00244 , year=

NeuroSurgeon: A Toolkit for Subnetwork Analysis , author=. arXiv preprint arXiv:2309.00244 , year=. doi:10.48550/arXiv.2309.00244 , url=

work page doi:10.48550/arxiv.2309.00244

[10] [10]

Eliciting Latent Predictions from Transformers with the Tuned Lens

Eliciting Latent Predictions from Transformers with the Tuned Lens , author=. arXiv preprint arXiv:2303.08112 , year=. doi:10.48550/arXiv.2303.08112 , url=

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.08112

[11] [11]

2022 , howpublished=

TransformerLens , author=. 2022 , howpublished=

2022

[12] [12]

arXiv preprint arXiv:2403.13257 , year=

Arcee's MergeKit: A Toolkit for Merging Large Language Models , author=. arXiv preprint arXiv:2403.13257 , year=. doi:10.48550/arXiv.2403.13257 , url=

work page doi:10.48550/arxiv.2403.13257

[13] [13]

Findings of the Association for Computational Linguistics: EMNLP 2024 , month = nov, year =

A Unified Framework for Model Editing , author =. Findings of the Association for Computational Linguistics: EMNLP 2024 , month = nov, year =. doi:10.18653/v1/2024.findings-emnlp.903 , url =

work page doi:10.18653/v1/2024.findings-emnlp.903 2024

[14] [14]

Interpreto: An Explainability Library for Transformers

Interpreto: An Explainability Library for Transformers , author=. arXiv preprint arXiv:2512.09730 , year=. doi:10.48550/arXiv.2512.09730 , url=

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2512.09730

[15] [15]

arXiv preprint arXiv:2407.14561 , year=

NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals , author=. arXiv preprint arXiv:2407.14561 , year=. doi:10.48550/arXiv.2407.14561 , url=

work page doi:10.48550/arxiv.2407.14561

[16] [16]

arXiv preprint arXiv:2511.14465 , year=

nnterp: A Standardized Interface for Mechanistic Interpretability of Transformers , author=. arXiv preprint arXiv:2511.14465 , year=. doi:10.48550/arXiv.2511.14465 , url=

work page doi:10.48550/arxiv.2511.14465

[17] [17]

Advances in Neural Information Processing Systems 35 (NeurIPS 2022) , year=

Locating and Editing Factual Associations in GPT , author=. Advances in Neural Information Processing Systems 35 (NeurIPS 2022) , year=

2022

[18] [18]

Mass-Editing Memory in a Transformer

Mass-Editing Memory in a Transformer , author=. arXiv preprint arXiv:2210.07229 , year=. doi:10.48550/arXiv.2210.07229 , url=

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2210.07229

[19] [19]

ACM Trans

Zhao, Haiyan and Chen, Hanjie and Yang, Fan and Liu, Ninghao and Deng, Huiqi and Cai, Hengyi and Wang, Shuaiqiang and Yin, Dawei and Du, Mengnan , title =. ACM Trans. Intell. Syst. Technol. , month = feb, articleno =. 2024 , issue_date =. doi:10.1145/3639372 , abstract =

work page doi:10.1145/3639372 2024

[20] [20]

2026 , eprint=

A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications , author=. 2026 , eprint=

2026

[21] [21]

2025 , eprint=

OLMoE: Open Mixture-of-Experts Language Models , author=. 2025 , eprint=

2025

[22] [22]

2025 , eprint=

FlexOlmo: Open Language Models for Flexible Data Use , author=. 2025 , eprint=

2025

[23] [23]

2026 , eprint=

FlexMoRE: A Flexible Mixture of Rank-heterogeneous Experts for Efficient Federatedly-trained Large Language Models , author=. 2026 , eprint=

2026

[24] [24]

2025 , eprint=

PHLoRA: data-free Post-hoc Low-Rank Adapter extraction from full-rank checkpoint , author=. 2025 , eprint=

2025