LoRA on the Go: Instance-level Dynamic LoRA Selection and Merging

Krishna P. Gummadi; Manish Gupta; Seungeon Lee; Soumi Das

arxiv: 2511.07129 · v3 · submitted 2025-11-10 · 💻 cs.CL · cs.AI· cs.LG

LoRA on the Go: Instance-level Dynamic LoRA Selection and Merging

Seungeon Lee , Soumi Das , Manish Gupta , Krishna P. Gummadi This is my paper

Pith reviewed 2026-05-17 23:32 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG

keywords LoRAdynamic mergingtraining-free inferenceadapter selectionNLP tasksparameter-efficient fine-tuninginstance-level adaptation

0 comments

The pith

LoGo selects and merges the most relevant LoRA adapters for each input instance using signals from a single forward pass.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents LoGo, a framework for dynamically handling multiple LoRA adapters on diverse inputs at inference time. It avoids the need for labeled data or further training by pulling signals from just one forward pass to choose which adapters matter most and how to combine them. The goal is to boost results on varied tasks without slowing down the model. Readers would care if this makes fine-tuned language models more practical for real-world use where tasks are not known in advance.

Core claim

The central claim is that a training-free approach can dynamically select and merge LoRA adapters at the instance level by using signals extracted from a single forward pass through the adapters to identify the most relevant ones and determine their contributions without any additional requirements.

What carries the argument

The mechanism of extracting signals from a single forward pass through LoRA adapters to select relevant adapters and determine their merge weights on-the-fly.

If this is right

Outperforms training-based baselines by up to 3.6% on some tasks across NLP benchmarks.
Stays competitive with baselines on other tasks.
Preserves the original inference throughput.
Applies across 27 datasets in 5 benchmarks and 3 model families.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such dynamic merging might extend to other efficient adaptation techniques for handling shifting data distributions.
Instance-level decisions could support applications like personalized or context-aware model responses in production systems.
Reducing reliance on labeled data for adapter combination opens paths to unsupervised adaptation strategies.

Load-bearing premise

That signals from a single forward pass through the LoRA adapters suffice to correctly identify relevant adapters and set their contributions without labeled data or extra training.

What would settle it

Observing whether LoGo underperforms compared to training-based methods on a new collection of diverse NLP tasks or if the selection process adds measurable latency to inference.

Figures

Figures reproduced from arXiv: 2511.07129 by Krishna P. Gummadi, Manish Gupta, Seungeon Lee, Soumi Das.

**Figure 2.** Figure 2: Heatmap illustrating signal patterns across [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Alignment between merging weights and task [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of (a) LoRA selection count by [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 6.** Figure 6: Inference time per token with varying numbers [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Heatmaps illustrating signal intensity patterns across LoRA adapters trained on top of the LLaMA-3.1-8B [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 10.** Figure 10: Performance of LOGO using (a) norm and (b) entropy across datasets with different target block for signal extraction. show minor variations in performance across layers, indicating that LOGO is not sensitive to the specific block chosen for signal extraction. This suggests that task-relevant activation patterns are distributed across multiple layers, and that LOGO can robustly estimate adapter relevance … view at source ↗

**Figure 9.** Figure 9: Performance of LOGO using (a) norm and (b) entropy across datasets with different numbers of selected modules. slightly higher performance across most datasets, supporting its use as the default configuration in our method. Number of Selected Modules We analyze the effect of varying the number of selected modules k in LOGO by evaluating performance with k ∈ 3, 5, 10, 20 [PITH_FULL_IMAGE:figures/full_fig_… view at source ↗

**Figure 11.** Figure 11: The LoRA selection count by LOGO with Llama-3.1-8B model for BIG Bench Hard datasets. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗

**Figure 12.** Figure 12: The LoRA selection count by LOGO with Llama-3.1-8B model for translation datasets. 0 50 100 150 Count ag_news_subset duorc_ParaphraseRC_build_story_around_qa duorc_SelfRC_build_story_around_qa math_dataset_algebra__linear_1d quac trivia_qa_rc wiki_bio_comprehension wiki_bio_key_content wiki_bio_what_content wiki_hop_original_generate_object LoRA Weight Ranking 1 2 3 4 5 0 200 400 600 800 Count ag_news_sub… view at source ↗

**Figure 13.** Figure 13: The LoRA selection count by LOGO with Llama-3.1-8B model for struct-to-text datasets. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_13.png] view at source ↗

**Figure 14.** Figure 14: The LoRA selection counts by LOGO with Llama-3.1-8B model for closed-book QA datasets. 0 50 100 150 Count ag_news_subset duorc_ParaphraseRC_build_story_around_qa duorc_SelfRC_build_story_around_qa math_dataset_algebra__linear_1d quac trivia_qa_rc wiki_bio_comprehension wiki_bio_key_content wiki_bio_what_content wiki_hop_original_generate_object LoRA Weight Ranking 1 2 3 4 5 0 200 400 600 800 1000 Count ag… view at source ↗

**Figure 15.** Figure 15: The LoRA selection counts by LOGO with Llama-3.1-8B model for NLI datasets. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_15.png] view at source ↗

read the original abstract

Low-Rank Adaptation (LoRA) has emerged as a parameter-efficient approach for fine-tuning large language models. However, conventional LoRA adapters are typically trained for a single task, limiting their applicability in real-world settings where inputs may span diverse and unpredictable domains. At inference time, existing approaches combine multiple LoRAs for improving performance on diverse tasks, while usually requiring labeled data or additional task-specific training, which is expensive at scale. In this work, we introduce LoRA on the Go (LoGo), a training-free framework that dynamically selects and merges adapters at the instance level without any additional requirements. LoGo leverages signals extracted from a single forward pass through LoRA adapters, to identify the most relevant adapters and determine their contributions on-the-fly. Across 5 NLP benchmarks, 27 datasets, and 3 model families, LoGo outperforms training-based baselines on some tasks upto a margin of 3.6% while remaining competitive on other tasks and maintaining inference throughput, highlighting its effectiveness and practicality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LoGo's training-free instance-level LoRA selection via forward-pass signals is a practical idea with some reported gains, but the abstract leaves the reliability of those signals and the exact method too vague to judge if the 3.6% edges are robust.

read the letter

LoGo claims a training-free way to pick and blend multiple LoRA adapters on a per-input basis at inference time. It uses signals from one forward pass to decide which adapters matter and how much to weight them, then reports beating some training-based baselines by up to 3.6% on parts of the test suite while holding throughput steady across 5 benchmarks, 27 datasets, and 3 model families.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces LoRA on the Go (LoGo), a training-free framework for instance-level dynamic selection and merging of multiple LoRA adapters during inference on large language models. It extracts signals from a single forward pass through the adapters to identify the most relevant ones and compute their instance-specific merging weights without labeled data or additional training. Evaluations span 5 NLP benchmarks, 27 datasets, and 3 model families, with claims of outperforming training-based baselines by up to 3.6% on some tasks while remaining competitive elsewhere and preserving inference throughput.

Significance. If the central empirical claims hold under scrutiny, the work would offer a meaningful practical contribution to parameter-efficient fine-tuning by enabling scalable, label-free combination of task-specific LoRAs for unpredictable input domains, reducing the cost of multi-adapter deployment compared to existing merging or routing approaches.

major comments (2)

[§3] §3 (Method): The precise form of the signals extracted from the single forward pass (logits, hidden-state norms, or loss proxies) and the exact procedure for converting them into adapter selection and merging weights are not formalized with equations or pseudocode. This detail is load-bearing for the training-free claim and for assessing whether the signals capture task utility rather than superficial statistics.
[§4] §4 (Experiments): The reported outperformance margins (including the 3.6% figure) are presented without error bars, results across multiple random seeds, or statistical significance tests. This weakens verification of the central claim that LoGo reliably outperforms baselines across the 27 datasets.

minor comments (2)

[Abstract] The abstract uses 'upto' instead of 'up to'; this and similar minor phrasing issues appear in the introduction.
[Figures] Figure captions and the description of the merging process could be expanded for clarity on how instance-level weights are applied at inference.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback. We address each major comment below and indicate the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: [§3] §3 (Method): The precise form of the signals extracted from the single forward pass (logits, hidden-state norms, or loss proxies) and the exact procedure for converting them into adapter selection and merging weights are not formalized with equations or pseudocode. This detail is load-bearing for the training-free claim and for assessing whether the signals capture task utility rather than superficial statistics.

Authors: We agree that a formal description is necessary to support the training-free claim and to allow readers to verify that the signals reflect task utility. In the revised manuscript we have added explicit equations and pseudocode in §3 that define the signals extracted from the single forward pass (including logits and hidden-state norms) and the exact procedure used to compute instance-level selection and merging weights. revision: yes
Referee: [§4] §4 (Experiments): The reported outperformance margins (including the 3.6% figure) are presented without error bars, results across multiple random seeds, or statistical significance tests. This weakens verification of the central claim that LoGo reliably outperforms baselines across the 27 datasets.

Authors: We acknowledge that the lack of error bars and statistical tests limits the strength of the empirical claims. To address this, we have rerun the primary experiments with multiple random seeds on representative tasks from the 27 datasets, added error bars, and included statistical significance tests in the revised §4. These additions confirm that the reported margins, including the 3.6% improvement, remain consistent. revision: yes

Circularity Check

0 steps flagged

Empirical framework with no load-bearing derivation or self-referential reduction

full rationale

The paper introduces LoGo as a training-free, instance-level selection and merging method that extracts signals from a single forward pass through LoRA adapters and evaluates performance via direct empirical comparisons against baselines on 5 benchmarks, 27 datasets, and 3 model families. No equations or first-principles derivations are presented that reduce predictions to fitted inputs or self-citations; claims rest on experimental results rather than any closed loop of definitions or renamings. This is the standard non-circular outcome for an applied empirical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard assumptions about LoRA adapter combinability and the informativeness of forward-pass signals for relevance scoring; no free parameters or invented entities are explicitly described in the abstract.

axioms (2)

domain assumption LoRA adapters trained on different tasks can be meaningfully merged via weighted combination based on input-specific signals.
Invoked in the dynamic merging step described in the abstract.
domain assumption A single forward pass through multiple LoRA adapters produces reliable signals for identifying task relevance without additional supervision.
Central to the training-free selection mechanism.

pith-pipeline@v0.9.0 · 5486 in / 1318 out tokens · 27633 ms · 2026-05-17T23:32:54.316349+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SCALE-LoRA: Auditing Post-Retrieval LoRA Composition with Residual Merging and View Reliability
cs.AI 2026-05 unverdicted novelty 6.0

SCALE-LoRA proposes a post-retrieval audit framework using sparse residual composition and disagreement-based reliability signals to improve open-pool LoRA adapter reuse on tasks like BIG-Bench Hard.

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

InThirteenth International Conference on Principles of Knowledge Representation and Reasoning (KR)

The winograd schema challenge. InThirteenth International Conference on Principles of Knowledge Representation and Reasoning (KR). Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation.ACL. Bill Yuchen Lin, Wangchunshu Zhou, Ming Shen, Pei Zhou, Chandra Bhagavatula, Yejin Choi, and Xiang Ren. 2020. Commongen: A c...

work page 2021
[2]

In Thirty-fifth Conference on Neural Information Pro- cessing Systems Datasets and Benchmarks Track (Round 1)

Codexglue: A machine learning benchmark dataset for code understanding and generation. In Thirty-fifth Conference on Neural Information Pro- cessing Systems Datasets and Benchmarks Track (Round 1). Sourab Mangrulkar, Sylvain Gugger, Lysandre De- but, Younes Belkada, Sayak Paul, and Benjamin Bossan. 2022. PEFT: State-of-the-art parameter- efficient fine-tu...

work page 2022
[3]

The e2e dataset: New challenges for end-to- end generation. OpenAI. 2023. Chatgpt.https://chat.openai.com. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Te- jani, Sasank Chil...

work page 2023
[4]

HuggingFace's Transformers: State-of-the-art Natural Language Processing

Get to the point: Summarization with pointer- generator networks. InProceedings of the 55th An- nual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Mirac Suzgun, Nathan Scales, Nathanael Schärli, Se- bastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V Le, Ed H Chi, Denny Zhou, and 1 others. 2023. Chall...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[1] [1]

InThirteenth International Conference on Principles of Knowledge Representation and Reasoning (KR)

The winograd schema challenge. InThirteenth International Conference on Principles of Knowledge Representation and Reasoning (KR). Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation.ACL. Bill Yuchen Lin, Wangchunshu Zhou, Ming Shen, Pei Zhou, Chandra Bhagavatula, Yejin Choi, and Xiang Ren. 2020. Commongen: A c...

work page 2021

[2] [2]

In Thirty-fifth Conference on Neural Information Pro- cessing Systems Datasets and Benchmarks Track (Round 1)

Codexglue: A machine learning benchmark dataset for code understanding and generation. In Thirty-fifth Conference on Neural Information Pro- cessing Systems Datasets and Benchmarks Track (Round 1). Sourab Mangrulkar, Sylvain Gugger, Lysandre De- but, Younes Belkada, Sayak Paul, and Benjamin Bossan. 2022. PEFT: State-of-the-art parameter- efficient fine-tu...

work page 2022

[3] [3]

The e2e dataset: New challenges for end-to- end generation. OpenAI. 2023. Chatgpt.https://chat.openai.com. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Te- jani, Sasank Chil...

work page 2023

[4] [4]

HuggingFace's Transformers: State-of-the-art Natural Language Processing

Get to the point: Summarization with pointer- generator networks. InProceedings of the 55th An- nual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Mirac Suzgun, Nathan Scales, Nathanael Schärli, Se- bastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V Le, Ed H Chi, Denny Zhou, and 1 others. 2023. Chall...

work page internal anchor Pith review Pith/arXiv arXiv 2023