pith. machine review for the scientific record. sign in

arxiv: 2507.00029 · v2 · submitted 2025-06-17 · 💻 cs.LG · cs.AI

Recognition: unknown

LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing

Authors on Pith no claims yet
classification 💻 cs.LG cs.AI
keywords lora-mixerroutingattentionloraspecializationexpertslayerswhile
0
0 comments X
read the original abstract

Recent attempts to combine low-rank adaptation (LoRA) with mixture-of-experts (MoE) for multi-task adaptation of Large Language Models (LLMs) often replace whole attention/FFN layers with switch experts or append parallel expert branches, undermining parameter efficiency and limiting task specialization. We introduce LoRA-Mixer, a modular MoE framework that routes task-specific LoRA experts into the core projection matrices of the attention module, namely input and output linear layers, rather than primarily targeting FFN blocks. The design delivers fine-grained token-level specialization by fully exploiting the attention mechanism, while remaining drop-in compatible with Transformers and state-space models (SSMs), since linear projection layers are ubiquitous. To train robust routers from limited data while promoting stable, selective decisions and high expert reuse, LoRA-Mixer employs an adaptive Routing Specialization Loss (RSL) that jointly enforces global load balance and input-aware specialization via an entropy-shaping objective. The framework supports two regimes: (i) joint optimization of adapters and router with a differentiable hard-soft top-k routing scheme, and (ii) plug-and-play routing over frozen, pre-trained LoRA modules sourced from public repositories. Across 15 benchmarks, including MedQA, GSM8K, HumanEval, and GLUE, RSL-optimized LoRA-Mixer outperforms state-of-the-art routing and LoRA-MoE baselines while using 48 percent of their trainable parameters, with gains of 3.79, 2.90, and 3.95 percentage points on GSM8K, CoLA, and ARC-C, respectively. Cross-model transfer and adapter reuse experiments further demonstrate the approach's versatility and data efficiency. Our code is available at https://github.com/hustcselwb/LoRA-Mixer.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. IntervenSim: Intervention-Aware Social Network Simulation for Opinion Dynamics

    cs.SI 2026-04 unverdicted novelty 7.0

    IntervenSim is an intervention-aware social network simulation that couples source interventions with crowd interactions in a feedback loop, improving MAPE by 41.6% and DTW by 66.9% over prior static frameworks on rea...

  2. GateMOT: Q-Gated Attention for Dense Object Tracking

    cs.CV 2026-04 unverdicted novelty 6.0

    GateMOT proposes Q-Gated Attention to enable linear-complexity, spatially aware attention for state-of-the-art dense object tracking on benchmarks like BEE24.

  3. OmniTrend: Content-Context Modeling for Scalable Social Popularity Prediction

    cs.CV 2026-04 unverdicted novelty 6.0

    OmniTrend predicts popularity by combining separate content attractiveness and contextual exposure predictors using cross-modal and exogenous signals.

  4. HotComment: A Benchmark for Evaluating Popularity of Online Comments

    cs.AI 2026-04 unverdicted novelty 6.0

    HotComment is a new multimodal benchmark that quantifies online comment popularity via content quality assessment, interaction-based prediction, and agent-simulated user engagement, accompanied by the StyleCmt stylist...

  5. Seeing Further and Wider: Joint Spatio-Temporal Enlargement for Micro-Video Popularity Prediction

    cs.MM 2026-04 unverdicted novelty 5.0

    A new joint spatio-temporal enlargement model for micro-video popularity prediction using frame scoring for long sequences and a topology-aware memory bank for unbounded historical associations.

  6. CurEvo: Curriculum-Guided Self-Evolution for Video Understanding

    cs.CV 2026-04 unverdicted novelty 4.0

    CurEvo integrates curriculum guidance into self-evolution to structure autonomous improvement of video understanding models, yielding gains on VideoQA benchmarks.