Loramoe: Alleviate world knowledge for- getting in large language models via moe-style plugin

Shihan Dou, Enyu Zhou, Yan Liu, Songyang Gao, Jun Zhao, Wei Shen, Yuhao Zhou, Zhiheng Xi, Xiao Wang, Xiaoran Fan, Shiliang Pu, Jiang Zhu, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang · 2024 · arXiv 2312.09979

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Dynamic Cross-Modal Prompt Generation for Multimodal Continual Instruction Tuning

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

DRAPE generates query-image conditioned prompts on the fly for multimodal continual instruction tuning and reports SOTA results on MCIT benchmarks.

Beyond GSD-as-Token: Continuous Scale Conditioning for Remote Sensing VLMs

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

ScaleEarth conditions remote sensing VLMs on continuous GSD via CS-HLoRA and a visual GSD predictor, creating a closed training loop with GeoScale-VQA to achieve SOTA on Earth observation benchmarks.

Adaptive and Fine-grained Module-wise Expert Pruning for Efficient LoRA-MoE Fine-Tuning

cs.LG · 2026-04-29 · unverdicted · novelty 7.0

DMEP prunes experts module-by-module in LoRA-MoE and removes load balancing after pruning, cutting trainable parameters 35-43% and raising throughput ~10% while matching or exceeding uniform baselines on reasoning tasks.

Orthogonal Subspace Projection for Continual Machine Unlearning via SVD-Based LoRA

cs.LG · 2026-04-14 · unverdicted · novelty 7.0

SVD-guided orthogonal subspace projection in LoRA enables stable performance across 30 sequential unlearning tasks on CIFAR-100 and MNIST, unlike naive static fusion which collapses retained accuracy.

MP-ISMoE: Mixed-Precision Interactive Side Mixture-of-Experts for Efficient Transfer Learning

cs.LG · 2026-04-10 · unverdicted · novelty 6.0

MP-ISMoE uses Gaussian noise perturbed iterative quantization and interactive side mixture-of-experts to deliver higher accuracy than prior memory-efficient transfer learning methods while keeping similar parameter and memory usage.

Little by Little: Continual Learning via Incremental Mixture of Rank-1 Associative Memory Experts

cs.LG · 2025-06-26 · unverdicted · novelty 6.0

MoRAM frames continual learning as incremental addition of rank-1 adapters viewed as self-activating key-value associative memory units in a mixture-of-experts setup.

LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing

cs.LG · 2025-06-17 · unverdicted · novelty 6.0

LoRA-Mixer routes modular LoRA experts into attention projection matrices with an adaptive Routing Specialization Loss to improve multi-task performance while using fewer trainable parameters than prior LoRA-MoE methods.

Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression

cs.CL · 2024-06-17 · unverdicted · novelty 5.0

Introduces Tree Generation (TG-SFT) to generate synthetic instruction-tuning data from LLMs, reducing catastrophic forgetting when fine-tuning MLLMs on domain-specific or multimodal data.

citing papers explorer

Showing 8 of 8 citing papers.

Dynamic Cross-Modal Prompt Generation for Multimodal Continual Instruction Tuning cs.CV · 2026-05-11 · unverdicted · none · ref 6
DRAPE generates query-image conditioned prompts on the fly for multimodal continual instruction tuning and reports SOTA results on MCIT benchmarks.
Beyond GSD-as-Token: Continuous Scale Conditioning for Remote Sensing VLMs cs.CV · 2026-05-08 · unverdicted · none · ref 4
ScaleEarth conditions remote sensing VLMs on continuous GSD via CS-HLoRA and a visual GSD predictor, creating a closed training loop with GeoScale-VQA to achieve SOTA on Earth observation benchmarks.
Adaptive and Fine-grained Module-wise Expert Pruning for Efficient LoRA-MoE Fine-Tuning cs.LG · 2026-04-29 · unverdicted · none · ref 14
DMEP prunes experts module-by-module in LoRA-MoE and removes load balancing after pruning, cutting trainable parameters 35-43% and raising throughput ~10% while matching or exceeding uniform baselines on reasoning tasks.
Orthogonal Subspace Projection for Continual Machine Unlearning via SVD-Based LoRA cs.LG · 2026-04-14 · unverdicted · none · ref 12
SVD-guided orthogonal subspace projection in LoRA enables stable performance across 30 sequential unlearning tasks on CIFAR-100 and MNIST, unlike naive static fusion which collapses retained accuracy.
MP-ISMoE: Mixed-Precision Interactive Side Mixture-of-Experts for Efficient Transfer Learning cs.LG · 2026-04-10 · unverdicted · none · ref 154
MP-ISMoE uses Gaussian noise perturbed iterative quantization and interactive side mixture-of-experts to deliver higher accuracy than prior memory-efficient transfer learning methods while keeping similar parameter and memory usage.
Little by Little: Continual Learning via Incremental Mixture of Rank-1 Associative Memory Experts cs.LG · 2025-06-26 · unverdicted · none · ref 23
MoRAM frames continual learning as incremental addition of rank-1 adapters viewed as self-activating key-value associative memory units in a mixture-of-experts setup.
LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing cs.LG · 2025-06-17 · unverdicted · none · ref 24
LoRA-Mixer routes modular LoRA experts into attention projection matrices with an adaptive Routing Specialization Loss to improve multi-task performance while using fewer trainable parameters than prior LoRA-MoE methods.
Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression cs.CL · 2024-06-17 · unverdicted · none · ref 11
Introduces Tree Generation (TG-SFT) to generate synthetic instruction-tuning data from LLMs, reducing catastrophic forgetting when fine-tuning MLLMs on domain-specific or multimodal data.

Loramoe: Alleviate world knowledge for- getting in large language models via moe-style plugin

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer