A Survey on Mixture of Experts in Large Language Models , ISSN=

· 2025 · arXiv 2025.355402

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

SDG-MoE: Signed Debate Graph Mixture-of-Experts

cs.LG · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

SDG-MoE introduces learned signed interaction graphs and disagreement-gated deliberation among experts in MoE architectures, yielding 19.8% better validation perplexity than the strongest baseline.

One Model to Translate Them All: Universal Any-to-Any Translation for Heterogeneous Collaborative Perception

cs.CV · 2026-05-18 · conditional · novelty 6.0

UniTrans pretrains a bank of translator experts and learns combination coefficients from modality mappings in a scene-invariant latent space to enable zero-shot any-to-any feature translation for heterogeneous collaborative perception.

Temporally Extended Mixture-of-Experts Models

cs.LG · 2026-04-22 · unverdicted · novelty 6.0

Temporally extended MoE layers using the option-critic framework with deliberation costs cut switching rates below 5% while retaining most capability on MATH, MMLU, and MMMLU.

TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models

cs.LG · 2026-04-07 · unverdicted · novelty 6.0

TalkLoRA equips MoE-LoRA experts with a communication module that smooths routing dynamics and improves performance on language tasks under similar parameter budgets.

Rethinking IRSTD: Single-Point Supervision Guided Encoder-only Framework is Enough for Infrared Small Target Detection

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

SPIRE turns IRSTD into centroid regression via single-point supervision and a high-resolution probabilistic encoder, matching prior performance with lower compute and false alarms.

CoGR-MoE: Concept-Guided Expert Routing with Consistent Selection and Flexible Reasoning for Visual Question Answering

cs.CV · 2026-04-18 · unverdicted · novelty 5.0

CoGR-MoE improves VQA by using concept-guided expert routing with option feature reweighting and contrastive learning to achieve consistent yet flexible reasoning across answer options.

ShadowNPU: System and Algorithm Co-design for NPU-Centric On-Device LLM Inference

cs.PF · 2025-08-22 · unverdicted · novelty 5.0

ShadowNPU presents shadowAttn, a co-designed sparse attention system that uses NPU pilot compute and techniques like graph bucketing and per-head sparsity to minimize CPU/GPU fallback during on-device LLM inference while maintaining accuracy.

citing papers explorer

Showing 7 of 7 citing papers.

SDG-MoE: Signed Debate Graph Mixture-of-Experts cs.LG · 2026-05-08 · unverdicted · none · ref 24 · 2 links
SDG-MoE introduces learned signed interaction graphs and disagreement-gated deliberation among experts in MoE architectures, yielding 19.8% better validation perplexity than the strongest baseline.
One Model to Translate Them All: Universal Any-to-Any Translation for Heterogeneous Collaborative Perception cs.CV · 2026-05-18 · conditional · none · ref 3
UniTrans pretrains a bank of translator experts and learns combination coefficients from modality mappings in a scene-invariant latent space to enable zero-shot any-to-any feature translation for heterogeneous collaborative perception.
Temporally Extended Mixture-of-Experts Models cs.LG · 2026-04-22 · unverdicted · none · ref 4
Temporally extended MoE layers using the option-critic framework with deliberation costs cut switching rates below 5% while retaining most capability on MATH, MMLU, and MMMLU.
TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models cs.LG · 2026-04-07 · unverdicted · none · ref 4
TalkLoRA equips MoE-LoRA experts with a communication module that smooths routing dynamics and improves performance on language tasks under similar parameter budgets.
Rethinking IRSTD: Single-Point Supervision Guided Encoder-only Framework is Enough for Infrared Small Target Detection cs.CV · 2026-04-07 · unverdicted · none · ref 23
SPIRE turns IRSTD into centroid regression via single-point supervision and a high-resolution probabilistic encoder, matching prior performance with lower compute and false alarms.
CoGR-MoE: Concept-Guided Expert Routing with Consistent Selection and Flexible Reasoning for Visual Question Answering cs.CV · 2026-04-18 · unverdicted · none · ref 2
CoGR-MoE improves VQA by using concept-guided expert routing with option feature reweighting and contrastive learning to achieve consistent yet flexible reasoning across answer options.
ShadowNPU: System and Algorithm Co-design for NPU-Centric On-Device LLM Inference cs.PF · 2025-08-22 · unverdicted · none · ref 16
ShadowNPU presents shadowAttn, a co-designed sparse attention system that uses NPU pilot compute and techniques like graph bucketing and per-head sparsity to minimize CPU/GPU fallback during on-device LLM inference while maintaining accuracy.

A Survey on Mixture of Experts in Large Language Models , ISSN=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer