SDG-MoE introduces learned signed interaction graphs and disagreement-gated deliberation among experts in MoE architectures, yielding 19.8% better validation perplexity than the strongest baseline.
A Survey on Mixture of Experts in Large Language Models , ISSN=
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
UniTrans pretrains a bank of translator experts and learns combination coefficients from modality mappings in a scene-invariant latent space to enable zero-shot any-to-any feature translation for heterogeneous collaborative perception.
Temporally extended MoE layers using the option-critic framework with deliberation costs cut switching rates below 5% while retaining most capability on MATH, MMLU, and MMMLU.
TalkLoRA equips MoE-LoRA experts with a communication module that smooths routing dynamics and improves performance on language tasks under similar parameter budgets.
SPIRE turns IRSTD into centroid regression via single-point supervision and a high-resolution probabilistic encoder, matching prior performance with lower compute and false alarms.
CoGR-MoE improves VQA by using concept-guided expert routing with option feature reweighting and contrastive learning to achieve consistent yet flexible reasoning across answer options.
ShadowNPU presents shadowAttn, a co-designed sparse attention system that uses NPU pilot compute and techniques like graph bucketing and per-head sparsity to minimize CPU/GPU fallback during on-device LLM inference while maintaining accuracy.
citing papers explorer
-
SDG-MoE: Signed Debate Graph Mixture-of-Experts
SDG-MoE introduces learned signed interaction graphs and disagreement-gated deliberation among experts in MoE architectures, yielding 19.8% better validation perplexity than the strongest baseline.
-
One Model to Translate Them All: Universal Any-to-Any Translation for Heterogeneous Collaborative Perception
UniTrans pretrains a bank of translator experts and learns combination coefficients from modality mappings in a scene-invariant latent space to enable zero-shot any-to-any feature translation for heterogeneous collaborative perception.
-
Temporally Extended Mixture-of-Experts Models
Temporally extended MoE layers using the option-critic framework with deliberation costs cut switching rates below 5% while retaining most capability on MATH, MMLU, and MMMLU.
-
TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models
TalkLoRA equips MoE-LoRA experts with a communication module that smooths routing dynamics and improves performance on language tasks under similar parameter budgets.
-
Rethinking IRSTD: Single-Point Supervision Guided Encoder-only Framework is Enough for Infrared Small Target Detection
SPIRE turns IRSTD into centroid regression via single-point supervision and a high-resolution probabilistic encoder, matching prior performance with lower compute and false alarms.
-
CoGR-MoE: Concept-Guided Expert Routing with Consistent Selection and Flexible Reasoning for Visual Question Answering
CoGR-MoE improves VQA by using concept-guided expert routing with option feature reweighting and contrastive learning to achieve consistent yet flexible reasoning across answer options.
-
ShadowNPU: System and Algorithm Co-design for NPU-Centric On-Device LLM Inference
ShadowNPU presents shadowAttn, a co-designed sparse attention system that uses NPU pilot compute and techniques like graph bucketing and per-head sparsity to minimize CPU/GPU fallback during on-device LLM inference while maintaining accuracy.