From sparse to soft mixtures of experts

Puigcerver, J · 2023 · arXiv 2308.00951

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

SplatWeaver: Learning to Allocate Gaussian Primitives for Generalizable Novel View Synthesis

cs.CV · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

SplatWeaver uses cardinality Gaussian experts and pixel-level routing to dynamically allocate varying numbers of Gaussian primitives for generalizable novel view synthesis.

When Are Experts Misrouted? Counterfactual Routing Analysis in Mixture-of-Experts Language Models

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Standard top-k routers in MoE language models often select suboptimal routes for difficult tokens, and updating only the final router layer raises pass@K on AIME and HMMT benchmarks across multiple models.

Agent-Centric Observation Adaptation for Robust Visual Control under Dynamic Perturbations

cs.RO · 2026-04-27 · unverdicted · novelty 7.0 · 2 refs

ACO-MoE recovers 95.3% of clean-input performance in visual control tasks under Markov-switching corruptions by routing restoration experts and anchoring representations to clean foreground masks.

B-MoE: A Body-Part-Aware Mixture-of-Experts "All Parts Matter" Approach to Micro-Action Recognition

cs.CV · 2026-03-25 · unverdicted · novelty 7.0

B-MoE framework achieves state-of-the-art performance on micro-action recognition by using region-specific experts and cross-attention routing.

Path-Constrained Mixture-of-Experts

cs.LG · 2026-03-18 · unverdicted · novelty 7.0

PathMoE constrains expert paths in MoE models by sharing router parameters across layer blocks, yielding more concentrated paths, better performance on perplexity and tasks, and no need for auxiliary losses.

FaceMoE: Mixture of Experts for Low-Resolution Face Recognition

cs.CV · 2026-06-30 · unverdicted · novelty 6.0

FaceMoE introduces a MoE transformer with top-k routed specialized FFN experts for resolution-aware feature extraction in low-resolution face recognition, outperforming prior methods on eleven datasets.

AGoQ: Activation and Gradient Quantization for Memory-Efficient Distributed Training of LLMs

cs.CL · 2026-05-01 · unverdicted · novelty 6.0 · 2 refs

AGoQ delivers up to 52% lower memory use and 1.34x faster training for 8B-32B LLaMA models by using near-4-bit adaptive activations and 8-bit gradients while preserving pretraining convergence and downstream accuracy.

Design and Behavior of Sparse Mixture-of-Experts Layers in CNN-based Semantic Segmentation

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

Patch-wise sparse MoE layers in CNNs for semantic segmentation yield architecture-dependent gains up to 3.9 mIoU on Cityscapes and BDD100K with low overhead, but show strong design sensitivity.

MP-ISMoE: Mixed-Precision Interactive Side Mixture-of-Experts for Efficient Transfer Learning

cs.LG · 2026-04-10 · unverdicted · novelty 6.0

MP-ISMoE uses Gaussian noise perturbed iterative quantization and interactive side mixture-of-experts to deliver higher accuracy than prior memory-efficient transfer learning methods while keeping similar parameter and memory usage.

Tight Clusters Make Specialized Experts

cs.LG · 2025-02-21 · unverdicted · novelty 6.0

Introduces Adaptive Clustering router for MoE models that scales features to identify tight expert clusters, yielding faster convergence, robustness to corruption, and performance gains.

Tackling Multimodal Learning Challenges with Mixture-of-Expert: A Survey

cs.LG · 2026-05-22 · accept · novelty 5.0

A literature survey that categorizes how Mixture-of-Experts architectures address multimodal learning challenges and identifies open research gaps.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Agent-Centric Observation Adaptation for Robust Visual Control under Dynamic Perturbations cs.RO · 2026-04-27 · unverdicted · none · ref 46 · 2 links
ACO-MoE recovers 95.3% of clean-input performance in visual control tasks under Markov-switching corruptions by routing restoration experts and anchoring representations to clean foreground masks.

From sparse to soft mixtures of experts

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer