Journal of Machine Learning Research , volume=

Switch Transformers: Scaling to Trillion Parameter Models with Simple, Efficient Sparsity , author=

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

representative citing papers

BoolXLLM: LLM-Assisted Explainability for Boolean Models

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

BoolXLLM augments an existing Boolean rule learner with LLMs for feature selection, discretization thresholds, and natural-language rule translation to improve interpretability while preserving accuracy.

Mixture of Experts for Low-Resource LLMs

cs.CL · 2026-05-17 · unverdicted · novelty 5.0

Pre-trained MoE models exhibit deep-layer routing collapse for low-resource languages like Hebrew, largely corrected by continual pre-training on balanced bilingual data, with consistent patterns observed in Japanese.

When Does Sparse MoE Help in Vision? The Role of Backbone Compute Leverage in Sparse Routing

cs.CV · 2026-05-15 · unverdicted · novelty 5.0

Sparse MoE vision models show positive accuracy gaps only when routing a substantial compute fraction ρ and using k≥2 experts at large scale; batch-axis dispatch is identified as a key failure mode.

Dense vs Sparse Pretraining at Tiny Scale: Active-Parameter vs Total-Parameter Matching

cs.CL · 2026-05-13 · accept · novelty 5.0

At tiny scale, MoE transformers lower validation loss versus dense models when active parameters match but raise it when total stored parameters match.

VLA-GSE: Boosting Parameter-Efficient Fine-Tuning in VLA with Generalized and Specialized Experts

cs.RO · 2026-05-07 · unverdicted · novelty 5.0

VLA-GSE uses spectral decomposition of the VLA backbone to create generalized and specialized experts, enabling effective robot task adaptation while updating only 2.51% of parameters and achieving 81.2% zero-shot success on LIBERO-Plus.

citing papers explorer

Showing 5 of 5 citing papers.

BoolXLLM: LLM-Assisted Explainability for Boolean Models cs.AI · 2026-05-12 · unverdicted · none · ref 53
BoolXLLM augments an existing Boolean rule learner with LLMs for feature selection, discretization thresholds, and natural-language rule translation to improve interpretability while preserving accuracy.
Mixture of Experts for Low-Resource LLMs cs.CL · 2026-05-17 · unverdicted · none · ref 2
Pre-trained MoE models exhibit deep-layer routing collapse for low-resource languages like Hebrew, largely corrected by continual pre-training on balanced bilingual data, with consistent patterns observed in Japanese.
When Does Sparse MoE Help in Vision? The Role of Backbone Compute Leverage in Sparse Routing cs.CV · 2026-05-15 · unverdicted · none · ref 5
Sparse MoE vision models show positive accuracy gaps only when routing a substantial compute fraction ρ and using k≥2 experts at large scale; batch-axis dispatch is identified as a key failure mode.
Dense vs Sparse Pretraining at Tiny Scale: Active-Parameter vs Total-Parameter Matching cs.CL · 2026-05-13 · accept · none · ref 2
At tiny scale, MoE transformers lower validation loss versus dense models when active parameters match but raise it when total stored parameters match.
VLA-GSE: Boosting Parameter-Efficient Fine-Tuning in VLA with Generalized and Specialized Experts cs.RO · 2026-05-07 · unverdicted · none · ref 15
VLA-GSE uses spectral decomposition of the VLA backbone to create generalized and specialized experts, enabling effective robot task adaptation while updating only 2.51% of parameters and achieving 81.2% zero-shot success on LIBERO-Plus.

Journal of Machine Learning Research , volume=

fields

years

verdicts

representative citing papers

citing papers explorer