Switch Transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learning Research, 23 (120):1–39

William Fedus, Barret Zoph, Noam Shazeer · 2022

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

BOHM: Zero-Cost Hierarchical Attribution for Compound AI Systems

cs.AI · 2026-05-19 · conditional · novelty 7.0

BOHM extracts multi-resolution attribution trees from existing routing weights in hierarchical AI systems, providing zero-cost explanations that correlate with SHAP when routing is near-optimal.

Perceive, Route and Modulate: Dynamic Pattern Recalibration for Time Series Forecasting

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

Dynamic Pattern Recalibration (DPR) adds a perceive-route-modulate pipeline that generates time-aware modulation vectors to recalibrate hidden states in forecasting models, improving performance across architectures with low overhead.

citing papers explorer

Showing 2 of 2 citing papers.

BOHM: Zero-Cost Hierarchical Attribution for Compound AI Systems cs.AI · 2026-05-19 · conditional · none · ref 10
BOHM extracts multi-resolution attribution trees from existing routing weights in hierarchical AI systems, providing zero-cost explanations that correlate with SHAP when routing is near-optimal.
Perceive, Route and Modulate: Dynamic Pattern Recalibration for Time Series Forecasting cs.LG · 2026-05-07 · unverdicted · none · ref 32
Dynamic Pattern Recalibration (DPR) adds a perceive-route-modulate pipeline that generates time-aware modulation vectors to recalibrate hidden states in forecasting models, improving performance across architectures with low overhead.

Switch Transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learning Research, 23 (120):1–39

fields

years

verdicts

representative citing papers

citing papers explorer