Expert calibration suffices for MoE calibration under distribution shifts in hard-routed models but not soft-routed ones; adversarial reweighting improves the accuracy-calibration tradeoff across models and shifts.
arXiv preprint arXiv:2408.11598 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Derives theoretical bounds for three structured proper losses and reports that controlled experiments do not support claims of general superiority over cross-entropy.
citing papers explorer
-
Toward Calibrated Mixture-of-Experts Under Distribution Shift
Expert calibration suffices for MoE calibration under distribution shifts in hard-routed models but not soft-routed ones; adversarial reweighting improves the accuracy-calibration tradeoff across models and shifts.