Measuring massive multitask language understanding.Proceedings of the International Conference on Learning Representations (ICLR), 2021a

Quzhe Huang, Zhenwei An, Nan Zhuang, Mingxu Tao, Chen Zhang, Yang Jin, Kun Xu, Liwei Chen, Songfang Huang, Yansong Feng · 2024 · arXiv 2403.07652

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

Mixture of Predefined Experts: Maximizing Data Usage on Vertical Federated Learning

cs.LG · 2026-02-13 · unverdicted · novelty 7.0

Split-MoPE integrates split learning with predefined-expert routing to maximize usable data in vertical federated learning under sample misalignment, delivering state-of-the-art accuracy in one communication round plus built-in robustness and per-sample contribution scores.

BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE

cs.AI · 2026-05-14 · conditional · novelty 6.0

BEAM uses binary expert activation masks trained end-to-end to achieve dynamic sparsity in MoE models, cutting FLOPs by 85% with over 98% performance retention.

XPERT: Expert Knowledge Transfer for Effective Training of Language Models

cs.CL · 2026-05-09 · unverdicted · novelty 6.0

XPERT extracts and reuses cross-domain expert knowledge from pre-trained MoE LLMs via inference analysis and tensor decomposition to improve performance and convergence in downstream language model training.

DIMoE-Adapters: Dynamic Expert Evolution for Continual Learning in Vision-Language Models

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

DIMoE-Adapters uses self-calibrated expert evolution and prototype-guided selection to dynamically grow and allocate experts, outperforming prior continual learning methods on vision-language models.

citing papers explorer

Showing 4 of 4 citing papers.

Mixture of Predefined Experts: Maximizing Data Usage on Vertical Federated Learning cs.LG · 2026-02-13 · unverdicted · none · ref 14
Split-MoPE integrates split learning with predefined-expert routing to maximize usable data in vertical federated learning under sample misalignment, delivering state-of-the-art accuracy in one communication round plus built-in robustness and per-sample contribution scores.
BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE cs.AI · 2026-05-14 · conditional · none · ref 9
BEAM uses binary expert activation masks trained end-to-end to achieve dynamic sparsity in MoE models, cutting FLOPs by 85% with over 98% performance retention.
XPERT: Expert Knowledge Transfer for Effective Training of Language Models cs.CL · 2026-05-09 · unverdicted · none · ref 60
XPERT extracts and reuses cross-domain expert knowledge from pre-trained MoE LLMs via inference analysis and tensor decomposition to improve performance and convergence in downstream language model training.
DIMoE-Adapters: Dynamic Expert Evolution for Continual Learning in Vision-Language Models cs.CV · 2026-05-08 · unverdicted · none · ref 31
DIMoE-Adapters uses self-calibrated expert evolution and prototype-guided selection to dynamically grow and allocate experts, outperforming prior continual learning methods on vision-language models.

Measuring massive multitask language understanding.Proceedings of the International Conference on Learning Representations (ICLR), 2021a

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer