FlexMoE produces nested pruned subnetworks for MoE LLMs across budgets via channel importance ranking and discrete action learning, plus one mid-budget recovery fine-tune, retaining 99.8% performance at 50% expert parameter pruning.
Training matryoshka mixture-of-experts for elastic inference- time expert utilization
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
PARCEL is a new visual tokenization architecture combining pool-anchored resampling with conditioned elastic queries to enhance performance-efficiency tradeoffs in LVLMs over prior matryoshka methods.
VECA learns effective visual representations using core-periphery attention where patches interact exclusively via a resolution-invariant set of learned core embeddings, achieving linear O(N) complexity while maintaining competitive performance.
m3BERT uses a three-stage Matryoshka pretraining approach on a bidirectional encoder to support variable embedding sizes while outperforming prior models on large-scale retrieval tasks.
citing papers explorer
-
m3BERT: A Modern, Multi-lingual, Matryoshka Bidirectional Encoder
m3BERT uses a three-stage Matryoshka pretraining approach on a bidirectional encoder to support variable embedding sizes while outperforming prior models on large-scale retrieval tasks.