VisMMoE exploits visual-expert affinity via token pruning to achieve up to 2.68x faster VL-MoE inference on memory-constrained hardware while keeping accuracy competitive.
Communication Scaling for Future Transformers on Future Hardware
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
CompPow makes the case that component-aware power management inside GPUs can yield 10% higher energy efficiency and 5% better performance for ML workloads.
citing papers explorer
-
VisMMOE: Exploiting Visual-Expert Affinity for Efficient Visual-Language MoE Offloading
VisMMoE exploits visual-expert affinity via token pruning to achieve up to 2.68x faster VL-MoE inference on memory-constrained hardware while keeping accuracy competitive.
-
CompPow: A Case for Component-level GPU Power Management
CompPow makes the case that component-aware power management inside GPUs can yield 10% higher energy efficiency and 5% better performance for ML workloads.