Cluster-aware Upcycling breaks MoE expert symmetry by initializing experts from semantic activation clusters using SVD subspaces and cluster centroids for the router, plus self-distillation, yielding better zero- and few-shot performance on CLIP ViT models than standard upcycling.
Scaling laws for neural language models.arXiv, 2020
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Newer text-to-image models produce less diverse synthetic data, causing classifiers trained solely on them to show declining accuracy on real test sets.
citing papers explorer
-
Enhancing Mixture-of-Experts Specialization via Cluster-Aware Upcycling
Cluster-aware Upcycling breaks MoE expert symmetry by initializing experts from semantic activation clusters using SVD subspaces and cluster centroids for the router, plus self-distillation, yielding better zero- and few-shot performance on CLIP ViT models than standard upcycling.
-
When Pretty Isn't Useful: Investigating Why Modern Text-to-Image Models Fail as Reliable Training Data Generators
Newer text-to-image models produce less diverse synthetic data, causing classifiers trained solely on them to show declining accuracy on real test sets.