Task vectors from weight differences allow arithmetic operations to edit pre-trained models, improving multiple tasks simultaneously and enabling analogical inference on unseen tasks.
The role of permutation invariance in linear mode connectivity of neural networks.arXiv preprint arXiv:2110.06296
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Fisher-MoE prunes sparse intermediate dimensions in MoE FFNs ranked by Fisher importance, delivering 50% compression that preserves capability while cutting memory ~45% and raising throughput 21%.
MCWC aligns permutation-symmetric blocks across layers to enable sequential prediction and residual entropy coding, improving rate-accuracy tradeoffs versus quantization and prior codecs on language and vision models.
ODE-M formulates continual model merging as a barrier-aware ODE trajectory in parameter space, using first-order feedback and a utility-aware schedule to balance retained knowledge and new task performance.
DiMS is a physics-inspired dynamical sampler guaranteed to exactly sample reparameterization-invariant minimum level sets in neural network loss landscapes.
Representations learned by large AI models are converging toward a shared statistical model of reality.
citing papers explorer
No citing papers match the current filters.