SE2D stabilizes continual distillation across heterogeneous teachers by preserving logits on external unlabeled data to mitigate unseen knowledge forgetting.
Bert: Pre-training of deep bidirectional trans- formers for language understanding
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
ACE-Merging estimates task input covariances from parameter differences to enable closed-form data-free merging that reduces interference and outperforms prior baselines on vision and language tasks.
citing papers explorer
-
Continual Distillation of Teachers from Different Domains
SE2D stabilizes continual distillation across heterogeneous teachers by preserving logits on external unlabeled data to mitigate unseen knowledge forgetting.
-
ACE-Merging: Data-Free Model Merging with Adaptive Covariance Estimation
ACE-Merging estimates task input covariances from parameter differences to enable closed-form data-free merging that reduces interference and outperforms prior baselines on vision and language tasks.