SSU mitigates catastrophic forgetting in low-resource LLM target-language adaptation by scoring and column-wise freezing source-critical parameters, reducing source degradation to ~3% versus ~20% for full fine-tuning while matching target performance.
-2.8pt 2.9mm tabular @ lcccc@ & k'=1 & k'=2 & k'=4 & k'=6 \\ Top- k & 45.48 & 47.88 & 48.05 & 47.57 \\ EMoE & 45.68 & 48.22 & 49.00 & 49.50 \\ \ \ \ w/o co-act
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
GEMS formulates close-ended human-behavior simulation as link prediction on a heterogeneous graph and matches or exceeds LLM performance with three orders of magnitude fewer parameters across three datasets and three evaluation settings.
EMoE trains MoE models so they maintain performance when the number of activated experts changes at inference, expanding the usable range to 2-3 times the training k with higher peak results.
UI-Oceanus shows that continual pre-training on forward dynamics predictions from synthetic GUI exploration improves agent success rates by 7% offline and 16.8% online, with gains scaling by data volume.
citing papers explorer
-
Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded Updates
SSU mitigates catastrophic forgetting in low-resource LLM target-language adaptation by scoring and column-wise freezing source-critical parameters, reducing source degradation to ~3% versus ~20% for full fine-tuning while matching target performance.
-
Graph-Based Alternatives to LLMs for Human Simulation
GEMS formulates close-ended human-behavior simulation as link prediction on a heterogeneous graph and matches or exceeds LLM performance with three orders of magnitude fewer parameters across three datasets and three evaluation settings.
-
Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts
EMoE trains MoE models so they maintain performance when the number of activated experts changes at inference, expanding the usable range to 2-3 times the training k with higher peak results.
-
UI-Oceanus: Scaling GUI Agents with Synthetic Environmental Dynamics
UI-Oceanus shows that continual pre-training on forward dynamics predictions from synthetic GUI exploration improves agent success rates by 7% offline and 16.8% online, with gains scaling by data volume.