Multilingual pretraining develops translation in two phases: early copying driven by surface similarities, followed by generalizing mechanisms while copying is refined.
Title resolution pending
6 Pith papers cite this work. Polarity classification is still indexing.
years
2026 6representative citing papers
ASASR recasts generative super-resolution flow into Sobolev Riemannian geometry via spectrally colored noise kernels and parametric adversaries from the Riesz Representation Theorem to enforce structural fidelity.
Rescaled ASGD recovers convergence to the true global objective by rescaling worker stepsizes proportional to computation times, matching the known time lower bound in the leading term under non-convex smoothness and bounded heterogeneity.
Task-aligned supervised geometric stability predicts linear steerability with high accuracy while unsupervised stability detects representational drift earlier and with lower false alarms than CKA or Procrustes.
Qreg+NWLU improves forgetting mitigation and knowledge transfer in value-based multi-cyclic CRL by using dynamic Q-value rehearsal and immediate regularization instead of waiting after the first task.
citing papers explorer
-
Copy First, Translate Later: Interpreting Translation Dynamics in Multilingual Pretraining
Multilingual pretraining develops translation in two phases: early copying driven by surface similarities, followed by generalizing mechanisms while copying is refined.
-
Coloring the Noise: Adversarial Sobolev Alignment for Faithful Image Super Resolution
ASASR recasts generative super-resolution flow into Sobolev Riemannian geometry via spectrally colored noise kernels and parametric adversaries from the Riesz Representation Theorem to enforce structural fidelity.
-
Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity
Rescaled ASGD recovers convergence to the true global objective by rescaling worker stepsizes proportional to computation times, matching the known time lower bound in the leading term under non-convex smoothness and bounded heterogeneity.
-
The Geometric Canary: Predicting Steerability and Detecting Drift via Representational Stability
Task-aligned supervised geometric stability predicts linear steerability with high accuracy while unsupervised stability detects representational drift earlier and with lower false alarms than CKA or Procrustes.
-
Don't Forget the Critic: Value-Based Data Rehearsal for Multi-Cyclic Continual Reinforcement Learning
Qreg+NWLU improves forgetting mitigation and knowledge transfer in value-based multi-cyclic CRL by using dynamic Q-value rehearsal and immediate regularization instead of waiting after the first task.
- REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations