{"total":13,"items":[{"citing_arxiv_id":"2606.01045","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Child-directed speech facilitates production, not comprehension, in BabyLMs","primary_cat":"cs.CL","submitted_at":"2026-05-31T06:27:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CDS-trained BabyLMs show earlier and more appropriate production in a new frame-completion task while FineWeb-edu models lead on comprehension benchmarks, indicating current tests underestimate CDS benefits.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19409","ref_index":1,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Unlocking the Potential of Continual Model Merging: An ODE Perspective","primary_cat":"cs.LG","submitted_at":"2026-05-19T06:03:20+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14546","ref_index":55,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Discovering Physical Directions in Weight Space: Composing Neural PDE Experts","primary_cat":"cs.LG","submitted_at":"2026-05-14T08:25:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Fine-tuning neural PDE operators to regime endpoints reveals a physical direction in weight space that CCM uses to compose accurate merged models for new or extrapolated regimes from metadata or short prefixes.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09471","ref_index":136,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Statistical Cost of Adaptation in Multi-Source Transfer Learning","primary_cat":"math.ST","submitted_at":"2026-05-10T10:56:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Multi-source transfer learning incurs an intrinsic adaptation cost that can exceed one, with phase transitions separating regimes where bias-agnostic estimators match oracle performance from those where they cannot.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.17751","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"HiP-LoRA: Budgeted Spectral Plasticity for Robust Low-Rank Adaptation","primary_cat":"cs.LG","submitted_at":"2026-04-20T03:11:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"HiP-LoRA decomposes LoRA updates into principal and residual spectral channels with a singular-value-weighted stability budget to reduce forgetting and interference during foundation model adaptation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.22823","ref_index":28,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PivotMerge: Bridging Heterogeneous Multimodal Pre-training via Post-Alignment Model Merging","primary_cat":"cs.CV","submitted_at":"2026-04-18T09:38:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PivotMerge merges heterogeneous multimodal pre-trained models via shared-space decomposition to filter conflicts and layer-wise weights based on alignment contributions, outperforming baselines on multimodal benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.16426","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Functional Similarity Metric for Neural Networks: Overcoming Parametric Ambiguity via Activation Region Analysis","primary_cat":"cs.LG","submitted_at":"2026-04-04T17:50:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A functional similarity metric for ReLU networks uses normalized activation region signatures and MinHash to overcome parametric symmetries like neuron permutation and scaling.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Activation patterns may \"flicker,\" making derived statistics un- stable. 3.3 More advanced algorithmic approaches (often for network alignment) There exist more complex methods primarily aimed at aligning neurons be- tween two networks, closely related to our task of constructing a similarity metric. • Optimal assignment-based methods: \"Git Re-Basin: Merging Models modulo Permutation Symmetries\" [ 1] and \"Optimizing Mode Connectivity via Neuron Alignment\" [ 32] solve a linear assignment problem to match neurons based on weight similarity or activation correlation, finding an optimal permutation for alignment. • Combinatorial structure analysis (sign patterns): Studies such as [ 16] examine the combinatorial structure of linear regions (cham- bers) defined by ReLU networks via sign patterns of pre-activations."},{"citing_arxiv_id":"2604.02719","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MOMO: Mars Orbital Model Foundation Model for Mars Orbital Applications","primary_cat":"cs.CV","submitted_at":"2026-04-03T04:22:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"MOMO merges sensor-specific models from three Mars orbital instruments at matched validation loss stages to form a foundation model that outperforms ImageNet, Earth observation, sensor-specific, and supervised baselines on nine Mars-Bench tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"In Figure 3, each plot presents the interpolated loss surface among the HiRISE, CTX, and THEMIS models. Across all merging strategies, the task-vector-based merged model consistently achieves an equal or lower loss com- pared to its constituent models. This demonstrates that task-vector merging can produce models that outperform the originals from which they were derived. Prior work [2, 20, 61] suggests that optimal merging oc- curs when constituent models lie within the same loss basin, as this promotes stability and enhanced performance. Fig- ure 3 shows that relative to both LE and ES, the EVL strat- egy selects model checkpoints that are more closely aligned in weight space. Consequently, EVL is expected to yield the most stable and best-performing merged model."},{"citing_arxiv_id":"2603.24350","ref_index":41,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Evidence of an Emergent \"Self\" in Continual Robot Learning","primary_cat":"cs.RO","submitted_at":"2026-03-25T14:27:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Continual learning robots form a significantly more stable invariant subnetwork than constant-task controls, and preserving it improves adaptation while damaging it hurts performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.20102","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Steerable Adversarial Scenario Generation through Test-Time Preference Alignment","primary_cat":"cs.AI","submitted_at":"2025-09-24T13:27:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SAGE reframes adversarial scenario generation as multi-objective preference alignment, using hierarchical group-based optimization and test-time linear interpolation of two expert policies to enable steerable control over adversariality-realism trade-offs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2506.14951","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Flat Channels to Infinity in Neural Loss Landscapes","primary_cat":"cs.LG","submitted_at":"2025-06-17T20:04:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Neural loss landscapes contain flat channels to infinity along which gradient flow leads pairs of neurons to implement gated linear units.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2405.07987","ref_index":206,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Platonic Representation Hypothesis","primary_cat":"cs.LG","submitted_at":"2024-05-13T17:58:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Representations learned by large AI models are converging toward a shared statistical model of reality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2212.04089","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Editing Models with Task Arithmetic","primary_cat":"cs.LG","submitted_at":"2022-12-08T05:50:53+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Task vectors from weight differences allow arithmetic operations to edit pre-trained models, improving multiple tasks simultaneously and enabling analogical inference on unseen tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"saving compute with little harm in accuracy. Limitations. Task vectors are restricted to models with the same architecture, since they depend on element-wise operations on model weights. Further, in all of our experiments we perform arithmetic operations only on models ﬁne-tuned from the same pre-trained initialization, although emerging work shows promise in relaxing this assumption [2]. We also note that some architectures are very popular, and have \"standard\" initializations-e.g., at the time of writing there are over 3,000 models on Hugging Face Hub ﬁne-tuned from the same BERT-base initialization [17], and over 800 models ﬁne-tuned from the same T5-small initialization. 8 Published as a conference paper at ICLR 2023 7 R ELATED WORK"}],"limit":50,"offset":0}