pith. sign in

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

citation-role summary

background 1 baseline 1

citation-polarity summary

representative citing papers

Editing Models with Task Arithmetic

cs.LG · 2022-12-08 · accept · novelty 8.0

Task vectors from weight differences allow arithmetic operations to edit pre-trained models, improving multiple tasks simultaneously and enabling analogical inference on unseen tasks.

BoostLoRA: Growing Effective Rank by Boosting Adapters

cs.LG · 2026-04-30 · unverdicted · novelty 7.0

BoostLoRA grows effective adapter rank linearly via iterative boosting on hard examples with orthogonal low-rank updates, outperforming both single-shot ultra-low-rank adapters and full fine-tuning on math and code tasks with zero added inference overhead.

Atomic-Probe Governance for Skill Updates in Compositional Robot Policies

cs.RO · 2026-04-29 · unverdicted · novelty 7.0 · 2 refs

A cross-version swap protocol reveals dominant skills that swing composition success by up to 50 percentage points, and an atomic probe with selective revalidation governs updates at lower cost than always re-testing full compositions.

Flamingo: a Visual Language Model for Few-Shot Learning

cs.CV · 2022-04-29 · unverdicted · novelty 7.0

Flamingo models reach new state-of-the-art few-shot results on image and video tasks by bridging frozen vision and language models with cross-attention layers trained on interleaved web-scale data.

A Roadmap to Pluralistic Alignment

cs.AI · 2024-02-07 · unverdicted · novelty 6.0

The paper formalizes three types of pluralistic AI models and three benchmark classes, arguing that current alignment techniques may reduce rather than increase distributional pluralism.

citing papers explorer

Showing 9 of 9 citing papers.

  • Editing Models with Task Arithmetic cs.LG · 2022-12-08 · accept · none · ref 106

    Task vectors from weight differences allow arithmetic operations to edit pre-trained models, improving multiple tasks simultaneously and enabling analogical inference on unseen tasks.

  • BoostLoRA: Growing Effective Rank by Boosting Adapters cs.LG · 2026-04-30 · unverdicted · none · ref 34

    BoostLoRA grows effective adapter rank linearly via iterative boosting on hard examples with orthogonal low-rank updates, outperforming both single-shot ultra-low-rank adapters and full fine-tuning on math and code tasks with zero added inference overhead.

  • Atomic-Probe Governance for Skill Updates in Compositional Robot Policies cs.RO · 2026-04-29 · unverdicted · none · ref 48 · 2 links

    A cross-version swap protocol reveals dominant skills that swing composition success by up to 50 percentage points, and an atomic probe with selective revalidation governs updates at lower cost than always re-testing full compositions.

  • Flamingo: a Visual Language Model for Few-Shot Learning cs.CV · 2022-04-29 · unverdicted · none · ref 128

    Flamingo models reach new state-of-the-art few-shot results on image and video tasks by bridging frozen vision and language models with cross-attention layers trained on interleaved web-scale data.

  • Early Data Exposure Improves Robustness to Subsequent Fine-Tuning cs.LG · 2026-05-12 · conditional · none · ref 20

    Early mixing of post-training data into pretraining improves retention of acquired capabilities after subsequent fine-tuning in language models.

  • Train Separately, Merge Together: Modular Post-Training with Mixture-of-Experts cs.LG · 2026-04-20 · unverdicted · none · ref 44

    BAR trains independent domain experts via separate mid-training, SFT, and RL pipelines then composes them with a MoE router to match monolithic retraining performance at lower cost and without catastrophic forgetting.

  • EmbeddingGemma: Powerful and Lightweight Text Representations cs.CL · 2025-09-24 · unverdicted · none · ref 25

    A 300M-parameter open embedding model sets new SOTA on MTEB for its size class and matches models twice as large while staying effective when compressed.

  • A Roadmap to Pluralistic Alignment cs.AI · 2024-02-07 · unverdicted · none · ref 295

    The paper formalizes three types of pluralistic AI models and three benchmark classes, arguing that current alignment techniques may reduce rather than increase distributional pluralism.

  • CoCa: Contrastive Captioners are Image-Text Foundation Models cs.CV · 2022-05-04 · accept · none · ref 52

    CoCa unifies contrastive and generative pretraining in one image-text model to reach 86.3% zero-shot ImageNet accuracy and new state-of-the-art results on multiple downstream benchmarks.