Recognition: 2 theorem links
· Lean TheoremFeatCal: Feature Calibration for Post-Merging Models
Pith reviewed 2026-05-14 20:23 UTC · model grok-4.3
The pith
Feature drift in merged models decomposes into upstream propagation and local mismatch that can be corrected layer by layer using closed-form calibration on a small set.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Feature drift between merged and expert models can be decomposed into upstream propagation and local mismatch; tracking this drift through layers in forward order links it to output degradation and motivates an efficient closed-form layer-wise calibration that reduces drift while remaining close to the merged weights.
What carries the argument
The decomposition of feature drift into upstream propagation and local mismatch, which enables derivation of layer-wise closed-form weight calibration updates.
If this is right
- FeatCal reaches 85.5% accuracy on CLIP-ViT-B/32 Task Arithmetic versus 77.0% and 78.8% for Surgery and ProbSurgery.
- On FLAN-T5-base GLUE it reaches 85.2% versus 83.7% and 82.2%.
- Eight examples per task yield 82.9% on CLIP-ViT-B/32 while 256 examples finish in 53 seconds, roughly four times faster than the baselines.
- No gradient descent, iterative optimization, or added modules are required.
Where Pith is reading between the lines
- The same forward-order calibration could be applied to merging methods other than task arithmetic.
- Feature-drift tracking may allow selective recalibration of only the layers where drift accumulates most.
- The closed-form solution could support incremental merging by updating only new layers when additional tasks are added.
Load-bearing premise
The decomposition of feature drift into upstream propagation and local mismatch captures the dominant cause of output degradation, and the layer-wise closed-form calibration on a small set generalizes without harming unrelated capabilities.
What would settle it
Applying the FeatCal updates fails to reduce measured feature drift or raise accuracy above the uncalibrated merged model and the Surgery baselines on a held-out calibration set from the same tasks.
Figures
read the original abstract
Model merging combines task experts into one model and avoids joint training, retraining, or deploying many expert models, but the merged model often still underperforms task experts. We study this performance gap through feature drift, the difference between features produced by the merged model and by the expert on the same input. Our theory decomposes this drift into upstream propagation and local mismatch, tracks how it propagates and combines through later layers in forward order, and links final feature drift to output drift. This view motivates FeatCal, which uses a small calibration set to calibrate the merged model weights layer by layer in forward order, reducing feature drift while staying close to merged weights and preserving the benefits of model merging. FeatCal uses an efficient closed-form solution to update model weights, with no gradient descent, iterative optimization, or extra modules. On the main CLIP and GLUE benchmarks, FeatCal beats Surgery and ProbSurgery, the closest post-merging calibration baselines: 85.5% vs. 77.0%/78.8% on CLIP-ViT-B/32 Task Arithmetic (TA) and 85.2% vs. 83.7%/82.2% on FLAN-T5-base GLUE. On CLIP-ViT-B/32, 8 examples per task reach 82.9%, and 256 examples per task take 53 seconds, about 4x faster than both baselines, showing better sample efficiency and lower calibration cost.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that feature drift in merged models can be decomposed into upstream propagation and local mismatch, which propagates through layers in forward order and links to output degradation. This motivates FeatCal, a post-merging calibration method that applies layer-wise closed-form weight updates in forward order on a small calibration set to reduce drift while remaining close to the merged weights, without gradients, optimization, or extra modules. It reports benchmark wins over Surgery and ProbSurgery: 85.5% vs. 77.0%/78.8% on CLIP-ViT-B/32 Task Arithmetic and 85.2% vs. 83.7%/82.2% on FLAN-T5-base GLUE, plus efficiency results (82.9% with 8 examples/task; 53s for 256 examples/task).
Significance. If the decomposition holds and sequential calibration generalizes without reintroducing drift or harming unrelated capabilities, FeatCal would offer a practical, low-cost way to close the gap between merged models and task experts, strengthening model merging as an alternative to joint training or multi-model deployment.
major comments (1)
- [Method (drift decomposition and sequential update)] The central derivation assumes sequential forward-order closed-form calibration on local mismatch leaves residual upstream drift small after non-linearities (ReLU/GELU/attention) propagate changes in activation scale and distribution to later layers. No analytic bound, post-correction drift measurement, or ablation of the decomposition is supplied to verify that the full-pass residual remains negligible; this assumption is load-bearing for the claim that the method reduces total drift without re-solving the system.
minor comments (1)
- [Abstract and Experiments] Abstract and results sections report benchmark numbers and timing but omit any error bars, statistical significance tests, or details on how the 8-example and 256-example regimes were sampled.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the drift decomposition. We address the major comment below and will revise the manuscript to incorporate additional verification.
read point-by-point responses
-
Referee: The central derivation assumes sequential forward-order closed-form calibration on local mismatch leaves residual upstream drift small after non-linearities (ReLU/GELU/attention) propagate changes in activation scale and distribution to later layers. No analytic bound, post-correction drift measurement, or ablation of the decomposition is supplied to verify that the full-pass residual remains negligible; this assumption is load-bearing for the claim that the method reduces total drift without re-solving the system.
Authors: We agree that the manuscript does not supply an analytic bound on residual upstream drift after non-linearities, nor does it report explicit post-correction drift measurements across layers or an ablation isolating the sequential decomposition. The derivation proceeds from the forward-order propagation of local mismatch and relies on the empirical observation that layer-wise closed-form updates reduce total drift without iterative re-solving. In the revision we will add: (i) layer-wise feature drift measurements before and after FeatCal on the calibration set, (ii) an ablation comparing sequential forward calibration against a simultaneous (non-sequential) variant, and (iii) a brief discussion of why a tight analytic bound is intractable for general non-linear activations while the empirical gains on CLIP and GLUE benchmarks support the practical utility of the approach. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper presents a decomposition of feature drift into upstream propagation and local mismatch, then derives a layer-wise closed-form weight update using a small held-out calibration set. This process computes updates directly from data examples and does not reduce by construction to fitted parameters, self-citations, or tautological definitions. The central claims rest on empirical improvements over baselines on CLIP and GLUE tasks rather than on any load-bearing self-citation or ansatz smuggled from prior work. The derivation chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Feature drift is the primary driver of the performance gap between merged and expert models
invented entities (1)
-
feature drift (decomposed into upstream propagation and local mismatch)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our theory decomposes this drift into upstream propagation and local mismatch... closed-form solution to update model weights, with no gradient descent...
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat ≃ Nat recovery unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
layer by layer in forward order... regularized regression problem with a closed-form update
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, and Ludwig Schmidt
Mitchell Wortsman, Gabriel Ilharco, Samir Ya Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, and Ludwig Schmidt. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. InICML, 2022
work page 2022
-
[2]
Michael S. Matena and Colin A. Raffel. Merging models with fisher-weighted averaging. In NeurIPS, 2022
work page 2022
-
[3]
Editing models with task arithmetic
Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Ludwig Schmidt, Hannaneh Ha- jishirzi, and Ali Farhadi. Editing models with task arithmetic. InICLR, 2023
work page 2023
-
[4]
TIES-Merging: Resolving interference when merging models
Prateek Yadav, Derek Tam, Leshem Choshen, Colin Raffel, and Mohit Bansal. TIES-Merging: Resolving interference when merging models. InNeurIPS, 2023
work page 2023
-
[5]
AdaMerging: Adaptive model merging for multi-task learning
Enneng Yang, Zhenyi Wang, Li Shen, Shiwei Liu, Guibing Guo, Xingwei Wang, and Dacheng Tao. AdaMerging: Adaptive model merging for multi-task learning. InICLR, 2024
work page 2024
-
[6]
Language models are Super Mario: Absorbing abilities from homologous models as a free lunch
Le Yu, Bowen Yu, Haiyang Yu, Fei Huang, and Yongbin Li. Language models are Super Mario: Absorbing abilities from homologous models as a free lunch. InICML, 2024
work page 2024
-
[7]
Model Breadcrumbs: Scaling multi-task model merging with sparse masks
MohammadReza Davari and Eugene Belilovsky. Model Breadcrumbs: Scaling multi-task model merging with sparse masks. InECCV, 2024
work page 2024
-
[8]
Merging by matching models in task parameter subspaces.TMLR, 2024
Derek Tam, Mohit Bansal, and Colin Raffel. Merging by matching models in task parameter subspaces.TMLR, 2024
work page 2024
-
[9]
Ponti, Iryna Gurevych, and Mohammad Emtiyaz Khan
Nico Daheim, Thomas Möllenhoff, Edoardo M. Ponti, Iryna Gurevych, and Mohammad Emtiyaz Khan. Model merging by uncertainty-based gradient matching. InICLR, 2024
work page 2024
-
[10]
Whoever started the interference should end it: Guiding data-free model merging via task vectors
Runxi Cheng, Feng Xiong, Yongxian Wei, Wanyun Zhu, and Chun Yuan. Whoever started the interference should end it: Guiding data-free model merging via task vectors. InICML, 2025
work page 2025
-
[11]
Representation surgery for multi-task model merging
Enneng Yang, Li Shen, Zhenyi Wang, Guibing Guo, Xiaojun Chen, Xingwei Wang, and Dacheng Tao. Representation surgery for multi-task model merging. InICML, 2024. 10
work page 2024
-
[12]
Represen- tation surgery in model merging with probabilistic modeling
Qi Wei, Shuo He, Enneng Yang, Tingcong Liu, Haobo Wang, Lei Feng, and Bo An. Represen- tation surgery in model merging with probabilistic modeling. InICML, 2025
work page 2025
-
[13]
Enneng Yang, Li Shen, Zhenyi Wang, Guibing Guo, Xingwei Wang, Xiaocun Cao, Jie Zhang, and Dacheng Tao. SurgeryV2: Bridging the gap between model merging and multi-task learning with deep representation surgery.arXiv preprint arXiv:2410.14389, 2024
-
[14]
Parameter-efficient interventions for enhanced model merging
Marcin Osial, Daniel Marczak, and Bartosz Zieli ´nski. Parameter-efficient interventions for enhanced model merging. InProceedings of the 2025 SIAM International Conference on Data Mining, 2025
work page 2025
-
[15]
Task arithmetic in the tangent space: Improved editing of pre-trained models
Guillermo Ortiz-Jimenez, Alessandro Favero, and Pascal Frossard. Task arithmetic in the tangent space: Improved editing of pre-trained models. InNeurIPS, 2023
work page 2023
-
[16]
Dataless knowledge fusion by merging weights of language models
Xisen Jin, Xiang Ren, Daniel Preotiuc-Pietro, and Pengxiang Cheng. Dataless knowledge fusion by merging weights of language models. InICLR, 2023
work page 2023
-
[17]
RegMean++: Enhancing Effectiveness and Generalization of Regression Mean for Model Merging
The-Hai Nguyen, Huu-Tien Dang, Takeshi Suzuki, and Le-Minh Nguyen. RegMean++: En- hancing effectiveness and generalization of regression mean for model merging.arXiv preprint arXiv:2508.03121, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[18]
Wenju Sun, Qingyong Li, Wen Wang, Yang Liu, Yangliao Geng, and Boyang Li. Towards minimizing feature drift in model merging: Layer-wise task vector fusion for adaptive knowledge integration. InNeurIPS, 2025
work page 2025
-
[19]
Arthur E. Hoerl and Robert W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems.Technometrics, 1970
work page 1970
-
[20]
A. N. Tikhonov and V . Y . Arsenin.Solutions of Ill-posed Problems. V . H. Winston & Sons,
-
[21]
Distributed solely by Halsted Press
-
[22]
Fusionbench: A unified library and comprehensive benchmark for deep model fusion.JMLR, 2025
Anke Tang, Li Shen, Yong Luo, Enneng Yang, Han Hu, Lefei Zhang, Bo Du, and Dacheng Tao. Fusionbench: A unified library and comprehensive benchmark for deep model fusion.JMLR, 2025
work page 2025
-
[23]
Mergebench: A benchmark for merging domain-specialized llms.arXiv preprint arXiv:2505.10833, 2025
Yifei He, Siqi Zeng, Yuzheng Hu, Rui Yang, Tong Zhang, and Han Zhao. MergeBench: A benchmark for merging domain-specialized LLMs.arXiv preprint arXiv:2505.10833, 2025
-
[24]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InICML, 2021
work page 2021
-
[25]
Ehinger, Aude Oliva, and Antonio Torralba
Jianxiong Xiao, James Hays, Krista A. Ehinger, Aude Oliva, and Antonio Torralba. SUN database: Large-scale scene recognition from abbey to zoo. InCVPR, 2010
work page 2010
-
[26]
3D object representations for fine-grained categorization
Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3D object representations for fine-grained categorization. InICCV Workshops, 2013
work page 2013
-
[27]
Remote sensing image scene classification: Benchmark and state of the art.Proc
Gong Cheng, Junwei Han, and Xiaoqiang Lu. Remote sensing image scene classification: Benchmark and state of the art.Proc. IEEE, 2017
work page 2017
-
[28]
Patrick Helber, Benjamin Bischke, Andreas Dengel, and Damian Borth. EuroSAT: A novel dataset and deep learning benchmark for land use and land cover classification.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2019
work page 2019
-
[29]
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y . Ng. Reading digits in natural images with unsupervised feature learning. InNIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011
work page 2011
-
[30]
The german traffic sign recognition benchmark: A multi-class classification competition
Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. The german traffic sign recognition benchmark: A multi-class classification competition. InThe 2011 International Joint Conference on Neural Networks, 2011. 11
work page 2011
-
[31]
Gradient-based learning applied to document recognition.Proc
Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proc. IEEE, 1998
work page 1998
-
[32]
Describing textures in the wild
Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Sammy Mohamed, and Andrea Vedaldi. Describing textures in the wild. InCVPR, 2014
work page 2014
-
[33]
Automated flower classification over a large number of classes
Maria-Elena Nilsback and Andrew Zisserman. Automated flower classification over a large number of classes. InIndian Conference on Computer Vision, Graphics and Image Processing, 2008
work page 2008
-
[34]
Veeling, Jasper Linmans, Jim Winkens, Taco Cohen, and Max Welling
Bastiaan S. Veeling, Jasper Linmans, Jim Winkens, Taco Cohen, and Max Welling. Rotation equivariant CNNs for digital pathology. InMedical Image Computing and Computer Assisted Intervention, 2018
work page 2018
-
[35]
Ian J. Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, Yingbo Zhou, Chetan Ramaiah, Fangxiang Feng, Ruifan Li, Xiaojie Wang, Dimitris Athanasakis, John Shawe-Taylor, Maxim Milakov, John Park, Radu Ionescu, Marius Popescu, Cristian Grozea, James Bergstra, Ji...
work page 2015
-
[36]
Parkhi, Andrea Vedaldi, Andrew Zisserman, and C
Omkar M. Parkhi, Andrea Vedaldi, Andrew Zisserman, and C. V . Jawahar. Cats and dogs. In CVPR, 2012
work page 2012
-
[37]
Adam Coates, Andrew Y . Ng, and Honglak Lee. An analysis of single-layer networks in unsupervised feature learning. InProceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011
work page 2011
-
[38]
Learning multiple layers of features from tiny images
Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009
work page 2009
-
[39]
Food-101: Mining discriminative components with random forests
Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101: Mining discriminative components with random forests. InECCV, 2014
work page 2014
-
[40]
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
Han Xiao, Kashif Rasul, and Roland V ollgraf. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms.arXiv preprint arXiv:1708.07747, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[41]
EMNIST: Extending MNIST to handwritten letters
Gregory Cohen, Saeed Afshar, Jonathan Tapson, and Andre van Schaik. EMNIST: Extending MNIST to handwritten letters. InInternational Joint Conference on Neural Networks, 2017
work page 2017
-
[42]
Deep learning for classical japanese literature
Tarin Clanuwat, Mikel Bober-Irizar, Asanobu Kitamoto, Alex Lamb, Kazuaki Yamamoto, and David Ha. Deep learning for classical japanese literature. InNeurIPS Workshop on Machine Learning for Creativity and Design, 2018
work page 2018
-
[43]
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Y . Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank. InEMNLP, 2013
work page 2013
-
[44]
OpenAI. Rendered SST-2 Dataset. https://github.com/openai/CLIP/blob/main/ data/rendered-sst2.md, 2021
work page 2021
-
[45]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer.JMLR, 2020
work page 2020
-
[46]
Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M
Jason Wei, Maarten Bosma, Vincent Y . Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V . Le. Finetuned language models are zero-shot learners. In ICLR, 2022
work page 2022
-
[47]
Dai, Hongkun Yu, Slav Petrov, Ed H
Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping H...
work page 2024
-
[48]
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 2018
work page 2018
-
[49]
Alex Warstadt, Amanpreet Singh, and Samuel R. Bowman. Neural network acceptability judgments.TACL, 2019
work page 2019
-
[50]
Adina Williams, Nikita Nangia, and Samuel R. Bowman. A broad-coverage challenge corpus for sentence understanding through inference. InNAACL-HLT, 2018
work page 2018
-
[51]
William B. Dolan and Chris Brockett. Automatically constructing a corpus of sentential paraphrases. InProceedings of the Third International Workshop on Paraphrasing, 2005
work page 2005
-
[52]
SQuAD: 100,000+ questions for machine comprehension of text
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. SQuAD: 100,000+ questions for machine comprehension of text. InEMNLP, 2016
work page 2016
-
[53]
The PASCAL recognising textual entailment challenge
Ido Dagan, Oren Glickman, and Bernardo Magnini. The PASCAL recognising textual entailment challenge. InMachine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Textual Entailment. Springer, 2006
work page 2006
-
[54]
SemEval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation
Daniel Cer, Mona Diab, Eneko Agirre, Iñigo Lopez-Gazpio, and Lucia Specia. SemEval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. InSemEval, 2017
work page 2017
-
[55]
Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InICLR, 2022
work page 2022
-
[56]
Qi Zhou, Yiming Zhang, Yanggan Gu, Yuanyi Wang, Zhaoyi Yan, Zhen Li, Chi Yung Chung, and Hongxia Yang. Model fusion for scalable and sustainable artificial intelligence: A review and outlook.Journal of Modern Power Systems and Clean Energy, 2026
work page 2026
-
[57]
Democratizing ai through model fusion: A comprehensive review and future directions.Nexus, 2025
Qi Zhou, Yiming Zhang, Yanggan Gu, Yuanyi Wang, Zhijie Sang, Zhaoyi Yan, Zhen Li, Shengyu Zhang, Fei Wu, and Hongxia Yang. Democratizing ai through model fusion: A comprehensive review and future directions.Nexus, 2025
work page 2025
-
[58]
Model Merging Scaling Laws in Large Language Models
Yuanyi Wang, Yanggan Gu, Yiming Zhang, Qi Zhou, Zhaoyi Yan, Congkai Xie, Xinyao Wang, Jianbo Yuan, and Hongxia Yang. Model merging scaling laws in large language models.arXiv preprint arXiv:2509.24244, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[59]
Yuanyi Wang, Yanggan Gu, Zihao Wang, Kunxi Li, Yifan Yang, Zhaoyi Yan, Congkai Xie, Jianmin Wu, and Hongxia Yang. MergePipe: A budget-aware parameter management system for scalable LLM merging.arXiv preprint arXiv:2602.13273, 2026
-
[60]
Yanggan Gu, Yuanyi Wang, Zhaoyi Yan, Yiming Zhang, Qi Zhou, Fei Wu, and Hongxia Yang. InfiFPO: Implicit model fusion via preference optimization in large language models.arXiv preprint arXiv:2505.13878, 2025
-
[61]
Capturing nuanced preferences: Preference-aligned distillation for small language models
Yanggan Gu, Junzhuo Li, Sirui Huang, Xin Zou, Zhenghua Li, and Xuming Hu. Capturing nuanced preferences: Preference-aligned distillation for small language models. InFindings of ACL, 2025
work page 2025
-
[62]
Infigfusion: Graph-on-logits distillation via efficient gromov-wasserstein for model fusion
Yuanyi Wang, Zhaoyi Yan, Yiming Zhang, Qi Zhou, Yanggan Gu, Fei Wu, and Hongxia Yang. InfiGFusion: Graph-on-logits distillation via efficient gromov-wasserstein for model fusion. arXiv preprint arXiv:2505.13893, 2025
-
[63]
Exploring response uncertainty in MLLMs: An empirical evaluation under misleading scenarios
Yunkai Dang, Mengxi Gao, Yibo Yan, Xin Zou, Yanggan Gu, Jungang Li, Jingyu Wang, Peijie Jiang, Aiwei Liu, Jia Liu, and Xuming Hu. Exploring response uncertainty in MLLMs: An empirical evaluation under misleading scenarios. InEMNLP, 2025
work page 2025
-
[64]
Yifan Yang, Jinjia Li, Kunxi Li, Puhao Zheng, Yuanyi Wang, Zheyan Qu, Yang Yu, Jianmin Wu, Ming Li, and Hongxia Yang. InfiCoEvalChain: A blockchain-based decentralized framework for collaborative LLM evaluation.arXiv preprint arXiv:2602.08229, 2026. 13
-
[65]
Wenjun Wang, Shuo Cai, Congkai Xie, Mingfa Feng, Yiming Zhang, Zhen Li, Kejing Yang, Ming Li, Jiannong Cao, and Hongxia Yang. InfiR2: A comprehensive FP8 training recipe for reasoning-enhanced language models.arXiv preprint arXiv:2509.22536, 2025. 14 A Limitations Computational scope.Due to limited compute, our experiments focus on CLIP and FLAN-T5. We do...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.