E-PMQ: Expert-Guided Post-Merge Quantization with Merged-Weight Anchoring

arxiv: 2605.16882 · v1 · pith:MSLFK6E5new · submitted 2026-05-16 · 💻 cs.CL

E-PMQ: Expert-Guided Post-Merge Quantization with Merged-Weight Anchoring

Wenjun Wang , Yanggan Gu , Shuo Cai , Yuanyi Wang , Pengkai Wang , Jianmin Wu , Hongxia Yang This is my paper

Pith reviewed 2026-05-19 20:56 UTC · model grok-4.3

classification 💻 cs.CL

keywords post-merge quantizationmodel mergingexpert-guided calibrationmerged-weight anchoringlow-bit deploymentquantization deviationCLIP-ViTtask arithmetic

0 comments p. Extension

pith:MSLFK6E5 Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{MSLFK6E5}

Prints a linked pith:MSLFK6E5 badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

Expert-guided calibration with source experts and merged-weight anchoring makes post-merge quantization reliable for multi-task models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that applying standard post-training quantization directly to a merged model couples two separate errors: the usual low-bit reconstruction error plus an extra deviation that comes from how the merging process blended the original experts. To break this coupling, E-PMQ uses the original expert models to supply layer-wise output targets during calibration while anchoring the process to the already-merged weights. A sympathetic reader would care because model merging and quantization are both practical routes to low-resource deployment, yet their direct combination has remained unreliable until the two deviations are addressed separately.

Core claim

E-PMQ formulates the post-merge quantization setting and demonstrates that expert-guided output targets during layer-wise calibration, paired with merged-weight anchoring, mitigate both the quantization deviation and the expert-relative merging deviation, producing large accuracy gains on merged vision and language models.

What carries the argument

Expert-guided output targets from source experts during layer-wise calibration together with merged-weight anchoring to preserve integrated merged behavior.

If this is right

On eight-task CLIP-ViT-B/32 merging, 4-bit E-PMQ raises accuracy from 65.0% to 73.6% under Task Arithmetic and from 69.1% to 74.8% under TIES-Merging.
On the harder 20-task CLIP-ViT-L/14 setting, E-PMQ raises accuracy from 34.8% to 76.7%.
On FLAN-T5-base GLUE merging, E-PMQ improves from 78.26% to 83.34%.
The same anchoring and expert-target technique applies across different merging methods without requiring joint retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If original experts are routinely discarded after merging, the method would require either keeping them or regenerating equivalent targets, limiting plug-and-play use.
The same separation of merging deviation from quantization deviation could be tested on other post-processing steps such as pruning or distillation of merged models.
Layer-wise calibration guided by experts may generalize to new merging algorithms beyond Task Arithmetic and TIES.
The approach implies that post-merge pipelines benefit from retaining some access to source models specifically for calibration stages.

Load-bearing premise

Source expert weights remain available after merging and can supply reliable output targets during calibration without introducing distribution shift or extra bias relative to the merged model's integrated behavior.

What would settle it

Running the same layer-wise calibration but replacing expert outputs with outputs sampled from the merged model itself and observing no accuracy gain or even degradation.

Figures

Figures reproduced from arXiv: 2605.16882 by Hongxia Yang, Jianmin Wu, Pengkai Wang, Shuo Cai, Wenjun Wang, Yanggan Gu, Yuanyi Wang.

**Figure 3.** Figure 3: Bit-width analysis on CLIP-ViT-B/32. E-PMQ consistently outperforms GPTQ from 3-bit to 8-bit [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 2.** Figure 2: and [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

read the original abstract

Low-resource deployment constraints have made model quantization essential for deploying neural networks while preserving performance. Meanwhile, model merging has become an increasingly practical low-resource strategy for integrating multiple task- or domain-specialized experts into a single model without joint training or multi-model serving. Together, quantization and model merging enable an efficient low-resource deployment pipeline by integrating multiple experts into one low-bit model. We formulate this setting as Post-Merge Quantization (PMQ). We show that directly applying post-training quantization (PTQ) to a merged model is unreliable because two distinct deviations are coupled: the quantization deviation introduced by low-bit reconstruction and the expert-relative merging deviation inherited from model merging. To mitigate these deviations, we propose E-PMQ, an expert-guided PMQ framework that uses source expert weights to provide expert- guided output targets during layer-wise calibration, together with merged-weight anchoring to stabilize the calibration and preserve the integrated behavior of the merged model. On CLIP-ViT-B/32 eight-task merging, E-PMQ improves 4-bit GPTQ from 65.0% to 73.6% under Task Arithmetic and from 69.1% to 74.8% under TIES-Merging. On harder settings, E-PMQ improves GPTQ from 34.8% to 76.7% on 20-task CLIP-ViT-L/14 and from 78.26% to 83.34% on FLAN-T5- base GLUE. These results demonstrate that E-PMQ enables effective post-merge quantization and low-bit deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

E-PMQ shows solid accuracy gains on merged CLIP and T5 models but the expert-target approach risks optimizing toward source experts rather than the actual merged behavior.

read the letter

The core takeaway is that this paper gives a workable recipe for quantizing a merged model without losing too much performance, and the reported lifts are large enough to notice. On the eight-task CLIP-ViT-B/32 setup it moves 4-bit GPTQ from 65% to 73.6% under Task Arithmetic and from 69.1% to 74.8% under TIES; the 20-task ViT-L/14 jump from 34.8% to 76.7% is even more striking. The new piece is the explicit pairing of expert-guided output targets with merged-weight anchoring inside the post-merge quantization loop. That combination is not just a routine extension of standard PTQ or merging papers, and the authors are right to flag the coupled deviations problem as something that plain GPTQ does not handle well. The anchoring step looks like a reasonable way to keep the merged weights from drifting too far during calibration. The main soft spot is the stress-test concern: because the calibration targets come from the individual experts rather than from the merged model itself, it is not clear how much of the gain is faithful low-bit preservation versus simply steering the quantized model back toward the original expert behaviors. The abstract gives point estimates with no error bars or ablation tables, so we cannot yet tell whether the anchoring fully compensates or whether the method works mainly when the merge is simple. The experimental protocol is also only sketched, which makes it hard to judge how sensitive the results are to calibration data choice or layer selection. This paper is aimed at people who already merge models and now need to ship them at 4 bits; a practitioner who has the source experts on hand could try the recipe tomorrow and see if the numbers hold. It is worth sending to peer review because the deployment setting is real and the gains are big enough to matter if they replicate, even though the current evidence is still preliminary.

Referee Report

2 major / 2 minor

Summary. The paper introduces E-PMQ, a framework for post-merge quantization (PMQ) of models formed by merging multiple task- or domain-specialized experts. It identifies that standard post-training quantization applied to merged models suffers from coupled quantization deviation and expert-relative merging deviation. To mitigate this, E-PMQ uses source expert weights to supply expert-guided output targets during layer-wise calibration, combined with merged-weight anchoring to stabilize calibration and preserve the merged model's integrated behavior. Experiments report substantial accuracy gains over baseline 4-bit GPTQ, including lifts from 65.0% to 73.6% on 8-task CLIP-ViT-B/32 Task Arithmetic merging, from 69.1% to 74.8% under TIES-Merging, from 34.8% to 76.7% on 20-task CLIP-ViT-L/14, and from 78.26% to 83.34% on FLAN-T5-base GLUE.

Significance. If the central claims hold and the method generalizes beyond the reported settings, this work would be significant for practical low-resource deployment pipelines that combine model merging with quantization. It provides a concrete approach to handling the interaction between merging and low-bit reconstruction without requiring joint retraining, with reported gains that could enable more reliable multi-expert models on constrained hardware. The explicit separation of deviations and use of anchoring represent a targeted extension of existing PTQ techniques.

major comments (2)

[Abstract] Abstract: The abstract reports concrete accuracy lifts (e.g., 34.8% to 76.7% on 20-task CLIP-ViT-L/14) but provides no error bars, ablation details, or full experimental protocol; this makes it difficult to assess reliability and isolate whether gains stem from mitigating coupled deviations or other factors.
[Method] Method section (around the description of expert-guided targets and anchoring): The method supplies layer-wise calibration targets from individual source experts rather than from the merged model itself. Because merging (Task Arithmetic or TIES) produces a non-linear combination of expert behaviors, the expert outputs on a given input can differ systematically from the merged model's outputs; if this mismatch is large, the quantization optimizes toward expert-specific distributions instead of the integrated merged distribution, and merged-weight anchoring may only partially compensate.

minor comments (2)

[Experiments] Experiments: Clarify the calibration dataset size, sampling strategy, and any steps taken to ensure the expert targets do not introduce distribution shift relative to the merged model.
[Notation] Notation and figures: Ensure consistent use of symbols for merged weights versus expert weights and add legends or captions that explicitly distinguish the anchoring mechanism in any diagrams.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback. We address each major comment below and indicate the corresponding revisions to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The abstract reports concrete accuracy lifts (e.g., 34.8% to 76.7% on 20-task CLIP-ViT-L/14) but provides no error bars, ablation details, or full experimental protocol; this makes it difficult to assess reliability and isolate whether gains stem from mitigating coupled deviations or other factors.

Authors: We agree that the abstract would benefit from additional context on reliability. Due to length constraints, we have revised the abstract to note that reported accuracies are means over three random seeds with standard deviations provided in the experimental results (Section 4). Full protocols, ablation studies, and analysis isolating the contribution of coupled-deviation mitigation appear in Sections 3 and 5 of the revised manuscript. revision: yes
Referee: [Method] Method section (around the description of expert-guided targets and anchoring): The method supplies layer-wise calibration targets from individual source experts rather than from the merged model itself. Because merging (Task Arithmetic or TIES) produces a non-linear combination of expert behaviors, the expert outputs on a given input can differ systematically from the merged model's outputs; if this mismatch is large, the quantization optimizes toward expert-specific distributions instead of the integrated merged distribution, and merged-weight anchoring may only partially compensate.

Authors: We acknowledge the referee's point on potential output mismatch arising from non-linear merging. Our design intentionally uses expert outputs as calibration targets to supply specialized, high-fidelity signals while merged-weight anchoring explicitly penalizes deviation from the merged weights during the quantization optimization. This combination is intended to preserve integrated behavior. We have expanded the method section with a new paragraph discussing the rationale, added a quantitative comparison of expert versus merged output distributions on calibration data, and included an ablation isolating the anchoring term to demonstrate its compensatory effect. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical method using independent expert targets

full rationale

The paper proposes E-PMQ as a practical framework that supplies layer-wise calibration targets from source expert weights and applies merged-weight anchoring. These are external inputs to the merged model rather than quantities defined in terms of the final quantized output or fitted directly to the reported accuracy gains. The claimed improvements (e.g., 34.8% to 76.7% on 20-task ViT-L/14) are measured outcomes on held-out benchmarks after applying the procedure; no equation or step equates the result to a self-defined fit, a renamed known pattern, or a load-bearing self-citation chain. The approach remains falsifiable against external data and standard PTQ baselines.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The method rests on the standard assumption that layer-wise calibration with external targets can correct quantization error; no explicit free parameters, new axioms, or invented entities are stated in the abstract.

pith-pipeline@v0.9.0 · 5841 in / 1042 out tokens · 29621 ms · 2026-05-19T20:56:41.109768+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

min_Q sum_i ||Q X_i^ℓ - W_i X_i^ℓ||_F² + λ ||Q - W_m||_F² with adaptive λ from activation energy
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Post-Merge Quantization couples quantization deviation and expert-relative merging deviation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 3 internal anchors

[1]

and Mozer, Michael C

Alexander, James A. and Mozer, Michael C. , title =. Advances in Neural Information Processing Systems 7 , editor =

work page
[2]

and Beeman, David , title =

Bower, James M. and Beeman, David , title =

work page
[3]

and Schnell, Eric and Barkai, Edi , title =

Hasselmo, Michael E. and Schnell, Eric and Barkai, Edi , title =. Journal of Neuroscience , volume =

work page
[4]

Journal of Modern Power Systems and Clean Energy , volume=

Model Fusion for Scalable and Sustainable Artificial Intelligence: A Review and Outlook , author=. Journal of Modern Power Systems and Clean Energy , volume=. 2026 , publisher=

work page 2026
[5]

Nexus , year=

Democratizing AI through model fusion: A comprehensive review and future directions , author=. Nexus , year=

work page
[6]

2026 , eprint=

MergePipe: A Budget-Aware Parameter Management System for Scalable LLM Merging , author=. 2026 , eprint=

work page 2026
[7]

Model Merging Scaling Laws in Large Language Models

Model merging scaling laws in large language models , author=. arXiv preprint arXiv:2509.24244 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[8]

arXiv preprint arXiv:2505.13878 , year=

InfiFPO: Implicit model fusion via preference optimization in large language models , author=. arXiv preprint arXiv:2505.13878 , year=

work page arXiv
[9]

arXiv preprint arXiv:2505.13893 , year=

Infigfusion: Graph-on-logits distillation via efficient gromov-wasserstein for model fusion , author=. arXiv preprint arXiv:2505.13893 , year=

work page arXiv
[10]

arXiv preprint arXiv:2602.08229 , year=

InfiCoEvalChain: A Blockchain-Based Decentralized Framework for Collaborative LLM Evaluation , author=. arXiv preprint arXiv:2602.08229 , year=

work page arXiv
[11]

2025 , eprint=

InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models , author=. 2025 , eprint=

work page 2025
[12]

Journal of Machine Learning Research , volume=

FusionBench: A Unified Library and Comprehensive Benchmark for Deep Model Fusion , author=. Journal of Machine Learning Research , volume=

work page
[13]

International Conference on Machine Learning , pages=

Learning Transferable Visual Models From Natural Language Supervision , author=. International Conference on Machine Learning , pages=

work page
[14]

Journal of Machine Learning Research , volume=

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , author=. Journal of Machine Learning Research , volume=

work page
[15]

International Conference on Learning Representations , year=

Finetuned Language Models are Zero-Shot Learners , author=. International Conference on Learning Representations , year=

work page
[16]

Journal of Machine Learning Research , volume=

Scaling Instruction-Finetuned Language Models , author=. Journal of Machine Learning Research , volume=

work page
[17]

International Conference on Learning Representations , year=

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , author=. International Conference on Learning Representations , year=

work page
[18]

International Conference on Machine Learning , year=

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time , author=. International Conference on Machine Learning , year=

work page
[19]

Advances in Neural Information Processing Systems , year=

Merging Models with Fisher-Weighted Averaging , author=. Advances in Neural Information Processing Systems , year=

work page
[20]

International Conference on Learning Representations , year=

Editing Models with Task Arithmetic , author=. International Conference on Learning Representations , year=

work page
[21]

Advances in Neural Information Processing Systems , year=

TIES-Merging: Resolving Interference When Merging Models , author=. Advances in Neural Information Processing Systems , year=

work page
[22]

2024 , eprint=

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch , author=. 2024 , eprint=

work page 2024
[23]

International Conference on Machine Learning , year=

Up or Down? Adaptive Rounding for Post-Training Quantization , author=. International Conference on Machine Learning , year=

work page
[24]

International Conference on Learning Representations , year=

BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction , author=. International Conference on Learning Representations , year=

work page
[25]

International Conference on Learning Representations , year=

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers , author=. International Conference on Learning Representations , year=

work page
[26]

Proceedings of Machine Learning and Systems , year=

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration , author=. Proceedings of Machine Learning and Systems , year=

work page
[27]

International Conference on Machine Learning , year=

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models , author=. International Conference on Machine Learning , year=

work page
[28]

Advances in Neural Information Processing Systems , year=

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers , author=. Advances in Neural Information Processing Systems , year=

work page
[29]

IEEE Conference on Computer Vision and Pattern Recognition , year=

SUN Database: Large-scale Scene Recognition from Abbey to Zoo , author=. IEEE Conference on Computer Vision and Pattern Recognition , year=

work page
[30]

IEEE International Conference on Computer Vision Workshops , year=

3D Object Representations for Fine-Grained Categorization , author=. IEEE International Conference on Computer Vision Workshops , year=

work page
[31]

Proceedings of the IEEE , volume=

Remote Sensing Image Scene Classification: Benchmark and State of the Art , author=. Proceedings of the IEEE , volume=

work page
[32]

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , volume=

EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification , author=. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , volume=

work page
[33]

NeurIPS Workshop on Deep Learning and Unsupervised Feature Learning , year=

Reading Digits in Natural Images with Unsupervised Feature Learning , author=. NeurIPS Workshop on Deep Learning and Unsupervised Feature Learning , year=

work page
[34]

International Joint Conference on Neural Networks , year=

The German Traffic Sign Recognition Benchmark: A multi-class classification competition , author=. International Joint Conference on Neural Networks , year=

work page
[35]

Proceedings of the IEEE , volume=

Gradient-Based Learning Applied to Document Recognition , author=. Proceedings of the IEEE , volume=

work page
[36]

IEEE Conference on Computer Vision and Pattern Recognition , year=

Describing Textures in the Wild , author=. IEEE Conference on Computer Vision and Pattern Recognition , year=

work page
[37]

Indian Conference on Computer Vision, Graphics and Image Processing , year=

Automated Flower Classification over a Large Number of Classes , author=. Indian Conference on Computer Vision, Graphics and Image Processing , year=

work page
[38]

Medical Image Computing and Computer Assisted Intervention , year=

Rotation Equivariant CNNs for Digital Pathology , author=. Medical Image Computing and Computer Assisted Intervention , year=

work page
[39]

Neural Networks , volume=

Challenges in Representation Learning: A Report on Three Machine Learning Contests , author=. Neural Networks , volume=

work page
[40]

IEEE Conference on Computer Vision and Pattern Recognition , year=

Cats and Dogs , author=. IEEE Conference on Computer Vision and Pattern Recognition , year=

work page
[41]

International Conference on Artificial Intelligence and Statistics , year=

An Analysis of Single-Layer Networks in Unsupervised Feature Learning , author=. International Conference on Artificial Intelligence and Statistics , year=

work page
[42]

Learning Multiple Layers of Features from Tiny Images , author=

work page
[43]

European Conference on Computer Vision , year=

Food-101 -- Mining Discriminative Components with Random Forests , author=. European Conference on Computer Vision , year=

work page
[44]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms , author=. arXiv preprint arXiv:1708.07747 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[45]

International Joint Conference on Neural Networks , year=

EMNIST: Extending MNIST to handwritten letters , author=. International Joint Conference on Neural Networks , year=

work page
[46]

Deep Learning for Classical Japanese Literature

Deep Learning for Classical Japanese Literature , author=. arXiv preprint arXiv:1812.01718 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[47]

Conference on Empirical Methods in Natural Language Processing , year=

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , author=. Conference on Empirical Methods in Natural Language Processing , year=

work page
[48]

Proceedings of the 42nd International Conference on Machine Learning , series=

Whoever Started the Interference Should End It: Guiding Data-Free Model Merging via Task Vectors , author=. Proceedings of the 42nd International Conference on Machine Learning , series=. 2025 , publisher=

work page 2025
[49]

2024 , eprint=

Representation Surgery for Multi-Task Model Merging , author=. 2024 , eprint=

work page 2024

[1] [1]

and Mozer, Michael C

Alexander, James A. and Mozer, Michael C. , title =. Advances in Neural Information Processing Systems 7 , editor =

work page

[2] [2]

and Beeman, David , title =

Bower, James M. and Beeman, David , title =

work page

[3] [3]

and Schnell, Eric and Barkai, Edi , title =

Hasselmo, Michael E. and Schnell, Eric and Barkai, Edi , title =. Journal of Neuroscience , volume =

work page

[4] [4]

Journal of Modern Power Systems and Clean Energy , volume=

Model Fusion for Scalable and Sustainable Artificial Intelligence: A Review and Outlook , author=. Journal of Modern Power Systems and Clean Energy , volume=. 2026 , publisher=

work page 2026

[5] [5]

Nexus , year=

Democratizing AI through model fusion: A comprehensive review and future directions , author=. Nexus , year=

work page

[6] [6]

2026 , eprint=

MergePipe: A Budget-Aware Parameter Management System for Scalable LLM Merging , author=. 2026 , eprint=

work page 2026

[7] [7]

Model Merging Scaling Laws in Large Language Models

Model merging scaling laws in large language models , author=. arXiv preprint arXiv:2509.24244 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

arXiv preprint arXiv:2505.13878 , year=

InfiFPO: Implicit model fusion via preference optimization in large language models , author=. arXiv preprint arXiv:2505.13878 , year=

work page arXiv

[9] [9]

arXiv preprint arXiv:2505.13893 , year=

Infigfusion: Graph-on-logits distillation via efficient gromov-wasserstein for model fusion , author=. arXiv preprint arXiv:2505.13893 , year=

work page arXiv

[10] [10]

arXiv preprint arXiv:2602.08229 , year=

InfiCoEvalChain: A Blockchain-Based Decentralized Framework for Collaborative LLM Evaluation , author=. arXiv preprint arXiv:2602.08229 , year=

work page arXiv

[11] [11]

2025 , eprint=

InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models , author=. 2025 , eprint=

work page 2025

[12] [12]

Journal of Machine Learning Research , volume=

FusionBench: A Unified Library and Comprehensive Benchmark for Deep Model Fusion , author=. Journal of Machine Learning Research , volume=

work page

[13] [13]

International Conference on Machine Learning , pages=

Learning Transferable Visual Models From Natural Language Supervision , author=. International Conference on Machine Learning , pages=

work page

[14] [14]

Journal of Machine Learning Research , volume=

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , author=. Journal of Machine Learning Research , volume=

work page

[15] [15]

International Conference on Learning Representations , year=

Finetuned Language Models are Zero-Shot Learners , author=. International Conference on Learning Representations , year=

work page

[16] [16]

Journal of Machine Learning Research , volume=

Scaling Instruction-Finetuned Language Models , author=. Journal of Machine Learning Research , volume=

work page

[17] [17]

International Conference on Learning Representations , year=

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , author=. International Conference on Learning Representations , year=

work page

[18] [18]

International Conference on Machine Learning , year=

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time , author=. International Conference on Machine Learning , year=

work page

[19] [19]

Advances in Neural Information Processing Systems , year=

Merging Models with Fisher-Weighted Averaging , author=. Advances in Neural Information Processing Systems , year=

work page

[20] [20]

International Conference on Learning Representations , year=

Editing Models with Task Arithmetic , author=. International Conference on Learning Representations , year=

work page

[21] [21]

Advances in Neural Information Processing Systems , year=

TIES-Merging: Resolving Interference When Merging Models , author=. Advances in Neural Information Processing Systems , year=

work page

[22] [22]

2024 , eprint=

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch , author=. 2024 , eprint=

work page 2024

[23] [23]

International Conference on Machine Learning , year=

Up or Down? Adaptive Rounding for Post-Training Quantization , author=. International Conference on Machine Learning , year=

work page

[24] [24]

International Conference on Learning Representations , year=

BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction , author=. International Conference on Learning Representations , year=

work page

[25] [25]

International Conference on Learning Representations , year=

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers , author=. International Conference on Learning Representations , year=

work page

[26] [26]

Proceedings of Machine Learning and Systems , year=

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration , author=. Proceedings of Machine Learning and Systems , year=

work page

[27] [27]

International Conference on Machine Learning , year=

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models , author=. International Conference on Machine Learning , year=

work page

[28] [28]

Advances in Neural Information Processing Systems , year=

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers , author=. Advances in Neural Information Processing Systems , year=

work page

[29] [29]

IEEE Conference on Computer Vision and Pattern Recognition , year=

SUN Database: Large-scale Scene Recognition from Abbey to Zoo , author=. IEEE Conference on Computer Vision and Pattern Recognition , year=

work page

[30] [30]

IEEE International Conference on Computer Vision Workshops , year=

3D Object Representations for Fine-Grained Categorization , author=. IEEE International Conference on Computer Vision Workshops , year=

work page

[31] [31]

Proceedings of the IEEE , volume=

Remote Sensing Image Scene Classification: Benchmark and State of the Art , author=. Proceedings of the IEEE , volume=

work page

[32] [32]

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , volume=

EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification , author=. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , volume=

work page

[33] [33]

NeurIPS Workshop on Deep Learning and Unsupervised Feature Learning , year=

Reading Digits in Natural Images with Unsupervised Feature Learning , author=. NeurIPS Workshop on Deep Learning and Unsupervised Feature Learning , year=

work page

[34] [34]

International Joint Conference on Neural Networks , year=

The German Traffic Sign Recognition Benchmark: A multi-class classification competition , author=. International Joint Conference on Neural Networks , year=

work page

[35] [35]

Proceedings of the IEEE , volume=

Gradient-Based Learning Applied to Document Recognition , author=. Proceedings of the IEEE , volume=

work page

[36] [36]

IEEE Conference on Computer Vision and Pattern Recognition , year=

Describing Textures in the Wild , author=. IEEE Conference on Computer Vision and Pattern Recognition , year=

work page

[37] [37]

Indian Conference on Computer Vision, Graphics and Image Processing , year=

Automated Flower Classification over a Large Number of Classes , author=. Indian Conference on Computer Vision, Graphics and Image Processing , year=

work page

[38] [38]

Medical Image Computing and Computer Assisted Intervention , year=

Rotation Equivariant CNNs for Digital Pathology , author=. Medical Image Computing and Computer Assisted Intervention , year=

work page

[39] [39]

Neural Networks , volume=

Challenges in Representation Learning: A Report on Three Machine Learning Contests , author=. Neural Networks , volume=

work page

[40] [40]

IEEE Conference on Computer Vision and Pattern Recognition , year=

Cats and Dogs , author=. IEEE Conference on Computer Vision and Pattern Recognition , year=

work page

[41] [41]

International Conference on Artificial Intelligence and Statistics , year=

An Analysis of Single-Layer Networks in Unsupervised Feature Learning , author=. International Conference on Artificial Intelligence and Statistics , year=

work page

[42] [42]

Learning Multiple Layers of Features from Tiny Images , author=

work page

[43] [43]

European Conference on Computer Vision , year=

Food-101 -- Mining Discriminative Components with Random Forests , author=. European Conference on Computer Vision , year=

work page

[44] [44]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms , author=. arXiv preprint arXiv:1708.07747 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[45] [45]

International Joint Conference on Neural Networks , year=

EMNIST: Extending MNIST to handwritten letters , author=. International Joint Conference on Neural Networks , year=

work page

[46] [46]

Deep Learning for Classical Japanese Literature

Deep Learning for Classical Japanese Literature , author=. arXiv preprint arXiv:1812.01718 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[47] [47]

Conference on Empirical Methods in Natural Language Processing , year=

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , author=. Conference on Empirical Methods in Natural Language Processing , year=

work page

[48] [48]

Proceedings of the 42nd International Conference on Machine Learning , series=

Whoever Started the Interference Should End It: Guiding Data-Free Model Merging via Task Vectors , author=. Proceedings of the 42nd International Conference on Machine Learning , series=. 2025 , publisher=

work page 2025

[49] [49]

2024 , eprint=

Representation Surgery for Multi-Task Model Merging , author=. 2024 , eprint=

work page 2024