pith. sign in

Less Precise Can Be More Reliable: A Systematic Evaluation of Quantization's Impact on VLMs Beyond Accuracy

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

Vision-Language Models (VLMs) such as CLIP have revolutionized zero-shot classification and safety-critical tasks, including Out-of-Distribution (OOD) detection. However, their high computational cost hinders efficient real-world deployment. While quantization is a standard solution for efficiency, its broader impact on reliability metrics beyond simple Top-1 accuracy remains critically under-explored. In this study, we conduct a large-scale evaluation of VLM quantization across a comprehensive experimental suite of over 700k evaluation runs with varying configurations. We find that, contrary to the assumption that quantization's noise degrades performance, it can simultaneously improve accuracy, calibration, OOD detection, and robustness to noise, though not to covariate shift or spurious correlations. We leverage these counterintuitive findings to characterize the mechanics of quantization beyond simple regularization: we show that quantization dampens high-rank spectral components, compelling the model to rely more heavily on robust, low-rank features. Ultimately, this spectral filtering effect drives the observed improvements in generalization and noise tolerance, establishing a pathway to deploy faster, more reliable VLMs by utilizing quantization beyond its conventional role.

fields

cs.SE 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

How Robustly do LLMs Understand Execution Semantics?

cs.SE · 2026-02-24 · unverdicted · novelty 6.0

Frontier LLMs like GPT-5.2 show large accuracy drops on perturbed program-output prediction tasks while open-source reasoning models remain more stable, exposing limits in code semantics understanding.

citing papers explorer

Showing 1 of 1 citing paper.

  • How Robustly do LLMs Understand Execution Semantics? cs.SE · 2026-02-24 · unverdicted · none · ref 6 · internal anchor

    Frontier LLMs like GPT-5.2 show large accuracy drops on perturbed program-output prediction tasks while open-source reasoning models remain more stable, exposing limits in code semantics understanding.