pith. sign in

arxiv: 2605.21060 · v1 · pith:52V7YDRLnew · submitted 2026-05-20 · 💻 cs.LG · cs.AI· stat.ML

Divide et Calibra: Multiclass Local Calibration via Vector Quantization

Pith reviewed 2026-05-21 05:35 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML
keywords multiclass calibrationvector quantizationlocal calibrationDirichlet parameterizationlatent space partitioningmachine learning reliability
0
0 comments X

The pith

Vector quantization creates region-specific calibration maps for multiclass models by sharing parameters across a partitioned latent space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that global calibration assumes uniform error across the entire latent space while many local methods discard information through dimensionality reduction. Instead it builds calibration maps from shared codeword-dependent factors once the space is partitioned by vector quantization. An indexed parameterization of Dirichlet concentrations then lets parameters be reused across regions. This produces heterogeneous maps that still calibrate well where data is sparse. A sympathetic reader would care because high-stakes applications need reliable uncertainty estimates everywhere, not just on average.

Core claim

By inducing a structured partition of the representation space with vector quantization and using an indexed parameterization of Dirichlet concentrations, the method constructs compositional, region-specific calibration maps that generalize to data-sparse regions while preserving global calibration and predictive performance.

What carries the argument

Vector quantization partition of the representation space together with indexed parameterization of Dirichlet concentrations, which together enable structured parameter sharing across regions.

If this is right

  • Local calibration error decreases on standard benchmarks while global calibration and accuracy remain competitive.
  • Calibration maps become heterogeneous and adapt to different parts of the latent space.
  • Parameter sharing across quantized regions reduces the need for separate maps in data-poor areas.
  • The approach avoids the information loss that accompanies dimensionality-reduction steps in prior local methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same VQ-plus-indexed-parameter idea might be tried with other partitioning schemes such as clustering or decision trees.
  • If the codebook size is treated as a hyperparameter, one could test whether larger codebooks further improve local calibration at the cost of more parameters.
  • The method could be combined with post-hoc recalibration techniques that operate on top of the VQ indices.

Load-bearing premise

That partitioning the latent space with vector quantization and tying Dirichlet parameters to codewords produces useful sharing that improves calibration in sparse regions without adding new biases.

What would settle it

Measure local calibration error on a held-out test set deliberately constructed to contain large empty regions in the latent space; if error in those regions does not drop relative to a strong global baseline or if a new bias metric rises, the claim is falsified.

Figures

Figures reproduced from arXiv: 2605.21060 by Andrea Passerini, Andrea Pugnana, Cesare Barbera, Giovanni De Toni, Lorenzo Perini.

Figure 1
Figure 1. Figure 1: Figure 1a depicts our approach; Figure 1b shows how Voronoi tessellation works, assigning [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Local calibration metrics over five runs ( [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Local calibration in density-based sub-bins over five runs ( [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Global calibration metrics over five runs ( [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Local calibration in density-based sub-bins over five runs on [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: LCE (left) and NLL (right) for various calibration-set sizes on TissueMNIST. is not only more accurate in well-sampled settings but is also substantially more robust under data scarcity, highlighting the effectiveness of parameter sharing in leveraging limited calibration data. We report the effect of calibration set size on local calibration (LCE) and negative log-likelihood (NLL) for Weather in [PITH_FU… view at source ↗
Figure 7
Figure 7. Figure 7: LCE (left) and NLL (right) for various calibration-set sizes on Weather [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Local calibration in density-based sub-bins over five runs ( [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Local calibration in density-based sub-bins over five runs ( [PITH_FULL_IMAGE:figures/full_fig_p028_9.png] view at source ↗
read the original abstract

Accurate and well-calibrated Machine Learning (ML) models are mandatory in high-stakes settings, yet effective multiclass calibration remains challenging: global approaches assume calibration errors are homogeneous across the latent space, while local methods often rely on latent-space dimensionality reduction, which leads to information loss. To address these issues, we propose a compositional approach to multiclass calibration, where region-specific calibration maps are constructed from shared codeword-dependent factors. We instantiate this idea via Vector Quantization (VQ), which induces a structured partition of the representation space, and an indexed parameterization of Dirichlet concentrations that enables parameter sharing across regions. Our approach learns heterogeneous calibration maps that generalize well even to sparse regions of the latent space. Experiments on benchmark datasets show significant improvements in local calibration while maintaining competitive global calibration and predictive performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper proposes Divide et Calibra, a compositional multiclass calibration method that partitions the representation space via Vector Quantization (VQ) and uses an indexed parameterization of Dirichlet concentrations to share parameters across regions. It claims this yields heterogeneous, locally adaptive calibration maps that generalize effectively to sparse latent-space regions, with experiments on benchmark datasets demonstrating improved local calibration alongside competitive global calibration and predictive performance.

Significance. If the central claims are substantiated, the work offers a scalable alternative to global calibration (which assumes homogeneity) and dimensionality-reduction-based local methods (which incur information loss). The compositional construction via VQ-induced partitions and indexed Dirichlet sharing could enable better handling of heterogeneous calibration errors in high-stakes settings. The manuscript provides experimental results on standard benchmarks but does not include reproducible code, machine-checked proofs, or parameter-free derivations.

major comments (2)
  1. [§3.2] §3.2 (VQ partition and indexed Dirichlet parameterization): the central claim that codeword-dependent factors enable bias-free parameter sharing that improves calibration specifically in sparse regions is load-bearing but under-supported; the joint objective must be shown to align codeword assignment with calibration-error homogeneity rather than reconstruction loss alone, otherwise the indexed parameterization may simply average toward a global map.
  2. [Table 2] Table 2 / local-ECE breakdown: without an ablation that isolates the transfer effect to low-density codewords (e.g., by varying codebook size or freezing the VQ encoder), it remains unclear whether reported gains in sparse regions stem from the proposed sharing mechanism or from other modeling choices such as the Dirichlet concentration learning.
minor comments (3)
  1. The abstract states 'significant improvements' in local calibration; the main text should report exact local-ECE deltas, confidence intervals, and the number of runs to allow readers to assess practical magnitude.
  2. [Notation] Notation for the indexed Dirichlet concentrations (Eq. (X)) should explicitly define how the codeword index selects the concentration vector to prevent ambiguity in the parameter-sharing construction.
  3. [Related Work] Related-work discussion should more explicitly contrast the compositional VQ approach against prior local calibration techniques that also employ clustering or partitioning.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below, clarifying the methodological rationale and outlining targeted revisions to strengthen the empirical and explanatory support.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (VQ partition and indexed Dirichlet parameterization): the central claim that codeword-dependent factors enable bias-free parameter sharing that improves calibration specifically in sparse regions is load-bearing but under-supported; the joint objective must be shown to align codeword assignment with calibration-error homogeneity rather than reconstruction loss alone, otherwise the indexed parameterization may simply average toward a global map.

    Authors: We agree that the alignment between VQ partitions and calibration-error homogeneity requires clearer exposition. The VQ encoder is trained to minimize reconstruction loss on the latent representations, thereby grouping inputs with similar representations into the same codeword. The indexed Dirichlet parameterization then assigns a distinct concentration vector to each codeword; these parameters are optimized jointly with the calibration loss. Because codewords cluster representationally similar inputs, the shared parameters within a codeword effectively transfer strength from dense to sparse regions without introducing bias from dissimilar inputs. In the revision we will expand §3.2 with a paragraph that (i) formalizes the joint objective, (ii) states the assumption that representational proximity implies calibration-error homogeneity, and (iii) notes that the sharing is therefore bias-free conditional on that clustering. We will also add a short remark contrasting this construction with a purely global Dirichlet model. revision: partial

  2. Referee: [Table 2] Table 2 / local-ECE breakdown: without an ablation that isolates the transfer effect to low-density codewords (e.g., by varying codebook size or freezing the VQ encoder), it remains unclear whether reported gains in sparse regions stem from the proposed sharing mechanism or from other modeling choices such as the Dirichlet concentration learning.

    Authors: We concur that an ablation isolating the transfer effect would strengthen the claims. In the revised manuscript we will add two controlled experiments: (1) training with codebook sizes K = 8, 16, 32, 64 and reporting local-ECE stratified by codeword occupancy, and (2) a frozen-VQ variant in which the encoder is pretrained once and then held fixed while only the indexed Dirichlet parameters are learned. These results will be presented in an extended Table 2 together with a new column showing the performance gap between low- and high-density codewords, thereby isolating the contribution of the sharing mechanism. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation builds on standard VQ and Dirichlet concepts without reduction to inputs by construction

full rationale

The paper's central proposal uses Vector Quantization to induce a partition of the representation space and an indexed parameterization of Dirichlet concentrations to enable parameter sharing. The abstract and available text present this as a compositional construction for heterogeneous calibration maps, with claimed generalization to sparse regions supported by experiments rather than any definitional equivalence or fitted-input renaming. No equations are shown that equate a prediction to its own fitting procedure, no uniqueness theorems are imported from self-citations, and no ansatz is smuggled via prior work. The approach relies on established VQ and Dirichlet machinery whose independence from the target calibration improvement is not contradicted by the provided material. This is the expected non-finding for a method paper whose load-bearing steps remain externally verifiable.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that VQ creates a useful discrete partition allowing parameter sharing; no free parameters or invented entities are explicitly introduced beyond standard VQ and Dirichlet modeling.

axioms (1)
  • domain assumption Vector quantization induces a structured partition of the representation space suitable for region-specific calibration.
    Invoked to justify composing local maps from shared factors.

pith-pipeline@v0.9.0 · 5676 in / 1074 out tokens · 25459 ms · 2026-05-21T05:35:01.249149+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages

  1. [1]

    arXiv preprint arXiv:2308.01222 , year =

    Cheng Wang. Calibration in deep learning: A survey of the state-of-the-art.CoRR, abs/2308.01222, 2023

  2. [2]

    Krishnan, and Deepti R

    Abhishek Singh Sambyal, Usma Niyaz, Narayanan C. Krishnan, and Deepti R. Bathula. Under- standing calibration of deep neural networks for medical image classification.Comput. Methods Programs Biomed., 242:107816, 2023

  3. [3]

    Weinberger

    Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. On calibration of modern neural networks. InICML, volume 70 ofProceedings of Machine Learning Research, pages 1321–1330. PMLR, 2017

  4. [4]

    Koh, Jiaying Wu, Shen Li, Jianqing Xu, and Bryan Hooi

    Miao Xiong, Ailin Deng, Pang Wei W. Koh, Jiaying Wu, Shen Li, Jianqing Xu, and Bryan Hooi. Proximity-informed calibration for deep neural networks. InNeurIPS, 2023

  5. [5]

    Local calibration: metrics and recalibration

    Rachel Luo, Aadyot Bhatnagar, Yu Bai, Shengjia Zhao, Huan Wang, Caiming Xiong, Silvio Savarese, Stefano Ermon, Edward Schmerling, and Marco Pavone. Local calibration: metrics and recalibration. InUAI, volume 180 ofProceedings of Machine Learning Research, pages 1286–1295. PMLR, 2022

  6. [6]

    Meelis Kull, Miquel Perelló-Nieto, Markus Kängsepp, Telmo de Menezes e Silva Filho, Hao Song, and Peter A. Flach. Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with dirichlet calibration. InNeurIPS, pages 12295–12305, 2019

  7. [7]

    Jordan, and Francis Bach

    Eugene Berta, David Holzmüller, Michael I. Jordan, and Francis Bach. Structured Matrix Scaling for Multi-Class Calibration. InAISTATS, 2026

  8. [8]

    Andersson, Fredrik Lindsten, Jacob Roll, and Thomas B

    Juozas Vaicenavicius, David Widmann, Carl R. Andersson, Fredrik Lindsten, Jacob Roll, and Thomas B. Schön. Evaluating model calibration in classification. InAISTATS, volume 89 of Proceedings of Machine Learning Research, pages 3459–3467. PMLR, 2019

  9. [9]

    Assuming locally equal calibration errors for non-parametric multiclass calibration.Transactions on Machine Learning Research, 2023

    Kaspar Valk and Meelis Kull. Assuming locally equal calibration errors for non-parametric multiclass calibration.Transactions on Machine Learning Research, 2023

  10. [10]

    Multiclass Local Calibration With the Jensen-Shannon Distance

    Cesare Barbera, Lorenzo Perini, Giovanni De Toni, Andrea Passerini, and Andrea Pugnana. Multiclass Local Calibration With the Jensen-Shannon Distance. InAISTATS, 2026

  11. [11]

    Cooper, and Milos Hauskrecht

    Mahdi Pakdaman Naeini, Gregory F. Cooper, and Milos Hauskrecht. Obtaining well calibrated probabilities using bayesian binning. InAAAI, pages 2901–2907. AAAI Press, 2015. 10

  12. [12]

    Metrics of calibration for probabilistic predictions.J

    Imanol Arrieta Ibarra, Paman Gujral, Jonathan Tannen, Mark Tygert, and Cherie Xu. Metrics of calibration for probabilistic predictions.J. Mach. Learn. Res., 23:351:1–351:54, 2022

  13. [13]

    Last layer re-training is sufficient for robustness to spurious correlations

    Polina Kirichenko, Pavel Izmailov, and Andrew Gordon Wilson. Last layer re-training is sufficient for robustness to spurious correlations. InICLR. OpenReview.net, 2023

  14. [14]

    Taking a step back with kcal: Multi-class kernel-based calibration for deep neural networks

    Zhen Lin, Shubhendu Trivedi, and Jimeng Sun. Taking a step back with kcal: Multi-class kernel-based calibration for deep neural networks. InICLR. OpenReview.net, 2023

  15. [15]

    nearest neighbor

    Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. When is “nearest neighbor” meaningful? InInternational conference on database theory, pages 217–235. Springer, 1999

  16. [16]

    Springer, 2006

    Larry Wasserman.All of nonparametric statistics. Springer, 2006

  17. [17]

    The elements of statistical learning, 2009

    Trevor Hastie, Robert Tibshirani, Jerome Friedman, et al. The elements of statistical learning, 2009

  18. [18]

    V oronoi density estimator for high-dimensional data: Computation, compactification and convergence

    Vladislav Polianskii, Giovanni Luca Marchetti, Alexander Kravberg, Anastasiia Varava, Flo- rian T Pokorny, and Danica Kragic. V oronoi density estimator for high-dimensional data: Computation, compactification and convergence. InUncertainty in Artificial Intelligence, pages 1644–1653. PMLR, 2022

  19. [19]

    Neural discrete representation learning.Advances in neural information processing systems, 30, 2017

    Aaron Van Den Oord, Oriol Vinyals, et al. Neural discrete representation learning.Advances in neural information processing systems, 30, 2017

  20. [20]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

  21. [21]

    Medmnist classification decathlon: A lightweight automl benchmark for medical image analysis

    Jiancheng Yang, Rui Shi, and Bingbing Ni. Medmnist classification decathlon: A lightweight automl benchmark for medical image analysis. InISBI, pages 191–195. IEEE, 2021

  22. [22]

    Medmnist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification.Scientific Data, 10(1):41, 2023

    Jiancheng Yang, Rui Shi, Donglai Wei, Zequan Liu, Lin Zhao, Bilian Ke, Hanspeter Pfister, and Bingbing Ni. Medmnist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification.Scientific Data, 10(1):41, 2023

  23. [23]

    Annotated high-throughput microscopy image sets for validation.Nature methods, 9(7):637, 2012

    Vebjorn Ljosa, Katherine L Sokolnicki, and Anne E Carpenter. Annotated high-throughput microscopy image sets for validation.Nature methods, 9(7):637, 2012

  24. [24]

    Andrey Malinin, Neil Band, Yarin Gal, Mark J. F. Gales, Alexander Ganshin, German Ches- nokov, Alexey Noskov, Andrey Ploskonosov, Liudmila Prokhorenkova, Ivan Provilkov, Vatsal Raina, Vyas Raina, Denis Roginskiy, Mariya Shmatova, Panagiotis Tigas, and Boris Yangel. Shifts: A dataset of real distributional shift across multiple large-scale tasks. InNeurIPS...

  25. [25]

    Transforming classifier scores into accurate multiclass probability estimates

    Bianca Zadrozny and Charles Elkan. Transforming classifier scores into accurate multiclass probability estimates. InKDD, pages 694–699. ACM, 2002

  26. [26]

    Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods.Advances in large margin classifiers, 10(3):61–74, 1999

    John Platt et al. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods.Advances in large margin classifiers, 10(3):61–74, 1999

  27. [27]

    A ConvNet for the 2020s

    Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A ConvNet for the 2020s. InCVPR, pages 11966–11976. IEEE, 2022

  28. [28]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. InICLR. OpenReview.net, 2021

  29. [29]

    Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers

    Meelis Kull, Telmo Silva Filho, and Peter Flach. Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers. InArtificial intelligence and statistics, pages 623–631. PMLR, 2017

  30. [30]

    Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers

    Bianca Zadrozny and Charles Elkan. Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. InICML, pages 609–616. Morgan Kaufmann, 2001. 11

  31. [31]

    Local temperature scaling for probability calibration

    Zhipeng Ding, Xu Han, Peirong Liu, and Marc Niethammer. Local temperature scaling for probability calibration. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 6889–6899, 2021

  32. [32]

    Multicalibration: Calibration for the (computationally-identifiable) masses

    Ursula Hébert-Johnson, Michael Kim, Omer Reingold, and Guy Rothblum. Multicalibration: Calibration for the (computationally-identifiable) masses. InInternational Conference on Machine Learning, pages 1939–1948. PMLR, 2018

  33. [33]

    Moment multicalibration for uncertainty estimation

    Christopher Jung, Changhwa Lee, Mallesh Pai, Aaron Roth, and Rakesh V ohra. Moment multicalibration for uncertainty estimation. InConference on Learning Theory, pages 2634–

  34. [34]

    Multicalibration yields better matchings.arXiv preprint arXiv:2511.11413, 2025

    Riccardo Colini Baldeschi, Simone Di Gregorio, Simone Fioravanti, Federico Fusco, Ido Guy, Daniel Haimovich, Stefano Leonardi, Fridolin Linder, Lorenzo Perini, Matteo Russo, et al. Multicalibration yields better matchings.arXiv preprint arXiv:2511.11413, 2025

  35. [35]

    Discretization- free multicalibration through loss minimization over tree ensembles.arXiv preprint arXiv:2505.17435, 2025

    Hongyi Henry Jin, Zijun Ding, Dung Daniel Ngo, and Zhiwei Steven Wu. Discretization- free multicalibration through loss minimization over tree ensembles.arXiv preprint arXiv:2505.17435, 2025

  36. [36]

    Mcgrad: Multicalibration at web scale

    Niek Tax, Lorenzo Perini, Fridolin Linder, Daniel Haimovich, Dima Karamshuk, Nastaran Okati, Milan V ojnovic, and Pavlos Athanasios Apostolopoulos. Mcgrad: Multicalibration at web scale. InKDD (1), pages 2470–2481. ACM, 2026

  37. [37]

    Multicalibrated regression for downstream fairness

    Ira Globus-Harris, Varun Gupta, Christopher Jung, Michael Kearns, Jamie Morgenstern, and Aaron Roth. Multicalibrated regression for downstream fairness. InProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, pages 259–286, 2023

  38. [38]

    The statistical scope of multicalibration

    Georgy Noarov and Aaron Roth. The statistical scope of multicalibration. InInternational Conference on Machine Learning, pages 26283–26310. PMLR, 2023

  39. [39]

    Diveq: Differentiable vector quantization using the reparameterization trick

    Mohammad Hassan Vali, Tom Bäckström, and Arno Solin. Diveq: Differentiable vector quantization using the reparameterization trick. 2026

  40. [40]

    Cambridge university press, 2000

    Aad W Van der Vaart.Asymptotic statistics, volume 3. Cambridge university press, 2000

  41. [41]

    Maximum likelihood estimation of misspecified models.Econometrica: Journal of the econometric society, pages 1–25, 1982

    Halbert White. Maximum likelihood estimation of misspecified models.Econometrica: Journal of the econometric society, pages 1–25, 1982

  42. [42]

    Revisiting deep learning models for tabular data

    Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, and Artem Babenko. Revisiting deep learning models for tabular data. InNeurIPS, pages 18932–18943, 2021

  43. [43]

    Pytorch image models

    Ross Wightman et al. Pytorch image models. 2019

  44. [44]

    Kingma and Jimmy Ba

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InICLR (Poster), 2015. 12 A Proofs A.1 Proof of Proposition 1 Proof.It suffices to show that: min q∈Q ∥z−q∥ 2 = wX i=1 min j∈{1,...,|C|} ∥z(i) −c j∥2. Take any index vectors, ∥z−q s∥2 = (z(1) −c s(1), . . . ,z(w) −c s(w)) 2 = wX i=1 ∥z(i) −c s(i)∥2. Because the Euclidean norm is ...