pith. sign in

arxiv: 2604.10952 · v1 · submitted 2026-04-13 · 💻 cs.LG

UniPROT: Uniform Prototype Selection via Partial Optimal Transport with Submodular Guarantees

Pith reviewed 2026-05-10 16:20 UTC · model grok-4.3

classification 💻 cs.LG
keywords prototype selectionoptimal transportsubmodular maximizationimbalanced classificationgreedy algorithmuniform weights
0
0 comments X

The pith

By recasting uniform prototype selection as a partial optimal transport problem, the method turns an intractable super-additive objective into a submodular one that a greedy algorithm approximates within a factor of 1 minus 1/e.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents UniPROT, a subset selection approach that chooses prototypical examples from a source distribution so their uniform-weighted distribution matches a target distribution as closely as possible under optimal transport distance. Conventional selection methods often favor majority classes because they rely on implicit importance scores. The authors reformulate the marginal constraints of the transport problem to obtain a partial optimal transport objective that is submodular. This change permits an efficient greedy algorithm together with a proof that its output is at least a (1-1/e) fraction as good as the best possible solution to the original super-additive problem. On imbalanced classification tasks and on large-language-model pretraining and finetuning under domain shift, the resulting prototypes improve coverage of rare classes while leaving majority-class accuracy unchanged.

Core claim

UniPROT minimizes the optimal transport distance between a uniformly weighted prototype distribution and the target distribution; a partial-optimal-transport relaxation of the marginal constraints converts this cardinality-constrained super-additive maximization into a submodular maximization problem, for which the standard greedy algorithm is guaranteed to return a solution whose value is at least (1-1/e) times the optimal value of the original objective.

What carries the argument

The partial optimal transport reformulation of the marginal constraints that converts the original uniform-weighted OT objective into a submodular set function.

If this is right

  • A simple greedy pass yields prototypes whose uniform weights improve minority-class recall on standard imbalanced benchmarks without lowering majority-class accuracy.
  • The same procedure can be applied during both pretraining and finetuning of large language models when the source data exhibit domain imbalance.
  • Because the objective is submodular, the method scales to large source sets where exact solvers for the original formulation are infeasible.
  • Enforcing equal prototype weights removes the skew toward majority classes that arises from implicit importance scores in prior subset-selection methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same relaxation technique could be tested on other cardinality-constrained matching problems that currently lack submodular structure.
  • Because the algorithm is deterministic and fast, it can be inserted as a data-selection step inside existing training pipelines without changing model architecture.
  • One could measure how sensitive final accuracy is to small violations of the uniform-weight assumption on real-world datasets.

Load-bearing premise

The partial optimal transport relaxation must stay close enough to the original uniform-weighted OT objective that the approximation guarantee still produces prototypes that are useful for the downstream task.

What would settle it

On a small imbalanced dataset, compute the exact optimum of the original super-additive OT objective and compare it with the greedy solution of the reformulated objective; if the two selected subsets differ substantially in minority-class coverage or in realized OT distance to the target, the reformulation does not preserve sufficient fidelity.

Figures

Figures reproduced from arXiv: 2604.10952 by Bamdev Mishra, Ganesh Ramakrishnan, Karthik S. Gurumoorthy, Prateek Chanda, Pratik Jawanpuria, Prayas Agrawal.

Figure 1
Figure 1. Figure 1: k-medoids (4) consistently learn skewed weights for prototypes on CIFAR10. The plot shows ranked weights of pro￾totypes, averaged over 5 runs. In particular, if Pˆ is a solution of (4) with the correspond￾ing γPˆ such that l(Pˆ) = ⟨S, γPˆ⟩, then µPˆ = γPˆ1. If the learned µPˆ is skewed, it implies that some prototypes have higher mass (importance) than the others. Empirically, this is commonly observed as … view at source ↗
Figure 2
Figure 2. Figure 2: Exact marginal gain versus the proposed approx [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Minority Class Accuracy Analysis:On CIFAR10-LT and synthetic MNIST, UniPROT outperforms k-medoids on minority classes. (Top Left): Average minority-class accuracy vs. prototype count. (Bottom Left): CIFAR10 minority class-wise accuracy (avg. over 3 runs). (Top Right): MNIST with Induced Skew: (i) classes 0 and 5 both at 5% each (other classes at 90%), and (ii) classes 0 and 7 both at 3% each (other classes… view at source ↗
Figure 4
Figure 4. Figure 4: Validation perplexity dynamics on PHI-3, ZEPHYR-3B and PHI-2 during training vs top 3 best-performing baselines on MATHINSTRUCT. UniPROT-PS consistently outperforms other baselines [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Validation log-pplx with changing prototype percentage (left) and ablation table (right) on Z [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Perplexity dynamics on Pretraining LLAMA-3 500M and 60M for 20k steps. Effect of number of selected prototypes. We finetune ZEPYR-3B on MATHINSTRUCT for 2048 steps while varying the selection ratio r ∈ 50%, 25%, 12.5%. Fig￾ure 5 reports the validation perplexity. We observe that UniPROT remains consistently stable across all ratios. In contrast, COLM degrades noticeably as r decreases, while GREATS shows a… view at source ↗
Figure 7
Figure 7. Figure 7: Approximate vs. actual marginal gain as a function of the entropic regularization parameter [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Validation perplexity when batch size=256 [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Zephyr-3b Validation perplexity when bs=128 and subset ratio 25% and prototype selection is batch wise. [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗
read the original abstract

Selecting prototypical examples from a source distribution to represent a target data distribution is a fundamental problem in machine learning. Existing subset selection methods often rely on implicit importance scores, which can be skewed towards majority classes and lead to low-quality prototypes for minority classes. We present $\methodprop$, a novel subset selection framework that minimizes the optimal transport (OT) distance between a uniformly weighted prototypical distribution and the target distribution. While intuitive, this formulation leads to a cardinality-constrained maximization of a \emph{super-additive} objective, which is generally intractable to approximate efficiently. To address this, we propose a principled reformulation of the OT marginal constraints, yielding a partial optimal transport-based submodular objective. We prove that this reformulation enables a greedy algorithm with a $(1-1/e)$ approximation guarantee relative to the original super-additive maximization problem. Empirically, we showcase that enforcing uniform prototype weights in UniPROT consistently improves minority-class representation in imbalanced classification benchmarks without compromising majority-class accuracy. In both finetuning and pretraining regimes for large language models under domain imbalance, UniPROT enforces uniform source contributions, yielding robust performance gains. Our results establish UniPROT as a scalable, theoretically grounded solution for uniform-weighted prototype selection. Our code is publicly available at GitHub\footnote{Code: https://github.com/efficiency-learning/UniPROT}

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper introduces UniPROT, a framework for selecting a fixed-cardinality set of prototypes that minimizes the optimal transport distance to a target distribution under uniform prototype weights. The core technical contribution is a reformulation of the resulting cardinality-constrained super-additive maximization problem into a partial optimal transport objective that is shown to be submodular; a greedy algorithm is then proved to achieve a (1-1/e) approximation guarantee relative to the original problem. Experiments on imbalanced classification benchmarks and domain-imbalanced LLM pretraining/finetuning report improved minority-class representation without loss of majority-class accuracy.

Significance. If the claimed (1-1/e) guarantee is shown to transfer from the partial-OT surrogate to the original uniform-weighted OT objective, the work supplies a principled, scalable method for prototype selection that directly counters majority-class bias in importance-score approaches. The submodular reformulation plus public code release are concrete strengths that would facilitate adoption in data-selection pipelines for large models.

major comments (2)
  1. [Theoretical analysis / proof of approximation guarantee] Theoretical analysis section (proof of the approximation guarantee): The manuscript asserts that the partial-OT reformulation yields a submodular objective whose greedy maximizer approximates the original super-additive uniform-OT maximization to within (1-1/e). No explicit error bound is supplied relating the value of the relaxed partial-OT objective to the original OT objective (e.g., via the relaxation parameter or the difference between the induced transport plans). Without such a quantitative link, the stated guarantee applies only to the surrogate and does not necessarily carry over to the uniform-prototype problem claimed in the abstract and introduction.
  2. [Experimental evaluation] Experimental protocol (imbalanced classification and LLM sections): The reported gains rely on the partial-OT reformulation, yet no ablation or sensitivity analysis is presented on the relaxation parameter that controls the marginal relaxation. This parameter directly affects both the fidelity of the surrogate and the practical quality of the selected prototypes, making it load-bearing for validating that the theoretical guarantee remains meaningful in the reported benchmarks.
minor comments (3)
  1. [Abstract] Abstract: The GitHub link appears only as a footnote; spelling out the full URL in the main text would improve immediate accessibility.
  2. [Preliminaries / notation] Notation: The symbols used for prototype weights, marginal constraints, and the partial-transport relaxation parameter should be defined once in a central notation table or paragraph and used consistently thereafter.
  3. [Figures] Figure captions: Captions for the LLM domain-imbalance plots should explicitly state the number of runs, the exact imbalance ratios, and whether error bars represent standard deviation or standard error.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. We address each major comment below with clarifications and commit to revisions that strengthen the presentation of the theoretical guarantees and experimental validation.

read point-by-point responses
  1. Referee: [Theoretical analysis / proof of approximation guarantee] Theoretical analysis section (proof of the approximation guarantee): The manuscript asserts that the partial-OT reformulation yields a submodular objective whose greedy maximizer approximates the original super-additive uniform-OT maximization to within (1-1/e). No explicit error bound is supplied relating the value of the relaxed partial-OT objective to the original OT objective (e.g., via the relaxation parameter or the difference between the induced transport plans). Without such a quantitative link, the stated guarantee applies only to the surrogate and does not necessarily carry over to the uniform-prototype problem claimed in the abstract and introduction.

    Authors: We appreciate the referee pointing out the need for an explicit quantitative link. The partial-OT reformulation is constructed so that its objective is a controlled relaxation of the original uniform-OT objective, with the relaxation parameter governing the marginal slack; the submodularity and greedy guarantee are proved directly on this surrogate. To make the transfer rigorous, we will add a new proposition in the revised theoretical analysis section that supplies an explicit additive error bound between the original OT value and the partial-OT objective, expressed in terms of the relaxation parameter and the induced transport plans. This will clarify that the (1-1/e) guarantee extends to the original problem up to this controllable error term, consistent with the claims in the abstract and introduction. revision: yes

  2. Referee: [Experimental evaluation] Experimental protocol (imbalanced classification and LLM sections): The reported gains rely on the partial-OT reformulation, yet no ablation or sensitivity analysis is presented on the relaxation parameter that controls the marginal relaxation. This parameter directly affects both the fidelity of the surrogate and the practical quality of the selected prototypes, making it load-bearing for validating that the theoretical guarantee remains meaningful in the reported benchmarks.

    Authors: We agree that sensitivity analysis on the relaxation parameter is essential to demonstrate that the theoretical guarantee remains practically meaningful. In the revised manuscript we will add dedicated ablation studies in both the imbalanced classification and LLM pretraining/finetuning sections. These will vary the relaxation parameter over a representative range, reporting its impact on minority-class accuracy, overall performance, prototype distribution uniformity, and the gap between surrogate and original OT objectives. This will provide direct empirical support for the fidelity of the surrogate in the reported benchmarks. revision: yes

Circularity Check

0 steps flagged

No circularity: reformulation applies standard submodular maximization theorem to a derived objective without reducing the guarantee to a self-definition or fitted input

full rationale

The claimed derivation proceeds by reformulating the cardinality-constrained super-additive OT maximization into a partial-OT objective that is asserted to be submodular, then directly invoking the known (1-1/e) greedy approximation theorem for monotone submodular maximization. This theorem is an external, well-established result in combinatorial optimization and is not derived from or dependent on the paper's OT formulation, any fitted parameters, or prior self-citations by the authors. The reformulation step is presented as a mathematical construction that yields submodularity, after which the approximation guarantee follows from the general theory rather than from any tautological redefinition of the original objective. No load-bearing claim reduces by construction to its own inputs; the theoretical result remains independent of the empirical evaluations and does not rely on self-referential definitions or uniqueness theorems imported from the authors' prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard assumptions from optimal transport theory and submodular function maximization; no new free parameters or invented entities are introduced in the abstract.

axioms (2)
  • domain assumption Optimal transport distance is a suitable metric for measuring how well a uniform prototype distribution represents the target distribution
    Invoked in the problem formulation as the objective to minimize.
  • domain assumption The partial OT reformulation yields a submodular objective whose greedy maximization approximates the original super-additive problem
    Central to the proof of the (1-1/e) guarantee.

pith-pipeline@v0.9.0 · 5574 in / 1137 out tokens · 57186 ms · 2026-05-10T16:20:35.689215+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

  1. [1]

    [Yes] (b) An analysis of the properties and complexity (time, space, sample size) of any algorithm

    For all models and algorithms presented, check if you include: (a) A clear description of the mathematical setting, assumptions, algorithm, and/or model. [Yes] (b) An analysis of the properties and complexity (time, space, sample size) of any algorithm. [Yes] (c) (Optional) Anonymized source code, with spec- ification of all dependencies, including extern...

  2. [2]

    [Yes] (b) Complete proofs of all theoretical results

    For any theoretical claim, check if you include: (a) Statements of the full set of assumptions of all theoretical results. [Yes] (b) Complete proofs of all theoretical results. [Yes] (c) Clear explanations of any assumptions. [Yes]

  3. [3]

    [Yes] (b) All the training details (e.g., data splits, hyperpa- rameters, how they were chosen)

    For all figures and tables that present empirical results, check if you include: (a) The code, data, and instructions needed to repro- duce the main experimental results (either in the supplemental material or as a URL). [Yes] (b) All the training details (e.g., data splits, hyperpa- rameters, how they were chosen). [Yes] (c) A clear definition of the spe...

  4. [4]

    (a) Citations of the creator If your work uses existing assets

    If you are using existing assets (e.g., code, data, mod- els) or curating/releasing new assets, check if you in- clude: Chanda et al. (a) Citations of the creator If your work uses existing assets. [Yes] (b) The license information of the assets, if applica- ble. [Not Applicable] (c) New assets either in the supplemental material or as a URL, if applicabl...

  5. [5]

    [Not Applicable] (b) Descriptions of potential participant risks, with links to Institutional Review Board (IRB) ap- provals if applicable

    If you used crowdsourcing or conducted research with human subjects, check if you include: (a) The full text of instructions given to participants and screenshots. [Not Applicable] (b) Descriptions of potential participant risks, with links to Institutional Review Board (IRB) ap- provals if applicable. [Not Applicable] (c) The estimated hourly wage paid t...

  6. [6]

    Non-negativity:h(P)≥0∀P⊆S

  7. [7]

    Monotonicity:h(P 2)≥h(P1)∀P1 ⊆P2 ⊆S

  8. [8]

    Proof.We prove each property in turn (refer to the definition ofh(P)in (6))

    Super-additivity over disjoint sets:h(P 1∪P2)≥h(P1)+h(P 2)∀P1∩P2 =ϕ. Proof.We prove each property in turn (refer to the definition ofh(P)in (6))

  9. [9]

    Non-negativity.The non-negativity follows from the definition ofh(⋅)in (6) namely,h(P)∶=max γ∈Γ(1P ,∣P∣1/n) ⟨S,γ⟩, where the similarity matrixSis a non-negative matrix and the transport plan is enforced to non-negative

  10. [10]

    To prove monotonicity ofh(.), it is sufficient to show thath(P 2)≥h(P 1)

    Monotonicity.Consider a subsetP 1 and define a setP 2 =P 1∪{xi}for anyx i ∉P1. To prove monotonicity ofh(.), it is sufficient to show thath(P 2)≥h(P 1). To this end, letγ P1 be the argmax forh(P 1)and consider the sub-matrix γP1 (IP1 ,∶)which is the restriction of the optimal solution to the points inP 1. We note thatγ P1(i,∶)=0;x i ∉P1. We construct a fe...

  11. [11]

    Letγ P1 (IP1 ,∶)andγP2 (IP2 ,∶)represent the sub-matrices of the respective optimal solutions to the points inP 1 andP 2

    Super-additivity over disjoint sets.Consider two disjoint setsP 1 andP 2. Letγ P1 (IP1 ,∶)andγP2 (IP2 ,∶)represent the sub-matrices of the respective optimal solutions to the points inP 1 andP 2. For the disjoint union setP=P 1∪P2, we construct a feasible transport plan ˆγas: ˆγ(I P ,∶)=[γP1 (IP1 ,∶)⊺,γ P2 (IP2 ,∶)⊺] ⊺ , and ˆγ(j,∶)=0forx j ∉P. Evaluating...

  12. [12]

    The proof follows along similar lines of the non-negativity proof in Lemma 1

    Non-negativity:f(P)≥0∀P⊆S. The proof follows along similar lines of the non-negativity proof in Lemma 1

  13. [13]

    Monotonicity:f(P 2)≥f(P 1)∀P1 ⊆P2 ⊆S. Akin to the monotonicity proof in Lemma 1, for a super-setP 2 = P1∪{xi};xi ∉P1, we can construct a feasible transport ˆγusing the optimal solutionγ P1 as: ˆγ(I P2 ,∶)=[γP1 (IP1 ,∶)⊺, v ∥v∥1 ] ⊺ , wherev=k1/n−γ ⊺ P1 1≥0. Following similar lines to the argument in Lemma 1, we obtain the monotonicity result

  14. [14]

    ghost inner-product

    Submodularity:To prove submodularity of functionf(P)in (7), we first note the following result (Kawano et al., 2022, Lemma 2). Lemma 6.[(Kawano et al., 2022, Lemma 2)] Letl, m, nbe positive integers. Given a positive valuedm×nmatrixS>0, the following set functionψ∶2m→R+ is a submodular function: ψ(P)=max γ∈Γ≤(1P/l,1n/n) ⟨S,γ⟩,(11) where as defined earlier...

  15. [15]

    However, these approaches typicallylearn synthetic samplesor optimize continuous representations, rather than selecting a subset from a given discrete pool

    leverage OT distances to measure distributional fidelity between synthetic and real data. However, these approaches typicallylearn synthetic samplesor optimize continuous representations, rather than selecting a subset from a given discrete pool. As a result, they differ fundamentally from subset selection problems with combinatorial constraints. OT baryc...