Revisiting GAN with Bayes-Optimal Discrimination

Alfred O. Hero III; Ali Bereyhi; Ben Liang; Mohammadreza Tavasoli Naeini; Morteza Noshad

arxiv: 2510.25609 · v3 · submitted 2025-10-29 · 💻 cs.LG · cs.AI· eess.SP

Revisiting GAN with Bayes-Optimal Discrimination

Mohammadreza Tavasoli Naeini , Ali Bereyhi , Morteza Noshad , Ben Liang , Alfred O. Hero III This is my paper

Pith reviewed 2026-05-18 02:47 UTC · model grok-4.3

classification 💻 cs.LG cs.AIeess.SP

keywords GANBayes error rateBOLT losstotal variation distanceWasserstein distancegenerative modelsdiscriminator trainingimage synthesis

0 comments

The pith

Maximizing a surrogate of the discrimination Bayes error rate minimizes total variation between data and generator distributions under balanced priors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper replaces the usual cross-entropy training of the GAN discriminator with an objective that targets the Bayes error rate of distinguishing real from generated samples. It employs the BOLT loss to create a trainable surrogate for this error rate and has the generator maximize it. Under balanced class priors and without constraints on the discriminator, this maximization is shown to minimize the total variation distance. Constraining the discriminator to be one-Lipschitz makes the resulting discrepancy upper-bounded by the Wasserstein-one distance. Experiments on image datasets demonstrate gains in sample quality and coverage compared to standard methods.

Core claim

The authors establish that training the discriminator via the BOLT loss to approximate the Bayes error rate and maximizing this quantity for the generator provides a unified perspective on GAN objectives as bounds on the discrimination BER. Specifically, this leads to minimization of total variation under unconstrained discriminators with balanced priors, and to a discrepancy upper-bounded by the Wasserstein-1 distance when the discriminator is constrained to be 1-Lipschitz. This approach is claimed to achieve a better trade-off between training stability and convergence to the true data distribution.

What carries the argument

The BOLT loss as a surrogate for the discrimination Bayes error rate, which the generator maximizes to achieve distribution matching.

Load-bearing premise

The BOLT loss acts as a sufficiently close and optimizable stand-in for the actual Bayes error rate achieved by an optimal discriminator.

What would settle it

Directly estimating the total variation distance after training with the proposed objective versus standard cross-entropy on matched architectures and observing whether it is consistently smaller would test the minimization claim.

read the original abstract

We propose an alternative to the standard GAN training approach, in which the discriminator is a binary classifier trained by cross-entropy to distinguish real samples from generated ones. Instead, we directly target the discrimination Bayes error rate (BER). To this end, we use the recently proposed Bayes optimal learning threshold (BOLT) loss and train the generator to maximize a surrogate of the discrimination BER. This viewpoint gives a unified perspective on GAN training: different objectives can be interpreted as parameterized bounds on the discrimination BER that describe a trade-off between smoothness and tightness. We show that, under balanced class priors, maximizing the surrogate BER with an unconstrained discriminator minimizes the total variation between the data and generator distributions. By constraining the discriminator to be $1$-Lipschitz, the proposed maximization objective defines a discrepancy that is upper-bounded by the Wasserstein-1 distance, thereby linking it to Wasserstein GAN. Experiments on several image-generation datasets under matched architectures and optimization settings show that GAN training using the surrogate BER improves sample quality and coverage over standard baselines. This analysis suggests that the proposed Bayesian viewpoint can achieve a better trade-off between training stability and convergence of the generator to the data distribution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper reframes GAN training as maximizing a BOLT surrogate for Bayes error rate, which unifies some objectives and ties to TV and Wasserstein under specific constraints, but the core equivalence depends on unshown properties of that surrogate.

read the letter

The paper's main move is to drop cross-entropy for the discriminator and instead maximize a surrogate of the Bayes error rate using the BOLT loss. This produces a unified view where prior GAN objectives appear as parameterized bounds on that error rate, trading off smoothness against tightness. Under balanced priors the unconstrained case is claimed to minimize total variation, and the 1-Lipschitz version is upper-bounded by Wasserstein-1 distance. Experiments on image datasets with matched architectures and settings report gains in sample quality and coverage over standard baselines.

Referee Report

2 major / 2 minor

Summary. The paper proposes replacing the standard cross-entropy discriminator in GANs with the BOLT loss to directly target a surrogate of the discrimination Bayes error rate (BER). It frames existing GAN objectives as parameterized bounds on this BER that trade off smoothness and tightness. Under balanced priors, maximizing the surrogate BER with an unconstrained discriminator is claimed to minimize total variation between data and generator distributions; constraining the discriminator to be 1-Lipschitz yields a discrepancy upper-bounded by the Wasserstein-1 distance. Experiments on image datasets report improved sample quality and coverage relative to standard baselines under matched architectures.

Significance. If the surrogate equivalence and theoretical links hold, the work supplies a Bayesian lens that unifies GAN variants and could improve the stability-convergence trade-off. The explicit connections to total variation and Wasserstein distance, together with the empirical gains, would strengthen the case for BER-based training. The absence of free parameters in the core derivation and the falsifiable prediction of improved coverage are positive features.

major comments (2)

[Theoretical results] Theoretical results section: the central claim that maximizing the BOLT surrogate BER minimizes total variation (under balanced priors) requires that the BOLT-optimal discriminator coincides with the true Bayes-optimal classifier or that the attained surrogate value is strictly monotonic in the true BER. The manuscript should supply an explicit argument or lemma establishing this identity or monotonicity; without it the generator update no longer targets TV and the distribution-matching guarantee does not follow.
[Theoretical results] Proof of the 1-Lipschitz case: the statement that the proposed objective is upper-bounded by the Wasserstein-1 distance is load-bearing for the link to WGAN. The derivation should be checked for any hidden dependence on the specific form of the BOLT loss; if the bound holds only for the true BER and not automatically for the surrogate, the claim needs qualification or an additional inequality.

minor comments (2)

[Abstract / Introduction] The abstract and introduction should clarify whether the BOLT loss is used exactly as published or with any modifications; a brief equation or reference to the original BOLT formulation would help.
[Experiments] Experimental section: while architectures are matched, the manuscript should report the precise BOLT hyper-parameters (threshold schedule, etc.) and confirm that the same optimizer settings were used for all baselines to ensure the comparison isolates the loss choice.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review of our manuscript. The major comments focus on strengthening the theoretical links between the BOLT surrogate and the claimed distribution distances. We address each point below and have revised the manuscript to incorporate explicit arguments where needed.

read point-by-point responses

Referee: [Theoretical results] Theoretical results section: the central claim that maximizing the BOLT surrogate BER minimizes total variation (under balanced priors) requires that the BOLT-optimal discriminator coincides with the true Bayes-optimal classifier or that the attained surrogate value is strictly monotonic in the true BER. The manuscript should supply an explicit argument or lemma establishing this identity or monotonicity; without it the generator update no longer targets TV and the distribution-matching guarantee does not follow.

Authors: We agree that an explicit argument is required to rigorously connect maximization of the BOLT surrogate to minimization of total variation. The manuscript builds on the established property that BOLT is a consistent surrogate loss whose minimizer recovers the Bayes-optimal classifier. In the revised version we add Lemma 3.2, which proves that the BOLT surrogate value is strictly monotonic with respect to the true Bayes error rate under balanced priors. The proof proceeds by showing that any deviation from the Bayes decision boundary increases the surrogate loss by at least a positive multiple of the increase in BER, thereby ensuring that generator updates targeting the surrogate also minimize TV. We believe this addition closes the gap identified by the referee. revision: yes
Referee: [Theoretical results] Proof of the 1-Lipschitz case: the statement that the proposed objective is upper-bounded by the Wasserstein-1 distance is load-bearing for the link to WGAN. The derivation should be checked for any hidden dependence on the specific form of the BOLT loss; if the bound holds only for the true BER and not automatically for the surrogate, the claim needs qualification or an additional inequality.

Authors: We re-examined the 1-Lipschitz derivation. The upper bound follows directly from the Kantorovich-Rubinstein representation applied to the class of 1-Lipschitz functions and holds for any discriminator in that class, independent of the training loss. Because the BOLT surrogate is optimized within the same 1-Lipschitz constraint set, the resulting discrepancy remains upper-bounded by W1. In the revision we insert a short remark after the proof clarifying this loss-independence and noting that the bound is inherited from the function class rather than from the particular surrogate. No additional inequality is required. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external BOLT properties and standard probability

full rationale

The paper's key results follow from the known identity BER = ½ − ½ TV(P_data, P_G) for the true Bayes-optimal discriminator under balanced priors, extended to the BOLT surrogate via its stated properties as a recently proposed loss. No quoted step reduces a claimed prediction or discrepancy to a fitted parameter, self-definition, or unverified self-citation chain. The 1-Lipschitz constraint and Wasserstein upper bound are presented as derived consequences rather than tautological renamings. The central claims retain independent mathematical content outside the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on the BOLT loss being an effective surrogate for Bayes error rate, the validity of balanced class priors for the total-variation result, and the 1-Lipschitz constraint for the Wasserstein link. No free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)

domain assumption Balanced class priors allow the surrogate BER maximization to minimize total variation distance
Invoked to obtain the total-variation minimization result (abstract theoretical paragraph)
domain assumption BOLT loss provides a trainable surrogate for the discrimination Bayes error rate
Central to replacing cross-entropy and enabling the generator objective (abstract method description)

pith-pipeline@v0.9.0 · 5754 in / 1505 out tokens · 33937 ms · 2026-05-18T02:47:37.380932+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Under balanced class priors, maximizing the surrogate BER with an unconstrained discriminator minimizes the total variation between the data and generator distributions. By constraining the discriminator to be 1-Lipschitz, the proposed maximization objective defines a discrepancy that is upper-bounded by the Wasserstein-1 distance.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 3 (BOLT vs TV). … D^(π)(g) + D^(1-π)(g) ≥ TV(P_data,P_G) with equality when π=0.5.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.