Tippett-minimum Fusion of Representation-space Diffusion Models for Multi-Encoder Out-of-Distribution Detection

Neelkamal Bhuyan

arxiv: 2605.20502 · v1 · pith:ASWUZ5MOnew · submitted 2026-05-19 · 💻 cs.LG · cs.AI· cs.CV· stat.AP· stat.ML

Tippett-minimum Fusion of Representation-space Diffusion Models for Multi-Encoder Out-of-Distribution Detection

Neelkamal Bhuyan This is my paper

Pith reviewed 2026-05-21 06:49 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CVstat.APstat.ML

keywords out-of-distribution detectiondiffusion modelsmulti-encoder fusionrepresentation spaceTippett combinationdistribution shiftsin-distribution diagnostics

0 comments

The pith

EncMin2L fuses encoder-specific diffusion likelihoods with a two-level minimum gate to detect all four distribution shift types at 0.94 AUROC without out-of-distribution labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a fusion method for out-of-distribution detection that runs separate representation-space diffusion models on each of several encoders and then combines their outputs. It first uses only in-distribution data to measure which encoder responds most to each kind of shift, then applies a two-level min gate and Tippett minimum p-value rule to produce one calibrated score. The resulting detector reaches at least 0.94 AUROC on global domain, semantic, texture, and corruption shifts while using 2.3 times fewer parameters than a single large multi-encoder model. A sympathetic reader would care because machine-learning systems need reliable ways to flag unexpected inputs across the full range of real-world changes without requiring labeled examples of those changes.

Core claim

EncMin2L is an encoder-agnostic two-level min(·)-gate that aggregates per-encoder representation-space diffusion likelihood scores after calibration from in-distribution diagnostics; it achieves at least 0.94 AUROC simultaneously on global domain changes, semantic divergence, texture differences, and covariate corruptions while operating without any out-of-distribution labels and at 2.3 times lower parameter cost than monolithic multi-encoder baselines.

What carries the argument

EncMin2L, a two-level min(·)-gate that combines calibrated per-encoder diffusion-based likelihood detectors via Tippett minimum p-value aggregation.

If this is right

OOD detection becomes possible using only in-distribution data and no out-of-distribution examples or labels.
Encoder specialization is quantified from in-distribution data alone via the class-conditional F-test and log-likelihood shift under synthetic corruptions.
Performance across all four shift types is maintained while parameter count drops by a factor of 2.3 relative to monolithic baselines.
Tippett minimum p-value combination produces a single calibration-stable OOD score from the per-encoder detectors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same minimum-gate fusion could be applied to other families of per-encoder detectors besides representation-space diffusion models.
Lower parameter cost may allow the detector to run on edge hardware where a monolithic model would not fit.
The in-distribution diagnostics could be used to select or prune encoders when building new multi-encoder systems for targeted shift coverage.

Load-bearing premise

The two in-distribution diagnostics accurately identify each encoder's sensitivity to specific shift types without any out-of-distribution data or labels.

What would settle it

Run EncMin2L on a new benchmark suite containing all four shift types and observe whether AUROC falls below 0.94 on any single shift type under the same training and fusion protocol.

Figures

Figures reproduced from arXiv: 2605.20502 by Neelkamal Bhuyan.

**Figure 2.** Figure 2: Architecture of the two-level min(·)-gate (EncMin2L). Each encoder produces two pvalues: rk,n (blue, normed repr.) and rk,u (orange, unnormed repr.). Level 1 takes the within-encoder minimum, re-CDF recalibration (Proposition 1) restores uniformity, and Level 2 applies the crossencoder Tippett minimum (Algorithm 1) to produce the final score s(z). the OOD type at hand—without prior knowledge of which tha… view at source ↗

read the original abstract

We address out-of-distribution (OOD) detection across the full spectrum of distribution shifts -- global domain changes, semantic divergence, texture differences, and covariate corruptions -- through a multi-encoder fusion of per-encoder representation-space diffusion models (RDMs). We statistically identify each encoder's sensitivity to specific shift types from ID data alone and introduce EncMin2L -- an encoder-agnostic two-level $\min(\cdot)$-gate that combines and calibrates per-encoder diffusion-based likelihood detectors without OOD labels, outperforming monolithic multi-encoder baselines at $2.3\times$ lower parameter cost. Two ID-data diagnostics: $\eta^2$ (class-conditional F-test) and $\Delta\mu$ (log-likelihood shift under synthetic corruptions) -- quantify encoder specialization, while a Tippett minimum $p$-value combination aggregates per-encoder scores into a single, calibration-stable OOD signal. EncMin2L achieves $\geq 0.94$ AUROC across all four shift types simultaneously, outperforming the state-of-the-art representation-space diffusion OOD detectors across overlapping benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable ID-only recipe for fusing per-encoder diffusion detectors across four shift types, but the whole thing rests on whether the two proposed diagnostics actually rank encoders by real OOD performance.

read the letter

Hey, I looked at arXiv:2605.20502 on Tippett-minimum fusion of representation-space diffusion models for multi-encoder OOD detection. The core idea is to train separate RDMs on each encoder using only ID data, then use two ID-only scalars to decide which encoder is likely to be reliable for a given shift type before combining their scores. They call the gate EncMin2L and aggregate with Tippett's minimum p-value method. That combination is presented as new, and the claim is that it hits at least 0.94 AUROC on domain, semantic, texture, and corruption shifts while using 2.3 times fewer parameters than monolithic baselines. The efficiency angle and the label-free calibration are the parts that could matter in practice. The paper does a reasonable job spelling out the pipeline and showing why handling multiple shift categories at once is useful. The diagnostics (class-conditional eta squared F-test and log-likelihood shift under synthetic corruptions) are a direct attempt to solve the specialization problem without OOD labels, which is a genuine practical constraint. The soft spot is exactly the one in the stress-test note. Nothing in the abstract or the high-level description shows that eta squared or delta mu actually correlate with per-encoder AUROC on held-out semantic or domain shifts. If that correlation is weak or absent, the min-gate fusion has no reliable way to pick the right detector for each regime, and the simultaneous high performance claim becomes harder to trust. I would want to see the correlation plots or rank-order tables in the full results before accepting the central argument. The experimental protocol details are also thin in the summary, so it is not yet clear how fair the baselines are or whether error bars and multiple runs are reported. This is the kind of work that might interest people who need OOD detection to survive varied real-world shifts without extra labeled data. A reader already working on representation diffusion or multi-model fusion could extract the recipe and test the diagnostics themselves. I would send it for peer review. The method is concrete enough that referees can check the correlation evidence and the experimental controls directly.

Referee Report

2 major / 2 minor

Summary. The paper proposes EncMin2L, a Tippett-minimum p-value fusion of per-encoder representation-space diffusion models (RDMs) for OOD detection across global domain, semantic, texture, and covariate shifts. It introduces two ID-data-only diagnostics—η² (class-conditional F-test) and Δμ (log-likelihood shift under synthetic corruptions)—to identify each encoder's sensitivity to specific shift types, then aggregates per-encoder likelihood scores via a min-gate without any OOD labels or data, claiming ≥0.94 AUROC across all four shift types simultaneously while using 2.3× fewer parameters than monolithic multi-encoder baselines.

Significance. If the ID-only diagnostics reliably predict per-encoder performance on held-out real shifts, the method would offer a practical, calibration-stable, and label-free route to fusing multiple encoders for broad-spectrum OOD detection, with clear efficiency advantages over existing representation-space diffusion detectors.

major comments (2)

[Abstract] Abstract and experimental sections: strong AUROC figures (≥0.94 across all four shift types) and efficiency claims are stated without any experimental protocol, baseline definitions, statistical tests, or error bars, rendering it impossible to assess whether the numbers support the central claim of outperforming SOTA representation-space diffusion OOD detectors.
[§3 (Method), ID-data diagnostics subsection] §3 (Method), ID-data diagnostics subsection: the premise that η² (class-conditional variance ratio) and Δμ (log-likelihood change under synthetic corruptions) correctly rank encoders by sensitivity to each shift type is load-bearing for the Tippett min-p fusion and the “no OOD labels” advantage, yet no correlation is shown between these scalars and actual per-encoder AUROC on held-out semantic or domain shifts.

minor comments (2)

[§3.2] The precise mathematical definition of the EncMin2L min-gate and the calibration procedure for the combined score would benefit from an explicit equation.
[Figures/Tables] Figure captions and axis labels for any AUROC tables or shift-type breakdowns should explicitly list the exact benchmarks and baselines used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and insightful comments on our submission. We have carefully considered the points raised regarding the presentation of experimental results and the validation of our ID-data diagnostics. We provide responses to each major comment below, indicating where revisions will be made to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract and experimental sections: strong AUROC figures (≥0.94 across all four shift types) and efficiency claims are stated without any experimental protocol, baseline definitions, statistical tests, or error bars, rendering it impossible to assess whether the numbers support the central claim of outperforming SOTA representation-space diffusion OOD detectors.

Authors: We note that the abstract serves as a concise summary of key findings, while the experimental sections provide the detailed methodology. To ensure the claims are fully supported and easily verifiable, we will expand the experimental section to include a dedicated subsection on the evaluation protocol, explicit definitions and references for all baselines, results of statistical tests (e.g., paired t-tests or Wilcoxon tests for AUROC comparisons), and error bars representing variability across random seeds or data splits. These additions will allow readers to better assess the robustness of the reported ≥0.94 AUROC and efficiency gains. revision: yes
Referee: [§3 (Method), ID-data diagnostics subsection] §3 (Method), ID-data diagnostics subsection: the premise that η² (class-conditional variance ratio) and Δμ (log-likelihood change under synthetic corruptions) correctly rank encoders by sensitivity to each shift type is load-bearing for the Tippett min-p fusion and the “no OOD labels” advantage, yet no correlation is shown between these scalars and actual per-encoder AUROC on held-out semantic or domain shifts.

Authors: The η² and Δμ diagnostics are derived from statistical properties of the ID data to predict encoder sensitivity without requiring OOD samples, which underpins the label-free nature of EncMin2L. Although the manuscript shows the effectiveness of the fused detector, we concur that a direct empirical correlation analysis would bolster confidence in the diagnostics' predictive power. Accordingly, we will include in the revised version a new figure or table that plots or tabulates the relationship between the computed η²/Δμ values for each encoder and their individual AUROC performance on held-out shifts of semantic and domain types. This will demonstrate the ranking ability of the diagnostics. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on ID-only diagnostics and external benchmarks

full rationale

The paper's core construction uses two ID-data statistics (η² class-conditional F-test and Δμ log-likelihood shift under synthetic corruptions) to rank encoder sensitivities, then applies a Tippett min-p fusion to produce the final OOD score. These steps are defined and calibrated exclusively on in-distribution data and synthetic corruptions; the reported AUROC performance is measured on held-out OOD benchmarks that are not used in the diagnostic fitting or fusion calibration. No equation reduces a claimed prediction to a quantity fitted on the evaluation data, no self-citation chain supplies a load-bearing uniqueness result, and the method does not rename a known empirical pattern as a new derivation. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Abstract-only review prevents exhaustive enumeration; the listed items capture the main new components and background assumptions referenced in the summary.

axioms (1)

domain assumption Representation-space diffusion models yield useful likelihood scores for OOD detection when applied per encoder
Implicit foundation for the per-encoder RDM detectors.

invented entities (2)

EncMin2L no independent evidence
purpose: Encoder-agnostic two-level min-gate for fusing and calibrating per-encoder diffusion likelihoods
Newly introduced fusion component
eta^2 and Delta mu diagnostics no independent evidence
purpose: Quantify encoder specialization to shift types from ID data alone
Newly proposed statistical checks

pith-pipeline@v0.9.0 · 5738 in / 1457 out tokens · 58174 ms · 2026-05-21T06:49:11.574265+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages

[1]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

High-resolution image synthesis with latent diffusion models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[2]

CIFAR-100 (Canadian Institute for Advanced Research) , journal=

Alex Krizhevsky and Vinod Nair and Geoffrey Hinton , year=. CIFAR-100 (Canadian Institute for Advanced Research) , journal=

work page
[3]

NIPS workshop on deep learning and unsupervised feature learning , volume=

Reading digits in natural images with unsupervised feature learning , author=. NIPS workshop on deep learning and unsupervised feature learning , volume=. 2011 , organization=

work page 2011
[4]

Cimpoi and S

M. Cimpoi and S. Maji and I. Kokkinos and S. Mohamed and and A. Vedaldi , Title =. Proceedings of the

work page
[5]

Proceedings of International Conference on Computer Vision (ICCV) , month =

Deep Learning Face Attributes in the Wild , author =. Proceedings of International Conference on Computer Vision (ICCV) , month =

work page
[6]

Advances in neural information processing systems , volume=

Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=

work page
[7]

Advances in neural information processing systems , volume=

Maximum likelihood training of score-based diffusion models , author=. Advances in neural information processing systems , volume=

work page
[8]

International Conference on Learning Representations , volume=

On diffusion modeling for anomaly detection , author=. International Conference on Learning Representations , volume=

work page
[9]

International Conference on Learning Representations , year =

Score-Based Generative Modeling through Stochastic Differential Equations , author =. International Conference on Learning Representations , year =

work page
[10]

International Conference on Machine Learning , year =

Learning Transferable Visual Models From Natural Language Supervision , author =. International Conference on Machine Learning , year =

work page
[11]

Transactions on Machine Learning Research , year =

Oquab, Maxime and Darcet, Timoth. Transactions on Machine Learning Research , year =

work page
[12]

IEEE Conference on Computer Vision and Pattern Recognition , year =

Deep Residual Learning for Image Recognition , author =. IEEE Conference on Computer Vision and Pattern Recognition , year =

work page
[13]

International Conference on Learning Representations , year =

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , author =. International Conference on Learning Representations , year =

work page
[14]

International Conference on Learning Representations , year =

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , author =. International Conference on Learning Representations , year =

work page
[15]

Advances in Neural Information Processing Systems , year =

A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , author =. Advances in Neural Information Processing Systems , year =

work page
[16]

Advances in Neural Information Processing Systems , year =

Energy-based Out-of-Distribution Detection , author =. Advances in Neural Information Processing Systems , year =

work page
[17]

International Conference on Learning Representations , year =

Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One , author =. International Conference on Learning Representations , year =

work page
[18]

Advances in Neural Information Processing Systems , year =

Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder , author =. Advances in Neural Information Processing Systems , year =

work page
[19]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops , year =

Denoising Diffusion Models for Out-of-Distribution Detection , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops , year =

work page
[20]

and Young, S

Westfall, Peter H. and Young, S. Stanley , publisher =. Resampling-Based Multiple Testing: Examples and Methods for

work page
[21]

International Conference on Machine Learning , year =

Out-of-Distribution Detection with Deep Nearest Neighbors , author =. International Conference on Machine Learning , year =

work page
[22]

Advances in Neural Information Processing Systems , year =

Out-of-Distribution Detection using Multiple Semantic Label Representations , author =. Advances in Neural Information Processing Systems , year =

work page
[23]

Scandinavian Conference on Image Analysis , year =

Revisiting Likelihood-Based Out-of-Distribution Detection by Modeling Representations , author =. Scandinavian Conference on Image Analysis , year =

work page
[24]

arXiv preprint arXiv:2508.15737 , year =

Probability Density from Latent Diffusion Models for Out-of-Distribution Detection , author =. arXiv preprint arXiv:2508.15737 , year =

work page arXiv
[25]

Advances in Neural Information Processing Systems , year =

Out-of-Distribution Detection with a Single Unconditional Diffusion Model , author =. Advances in Neural Information Processing Systems , year =

work page
[26]

Advances in Neural Information Processing Systems , year =

Projection Regret: Reducing Background Bias for Novelty Detection via Diffusion Models , author =. Advances in Neural Information Processing Systems , year =

work page
[27]

and Caterini, Anthony L

Kamkari, Hamidreza and Ross, Brendan Leigh and Cresswell, Jesse C. and Caterini, Anthony L. and Krishnan, Rahul G. and Loaiza-Ganem, Gabriel , booktitle =. A Geometric Explanation of the Likelihood

work page
[28]

Proceedings of the 40th International Conference on Machine Learning , year =

Unsupervised Out-of-Distribution Detection with Diffusion Inpainting , author =. Proceedings of the 40th International Conference on Machine Learning , year =

work page
[29]

Kamilov , booktitle =

Shirin Shoushtari and Yi Wang and Xiao Shi and Salman Asif and Ulugbek S. Kamilov , booktitle =. EigenScore:. 2026 , url =

work page 2026

[1] [1]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

High-resolution image synthesis with latent diffusion models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[2] [2]

CIFAR-100 (Canadian Institute for Advanced Research) , journal=

Alex Krizhevsky and Vinod Nair and Geoffrey Hinton , year=. CIFAR-100 (Canadian Institute for Advanced Research) , journal=

work page

[3] [3]

NIPS workshop on deep learning and unsupervised feature learning , volume=

Reading digits in natural images with unsupervised feature learning , author=. NIPS workshop on deep learning and unsupervised feature learning , volume=. 2011 , organization=

work page 2011

[4] [4]

Cimpoi and S

M. Cimpoi and S. Maji and I. Kokkinos and S. Mohamed and and A. Vedaldi , Title =. Proceedings of the

work page

[5] [5]

Proceedings of International Conference on Computer Vision (ICCV) , month =

Deep Learning Face Attributes in the Wild , author =. Proceedings of International Conference on Computer Vision (ICCV) , month =

work page

[6] [6]

Advances in neural information processing systems , volume=

Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=

work page

[7] [7]

Advances in neural information processing systems , volume=

Maximum likelihood training of score-based diffusion models , author=. Advances in neural information processing systems , volume=

work page

[8] [8]

International Conference on Learning Representations , volume=

On diffusion modeling for anomaly detection , author=. International Conference on Learning Representations , volume=

work page

[9] [9]

International Conference on Learning Representations , year =

Score-Based Generative Modeling through Stochastic Differential Equations , author =. International Conference on Learning Representations , year =

work page

[10] [10]

International Conference on Machine Learning , year =

Learning Transferable Visual Models From Natural Language Supervision , author =. International Conference on Machine Learning , year =

work page

[11] [11]

Transactions on Machine Learning Research , year =

Oquab, Maxime and Darcet, Timoth. Transactions on Machine Learning Research , year =

work page

[12] [12]

IEEE Conference on Computer Vision and Pattern Recognition , year =

Deep Residual Learning for Image Recognition , author =. IEEE Conference on Computer Vision and Pattern Recognition , year =

work page

[13] [13]

International Conference on Learning Representations , year =

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , author =. International Conference on Learning Representations , year =

work page

[14] [14]

International Conference on Learning Representations , year =

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , author =. International Conference on Learning Representations , year =

work page

[15] [15]

Advances in Neural Information Processing Systems , year =

A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , author =. Advances in Neural Information Processing Systems , year =

work page

[16] [16]

Advances in Neural Information Processing Systems , year =

Energy-based Out-of-Distribution Detection , author =. Advances in Neural Information Processing Systems , year =

work page

[17] [17]

International Conference on Learning Representations , year =

Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One , author =. International Conference on Learning Representations , year =

work page

[18] [18]

Advances in Neural Information Processing Systems , year =

Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder , author =. Advances in Neural Information Processing Systems , year =

work page

[19] [19]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops , year =

Denoising Diffusion Models for Out-of-Distribution Detection , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops , year =

work page

[20] [20]

and Young, S

Westfall, Peter H. and Young, S. Stanley , publisher =. Resampling-Based Multiple Testing: Examples and Methods for

work page

[21] [21]

International Conference on Machine Learning , year =

Out-of-Distribution Detection with Deep Nearest Neighbors , author =. International Conference on Machine Learning , year =

work page

[22] [22]

Advances in Neural Information Processing Systems , year =

Out-of-Distribution Detection using Multiple Semantic Label Representations , author =. Advances in Neural Information Processing Systems , year =

work page

[23] [23]

Scandinavian Conference on Image Analysis , year =

Revisiting Likelihood-Based Out-of-Distribution Detection by Modeling Representations , author =. Scandinavian Conference on Image Analysis , year =

work page

[24] [24]

arXiv preprint arXiv:2508.15737 , year =

Probability Density from Latent Diffusion Models for Out-of-Distribution Detection , author =. arXiv preprint arXiv:2508.15737 , year =

work page arXiv

[25] [25]

Advances in Neural Information Processing Systems , year =

Out-of-Distribution Detection with a Single Unconditional Diffusion Model , author =. Advances in Neural Information Processing Systems , year =

work page

[26] [26]

Advances in Neural Information Processing Systems , year =

Projection Regret: Reducing Background Bias for Novelty Detection via Diffusion Models , author =. Advances in Neural Information Processing Systems , year =

work page

[27] [27]

and Caterini, Anthony L

Kamkari, Hamidreza and Ross, Brendan Leigh and Cresswell, Jesse C. and Caterini, Anthony L. and Krishnan, Rahul G. and Loaiza-Ganem, Gabriel , booktitle =. A Geometric Explanation of the Likelihood

work page

[28] [28]

Proceedings of the 40th International Conference on Machine Learning , year =

Unsupervised Out-of-Distribution Detection with Diffusion Inpainting , author =. Proceedings of the 40th International Conference on Machine Learning , year =

work page

[29] [29]

Kamilov , booktitle =

Shirin Shoushtari and Yi Wang and Xiao Shi and Salman Asif and Ulugbek S. Kamilov , booktitle =. EigenScore:. 2026 , url =

work page 2026