DAVIS: OOD Detection via Dominant Activations and Variance for Increased Separation

Abid Hassan; Nenad Medvidovic; Saad Shafiq; Tuan Ngo

arxiv: 2601.22703 · v2 · submitted 2026-01-30 · 💻 cs.CV

DAVIS: OOD Detection via Dominant Activations and Variance for Increased Separation

Abid Hassan , Tuan Ngo , Saad Shafiq , Nenad Medvidovic This is my paper

Pith reviewed 2026-05-16 09:52 UTC · model grok-4.3

classification 💻 cs.CV

keywords out-of-distribution detectionOODglobal average poolingfeature enrichmentpost-hoc detectionchannel-wise variancedominant activationsneural network robustness

0 comments

The pith

Adding channel-wise variance and dominant activations to features after global average pooling improves out-of-distribution detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper contends that global average pooling throws away channel-wise variance and maximum activation values that carry useful signals for telling in-distribution samples from out-of-distribution ones. DAVIS recovers those signals by concatenating them directly to the usual GAP feature vector before feeding it to any downstream OOD scorer. This change produces large drops in false-positive rates on CIFAR-10, CIFAR-100, and ImageNet across ResNet, DenseNet, and EfficientNet families. Because the fix is post-hoc and architecture-agnostic, it can be applied to already-trained models without retraining. If the added statistics really are the missing piece, then many existing OOD pipelines can be strengthened by preserving more than the mean of each channel.

Core claim

The paper claims that the channel-wise variance and the dominant (maximum) activation values present in activation maps before global average pooling are highly discriminative for OOD detection. Concatenating these two statistics to the standard GAP-derived feature vector enriches the representation, reduces information loss, and yields substantially better separation between in-distribution and out-of-distribution inputs when measured by FPR95 on CIFAR-10, CIFAR-100, and ImageNet-1k.

What carries the argument

DAVIS augmentation that appends per-channel variance and maximum activation values to the penultimate feature vector to restore statistics discarded by global average pooling.

If this is right

FPR95 drops by 48% on CIFAR-10 with ResNet-18 and by 38% on CIFAR-100 with ResNet-34.
A 27% reduction appears on ImageNet-1k with MobileNet-v2.
The same enrichment works across ResNet, DenseNet, and EfficientNet without architecture-specific tuning.
The improvement supplies a concrete reason to move beyond mean-only summaries in post-hoc OOD methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same channel statistics might help other representation-based safety tasks such as domain-shift detection or adversarial-example flagging.
Selecting or weighting only the most informative channels could further reduce the added dimensionality.
If pooling loss is the root issue, then architectures that avoid global average pooling entirely may already enjoy some of the benefit DAVIS tries to restore.

Load-bearing premise

Channel-wise variance and dominant activations remain reliably discriminative for OOD inputs when simply concatenated to the GAP vector, without creating new failure modes or requiring any retraining.

What would settle it

Running the same OOD detectors on the same benchmarks with and without the added variance-plus-maximum features and finding no reduction, or an increase, in FPR95 would falsify the central claim.

Figures

Figures reproduced from arXiv: 2601.22703 by Abid Hassan, Nenad Medvidovic, Saad Shafiq, Tuan Ngo.

**Figure 1.** Figure 1: Dominant activations provide a stronger OOD signal than mean activations. The plot shows the average activation gap between ID (CIFAR-10) and OOD (Texture) samples for each unit in the penultimate layer of a pre-trained ResNet-18. The gap derived from the dominant (maximum) activation (blue) is consistently and significantly larger than the gap from the standard mean activation (orange). Our work is motiv… view at source ↗

**Figure 2.** Figure 2: Using dominant activations improves OOD score separation. Left: OOD scores based on standard mean activations show significant overlap between the ID (CIFAR-10) and OOD (Texture) distributions, leading to poor separability. Right: Leveraging dominant (maximum) activations shifts the OOD score distribution away from the ID scores. Both plots show energy scores from a ResNet-18. Post-Hoc Methods and the Reli… view at source ↗

**Figure 3.** Figure 3: Feature statistics for ID (ImageNet) vs. OOD (Texture) samples on an efficientNet-b0 backbone. While mean µ(x) show poor separation, both the standard deviation σ(x) and maximum m(x) statistics maintain a clear separation between ID and OOD activations. 5 DISCUSSION This section discusses the broader implications of DAVIS, analyzing its robustness against modern architectures and its practical advantages.… view at source ↗

**Figure 4.** Figure 4: Ex∼Din [µ(x)] ≥ Ex∼Dout [µ(x)] (15a) Ex∼Din [m(x)] ≥ Ex∼Dout [m(x)] (15b) Ex∼Din [σ(x)] ≥ Ex∼Dout [σ(x)] (15c) This fundamental property enables the network to perform both its primary classification task and OOD detection effectively [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of the separation gap ∆ achieved by different statistical features, averaged over all test samples. Left: It demonstrate that incorporating the standard deviation ∆µ,σ yields a larger separation gap than using the mean activation alone ∆µ. Right: It demonstrate that using the maximum activation ∆m yields a larger separation gap than using the mean activation ∆µ. Results are shown for a ResNet-50… view at source ↗

**Figure 6.** Figure 6: Comparison of scoring functions. The MSP score depends only on the single maximum softmax [PITH_FULL_IMAGE:figures/full_fig_p030_6.png] view at source ↗

**Figure 7.** Figure 7: Illustration of penultimate layer features derived from median (left) and entropy (right) statistics. For both measures, out-of-distribution (OOD) samples (Texture) exhibit consistently higher values than indistribution (ID) samples (CIFAR-100), creating an ”inverted separation” that is challenging for standard OOD scoring. (Model: DenseNet-101) [PITH_FULL_IMAGE:figures/full_fig_p038_7.png] view at source ↗

**Figure 8.** Figure 8: Feature statistics for ID (CIFAR-100) vs. OOD (Texture) samples on an ResNet18-SiLU backbone. While mean µ(x) show poor separation, both the standard deviation σ(x) and maximum m(x) statistics maintain a clear separation between ID and OOD activations. To isolate this effect, we evaluated ResNet-18 with SiLU instead of ReLU on the CIFAR benchmarks. (termed ResNet18-SiLU) on CIFAR benchmarks. This experimen… view at source ↗

read the original abstract

Detecting out-of-distribution (OOD) inputs is a critical safeguard for deploying machine learning models in the real world. However, most post-hoc detection methods operate on penultimate feature representations derived from global average pooling (GAP) -- a lossy operation that discards valuable distributional statistics from activation maps prior to global average pooling. We contend that these overlooked statistics, particularly channel-wise variance and dominant (maximum) activations, are highly discriminative for OOD detection. We introduce DAVIS, a simple and broadly applicable post-hoc technique that enriches feature vectors by incorporating these crucial statistics, directly addressing the information loss from GAP. Extensive evaluations show DAVIS sets a new benchmark across diverse architectures, including ResNet, DenseNet, and EfficientNet. It achieves significant reductions in the false positive rate (FPR95), with improvements of 48.26\% on CIFAR-10 using ResNet-18, 38.13\% on CIFAR-100 using ResNet-34, and 26.83\% on ImageNet-1k benchmarks using MobileNet-v2. Our analysis reveals the underlying mechanism for this improvement, providing a principled basis for moving beyond the mean in OOD detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DAVIS adds per-channel variance and max activations to GAP features for OOD detection and reports sizable FPR95 drops across models, but the specific stats need ablations to show they are the real driver.

read the letter

The main thing here is that DAVIS keeps more than just the mean from activation maps by appending channel-wise variance and the dominant activation value to the usual GAP vector, then uses that richer feature for post-hoc OOD scoring. The authors test it on ResNet, DenseNet, and EfficientNet families and report clear FPR95 reductions, such as 48% on CIFAR-10 with ResNet-18, 38% on CIFAR-100 with ResNet-34, and 27% on ImageNet-1k with MobileNet-v2. The method stays simple and architecture-agnostic, which is the practical strength: no retraining, just extra statistics pulled before pooling. They also include some analysis of how these extra numbers separate in-distribution from out-of-distribution samples better than the mean alone. That cross-architecture coverage and the focus on a real deployment pain point give the work a usable core. The soft spots sit in the experimental controls. The abstract shows no error bars, no ablation tables that swap in other per-channel statistics, and limited detail on how the enriched vector interacts with different downstream detectors like Mahalanobis or energy-based scores. If the lift comes mainly from higher dimensionality or any second-moment cue rather than variance and max specifically, the DAVIS recipe is not yet proven load-bearing. The stress-test note on that point holds until the full results are checked. This paper is aimed at people who already run post-hoc OOD methods on vision models and want a quick, cheap upgrade to try. A practitioner could implement it in a few lines and measure the difference on their own data. It has enough of a concrete proposal and testable claims to go to peer review rather than a desk reject, even if the current version would benefit from tighter ablations and variance reporting.

Referee Report

3 major / 1 minor

Summary. The paper introduces DAVIS, a post-hoc OOD detection method that augments standard GAP-derived feature vectors with channel-wise variance and dominant (maximum) activations extracted from pre-GAP activation maps. It claims this directly mitigates information loss from pooling and yields large empirical gains, including 48.26% FPR95 reduction on CIFAR-10 (ResNet-18), 38.13% on CIFAR-100 (ResNet-34), and 26.83% on ImageNet-1k (MobileNet-v2), establishing new benchmarks across ResNet, DenseNet, and EfficientNet families while providing an analysis of the underlying mechanism.

Significance. If the reported FPR95 gains prove robust under standard controls (error bars, ablations, and multiple seeds), DAVIS would offer a lightweight, architecture-agnostic improvement to existing post-hoc detectors. The emphasis on moving beyond the mean statistic could influence feature design in OOD and related tasks, provided the specific choice of variance-plus-max is shown to be load-bearing rather than incidental.

major comments (3)

[Experiments] Experiments section (and abstract): the headline FPR95 reductions (48.26%, 38.13%, 26.83%) are stated without error bars, number of random seeds, or statistical significance tests. This omission makes it impossible to judge whether the improvements exceed typical run-to-run variance in OOD benchmarks.
[Method] Method section: no ablation isolates the contribution of variance versus dominant activations versus simply increasing feature dimensionality; the paper therefore does not demonstrate that these two particular per-channel statistics are privileged over other descriptors (e.g., min, skewness, or even random projections).
[Experiments] Experiments section, comparison tables: baseline results for recent post-hoc methods (e.g., current Mahalanobis variants, ReAct, or ASH) are not reported with the same protocol, so the claim of “setting a new benchmark” cannot be verified from the given numbers alone.

minor comments (1)

[Abstract] The abstract states “our analysis reveals the underlying mechanism” but the provided text does not indicate whether this analysis includes quantitative diagnostics (e.g., distribution plots of variance/max on ID vs. OOD) or remains qualitative.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment point-by-point below and will revise the manuscript to incorporate the suggested improvements where they strengthen the work.

read point-by-point responses

Referee: Experiments section (and abstract): the headline FPR95 reductions (48.26%, 38.13%, 26.83%) are stated without error bars, number of random seeds, or statistical significance tests. This omission makes it impossible to judge whether the improvements exceed typical run-to-run variance in OOD benchmarks.

Authors: We agree that reporting variability is important for robustness claims. In the revised manuscript we will rerun all experiments over five random seeds, report mean FPR95 values with standard deviations and error bars in both tables and the abstract, and include paired statistical significance tests (e.g., Wilcoxon) against the strongest baseline to confirm the gains exceed run-to-run variance. revision: yes
Referee: Method section: no ablation isolates the contribution of variance versus dominant activations versus simply increasing feature dimensionality; the paper therefore does not demonstrate that these two particular per-channel statistics are privileged over other descriptors (e.g., min, skewness, or even random projections).

Authors: We will add a dedicated ablation subsection that (i) isolates variance alone, dominant activations alone, and their combination, (ii) compares against alternative per-channel descriptors (min, skewness, kurtosis) at identical dimensionality, and (iii) contrasts with random projections of the same added dimension. This will demonstrate whether the chosen statistics are load-bearing. revision: yes
Referee: Experiments section, comparison tables: baseline results for recent post-hoc methods (e.g., current Mahalanobis variants, ReAct, or ASH) are not reported with the same protocol, so the claim of “setting a new benchmark” cannot be verified from the given numbers alone.

Authors: We will re-evaluate all cited baselines (including updated Mahalanobis variants, ReAct, and ASH) under the identical protocol, data splits, preprocessing, and scoring used for DAVIS. Updated tables will report these re-computed numbers so that the benchmark claims can be directly verified. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical proposal without derivation chain or self-referential fits

full rationale

The paper introduces DAVIS as a post-hoc method that concatenates channel-wise variance and dominant activations to GAP-derived features for OOD detection. No equations, derivations, or fitted parameters are described that reduce the claimed improvements to inputs by construction. Claims rest entirely on empirical benchmarks across ResNet, DenseNet, and EfficientNet, with no self-citation load-bearing the core mechanism or uniqueness theorem invoked. The skeptic concern targets the discriminative power of the chosen statistics, which is a correctness question rather than circularity. This matches the default expectation of no significant circularity for an empirical technique.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The method is presented as a direct, parameter-free post-hoc enrichment of existing features.

pith-pipeline@v0.9.0 · 5520 in / 1152 out tokens · 22465 ms · 2026-05-16T09:52:52.993926+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages

[3]

Set all values inh(x)less thantto zero

work page
[4]

Lets 2 =P h(x), the sum after pruning

work page
[5]

Scale all non-zero values inh(x)byexp(s 1/s2). The final model output becomes, which is then used to compute energy scoreSEnergy(x;f ASH)∈Rfor OOD detection : f ASH(x) =W ⊤hASH(x) +b SCALE(Xu et al., 2024) It is a post-hoc method designed to enhance out-of-distribution (OOD) detection by adaptively scaling the activation of the penultimate layerh(x)before...

work page 2024
[6]

Compute thep-th percentile thresholdtofh(x)

work page
[7]

Lets 1 =P h(x), the sum of all activation values before pruning

work page
[8]

Construct a binary mask1 {h(x)≥t} that keeps only the top-pactivations

work page
[9]

Lets 2 =P h(x)·1 {h(x)≥t}, the sum of the top-pactivations

work page
[10]

Compute the scaling ratior= s1 s2

work page
[11]

The final model output is then computed with the scaled activations, and theenergy scoreis used for OOD detection: f SCALE(x) =W ⊤hSCALE(x) +b, S Energy(x;f SCALE)∈R

Scale the original activations byexp(r): hSCALE(x) = exp(r)·h(x). The final model output is then computed with the scaled activations, and theenergy scoreis used for OOD detection: f SCALE(x) =W ⊤hSCALE(x) +b, S Energy(x;f SCALE)∈R. B STATISTICALANALYSIS In this section, we present a detailed statistical analysis of our method,DAVIS, demonstrating how it ...

work page arXiv 2020
[12]

This dimension was chosen to be comparable to other models in our evaluation, such as the ResNet variants (512) and DenseNet-101 (342). Following established protocols (Sun et al., 2021; Sun & Li, 2022; Djurisic et al., 2023), all models were trained from scratch for 100 epochs using SGD with a momentum of 0.9, a weight decay of 0.0001, and a batch size o...

work page 2021
[13]

Computational Environment.All CIFAR model training and OOD detection experiments were conducted on an Apple M2 Max system with 96 GB of RAM

to ensure our re-implementation was consistent with the authors’ reported optimal setting, providing a fair comparison. Computational Environment.All CIFAR model training and OOD detection experiments were conducted on an Apple M2 Max system with 96 GB of RAM. The experiments were implemented in Python using PyTorch (v2.1) and the Torchvision library. F A...

work page arXiv 2017

[1] [3]

Set all values inh(x)less thantto zero

work page

[2] [4]

Lets 2 =P h(x), the sum after pruning

work page

[3] [5]

Scale all non-zero values inh(x)byexp(s 1/s2). The final model output becomes, which is then used to compute energy scoreSEnergy(x;f ASH)∈Rfor OOD detection : f ASH(x) =W ⊤hASH(x) +b SCALE(Xu et al., 2024) It is a post-hoc method designed to enhance out-of-distribution (OOD) detection by adaptively scaling the activation of the penultimate layerh(x)before...

work page 2024

[4] [6]

Compute thep-th percentile thresholdtofh(x)

work page

[5] [7]

Lets 1 =P h(x), the sum of all activation values before pruning

work page

[6] [8]

Construct a binary mask1 {h(x)≥t} that keeps only the top-pactivations

work page

[7] [9]

Lets 2 =P h(x)·1 {h(x)≥t}, the sum of the top-pactivations

work page

[8] [10]

Compute the scaling ratior= s1 s2

work page

[9] [11]

The final model output is then computed with the scaled activations, and theenergy scoreis used for OOD detection: f SCALE(x) =W ⊤hSCALE(x) +b, S Energy(x;f SCALE)∈R

Scale the original activations byexp(r): hSCALE(x) = exp(r)·h(x). The final model output is then computed with the scaled activations, and theenergy scoreis used for OOD detection: f SCALE(x) =W ⊤hSCALE(x) +b, S Energy(x;f SCALE)∈R. B STATISTICALANALYSIS In this section, we present a detailed statistical analysis of our method,DAVIS, demonstrating how it ...

work page arXiv 2020

[10] [12]

This dimension was chosen to be comparable to other models in our evaluation, such as the ResNet variants (512) and DenseNet-101 (342). Following established protocols (Sun et al., 2021; Sun & Li, 2022; Djurisic et al., 2023), all models were trained from scratch for 100 epochs using SGD with a momentum of 0.9, a weight decay of 0.0001, and a batch size o...

work page 2021

[11] [13]

Computational Environment.All CIFAR model training and OOD detection experiments were conducted on an Apple M2 Max system with 96 GB of RAM

to ensure our re-implementation was consistent with the authors’ reported optimal setting, providing a fair comparison. Computational Environment.All CIFAR model training and OOD detection experiments were conducted on an Apple M2 Max system with 96 GB of RAM. The experiments were implemented in Python using PyTorch (v2.1) and the Torchvision library. F A...

work page arXiv 2017