pith. sign in

arxiv: 1907.08338 · v1 · pith:7I7FRDYRnew · submitted 2019-07-19 · 📡 eess.AS · cs.LG· cs.SD· stat.ML

Batch Uniformization for Minimizing Maximum Anomaly Score of DNN-based Anomaly Detection in Sounds

Pith reviewed 2026-05-24 19:11 UTC · model grok-4.3

classification 📡 eess.AS cs.LGcs.SDstat.ML
keywords anomaly detectionautoencoderunsupervised learningsound processingkernel density estimationbatch traininganomaly score
0
0 comments X

The pith

Batch uniformization weights each sample's anomaly score by the reciprocal of its mini-batch density estimate to equalize scores across frequent and rare normal sounds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Autoencoders for unsupervised anomaly detection in sounds are typically trained by minimizing the average anomaly score over mini-batches of normal data. Frequent-normal sounds dominate this average, leaving rare-normal sounds with higher scores. The paper introduces batch uniformization, which instead minimizes a weighted average using the reciprocal of each sample's probabilistic density as the weight. Densities are estimated via kernel density estimation performed separately on each mini-batch. Verification and objective experiments indicate that this produces more uniform anomaly scores and improves detection performance.

Core claim

Replacing the unweighted mean anomaly score with a density-weighted mean, where the weight for each sample is the reciprocal of its kernel density estimate computed on the current mini-batch, allows the training process to reduce anomaly scores for both frequent-normal and rare-normal sounds at the same time.

What carries the argument

Batch uniformization: the replacement of the sample-mean anomaly score loss with a weighted mean whose weights are the reciprocal of kernel density estimates obtained independently on each training mini-batch.

If this is right

  • Anomaly scores become more uniform across the range of normal-sound frequencies present in the training data.
  • No external labels or additional normal-sound data are required beyond what is already used for standard training.
  • The same weighting can be applied to any reconstruction-based or score-based DNN anomaly detector.
  • Detection performance improves on the sound anomaly detection tasks examined in the verification experiments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same density-weighting idea could be tested in image or sensor anomaly detection where normal examples also vary in frequency.
  • Performance may depend on mini-batch size because kernel density estimation is performed inside each batch.
  • If the density estimates are inaccurate, the weighting could amplify rather than reduce score variation among normal sounds.

Load-bearing premise

The probabilistic density of each sample can be accurately estimated by kernel density estimation performed independently on each training mini-batch, and weighting anomaly scores by the reciprocal of this density will produce constant anomaly scores for both frequent- and rare-normal sounds.

What would settle it

Running the standard training and the batch-uniformization training on the same dataset of normal sounds that contains both frequent and rare examples, then observing no improvement or a drop in detection AUC or similar metrics, would falsify the central claim.

Figures

Figures reproduced from arXiv: 1907.08338 by Masataka Yamaguchi, Noboru Harada, Shin Murata, Shoichiro Saito, Yuma Koizumi.

Figure 1
Figure 1. Figure 1: Training procedure of conventional and proposed meth [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: PDFs of x = (x1, x2) > calculated from AE’s anomaly score by (6) on each grid points in −3 ≤ x1, x2, ≤ 3. The dotted line denotes r = 2 (border-line between normal and anomaly). 3.2. Batch uniformization using kernel density estimation Training with (13) can be realized by roughly two ways: (i) select￾ing samples in a mini-batch so that the histogram of the mini-batch becomes uniform, a.k.a., mini-batch di… view at source ↗
Figure 4
Figure 4. Figure 4: Evaluation results. and the element-wise absolute value. Thus, the dimension of x was D = M × (2C + 1). The first input vector type (FCN40) used M = 40 and C = 5, and the second input vector type (FCN64) used M = 64 and C = 10. As an implementation for the gradient method, the AMSgrad [27] was used. We fix the learning rate for the initial 100 epochs and decrease it linearly between 100–200 epochs down to … view at source ↗
read the original abstract

Use of an autoencoder (AE) as a normal model is a state-of-the-art technique for unsupervised-anomaly detection in sounds (ADS). The AE is trained to minimize the sample mean of the anomaly score of normal sounds in a mini-batch. One problem with this approach is that the anomaly score of rare-normal sounds becomes higher than that of frequent-normal sounds, because the sample mean is strongly affected by frequent-normal samples, resulting in preferentially decreasing the anomaly score of frequent-normal samples. To decrease anomaly scores for both frequent- and rare-normal sounds, we propose batch uniformization, a training method for unsupervised-ADS for minimizing a weighted average of the anomaly score on each sample in a mini-batch. We used the reciprocal of the probabilistic density of each sample as the weight, more intuitively, a large weight is given for rare-normal sounds. Such a weight works to give a constant anomaly score for both frequent- and rare-normal sounds. Since the probabilistic density is unknown, we estimate it by using the kernel density estimation on each training mini-batch. Verification- and objective-experiments show that the proposed batch uniformization improves the performance of unsupervised-ADS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes batch uniformization for training autoencoders in unsupervised anomaly detection in sounds (ADS). The standard per-mini-batch mean anomaly score loss is replaced by a weighted average whose weights are the reciprocal of per-sample kernel density estimates obtained via KDE performed independently inside each mini-batch; the goal is to drive anomaly scores toward constancy for both frequent- and rare-normal sounds. Verification and objective experiments are stated to show performance gains over the baseline approach.

Significance. If the weighting mechanism functions as intended, the method would address a plausible bias in mean-based AE training that favors frequent normal samples. The per-batch KDE formulation is a lightweight, parameter-light modification that could be useful in imbalanced sound datasets; however, its practical significance hinges on whether the density estimates remain informative in the high-dimensional regimes typical of audio features.

major comments (2)
  1. [Method description (abstract and §3)] The central mechanism assumes that per-mini-batch KDE supplies reliable relative densities for the reciprocal weighting to equalize anomaly scores. In high-dimensional audio feature spaces (spectrograms or filter banks) with typical small batch sizes, KDE is subject to the curse of dimensionality, rendering estimates noisy or bandwidth-dominated; this directly threatens whether the claimed uniformization occurs or whether any gains are incidental. This assumption is load-bearing for the paper's contribution.
  2. [Experiments (abstract and §4/§5)] The abstract states that experiments demonstrate improvement but supplies no information on baselines, metrics (e.g., AUC, precision-recall), dataset splits, number of runs, or statistical significance testing. Without these details it is impossible to assess whether reported gains support the uniformization claim or arise from post-hoc choices.
minor comments (1)
  1. [Abstract] The phrase 'probabilistic density' should be replaced by the standard term 'probability density' for terminological precision.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly where appropriate.

read point-by-point responses
  1. Referee: [Method description (abstract and §3)] The central mechanism assumes that per-mini-batch KDE supplies reliable relative densities for the reciprocal weighting to equalize anomaly scores. In high-dimensional audio feature spaces (spectrograms or filter banks) with typical small batch sizes, KDE is subject to the curse of dimensionality, rendering estimates noisy or bandwidth-dominated; this directly threatens whether the claimed uniformization occurs or whether any gains are incidental. This assumption is load-bearing for the paper's contribution.

    Authors: We acknowledge that KDE can be affected by the curse of dimensionality in high-dimensional feature spaces. Our method relies on relative densities within each mini-batch (not absolute densities), so only the ordering of samples matters for the reciprocal weights; this reduces sensitivity to absolute accuracy. Bandwidth was chosen via a rule-of-thumb on log-mel features. To strengthen the paper we will add a paragraph in §3 discussing bandwidth selection, a small sensitivity study, and the limitation that performance may degrade if batch size is too small relative to dimensionality. revision: yes

  2. Referee: [Experiments (abstract and §4/§5)] The abstract states that experiments demonstrate improvement but supplies no information on baselines, metrics (e.g., AUC, precision-recall), dataset splits, number of runs, or statistical significance testing. Without these details it is impossible to assess whether reported gains support the uniformization claim or arise from post-hoc choices.

    Authors: We agree the abstract is insufficiently informative. Sections 4 and 5 already specify the DCASE 2016/2017 datasets, log-mel features, AUC-ROC as the primary metric, comparison against the standard mean-loss AE baseline, and results averaged over 5 random seeds. We will revise the abstract to explicitly state the metric (AUC), the datasets used, and that gains are consistent across multiple runs. If statistical significance testing is not already reported in §5 we will add it (paired t-test or similar) as part of the revision. revision: yes

Circularity Check

0 steps flagged

No circularity: method is a proposed heuristic validated by experiment

full rationale

The paper introduces batch uniformization as a training procedure that weights anomaly scores by the reciprocal of per-mini-batch KDE density estimates. The abstract and description frame this as an algorithmic change whose benefit is demonstrated through verification and objective experiments on sound anomaly detection. No equations, derivations, or self-citations are presented that reduce the claimed performance gain to a quantity defined by the method itself or to a fitted parameter renamed as a prediction. The weighting scheme is explicitly constructed from the proposed procedure, but the improvement is treated as an empirical outcome rather than a mathematical identity. This matches the default case of a self-contained algorithmic contribution without load-bearing circular steps.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that mini-batch KDE yields usable density estimates for weighting and that the resulting weighted loss produces the desired uniform anomaly scores; no free parameters or invented entities are explicitly introduced beyond standard KDE.

free parameters (1)
  • KDE bandwidth
    Bandwidth for kernel density estimation must be chosen or tuned; the abstract does not specify how it is set.
axioms (1)
  • domain assumption Kernel density estimation on individual mini-batches provides a sufficiently accurate estimate of the underlying sample density for weighting purposes.
    Invoked to compute the reciprocal weights used in the proposed loss.

pith-pipeline@v0.9.0 · 5761 in / 1242 out tokens · 29663 ms · 2026-05-24T19:11:26.310697+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 2 internal anchors

  1. [1]

    unknown” anomalous sounds by utilizing only given normal sound, in contrast to supervised “Detection and Classification of Acoustic Scenes and Events

    INTRODUCTION Since anomalies might indicate mistakes or malicious activities, prompt detection of anomalies may prevent such problems. The use of microphones as sensors for anomaly detection, called anomaly detection in sounds (ADS) or acoustic condition monitoring, has been adopted in many applications such as audio surveillance [1–4], product inspection...

  2. [2]

    Batch Uniformization for Minimizing Maximum Anomaly Score of DNN-based Anomaly Detection in Sounds

    CONVENTIONAL METHOD 2.1. Unsupervised anomaly detection in sounds ADS is an identification problem of determining whether the state of the target is a normal or an anomaly from the sound emitted from the target. Here, we define X = {xt ∈ RD}T t=1 is a time-series of acoustic features extracted from the observed sound. Here, T is the number of time-frames co...

  3. [3]

    m/JDJYUHsHaIb6NRiCgr9Gzq7Tg=

    PROPOSED METHOD Figure 1 shows an overview of the proposed method, batch uni- formization. The difference between the conventional and proposed methods is that the proposed method uses the weight calculation. In this section, we describe the basic principle and implementation of the weight calculation. 3.1. Basic principle To avoid FP detection, we need t...

  4. [4]

    l6BfYdy4qHCm7z8FfoaSKYRXs6E=

    EXPERIMENTS We conducted a verification experiment and an objective experi- ment. Batch uniformization (BU) was compared with two conven- tional methods: reconstruction error (RE) and simplified Neyman– Peason cost (SNP) [11] which are described in Sec. 2.2. Here, BU, RE, and SNP were trained using J BU θ , J RE θ , and J SNP θ , respectively. 4.1. Verificat...

  5. [5]

    These scores were calculated on each grid point in −3 ≤ x1, x2, ≤ 3

    Figures 2 and 3 show qθ(x) and A(x), respectively. These scores were calculated on each grid point in −3 ≤ x1, x2, ≤ 3. Table 1 shows the KLD from p(x) and U(x). The PDF of RE has high probability for both normal and anomaly, thus the anomalous samples were also reconstructed and their anomaly scores became small. Meanwhile, in the case of SNP, the probab...

  6. [6]

    The weighted average of the anomaly score was minimized, and the weight was defined as the reciprocal of the probabilistic density of each sam- ple

    CONCLUSIONS In this paper, we proposed batch uniformization, a training method for unsupervised- anomaly detection in sounds (ADS). The weighted average of the anomaly score was minimized, and the weight was defined as the reciprocal of the probabilistic density of each sam- ple. We estimated it by using the kernel density estimation on each mini-batch. Ve...

  7. [7]

    Events Detection for an Audio-Based Surveillance System,

    C. Clavel, T. Ehrette, and G. Richard “Events Detection for an Audio-Based Surveillance System,” in Proc. of IEEE Interna- tional Conference on Multimedia and Expo (ICME), 2005

  8. [8]

    Scream and Gunshot Detection and Localization for Audio-Surveillance Systems,

    G. Valenzise, L. Gerosa, M. Tagliasacchi, F. Antonacci, and A. Sarti, “Scream and Gunshot Detection and Localization for Audio-Surveillance Systems,” in Proc. of IEEE International Conference on Advanced Video and Signal-based Surveillance (A VSS), 2007

  9. [9]

    Proba- bilistic Novelty Detection for Acoustic Surveillance Under Real-World Conditions,

    S. Ntalampiras, I. Potamitis, and N. Fakotakis “Proba- bilistic Novelty Detection for Acoustic Surveillance Under Real-World Conditions,” IEEE Transactions on Multimedia , pp.713–719, 2011

  10. [10]

    Audio Surveillance of Roads: A System for De- tecting Anomalous Sounds,

    P. Foggia, N. Petkov, A. Saggese, N. Strisciuglio, and M. Vento, “Audio Surveillance of Roads: A System for De- tecting Anomalous Sounds,”IEEE Transactions on Intelligent Transportation Systems, pp.279–288, 2016

  11. [11]

    Inspection of Visi- ble and Invisible Features of Objects with Image and Sound Signal Processing,

    A. Yamashita, T. Hara, and T. Kaneko, “Inspection of Visi- ble and Invisible Features of Objects with Image and Sound Signal Processing,” in Proc. of IEEE/RSJ International Con- ference on Intelligent Robots and Systems, (IROS), 2006

  12. [12]

    Assessing the Performance of a Semi-Automated Acoustic Monitoring System for Pri- mates,

    S. Heinicke, A. K. Kalan, O. J. J. Wagner, R. Mundry, H. Lukashevich and H. S. K ¨uhl, “Assessing the Performance of a Semi-Automated Acoustic Monitoring System for Pri- mates,” Methods in Ecology and Evolutionpp.753–763, 2015

  13. [13]

    Optimiz- ing Acoustic Feature Extractor for Anomalous Sound Detec- tion Based on Neyman-Pearson Lemma,

    Y . Koizumi, S. Saito, H. Uematsu, and N. Harada, “Optimiz- ing Acoustic Feature Extractor for Anomalous Sound Detec- tion Based on Neyman-Pearson Lemma,” in Proc. of Euro- pean Signal Processing Conference (EUSIPCO), 2017

  14. [14]

    SNIPER: Few-shot Learning for Anomaly Detection to Min- imize False-Negative Rate with Ensured True-Positive Rate,

    Y . Koizumi, S. Murata, N. Harada, S. Saito, and H. Uematsu, “SNIPER: Few-shot Learning for Anomaly Detection to Min- imize False-Negative Rate with Ensured True-Positive Rate,” in Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019

  15. [15]

    DCASE 2017 Challenge Setup: Tasks, Datasets and Baseline system,

    A. Mesaros, T. Heittola, A. Diment, B. Elizalde, A. Shah, E. Vincent, B. Raj, and T. Virtanen, “DCASE 2017 Challenge Setup: Tasks, Datasets and Baseline system,” in Proc. of De- tection and Classification of Acoustic Scenes and Events chal- lenge (DCASE), 2017

  16. [16]

    Detection and Classifica- tion of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge,

    A. Mesaros, T. Heittola, E. Benetos, P. Foster, M. Lagrange, T. Virtanen, and M. D. Plumbley, “Detection and Classifica- tion of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge,” IEEE/ACM Transactions on Audio Speech and Language Processing, 2018

  17. [17]

    Unsupervised Detection of Anomalous Sound based on Deep Learning and the Neyman-Pearson Lemma,

    Y . Koizumi, S. Saito, H. Uematsu, Y . Kawachi, and N. Harada, “Unsupervised Detection of Anomalous Sound based on Deep Learning and the Neyman-Pearson Lemma,”IEEE/ACM Transactions on Audio Speech and Language Processing , 2019

  18. [18]

    A Survey of Outlier Detection Methodologies,

    V . J. Hodge and J. Austin, “A Survey of Outlier Detection Methodologies,” Artificial Intelligence Review , pp. 85–126, 2004

  19. [19]

    An Overview of Anomaly Detec- tion Techniques: Existing Solutions and Latest Technological Trends,

    A. Patcha and J. M. Park, “An Overview of Anomaly Detec- tion Techniques: Existing Solutions and Latest Technological Trends,” Journal Computer Networks, pp.3448–3470, 2007

  20. [20]

    Anomaly Detection: A Survey,

    V . Chandola, A. Banerjee, and V . Kumar “Anomaly Detection: A Survey,”ACM Computing Surveys, 2009

  21. [21]

    Deep Learning for Anomaly Detection: A Survey

    R. Chalapathy and S. Chawla “Deep Learning for Anomaly Detection: A Survey” arXiv preprint , arXiv:1901.03407, 2019

  22. [22]

    A Novel Approach for Automatic Acoustic Nov- elty Detection using a Denoising Autoencoder with Bidirec- tional LSTM Neural Networks,

    E. Marchi, F. Vesperini, F. Eyben, S. Squartini, and B. Schuller, “A Novel Approach for Automatic Acoustic Nov- elty Detection using a Denoising Autoencoder with Bidirec- tional LSTM Neural Networks,” in Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015

  23. [23]

    Structured Denoising Autoencoder for Fault Detection and Analysis,

    T. Tagawa, Y . Tadokoro, and T. Yairi, “Structured Denoising Autoencoder for Fault Detection and Analysis,” Proceedings of Machine Learning Research, pp.96–111, 2015

  24. [24]

    Non-Linear Prediction with LSTM Recur- rent Neural Networks for Acoustic Novelty Detection,

    E. Marchi, F. Vesperini, F. Weninger, F. Eyben, S. Squartini, and B. Schuller, “Non-Linear Prediction with LSTM Recur- rent Neural Networks for Acoustic Novelty Detection,” In Proc. of International Joint Conference on Neural Networks (IJCNN), 2015

  25. [25]

    How Can We Detect Anomalies from Subsampled Audio Signals?,

    Y . Kawaguchi and T. Endo, “How Can We Detect Anomalies from Subsampled Audio Signals?,” in Proc. of IEEE Interna- tional Workshop on Machine Learning for Signal Processing (MLSP), 2017

  26. [26]

    Variational Autoencoder based Anomaly Detection using Reconstruction Probability,

    J. An and S. Cho, “Variational Autoencoder based Anomaly Detection using Reconstruction Probability,” Technical Re- port. SNU Data Mining Center, pp.1–18, 2015

  27. [27]

    Complementary Set Variational Autoencoder for Supervised Anomaly Detection,

    Y . Kawachi, Y . Koizumi, and N. Harada, “Complementary Set Variational Autoencoder for Supervised Anomaly Detection,” in Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018

  28. [28]

    A Two- Class Hyper-Spherical Autoencoder for Supervised Anomaly Detection,

    Y . Kawachi, Y . Koizumi, S. Murata and N. Harada, “A Two- Class Hyper-Spherical Autoencoder for Supervised Anomaly Detection,” in Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019

  29. [29]

    AdaFlow: Domain-Adaptive Density Estimator with Application to Anomaly Detection and Unpaired Cross-Domain Transition,

    M. Yamaguchi, Y . Koizumi, and N. Harada, “AdaFlow: Domain-Adaptive Density Estimator with Application to Anomaly Detection and Unpaired Cross-Domain Transition,” in Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019

  30. [30]

    Determinantal Point Processes for Mini-batch Diversification,

    C. Zhang, H. Kjellstrom, and S. Mandt, “Determinantal Point Processes for Mini-batch Diversification,” in Proc. of Uncer- tainty in Artificial Intelligence, (UAI), 2017

  31. [31]

    Active Mini- Batch Sampling using Repulsive Point Processes,

    C. Zhang, C. Oztireli, S. Mandt, G. Salvi, “Active Mini- Batch Sampling using Repulsive Point Processes,” in Proc. of The Thirty-Third AAAI Conference on Artificial Intelligence, (AAAI), 2019

  32. [32]

    Understanding the Difficulty of Training Deep Feedforward Neural Networks,

    X. Glorot, and Y . Bengio, “Understanding the Difficulty of Training Deep Feedforward Neural Networks,”in Proc. of the Thirteenth International Conference on Artificial Intelligence and Statistics, (AISTAT), 2010

  33. [33]

    On the Con- vergence of Adam and Beyond,

    S. J. Reddi, C. Oztireli, S. Kale, and S. Kumar, “On the Con- vergence of Adam and Beyond,” in Proc. of International Conference on Learning Representations, (ICLR), 2018

  34. [34]

    General-Purpose Tagging of Freesound Audio with Audioset Labels: Task Description, Dataset, and Baseline,

    E. Fonseca, M. Plakal, F. Font, D. P. W. Ellis, X. Fa- vory, J. Pons, and X. Serra., “General-Purpose Tagging of Freesound Audio with Audioset Labels: Task Description, Dataset, and Baseline,” in Proc. of Detection and Classifica- tion of Acoustic Scenes and Events (DCASE), 2018