Batch Uniformization for Minimizing Maximum Anomaly Score of DNN-based Anomaly Detection in Sounds

Masataka Yamaguchi; Noboru Harada; Shin Murata; Shoichiro Saito; Yuma Koizumi

arxiv: 1907.08338 · v1 · pith:7I7FRDYRnew · submitted 2019-07-19 · 📡 eess.AS · cs.LG· cs.SD· stat.ML

Batch Uniformization for Minimizing Maximum Anomaly Score of DNN-based Anomaly Detection in Sounds

Yuma Koizumi , Shoichiro Saito , Masataka Yamaguchi , Shin Murata , Noboru Harada This is my paper

Pith reviewed 2026-05-24 19:11 UTC · model grok-4.3

classification 📡 eess.AS cs.LGcs.SDstat.ML

keywords anomaly detectionautoencoderunsupervised learningsound processingkernel density estimationbatch traininganomaly score

0 comments

The pith

Batch uniformization weights each sample's anomaly score by the reciprocal of its mini-batch density estimate to equalize scores across frequent and rare normal sounds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Autoencoders for unsupervised anomaly detection in sounds are typically trained by minimizing the average anomaly score over mini-batches of normal data. Frequent-normal sounds dominate this average, leaving rare-normal sounds with higher scores. The paper introduces batch uniformization, which instead minimizes a weighted average using the reciprocal of each sample's probabilistic density as the weight. Densities are estimated via kernel density estimation performed separately on each mini-batch. Verification and objective experiments indicate that this produces more uniform anomaly scores and improves detection performance.

Core claim

Replacing the unweighted mean anomaly score with a density-weighted mean, where the weight for each sample is the reciprocal of its kernel density estimate computed on the current mini-batch, allows the training process to reduce anomaly scores for both frequent-normal and rare-normal sounds at the same time.

What carries the argument

Batch uniformization: the replacement of the sample-mean anomaly score loss with a weighted mean whose weights are the reciprocal of kernel density estimates obtained independently on each training mini-batch.

If this is right

Anomaly scores become more uniform across the range of normal-sound frequencies present in the training data.
No external labels or additional normal-sound data are required beyond what is already used for standard training.
The same weighting can be applied to any reconstruction-based or score-based DNN anomaly detector.
Detection performance improves on the sound anomaly detection tasks examined in the verification experiments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same density-weighting idea could be tested in image or sensor anomaly detection where normal examples also vary in frequency.
Performance may depend on mini-batch size because kernel density estimation is performed inside each batch.
If the density estimates are inaccurate, the weighting could amplify rather than reduce score variation among normal sounds.

Load-bearing premise

The probabilistic density of each sample can be accurately estimated by kernel density estimation performed independently on each training mini-batch, and weighting anomaly scores by the reciprocal of this density will produce constant anomaly scores for both frequent- and rare-normal sounds.

What would settle it

Running the standard training and the batch-uniformization training on the same dataset of normal sounds that contains both frequent and rare examples, then observing no improvement or a drop in detection AUC or similar metrics, would falsify the central claim.

Figures

Figures reproduced from arXiv: 1907.08338 by Masataka Yamaguchi, Noboru Harada, Shin Murata, Shoichiro Saito, Yuma Koizumi.

**Figure 2.** Figure 2: PDFs of x = (x1, x2) > calculated from AE’s anomaly score by (6) on each grid points in −3 ≤ x1, x2, ≤ 3. The dotted line denotes r = 2 (border-line between normal and anomaly). 3.2. Batch uniformization using kernel density estimation Training with (13) can be realized by roughly two ways: (i) selecting samples in a mini-batch so that the histogram of the mini-batch becomes uniform, a.k.a., mini-batch di… view at source ↗

**Figure 4.** Figure 4: Evaluation results. and the element-wise absolute value. Thus, the dimension of x was D = M × (2C + 1). The first input vector type (FCN40) used M = 40 and C = 5, and the second input vector type (FCN64) used M = 64 and C = 10. As an implementation for the gradient method, the AMSgrad [27] was used. We fix the learning rate for the initial 100 epochs and decrease it linearly between 100–200 epochs down to … view at source ↗

read the original abstract

Use of an autoencoder (AE) as a normal model is a state-of-the-art technique for unsupervised-anomaly detection in sounds (ADS). The AE is trained to minimize the sample mean of the anomaly score of normal sounds in a mini-batch. One problem with this approach is that the anomaly score of rare-normal sounds becomes higher than that of frequent-normal sounds, because the sample mean is strongly affected by frequent-normal samples, resulting in preferentially decreasing the anomaly score of frequent-normal samples. To decrease anomaly scores for both frequent- and rare-normal sounds, we propose batch uniformization, a training method for unsupervised-ADS for minimizing a weighted average of the anomaly score on each sample in a mini-batch. We used the reciprocal of the probabilistic density of each sample as the weight, more intuitively, a large weight is given for rare-normal sounds. Such a weight works to give a constant anomaly score for both frequent- and rare-normal sounds. Since the probabilistic density is unknown, we estimate it by using the kernel density estimation on each training mini-batch. Verification- and objective-experiments show that the proposed batch uniformization improves the performance of unsupervised-ADS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Batch uniformization tries to counter frequency bias in AE anomaly scores via inverse mini-batch KDE weights, but those density estimates are probably too noisy to deliver the claimed effect.

read the letter

The paper identifies a real training bias in autoencoder anomaly detection for sounds: minimizing the mean anomaly score lets frequent normal samples dominate, leaving rare normal samples with higher scores. Their fix is batch uniformization, which replaces the mean with a weighted average using the reciprocal of per-sample density estimated by KDE inside each mini-batch. This is meant to pull up the influence of rare samples so anomaly scores become more uniform across the normal class. The approach is a direct, concrete adjustment rather than a theoretical overhaul, and it is new in its packaging for this audio ADS setting. The motivation is clearly stated and the weighting rule follows logically from the problem description. Experiments are said to show gains, which at least suggests the method is worth testing in practice. The soft spot is the reliance on KDE density estimates computed independently per mini-batch. Audio features are high-dimensional and batches are small, so the estimates are dominated by bandwidth choice and noise rather than true relative densities. That undercuts the mechanism: the weights may not actually equalize scores as intended, and reported improvements could stem from incidental regularization instead. The abstract gives no numbers on bandwidth selection, baseline comparisons, or significance, so the evidence strength is hard to gauge from what's visible. This is for people already working on unsupervised sound anomaly detection with autoencoders who want a simple training tweak to try. It is coherent enough on its own terms to deserve referee time, though reviewers will need to see whether the KDE step holds up or if the gains are robust to that choice.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes batch uniformization for training autoencoders in unsupervised anomaly detection in sounds (ADS). The standard per-mini-batch mean anomaly score loss is replaced by a weighted average whose weights are the reciprocal of per-sample kernel density estimates obtained via KDE performed independently inside each mini-batch; the goal is to drive anomaly scores toward constancy for both frequent- and rare-normal sounds. Verification and objective experiments are stated to show performance gains over the baseline approach.

Significance. If the weighting mechanism functions as intended, the method would address a plausible bias in mean-based AE training that favors frequent normal samples. The per-batch KDE formulation is a lightweight, parameter-light modification that could be useful in imbalanced sound datasets; however, its practical significance hinges on whether the density estimates remain informative in the high-dimensional regimes typical of audio features.

major comments (2)

[Method description (abstract and §3)] The central mechanism assumes that per-mini-batch KDE supplies reliable relative densities for the reciprocal weighting to equalize anomaly scores. In high-dimensional audio feature spaces (spectrograms or filter banks) with typical small batch sizes, KDE is subject to the curse of dimensionality, rendering estimates noisy or bandwidth-dominated; this directly threatens whether the claimed uniformization occurs or whether any gains are incidental. This assumption is load-bearing for the paper's contribution.
[Experiments (abstract and §4/§5)] The abstract states that experiments demonstrate improvement but supplies no information on baselines, metrics (e.g., AUC, precision-recall), dataset splits, number of runs, or statistical significance testing. Without these details it is impossible to assess whether reported gains support the uniformization claim or arise from post-hoc choices.

minor comments (1)

[Abstract] The phrase 'probabilistic density' should be replaced by the standard term 'probability density' for terminological precision.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly where appropriate.

read point-by-point responses

Referee: [Method description (abstract and §3)] The central mechanism assumes that per-mini-batch KDE supplies reliable relative densities for the reciprocal weighting to equalize anomaly scores. In high-dimensional audio feature spaces (spectrograms or filter banks) with typical small batch sizes, KDE is subject to the curse of dimensionality, rendering estimates noisy or bandwidth-dominated; this directly threatens whether the claimed uniformization occurs or whether any gains are incidental. This assumption is load-bearing for the paper's contribution.

Authors: We acknowledge that KDE can be affected by the curse of dimensionality in high-dimensional feature spaces. Our method relies on relative densities within each mini-batch (not absolute densities), so only the ordering of samples matters for the reciprocal weights; this reduces sensitivity to absolute accuracy. Bandwidth was chosen via a rule-of-thumb on log-mel features. To strengthen the paper we will add a paragraph in §3 discussing bandwidth selection, a small sensitivity study, and the limitation that performance may degrade if batch size is too small relative to dimensionality. revision: yes
Referee: [Experiments (abstract and §4/§5)] The abstract states that experiments demonstrate improvement but supplies no information on baselines, metrics (e.g., AUC, precision-recall), dataset splits, number of runs, or statistical significance testing. Without these details it is impossible to assess whether reported gains support the uniformization claim or arise from post-hoc choices.

Authors: We agree the abstract is insufficiently informative. Sections 4 and 5 already specify the DCASE 2016/2017 datasets, log-mel features, AUC-ROC as the primary metric, comparison against the standard mean-loss AE baseline, and results averaged over 5 random seeds. We will revise the abstract to explicitly state the metric (AUC), the datasets used, and that gains are consistent across multiple runs. If statistical significance testing is not already reported in §5 we will add it (paired t-test or similar) as part of the revision. revision: yes

Circularity Check

0 steps flagged

No circularity: method is a proposed heuristic validated by experiment

full rationale

The paper introduces batch uniformization as a training procedure that weights anomaly scores by the reciprocal of per-mini-batch KDE density estimates. The abstract and description frame this as an algorithmic change whose benefit is demonstrated through verification and objective experiments on sound anomaly detection. No equations, derivations, or self-citations are presented that reduce the claimed performance gain to a quantity defined by the method itself or to a fitted parameter renamed as a prediction. The weighting scheme is explicitly constructed from the proposed procedure, but the improvement is treated as an empirical outcome rather than a mathematical identity. This matches the default case of a self-contained algorithmic contribution without load-bearing circular steps.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that mini-batch KDE yields usable density estimates for weighting and that the resulting weighted loss produces the desired uniform anomaly scores; no free parameters or invented entities are explicitly introduced beyond standard KDE.

free parameters (1)

KDE bandwidth
Bandwidth for kernel density estimation must be chosen or tuned; the abstract does not specify how it is set.

axioms (1)

domain assumption Kernel density estimation on individual mini-batches provides a sufficiently accurate estimate of the underlying sample density for weighting purposes.
Invoked to compute the reciprocal weights used in the proposed loss.

pith-pipeline@v0.9.0 · 5761 in / 1242 out tokens · 29663 ms · 2026-05-24T19:11:26.310697+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 2 internal anchors

[1]

unknown” anomalous sounds by utilizing only given normal sound, in contrast to supervised “Detection and Classiﬁcation of Acoustic Scenes and Events

INTRODUCTION Since anomalies might indicate mistakes or malicious activities, prompt detection of anomalies may prevent such problems. The use of microphones as sensors for anomaly detection, called anomaly detection in sounds (ADS) or acoustic condition monitoring, has been adopted in many applications such as audio surveillance [1–4], product inspection...

work page
[2]

Batch Uniformization for Minimizing Maximum Anomaly Score of DNN-based Anomaly Detection in Sounds

CONVENTIONAL METHOD 2.1. Unsupervised anomaly detection in sounds ADS is an identiﬁcation problem of determining whether the state of the target is a normal or an anomaly from the sound emitted from the target. Here, we deﬁne X = {xt ∈ RD}T t=1 is a time-series of acoustic features extracted from the observed sound. Here, T is the number of time-frames co...

work page internal anchor Pith review Pith/arXiv arXiv 1907
[3]

m/JDJYUHsHaIb6NRiCgr9Gzq7Tg=

PROPOSED METHOD Figure 1 shows an overview of the proposed method, batch uni- formization. The difference between the conventional and proposed methods is that the proposed method uses the weight calculation. In this section, we describe the basic principle and implementation of the weight calculation. 3.1. Basic principle To avoid FP detection, we need t...

work page 2019
[4]

l6BfYdy4qHCm7z8FfoaSKYRXs6E=

EXPERIMENTS We conducted a veriﬁcation experiment and an objective experi- ment. Batch uniformization (BU) was compared with two conven- tional methods: reconstruction error (RE) and simpliﬁed Neyman– Peason cost (SNP) [11] which are described in Sec. 2.2. Here, BU, RE, and SNP were trained using J BU θ , J RE θ , and J SNP θ , respectively. 4.1. Veriﬁcat...

work page
[5]

These scores were calculated on each grid point in −3 ≤ x1, x2, ≤ 3

Figures 2 and 3 show qθ(x) and A(x), respectively. These scores were calculated on each grid point in −3 ≤ x1, x2, ≤ 3. Table 1 shows the KLD from p(x) and U(x). The PDF of RE has high probability for both normal and anomaly, thus the anomalous samples were also reconstructed and their anomaly scores became small. Meanwhile, in the case of SNP, the probab...

work page 2019
[6]

The weighted average of the anomaly score was minimized, and the weight was deﬁned as the reciprocal of the probabilistic density of each sam- ple

CONCLUSIONS In this paper, we proposed batch uniformization, a training method for unsupervised- anomaly detection in sounds (ADS). The weighted average of the anomaly score was minimized, and the weight was deﬁned as the reciprocal of the probabilistic density of each sam- ple. We estimated it by using the kernel density estimation on each mini-batch. Ve...

work page 2019
[7]

Events Detection for an Audio-Based Surveillance System,

C. Clavel, T. Ehrette, and G. Richard “Events Detection for an Audio-Based Surveillance System,” in Proc. of IEEE Interna- tional Conference on Multimedia and Expo (ICME), 2005

work page 2005
[8]

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems,

G. Valenzise, L. Gerosa, M. Tagliasacchi, F. Antonacci, and A. Sarti, “Scream and Gunshot Detection and Localization for Audio-Surveillance Systems,” in Proc. of IEEE International Conference on Advanced Video and Signal-based Surveillance (A VSS), 2007

work page 2007
[9]

Proba- bilistic Novelty Detection for Acoustic Surveillance Under Real-World Conditions,

S. Ntalampiras, I. Potamitis, and N. Fakotakis “Proba- bilistic Novelty Detection for Acoustic Surveillance Under Real-World Conditions,” IEEE Transactions on Multimedia , pp.713–719, 2011

work page 2011
[10]

Audio Surveillance of Roads: A System for De- tecting Anomalous Sounds,

P. Foggia, N. Petkov, A. Saggese, N. Strisciuglio, and M. Vento, “Audio Surveillance of Roads: A System for De- tecting Anomalous Sounds,”IEEE Transactions on Intelligent Transportation Systems, pp.279–288, 2016

work page 2016
[11]

Inspection of Visi- ble and Invisible Features of Objects with Image and Sound Signal Processing,

A. Yamashita, T. Hara, and T. Kaneko, “Inspection of Visi- ble and Invisible Features of Objects with Image and Sound Signal Processing,” in Proc. of IEEE/RSJ International Con- ference on Intelligent Robots and Systems, (IROS), 2006

work page 2006
[12]

Assessing the Performance of a Semi-Automated Acoustic Monitoring System for Pri- mates,

S. Heinicke, A. K. Kalan, O. J. J. Wagner, R. Mundry, H. Lukashevich and H. S. K ¨uhl, “Assessing the Performance of a Semi-Automated Acoustic Monitoring System for Pri- mates,” Methods in Ecology and Evolutionpp.753–763, 2015

work page 2015
[13]

Optimiz- ing Acoustic Feature Extractor for Anomalous Sound Detec- tion Based on Neyman-Pearson Lemma,

Y . Koizumi, S. Saito, H. Uematsu, and N. Harada, “Optimiz- ing Acoustic Feature Extractor for Anomalous Sound Detec- tion Based on Neyman-Pearson Lemma,” in Proc. of Euro- pean Signal Processing Conference (EUSIPCO), 2017

work page 2017
[14]

SNIPER: Few-shot Learning for Anomaly Detection to Min- imize False-Negative Rate with Ensured True-Positive Rate,

Y . Koizumi, S. Murata, N. Harada, S. Saito, and H. Uematsu, “SNIPER: Few-shot Learning for Anomaly Detection to Min- imize False-Negative Rate with Ensured True-Positive Rate,” in Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019

work page 2019
[15]

DCASE 2017 Challenge Setup: Tasks, Datasets and Baseline system,

A. Mesaros, T. Heittola, A. Diment, B. Elizalde, A. Shah, E. Vincent, B. Raj, and T. Virtanen, “DCASE 2017 Challenge Setup: Tasks, Datasets and Baseline system,” in Proc. of De- tection and Classiﬁcation of Acoustic Scenes and Events chal- lenge (DCASE), 2017

work page 2017
[16]

Detection and Classiﬁca- tion of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge,

A. Mesaros, T. Heittola, E. Benetos, P. Foster, M. Lagrange, T. Virtanen, and M. D. Plumbley, “Detection and Classiﬁca- tion of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge,” IEEE/ACM Transactions on Audio Speech and Language Processing, 2018

work page 2016
[17]

Unsupervised Detection of Anomalous Sound based on Deep Learning and the Neyman-Pearson Lemma,

Y . Koizumi, S. Saito, H. Uematsu, Y . Kawachi, and N. Harada, “Unsupervised Detection of Anomalous Sound based on Deep Learning and the Neyman-Pearson Lemma,”IEEE/ACM Transactions on Audio Speech and Language Processing , 2019

work page 2019
[18]

A Survey of Outlier Detection Methodologies,

V . J. Hodge and J. Austin, “A Survey of Outlier Detection Methodologies,” Artiﬁcial Intelligence Review , pp. 85–126, 2004

work page 2004
[19]

An Overview of Anomaly Detec- tion Techniques: Existing Solutions and Latest Technological Trends,

A. Patcha and J. M. Park, “An Overview of Anomaly Detec- tion Techniques: Existing Solutions and Latest Technological Trends,” Journal Computer Networks, pp.3448–3470, 2007

work page 2007
[20]

Anomaly Detection: A Survey,

V . Chandola, A. Banerjee, and V . Kumar “Anomaly Detection: A Survey,”ACM Computing Surveys, 2009

work page 2009
[21]

Deep Learning for Anomaly Detection: A Survey

R. Chalapathy and S. Chawla “Deep Learning for Anomaly Detection: A Survey” arXiv preprint , arXiv:1901.03407, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901
[22]

A Novel Approach for Automatic Acoustic Nov- elty Detection using a Denoising Autoencoder with Bidirec- tional LSTM Neural Networks,

E. Marchi, F. Vesperini, F. Eyben, S. Squartini, and B. Schuller, “A Novel Approach for Automatic Acoustic Nov- elty Detection using a Denoising Autoencoder with Bidirec- tional LSTM Neural Networks,” in Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015

work page 2015
[23]

Structured Denoising Autoencoder for Fault Detection and Analysis,

T. Tagawa, Y . Tadokoro, and T. Yairi, “Structured Denoising Autoencoder for Fault Detection and Analysis,” Proceedings of Machine Learning Research, pp.96–111, 2015

work page 2015
[24]

Non-Linear Prediction with LSTM Recur- rent Neural Networks for Acoustic Novelty Detection,

E. Marchi, F. Vesperini, F. Weninger, F. Eyben, S. Squartini, and B. Schuller, “Non-Linear Prediction with LSTM Recur- rent Neural Networks for Acoustic Novelty Detection,” In Proc. of International Joint Conference on Neural Networks (IJCNN), 2015

work page 2015
[25]

How Can We Detect Anomalies from Subsampled Audio Signals?,

Y . Kawaguchi and T. Endo, “How Can We Detect Anomalies from Subsampled Audio Signals?,” in Proc. of IEEE Interna- tional Workshop on Machine Learning for Signal Processing (MLSP), 2017

work page 2017
[26]

Variational Autoencoder based Anomaly Detection using Reconstruction Probability,

J. An and S. Cho, “Variational Autoencoder based Anomaly Detection using Reconstruction Probability,” Technical Re- port. SNU Data Mining Center, pp.1–18, 2015

work page 2015
[27]

Complementary Set Variational Autoencoder for Supervised Anomaly Detection,

Y . Kawachi, Y . Koizumi, and N. Harada, “Complementary Set Variational Autoencoder for Supervised Anomaly Detection,” in Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018

work page 2018
[28]

A Two- Class Hyper-Spherical Autoencoder for Supervised Anomaly Detection,

Y . Kawachi, Y . Koizumi, S. Murata and N. Harada, “A Two- Class Hyper-Spherical Autoencoder for Supervised Anomaly Detection,” in Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019

work page 2019
[29]

AdaFlow: Domain-Adaptive Density Estimator with Application to Anomaly Detection and Unpaired Cross-Domain Transition,

M. Yamaguchi, Y . Koizumi, and N. Harada, “AdaFlow: Domain-Adaptive Density Estimator with Application to Anomaly Detection and Unpaired Cross-Domain Transition,” in Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019

work page 2019
[30]

Determinantal Point Processes for Mini-batch Diversiﬁcation,

C. Zhang, H. Kjellstrom, and S. Mandt, “Determinantal Point Processes for Mini-batch Diversiﬁcation,” in Proc. of Uncer- tainty in Artiﬁcial Intelligence, (UAI), 2017

work page 2017
[31]

Active Mini- Batch Sampling using Repulsive Point Processes,

C. Zhang, C. Oztireli, S. Mandt, G. Salvi, “Active Mini- Batch Sampling using Repulsive Point Processes,” in Proc. of The Thirty-Third AAAI Conference on Artiﬁcial Intelligence, (AAAI), 2019

work page 2019
[32]

Understanding the Difﬁculty of Training Deep Feedforward Neural Networks,

X. Glorot, and Y . Bengio, “Understanding the Difﬁculty of Training Deep Feedforward Neural Networks,”in Proc. of the Thirteenth International Conference on Artiﬁcial Intelligence and Statistics, (AISTAT), 2010

work page 2010
[33]

On the Con- vergence of Adam and Beyond,

S. J. Reddi, C. Oztireli, S. Kale, and S. Kumar, “On the Con- vergence of Adam and Beyond,” in Proc. of International Conference on Learning Representations, (ICLR), 2018

work page 2018
[34]

General-Purpose Tagging of Freesound Audio with Audioset Labels: Task Description, Dataset, and Baseline,

E. Fonseca, M. Plakal, F. Font, D. P. W. Ellis, X. Fa- vory, J. Pons, and X. Serra., “General-Purpose Tagging of Freesound Audio with Audioset Labels: Task Description, Dataset, and Baseline,” in Proc. of Detection and Classiﬁca- tion of Acoustic Scenes and Events (DCASE), 2018

work page 2018

[1] [1]

unknown” anomalous sounds by utilizing only given normal sound, in contrast to supervised “Detection and Classiﬁcation of Acoustic Scenes and Events

INTRODUCTION Since anomalies might indicate mistakes or malicious activities, prompt detection of anomalies may prevent such problems. The use of microphones as sensors for anomaly detection, called anomaly detection in sounds (ADS) or acoustic condition monitoring, has been adopted in many applications such as audio surveillance [1–4], product inspection...

work page

[2] [2]

Batch Uniformization for Minimizing Maximum Anomaly Score of DNN-based Anomaly Detection in Sounds

CONVENTIONAL METHOD 2.1. Unsupervised anomaly detection in sounds ADS is an identiﬁcation problem of determining whether the state of the target is a normal or an anomaly from the sound emitted from the target. Here, we deﬁne X = {xt ∈ RD}T t=1 is a time-series of acoustic features extracted from the observed sound. Here, T is the number of time-frames co...

work page internal anchor Pith review Pith/arXiv arXiv 1907

[3] [3]

m/JDJYUHsHaIb6NRiCgr9Gzq7Tg=

PROPOSED METHOD Figure 1 shows an overview of the proposed method, batch uni- formization. The difference between the conventional and proposed methods is that the proposed method uses the weight calculation. In this section, we describe the basic principle and implementation of the weight calculation. 3.1. Basic principle To avoid FP detection, we need t...

work page 2019

[4] [4]

l6BfYdy4qHCm7z8FfoaSKYRXs6E=

EXPERIMENTS We conducted a veriﬁcation experiment and an objective experi- ment. Batch uniformization (BU) was compared with two conven- tional methods: reconstruction error (RE) and simpliﬁed Neyman– Peason cost (SNP) [11] which are described in Sec. 2.2. Here, BU, RE, and SNP were trained using J BU θ , J RE θ , and J SNP θ , respectively. 4.1. Veriﬁcat...

work page

[5] [5]

These scores were calculated on each grid point in −3 ≤ x1, x2, ≤ 3

Figures 2 and 3 show qθ(x) and A(x), respectively. These scores were calculated on each grid point in −3 ≤ x1, x2, ≤ 3. Table 1 shows the KLD from p(x) and U(x). The PDF of RE has high probability for both normal and anomaly, thus the anomalous samples were also reconstructed and their anomaly scores became small. Meanwhile, in the case of SNP, the probab...

work page 2019

[6] [6]

The weighted average of the anomaly score was minimized, and the weight was deﬁned as the reciprocal of the probabilistic density of each sam- ple

CONCLUSIONS In this paper, we proposed batch uniformization, a training method for unsupervised- anomaly detection in sounds (ADS). The weighted average of the anomaly score was minimized, and the weight was deﬁned as the reciprocal of the probabilistic density of each sam- ple. We estimated it by using the kernel density estimation on each mini-batch. Ve...

work page 2019

[7] [7]

Events Detection for an Audio-Based Surveillance System,

C. Clavel, T. Ehrette, and G. Richard “Events Detection for an Audio-Based Surveillance System,” in Proc. of IEEE Interna- tional Conference on Multimedia and Expo (ICME), 2005

work page 2005

[8] [8]

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems,

G. Valenzise, L. Gerosa, M. Tagliasacchi, F. Antonacci, and A. Sarti, “Scream and Gunshot Detection and Localization for Audio-Surveillance Systems,” in Proc. of IEEE International Conference on Advanced Video and Signal-based Surveillance (A VSS), 2007

work page 2007

[9] [9]

Proba- bilistic Novelty Detection for Acoustic Surveillance Under Real-World Conditions,

S. Ntalampiras, I. Potamitis, and N. Fakotakis “Proba- bilistic Novelty Detection for Acoustic Surveillance Under Real-World Conditions,” IEEE Transactions on Multimedia , pp.713–719, 2011

work page 2011

[10] [10]

Audio Surveillance of Roads: A System for De- tecting Anomalous Sounds,

P. Foggia, N. Petkov, A. Saggese, N. Strisciuglio, and M. Vento, “Audio Surveillance of Roads: A System for De- tecting Anomalous Sounds,”IEEE Transactions on Intelligent Transportation Systems, pp.279–288, 2016

work page 2016

[11] [11]

Inspection of Visi- ble and Invisible Features of Objects with Image and Sound Signal Processing,

A. Yamashita, T. Hara, and T. Kaneko, “Inspection of Visi- ble and Invisible Features of Objects with Image and Sound Signal Processing,” in Proc. of IEEE/RSJ International Con- ference on Intelligent Robots and Systems, (IROS), 2006

work page 2006

[12] [12]

Assessing the Performance of a Semi-Automated Acoustic Monitoring System for Pri- mates,

S. Heinicke, A. K. Kalan, O. J. J. Wagner, R. Mundry, H. Lukashevich and H. S. K ¨uhl, “Assessing the Performance of a Semi-Automated Acoustic Monitoring System for Pri- mates,” Methods in Ecology and Evolutionpp.753–763, 2015

work page 2015

[13] [13]

Optimiz- ing Acoustic Feature Extractor for Anomalous Sound Detec- tion Based on Neyman-Pearson Lemma,

Y . Koizumi, S. Saito, H. Uematsu, and N. Harada, “Optimiz- ing Acoustic Feature Extractor for Anomalous Sound Detec- tion Based on Neyman-Pearson Lemma,” in Proc. of Euro- pean Signal Processing Conference (EUSIPCO), 2017

work page 2017

[14] [14]

SNIPER: Few-shot Learning for Anomaly Detection to Min- imize False-Negative Rate with Ensured True-Positive Rate,

Y . Koizumi, S. Murata, N. Harada, S. Saito, and H. Uematsu, “SNIPER: Few-shot Learning for Anomaly Detection to Min- imize False-Negative Rate with Ensured True-Positive Rate,” in Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019

work page 2019

[15] [15]

DCASE 2017 Challenge Setup: Tasks, Datasets and Baseline system,

A. Mesaros, T. Heittola, A. Diment, B. Elizalde, A. Shah, E. Vincent, B. Raj, and T. Virtanen, “DCASE 2017 Challenge Setup: Tasks, Datasets and Baseline system,” in Proc. of De- tection and Classiﬁcation of Acoustic Scenes and Events chal- lenge (DCASE), 2017

work page 2017

[16] [16]

Detection and Classiﬁca- tion of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge,

A. Mesaros, T. Heittola, E. Benetos, P. Foster, M. Lagrange, T. Virtanen, and M. D. Plumbley, “Detection and Classiﬁca- tion of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge,” IEEE/ACM Transactions on Audio Speech and Language Processing, 2018

work page 2016

[17] [17]

Unsupervised Detection of Anomalous Sound based on Deep Learning and the Neyman-Pearson Lemma,

Y . Koizumi, S. Saito, H. Uematsu, Y . Kawachi, and N. Harada, “Unsupervised Detection of Anomalous Sound based on Deep Learning and the Neyman-Pearson Lemma,”IEEE/ACM Transactions on Audio Speech and Language Processing , 2019

work page 2019

[18] [18]

A Survey of Outlier Detection Methodologies,

V . J. Hodge and J. Austin, “A Survey of Outlier Detection Methodologies,” Artiﬁcial Intelligence Review , pp. 85–126, 2004

work page 2004

[19] [19]

An Overview of Anomaly Detec- tion Techniques: Existing Solutions and Latest Technological Trends,

A. Patcha and J. M. Park, “An Overview of Anomaly Detec- tion Techniques: Existing Solutions and Latest Technological Trends,” Journal Computer Networks, pp.3448–3470, 2007

work page 2007

[20] [20]

Anomaly Detection: A Survey,

V . Chandola, A. Banerjee, and V . Kumar “Anomaly Detection: A Survey,”ACM Computing Surveys, 2009

work page 2009

[21] [21]

Deep Learning for Anomaly Detection: A Survey

R. Chalapathy and S. Chawla “Deep Learning for Anomaly Detection: A Survey” arXiv preprint , arXiv:1901.03407, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901

[22] [22]

A Novel Approach for Automatic Acoustic Nov- elty Detection using a Denoising Autoencoder with Bidirec- tional LSTM Neural Networks,

E. Marchi, F. Vesperini, F. Eyben, S. Squartini, and B. Schuller, “A Novel Approach for Automatic Acoustic Nov- elty Detection using a Denoising Autoencoder with Bidirec- tional LSTM Neural Networks,” in Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015

work page 2015

[23] [23]

Structured Denoising Autoencoder for Fault Detection and Analysis,

T. Tagawa, Y . Tadokoro, and T. Yairi, “Structured Denoising Autoencoder for Fault Detection and Analysis,” Proceedings of Machine Learning Research, pp.96–111, 2015

work page 2015

[24] [24]

Non-Linear Prediction with LSTM Recur- rent Neural Networks for Acoustic Novelty Detection,

E. Marchi, F. Vesperini, F. Weninger, F. Eyben, S. Squartini, and B. Schuller, “Non-Linear Prediction with LSTM Recur- rent Neural Networks for Acoustic Novelty Detection,” In Proc. of International Joint Conference on Neural Networks (IJCNN), 2015

work page 2015

[25] [25]

How Can We Detect Anomalies from Subsampled Audio Signals?,

Y . Kawaguchi and T. Endo, “How Can We Detect Anomalies from Subsampled Audio Signals?,” in Proc. of IEEE Interna- tional Workshop on Machine Learning for Signal Processing (MLSP), 2017

work page 2017

[26] [26]

Variational Autoencoder based Anomaly Detection using Reconstruction Probability,

J. An and S. Cho, “Variational Autoencoder based Anomaly Detection using Reconstruction Probability,” Technical Re- port. SNU Data Mining Center, pp.1–18, 2015

work page 2015

[27] [27]

Complementary Set Variational Autoencoder for Supervised Anomaly Detection,

Y . Kawachi, Y . Koizumi, and N. Harada, “Complementary Set Variational Autoencoder for Supervised Anomaly Detection,” in Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018

work page 2018

[28] [28]

A Two- Class Hyper-Spherical Autoencoder for Supervised Anomaly Detection,

Y . Kawachi, Y . Koizumi, S. Murata and N. Harada, “A Two- Class Hyper-Spherical Autoencoder for Supervised Anomaly Detection,” in Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019

work page 2019

[29] [29]

AdaFlow: Domain-Adaptive Density Estimator with Application to Anomaly Detection and Unpaired Cross-Domain Transition,

M. Yamaguchi, Y . Koizumi, and N. Harada, “AdaFlow: Domain-Adaptive Density Estimator with Application to Anomaly Detection and Unpaired Cross-Domain Transition,” in Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019

work page 2019

[30] [30]

Determinantal Point Processes for Mini-batch Diversiﬁcation,

C. Zhang, H. Kjellstrom, and S. Mandt, “Determinantal Point Processes for Mini-batch Diversiﬁcation,” in Proc. of Uncer- tainty in Artiﬁcial Intelligence, (UAI), 2017

work page 2017

[31] [31]

Active Mini- Batch Sampling using Repulsive Point Processes,

C. Zhang, C. Oztireli, S. Mandt, G. Salvi, “Active Mini- Batch Sampling using Repulsive Point Processes,” in Proc. of The Thirty-Third AAAI Conference on Artiﬁcial Intelligence, (AAAI), 2019

work page 2019

[32] [32]

Understanding the Difﬁculty of Training Deep Feedforward Neural Networks,

X. Glorot, and Y . Bengio, “Understanding the Difﬁculty of Training Deep Feedforward Neural Networks,”in Proc. of the Thirteenth International Conference on Artiﬁcial Intelligence and Statistics, (AISTAT), 2010

work page 2010

[33] [33]

On the Con- vergence of Adam and Beyond,

S. J. Reddi, C. Oztireli, S. Kale, and S. Kumar, “On the Con- vergence of Adam and Beyond,” in Proc. of International Conference on Learning Representations, (ICLR), 2018

work page 2018

[34] [34]

General-Purpose Tagging of Freesound Audio with Audioset Labels: Task Description, Dataset, and Baseline,

E. Fonseca, M. Plakal, F. Font, D. P. W. Ellis, X. Fa- vory, J. Pons, and X. Serra., “General-Purpose Tagging of Freesound Audio with Audioset Labels: Task Description, Dataset, and Baseline,” in Proc. of Detection and Classiﬁca- tion of Acoustic Scenes and Events (DCASE), 2018

work page 2018