pith. sign in

arxiv: 2605.02918 · v1 · submitted 2026-04-09 · 💻 cs.LG · cs.AI

Mitigating the reconstruction-detection trade-off in VAE-based unsupervised anomaly detection

Pith reviewed 2026-05-10 18:09 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords variational autoencoderunsupervised anomaly detectionbeta-VAEreconstruction qualitylatent space constraintsparse VAEbeta scheduling
0
0 comments X

The pith

β-VAE models for unsupervised anomaly detection exhibit a trade-off where stronger latent constraints improve detection but degrade reconstruction of normal data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that in variational autoencoders used for anomaly detection without labels, increasing the beta weighting on latent regularization produces models that score anomalies more accurately yet reconstruct normal samples less faithfully. This pattern stems from how tighter latent spaces separate the distributions of normal and abnormal inputs, which directly aids detection when scoring relies on reconstruction error. Variability across random seeds tracks the same latent separation distance. The authors test two adjustments—gradually ramping beta during training and switching to a Sparse VAE—and find the latter especially preserves high reconstruction quality while lifting detection metrics.

Core claim

Among β-VAE models trained for unsupervised anomaly detection, stronger latent-space constraints raise anomaly detection scores while lowering reconstruction quality on normal data. The distance between normal and abnormal latent distributions explains both the detection gains and the performance differences across random seeds. Beta scheduling and the Sparse VAE reduce the trade-off; the Sparse VAE in particular yields better detection without sacrificing reconstruction fidelity.

What carries the argument

The β-VAE latent regularization parameter that controls the separation between normal and abnormal latent distributions, which in turn governs both reconstruction fidelity and reconstruction-based anomaly scoring.

If this is right

  • Model selection based solely on reconstruction error of normal samples will often produce suboptimal anomaly detectors.
  • Sparse VAE architectures can deliver improved detection metrics while retaining reconstruction quality comparable to standard VAEs.
  • Performance differences across random seeds arise from variation in how well normal and abnormal latent distributions separate.
  • Gradual beta scheduling during training offers one practical route to balance reconstruction and detection objectives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar latent-constraint effects may appear in other generative models for anomaly detection and could be diagnosed by measuring distribution distances.
  • In deployment, monitoring both reconstruction error and latent separation statistics might provide a more reliable unsupervised proxy for selecting detectors.
  • The Sparse VAE approach could be combined with explicit distribution-separation losses to further decouple the two objectives.

Load-bearing premise

The observed trade-off between reconstruction quality and detection performance, along with the benefits of beta scheduling and Sparse VAE, holds for the specific models, datasets, and random seeds examined.

What would settle it

On a fresh dataset, training a series of β-VAEs with increasing beta values and finding that the model with the lowest normal-sample reconstruction error also achieves the highest anomaly detection AUC would falsify the claimed trade-off.

read the original abstract

Variational autoencoders are widely used for unsupervised anomaly detection. Model selection however remains an open-question: to remain fully unsupervised, hyperparameters are often chosen to minimize the reconstruction error on normal samples. In this paper, we reveal a trade-off between reconstruction quality and anomaly detection among $\beta$-VAE models. Models with constrained latent space reach higher detection metrics but lower reconstruction quality. We also assess the performance variability across random seeds and show it is linked to the distance between normal and abnormal latent distributions. From this analysis, we justify and investigate two methods to mitigate the reconstructiondetection tradeoff: beta-scheduling and the Sparse VAE. The latter especially shows an improvement in detection while maintaining high reconstruction quality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that β-VAE models for unsupervised anomaly detection exhibit a trade-off where constraining the latent space yields higher anomaly detection metrics at the expense of reconstruction quality. It further reports that performance variability across random seeds correlates with distances between normal and abnormal latent distributions, and proposes beta-scheduling together with a Sparse VAE variant as mitigations, with the Sparse VAE claimed to improve detection while preserving high reconstruction quality.

Significance. If the empirical patterns hold beyond the tested regimes, the work would provide practical guidance for hyperparameter selection in fully unsupervised VAE anomaly detection and clarify the role of latent-space constraints. The observed link between latent-distribution distances and seed variability offers a useful diagnostic, and the Sparse VAE mitigation constitutes a concrete, potentially adoptable contribution if its benefits prove robust.

major comments (2)
  1. [Experiments] Experiments section: the reported benefits of beta-scheduling and Sparse VAE are demonstrated only on the specific datasets, architectures, and anomaly types evaluated; without additional trials on out-of-distribution data regimes, larger model scales, or different anomaly characteristics, the central claim that these methods mitigate the trade-off in general remains unverified.
  2. [Analysis of variability] Analysis of seed variability: the link between performance variability and distances between normal/abnormal latent distributions is presented as correlational evidence, but the manuscript provides no controls or causal interventions to establish that this distance is the driving factor rather than a byproduct of the chosen training dynamics.
minor comments (2)
  1. [Abstract] The abstract refers to 'beta-scheduling' without specifying the exact schedule (e.g., linear, cosine) or the range of β values; this detail should appear in the methods section to ensure reproducibility.
  2. [Method] Notation for the sparsity level in the Sparse VAE is introduced without an explicit equation or hyperparameter table entry; adding a clear definition would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, indicating the revisions we will make to strengthen the work.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: the reported benefits of beta-scheduling and Sparse VAE are demonstrated only on the specific datasets, architectures, and anomaly types evaluated; without additional trials on out-of-distribution data regimes, larger model scales, or different anomaly characteristics, the central claim that these methods mitigate the trade-off in general remains unverified.

    Authors: We agree that the current experiments are confined to the reported datasets, architectures, and anomaly types, and that this limits strong claims of generality. The observed trade-off and the effectiveness of the mitigations are demonstrated consistently within these regimes, which include both image and non-image data. In the revised manuscript we will add a dedicated limitations paragraph, temper the language around generality, and include new experiments on an additional out-of-distribution regime and a larger model scale to provide further support for the proposed methods. revision: yes

  2. Referee: [Analysis of variability] Analysis of seed variability: the link between performance variability and distances between normal/abnormal latent distributions is presented as correlational evidence, but the manuscript provides no controls or causal interventions to establish that this distance is the driving factor rather than a byproduct of the chosen training dynamics.

    Authors: The manuscript presents the relationship as a correlation observed across random seeds and datasets; we have now clarified this wording explicitly. In the revision we add controls by re-training under varied optimization hyperparameters and regularization strengths while tracking the latent-distribution distances. We acknowledge that a direct causal intervention (for example, artificially constraining the distance) lies outside the present experimental design and would constitute a separate study; we therefore treat the distance metric as a useful diagnostic rather than a proven causal driver. revision: partial

Circularity Check

0 steps flagged

No circularity in empirical VAE trade-off analysis

full rationale

The paper's central claims rest on direct experimental evaluations of reconstruction error and anomaly detection metrics across beta-VAE variants, random seeds, and datasets. The observed trade-off, variability linked to latent distribution distances, and benefits of beta-scheduling plus Sparse VAE are reported from these comparisons without any mathematical derivation, fitted parameter renamed as prediction, or self-referential definition. No load-bearing self-citations or uniqueness theorems are invoked to force conclusions; results are presented as empirical observations open to generalization checks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The work rests on standard VAE assumptions for anomaly detection and introduces beta and sparsity as controllable factors.

free parameters (2)
  • beta
    Controls the weighting between reconstruction loss and KL divergence in β-VAE; its value or schedule directly affects the reported trade-off.
  • sparsity level
    Hyperparameter in Sparse VAE that determines how many latent dimensions are encouraged to be inactive.
axioms (1)
  • domain assumption Reconstruction error on normal samples can serve as a proxy for anomaly scoring in unsupervised settings.
    Core premise of VAE-based anomaly detection invoked throughout the abstract.

pith-pipeline@v0.9.0 · 5459 in / 1187 out tokens · 41131 ms · 2026-05-10T18:09:45.130884+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 1 internal anchor

  1. [1]

    INTRODUCTION Unsupervised anomaly detection (UAD) methods based on deep generative models are used to detect rare events, such as lesions in medical images, by learning a model on healthy data only. The limited availability of labeled data motivates the use of UAD methods, along with the potential that these methods enable the detection of various types o...

  2. [2]

    MA TERIALS 2.1. Variational Autoencoder In the β-V AE framework [11, 12], we aim to learn an en- coder qϕ(z|x) to map an observation x from a set of nor- mal observations X to a lower dimensional variable z ∈ Rd, and a decoder pθ(x|z) to map latent codes back to obser- vations. Classically, we set qϕ(z|x) = N (µ(x), σ(x)) and pθ(x|z) = N (D(z), I), where ...

  3. [3]

    We train 12 models with β ∈ [0.1, 1.0, 10, 100] and d ∈ [20, 64, 256]

    TRADE-OFF BETWEEN RECONSTRUCTION QUALITY AND ANOMALY DETECTION We explore the impact of constraining the latent space of β- V AEs through decreasing the latent dimensiond or increasing the β weight. We train 12 models with β ∈ [0.1, 1.0, 10, 100] and d ∈ [20, 64, 256]. In Figure 1, we report the anomaly de- tection performance (measured through the MSE ra...

  4. [4]

    STABILITY WITH RESPECT TO SEEDS For this baseline model ( d = 64 , β = 10 ), we study the variability of its performance across 30 random seeds and at- tempt to explain it by studying the distance between latent distributions of normal and abnormal images. In particular, we model the latent codes’ distributions for test-AD-30 and healthy test samples (tes...

  5. [5]

    BUILDING MORE ROBUST V AE We explore two methods for improving the smoothness of the model and mitigate the trade-off observed in Section 3. 5.1. Cyclical beta-scheduling during training The simplest way to improve the smoothness of the decoder is to increase the β weight on the KL regularization. Indeed, for diagonal posteriors qϕ(z|x) = N (µ(x), diag(σ(...

  6. [6]

    Overall, this model shows an improvement over the trade-off of β-V AEs as displayed in Figure 1

    With more severe anomalies (test-AD-50) the Sparse V AE outperforms all other models on all metrics. Overall, this model shows an improvement over the trade-off of β-V AEs as displayed in Figure 1. In Figure 2 , we further analyze this performance gain by showing that it comes with a smaller dis- tance between normal and abnormal latent distributions. The...

  7. [7]

    Second, we analyze the performance vari- ability across different training seeds and find that it is linked with the distance between healthy and abnormal latent dis- tributions

    CONCLUSION AND FUTURE WORKS We conduct an extensive empirical analysis ofβ-V AE models to highlight the trade-off between reconstruction quality and anomaly detection. Second, we analyze the performance vari- ability across different training seeds and find that it is linked with the distance between healthy and abnormal latent dis- tributions. This confi...

  8. [8]

    France 2030

    ACKNOWLEDGEMENTS This project was partly funded by the French government’s Agence Nationale de la Recherche under the “France 2030” program (ANR-23-IACL-0008, PRAIRIE-PSAI), the ANO- NEURO project (ANR-23-CE45-0005-01), the “Investisse- ments d’avenir” program (ANR-19-P3IA-0001); by the Eu- ropean Union’s Horizon Europe Framework Programme (grant 10113660...

  9. [9]

    Unsupervised abnormality detection in medical images with deep generative meth- ods,

    X. Chen and E. Konukoglu, “Unsupervised abnormality detection in medical images with deep generative meth- ods,” in Biomedical Image Synthesis and Simulation , pp. 303–324. Elsevier, 2022

  10. [10]

    Autoencoders for unsupervised anomaly seg- mentation in brain MR images: A comparative study,

    C. Baur, S. Denner, B. Wiestler, N. Navab, and S. Albar- qouni, “Autoencoders for unsupervised anomaly seg- mentation in brain MR images: A comparative study,” MedIA, vol. 69, pp. 101952, 2021-04

  11. [11]

    Regularized siamese neural network for unsupervised outlier detection on brain multiparametric magnetic res- onance imaging: Application to epilepsy lesion screen- ing,

    Z. Alaverdyan, J. Jung, R. Bouet, and C. Lartizien, “Regularized siamese neural network for unsupervised outlier detection on brain multiparametric magnetic res- onance imaging: Application to epilepsy lesion screen- ing,” MedIA, vol. 60, pp. 101618, 2020

  12. [12]

    Rethinking autoen- coders for medical anomaly detection from a theoretical perspective,

    Y . Cai, H. Chen, and K.-T. Cheng, “Rethinking autoen- coders for medical anomaly detection from a theoretical perspective,” in MICCAI, 2024, vol. LNCS 15011, pp. 544–554

  13. [13]

    Rethinking Reconstruction Autoencoder- Based Out-of-Distribution Detection,

    Y . Zhou, “Rethinking Reconstruction Autoencoder- Based Out-of-Distribution Detection,” in CVPR. 2022, pp. 7369–7377, IEEE

  14. [14]

    Enhancing reconstruction-based out-of-distribution de- tection in brain MRI with model and metric ensem- bles,

    E. M. C. Huijben, S. Amirrajab, and J. P. W. Pluim, “Enhancing reconstruction-based out-of-distribution de- tection in brain MRI with model and metric ensem- bles,” Comp. Meth. and Prog. in Biomed., vol. 272, pp. 109045, 2025

  15. [15]

    Unsupervised anomaly detection in brain FDG PET with deep genera- tive models: An experimental analysis of model vari- ability and mitigation strategies,

    M. Solal, P. Andr ´e, and N. Burgos, “Unsupervised anomaly detection in brain FDG PET with deep genera- tive models: An experimental analysis of model vari- ability and mitigation strategies,” in SPIE Medical Imaging, 2026

  16. [16]

    Benchmarking 3D generative autoencoders for pseudo-healthy reconstruction of brain 18F-FDG PET,

    R. Hassanaly, M. Solal, O. Colliot, N. Burgos, and ADNI, “Benchmarking 3D generative autoencoders for pseudo-healthy reconstruction of brain 18F-FDG PET,” JMI, vol. 12, no. 05, 2025

  17. [17]

    Cyclical annealing schedule: A simple ap- proach to mitigating KL vanishing,

    H. Fu, C. Li, X. Liu, J. Gao, A. Celikyilmaz, and L. Carin, “Cyclical annealing schedule: A simple ap- proach to mitigating KL vanishing,” in NAACL, 2019, pp. 240–250

  18. [18]

    Sparse multi-channel variational autoencoder for the joint analysis of heterogeneous data,

    L. Antelmi, N. Ayache, P. Robert, and M. Lorenzi, “Sparse multi-channel variational autoencoder for the joint analysis of heterogeneous data,” in ICML. 2019, vol. 97, pp. 302–311, PMLR

  19. [19]

    Auto-Encoding Varia- tional Bayes,

    D. P. Kingma and M. Welling, “Auto-Encoding Varia- tional Bayes,” 2013

  20. [20]

    Beta- V AE: Learning basic visual concepts with a constrained variational framework,

    I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner, “Beta- V AE: Learning basic visual concepts with a constrained variational framework,” in ICLR, 2017

  21. [21]

    Mul- tiVae: A Python package for multimodal variational au- toencoders on partial datasets,

    A. Senellart, C. Chadebec, and S. Allassonni `ere, “Mul- tiVae: A Python package for multimodal variational au- toencoders on partial datasets,” JOSS, vol. 10, no. 110, pp. 7996, 2025

  22. [22]

    Ways toward an early diagnosis in Alzheimer’s disease: The Alzheimer’s Disease Neu- roimaging Initiative (ADNI),

    S. G. Mueller, M. W. Weiner, L. J. Thal, R. C. Pe- tersen, C. R. Jack, W. Jagust, J. Q. Trojanowski, A. W. Toga, and L. Beckett, “Ways toward an early diagnosis in Alzheimer’s disease: The Alzheimer’s Disease Neu- roimaging Initiative (ADNI),”Alzheimer’s & Dementia, vol. 1, no. 1, pp. 55–66, 2005

  23. [23]

    Clinica: An open-source software platform for reproducible clinical neuroscience studies,

    A. Routier, N. Burgos, (...), M.-O. Habert, S. Durrle- man, and O. Colliot, “Clinica: An open-source software platform for reproducible clinical neuroscience studies,” Front. Neuroinform., vol. 15, pp. 689675, 2021

  24. [24]

    Image quality assessment: From error visibility to structural similarity,

    Zhou Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. on Image Process. , vol. 13, no. 4, pp. 600–612, 2004

  25. [25]

    Evaluation of pseudo-healthy image re- construction for anomaly detection with deep generative models: Application to brain FDG PET,

    R. Hassanaly, C. Brianceau, M. Solal, O. Colliot, and N. Burgos, “Evaluation of pseudo-healthy image re- construction for anomaly detection with deep generative models: Application to brain FDG PET,” MELBA, vol. 2, pp. 611–656, 2024

  26. [26]

    Contractive auto-encoders: explicit invariance during feature extraction,

    S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y . Bengio, “Contractive auto-encoders: explicit invariance during feature extraction,” in ICML. 2011, pp. 833–840, Om- nipress

  27. [27]

    Research on denoising sparse autoencoder,

    L. Meng, S. Ding, and Y . Xue, “Research on denoising sparse autoencoder,” Int. J. Mach. Learn. & Cyber., vol. 8, no. 5, pp. 1719–1729, 2017

  28. [28]

    A robust variational autoencoder using beta divergence,

    H. Akrami, A. A. Joshi, J. Li, S. Ayd ¨ore, and R. M. Leahy, “A robust variational autoencoder using beta divergence,” Knowledge-Based Systems, vol. 238, pp. 107886, 2022