Mitigating the reconstruction-detection trade-off in VAE-based unsupervised anomaly detection
Pith reviewed 2026-05-10 18:09 UTC · model grok-4.3
The pith
β-VAE models for unsupervised anomaly detection exhibit a trade-off where stronger latent constraints improve detection but degrade reconstruction of normal data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Among β-VAE models trained for unsupervised anomaly detection, stronger latent-space constraints raise anomaly detection scores while lowering reconstruction quality on normal data. The distance between normal and abnormal latent distributions explains both the detection gains and the performance differences across random seeds. Beta scheduling and the Sparse VAE reduce the trade-off; the Sparse VAE in particular yields better detection without sacrificing reconstruction fidelity.
What carries the argument
The β-VAE latent regularization parameter that controls the separation between normal and abnormal latent distributions, which in turn governs both reconstruction fidelity and reconstruction-based anomaly scoring.
If this is right
- Model selection based solely on reconstruction error of normal samples will often produce suboptimal anomaly detectors.
- Sparse VAE architectures can deliver improved detection metrics while retaining reconstruction quality comparable to standard VAEs.
- Performance differences across random seeds arise from variation in how well normal and abnormal latent distributions separate.
- Gradual beta scheduling during training offers one practical route to balance reconstruction and detection objectives.
Where Pith is reading between the lines
- Similar latent-constraint effects may appear in other generative models for anomaly detection and could be diagnosed by measuring distribution distances.
- In deployment, monitoring both reconstruction error and latent separation statistics might provide a more reliable unsupervised proxy for selecting detectors.
- The Sparse VAE approach could be combined with explicit distribution-separation losses to further decouple the two objectives.
Load-bearing premise
The observed trade-off between reconstruction quality and detection performance, along with the benefits of beta scheduling and Sparse VAE, holds for the specific models, datasets, and random seeds examined.
What would settle it
On a fresh dataset, training a series of β-VAEs with increasing beta values and finding that the model with the lowest normal-sample reconstruction error also achieves the highest anomaly detection AUC would falsify the claimed trade-off.
read the original abstract
Variational autoencoders are widely used for unsupervised anomaly detection. Model selection however remains an open-question: to remain fully unsupervised, hyperparameters are often chosen to minimize the reconstruction error on normal samples. In this paper, we reveal a trade-off between reconstruction quality and anomaly detection among $\beta$-VAE models. Models with constrained latent space reach higher detection metrics but lower reconstruction quality. We also assess the performance variability across random seeds and show it is linked to the distance between normal and abnormal latent distributions. From this analysis, we justify and investigate two methods to mitigate the reconstructiondetection tradeoff: beta-scheduling and the Sparse VAE. The latter especially shows an improvement in detection while maintaining high reconstruction quality.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that β-VAE models for unsupervised anomaly detection exhibit a trade-off where constraining the latent space yields higher anomaly detection metrics at the expense of reconstruction quality. It further reports that performance variability across random seeds correlates with distances between normal and abnormal latent distributions, and proposes beta-scheduling together with a Sparse VAE variant as mitigations, with the Sparse VAE claimed to improve detection while preserving high reconstruction quality.
Significance. If the empirical patterns hold beyond the tested regimes, the work would provide practical guidance for hyperparameter selection in fully unsupervised VAE anomaly detection and clarify the role of latent-space constraints. The observed link between latent-distribution distances and seed variability offers a useful diagnostic, and the Sparse VAE mitigation constitutes a concrete, potentially adoptable contribution if its benefits prove robust.
major comments (2)
- [Experiments] Experiments section: the reported benefits of beta-scheduling and Sparse VAE are demonstrated only on the specific datasets, architectures, and anomaly types evaluated; without additional trials on out-of-distribution data regimes, larger model scales, or different anomaly characteristics, the central claim that these methods mitigate the trade-off in general remains unverified.
- [Analysis of variability] Analysis of seed variability: the link between performance variability and distances between normal/abnormal latent distributions is presented as correlational evidence, but the manuscript provides no controls or causal interventions to establish that this distance is the driving factor rather than a byproduct of the chosen training dynamics.
minor comments (2)
- [Abstract] The abstract refers to 'beta-scheduling' without specifying the exact schedule (e.g., linear, cosine) or the range of β values; this detail should appear in the methods section to ensure reproducibility.
- [Method] Notation for the sparsity level in the Sparse VAE is introduced without an explicit equation or hyperparameter table entry; adding a clear definition would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, indicating the revisions we will make to strengthen the work.
read point-by-point responses
-
Referee: [Experiments] Experiments section: the reported benefits of beta-scheduling and Sparse VAE are demonstrated only on the specific datasets, architectures, and anomaly types evaluated; without additional trials on out-of-distribution data regimes, larger model scales, or different anomaly characteristics, the central claim that these methods mitigate the trade-off in general remains unverified.
Authors: We agree that the current experiments are confined to the reported datasets, architectures, and anomaly types, and that this limits strong claims of generality. The observed trade-off and the effectiveness of the mitigations are demonstrated consistently within these regimes, which include both image and non-image data. In the revised manuscript we will add a dedicated limitations paragraph, temper the language around generality, and include new experiments on an additional out-of-distribution regime and a larger model scale to provide further support for the proposed methods. revision: yes
-
Referee: [Analysis of variability] Analysis of seed variability: the link between performance variability and distances between normal/abnormal latent distributions is presented as correlational evidence, but the manuscript provides no controls or causal interventions to establish that this distance is the driving factor rather than a byproduct of the chosen training dynamics.
Authors: The manuscript presents the relationship as a correlation observed across random seeds and datasets; we have now clarified this wording explicitly. In the revision we add controls by re-training under varied optimization hyperparameters and regularization strengths while tracking the latent-distribution distances. We acknowledge that a direct causal intervention (for example, artificially constraining the distance) lies outside the present experimental design and would constitute a separate study; we therefore treat the distance metric as a useful diagnostic rather than a proven causal driver. revision: partial
Circularity Check
No circularity in empirical VAE trade-off analysis
full rationale
The paper's central claims rest on direct experimental evaluations of reconstruction error and anomaly detection metrics across beta-VAE variants, random seeds, and datasets. The observed trade-off, variability linked to latent distribution distances, and benefits of beta-scheduling plus Sparse VAE are reported from these comparisons without any mathematical derivation, fitted parameter renamed as prediction, or self-referential definition. No load-bearing self-citations or uniqueness theorems are invoked to force conclusions; results are presented as empirical observations open to generalization checks.
Axiom & Free-Parameter Ledger
free parameters (2)
- beta
- sparsity level
axioms (1)
- domain assumption Reconstruction error on normal samples can serve as a proxy for anomaly scoring in unsupervised settings.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.lean (Jcost uniqueness, Aczél classification)washburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Models with constrained latent space reach higher detection metrics but lower reconstruction quality... beta-scheduling and the Sparse VAE
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean (LogicNat orbit, embed injectivity)embed_injective unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
performance variability across random seeds... linked to the distance between normal and abnormal latent distributions
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
INTRODUCTION Unsupervised anomaly detection (UAD) methods based on deep generative models are used to detect rare events, such as lesions in medical images, by learning a model on healthy data only. The limited availability of labeled data motivates the use of UAD methods, along with the potential that these methods enable the detection of various types o...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[2]
MA TERIALS 2.1. Variational Autoencoder In the β-V AE framework [11, 12], we aim to learn an en- coder qϕ(z|x) to map an observation x from a set of nor- mal observations X to a lower dimensional variable z ∈ Rd, and a decoder pθ(x|z) to map latent codes back to obser- vations. Classically, we set qϕ(z|x) = N (µ(x), σ(x)) and pθ(x|z) = N (D(z), I), where ...
-
[3]
We train 12 models with β ∈ [0.1, 1.0, 10, 100] and d ∈ [20, 64, 256]
TRADE-OFF BETWEEN RECONSTRUCTION QUALITY AND ANOMALY DETECTION We explore the impact of constraining the latent space of β- V AEs through decreasing the latent dimensiond or increasing the β weight. We train 12 models with β ∈ [0.1, 1.0, 10, 100] and d ∈ [20, 64, 256]. In Figure 1, we report the anomaly de- tection performance (measured through the MSE ra...
-
[4]
STABILITY WITH RESPECT TO SEEDS For this baseline model ( d = 64 , β = 10 ), we study the variability of its performance across 30 random seeds and at- tempt to explain it by studying the distance between latent distributions of normal and abnormal images. In particular, we model the latent codes’ distributions for test-AD-30 and healthy test samples (tes...
-
[5]
BUILDING MORE ROBUST V AE We explore two methods for improving the smoothness of the model and mitigate the trade-off observed in Section 3. 5.1. Cyclical beta-scheduling during training The simplest way to improve the smoothness of the decoder is to increase the β weight on the KL regularization. Indeed, for diagonal posteriors qϕ(z|x) = N (µ(x), diag(σ(...
work page 2026
-
[6]
Overall, this model shows an improvement over the trade-off of β-V AEs as displayed in Figure 1
With more severe anomalies (test-AD-50) the Sparse V AE outperforms all other models on all metrics. Overall, this model shows an improvement over the trade-off of β-V AEs as displayed in Figure 1. In Figure 2 , we further analyze this performance gain by showing that it comes with a smaller dis- tance between normal and abnormal latent distributions. The...
-
[7]
CONCLUSION AND FUTURE WORKS We conduct an extensive empirical analysis ofβ-V AE models to highlight the trade-off between reconstruction quality and anomaly detection. Second, we analyze the performance vari- ability across different training seeds and find that it is linked with the distance between healthy and abnormal latent dis- tributions. This confi...
-
[8]
ACKNOWLEDGEMENTS This project was partly funded by the French government’s Agence Nationale de la Recherche under the “France 2030” program (ANR-23-IACL-0008, PRAIRIE-PSAI), the ANO- NEURO project (ANR-23-CE45-0005-01), the “Investisse- ments d’avenir” program (ANR-19-P3IA-0001); by the Eu- ropean Union’s Horizon Europe Framework Programme (grant 10113660...
work page 2030
-
[9]
Unsupervised abnormality detection in medical images with deep generative meth- ods,
X. Chen and E. Konukoglu, “Unsupervised abnormality detection in medical images with deep generative meth- ods,” in Biomedical Image Synthesis and Simulation , pp. 303–324. Elsevier, 2022
work page 2022
-
[10]
Autoencoders for unsupervised anomaly seg- mentation in brain MR images: A comparative study,
C. Baur, S. Denner, B. Wiestler, N. Navab, and S. Albar- qouni, “Autoencoders for unsupervised anomaly seg- mentation in brain MR images: A comparative study,” MedIA, vol. 69, pp. 101952, 2021-04
work page 2021
-
[11]
Z. Alaverdyan, J. Jung, R. Bouet, and C. Lartizien, “Regularized siamese neural network for unsupervised outlier detection on brain multiparametric magnetic res- onance imaging: Application to epilepsy lesion screen- ing,” MedIA, vol. 60, pp. 101618, 2020
work page 2020
-
[12]
Rethinking autoen- coders for medical anomaly detection from a theoretical perspective,
Y . Cai, H. Chen, and K.-T. Cheng, “Rethinking autoen- coders for medical anomaly detection from a theoretical perspective,” in MICCAI, 2024, vol. LNCS 15011, pp. 544–554
work page 2024
-
[13]
Rethinking Reconstruction Autoencoder- Based Out-of-Distribution Detection,
Y . Zhou, “Rethinking Reconstruction Autoencoder- Based Out-of-Distribution Detection,” in CVPR. 2022, pp. 7369–7377, IEEE
work page 2022
-
[14]
E. M. C. Huijben, S. Amirrajab, and J. P. W. Pluim, “Enhancing reconstruction-based out-of-distribution de- tection in brain MRI with model and metric ensem- bles,” Comp. Meth. and Prog. in Biomed., vol. 272, pp. 109045, 2025
work page 2025
-
[15]
M. Solal, P. Andr ´e, and N. Burgos, “Unsupervised anomaly detection in brain FDG PET with deep genera- tive models: An experimental analysis of model vari- ability and mitigation strategies,” in SPIE Medical Imaging, 2026
work page 2026
-
[16]
Benchmarking 3D generative autoencoders for pseudo-healthy reconstruction of brain 18F-FDG PET,
R. Hassanaly, M. Solal, O. Colliot, N. Burgos, and ADNI, “Benchmarking 3D generative autoencoders for pseudo-healthy reconstruction of brain 18F-FDG PET,” JMI, vol. 12, no. 05, 2025
work page 2025
-
[17]
Cyclical annealing schedule: A simple ap- proach to mitigating KL vanishing,
H. Fu, C. Li, X. Liu, J. Gao, A. Celikyilmaz, and L. Carin, “Cyclical annealing schedule: A simple ap- proach to mitigating KL vanishing,” in NAACL, 2019, pp. 240–250
work page 2019
-
[18]
Sparse multi-channel variational autoencoder for the joint analysis of heterogeneous data,
L. Antelmi, N. Ayache, P. Robert, and M. Lorenzi, “Sparse multi-channel variational autoencoder for the joint analysis of heterogeneous data,” in ICML. 2019, vol. 97, pp. 302–311, PMLR
work page 2019
-
[19]
Auto-Encoding Varia- tional Bayes,
D. P. Kingma and M. Welling, “Auto-Encoding Varia- tional Bayes,” 2013
work page 2013
-
[20]
Beta- V AE: Learning basic visual concepts with a constrained variational framework,
I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner, “Beta- V AE: Learning basic visual concepts with a constrained variational framework,” in ICLR, 2017
work page 2017
-
[21]
Mul- tiVae: A Python package for multimodal variational au- toencoders on partial datasets,
A. Senellart, C. Chadebec, and S. Allassonni `ere, “Mul- tiVae: A Python package for multimodal variational au- toencoders on partial datasets,” JOSS, vol. 10, no. 110, pp. 7996, 2025
work page 2025
-
[22]
S. G. Mueller, M. W. Weiner, L. J. Thal, R. C. Pe- tersen, C. R. Jack, W. Jagust, J. Q. Trojanowski, A. W. Toga, and L. Beckett, “Ways toward an early diagnosis in Alzheimer’s disease: The Alzheimer’s Disease Neu- roimaging Initiative (ADNI),”Alzheimer’s & Dementia, vol. 1, no. 1, pp. 55–66, 2005
work page 2005
-
[23]
Clinica: An open-source software platform for reproducible clinical neuroscience studies,
A. Routier, N. Burgos, (...), M.-O. Habert, S. Durrle- man, and O. Colliot, “Clinica: An open-source software platform for reproducible clinical neuroscience studies,” Front. Neuroinform., vol. 15, pp. 689675, 2021
work page 2021
-
[24]
Image quality assessment: From error visibility to structural similarity,
Zhou Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. on Image Process. , vol. 13, no. 4, pp. 600–612, 2004
work page 2004
-
[25]
R. Hassanaly, C. Brianceau, M. Solal, O. Colliot, and N. Burgos, “Evaluation of pseudo-healthy image re- construction for anomaly detection with deep generative models: Application to brain FDG PET,” MELBA, vol. 2, pp. 611–656, 2024
work page 2024
-
[26]
Contractive auto-encoders: explicit invariance during feature extraction,
S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y . Bengio, “Contractive auto-encoders: explicit invariance during feature extraction,” in ICML. 2011, pp. 833–840, Om- nipress
work page 2011
-
[27]
Research on denoising sparse autoencoder,
L. Meng, S. Ding, and Y . Xue, “Research on denoising sparse autoencoder,” Int. J. Mach. Learn. & Cyber., vol. 8, no. 5, pp. 1719–1729, 2017
work page 2017
-
[28]
A robust variational autoencoder using beta divergence,
H. Akrami, A. A. Joshi, J. Li, S. Ayd ¨ore, and R. M. Leahy, “A robust variational autoencoder using beta divergence,” Knowledge-Based Systems, vol. 238, pp. 107886, 2022
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.