pith. sign in

arxiv: 2606.00635 · v1 · pith:V6OZ4GRVnew · submitted 2026-05-30 · 💻 cs.LG

How Neural Losses Shape VAE Latents

Pith reviewed 2026-06-28 18:52 UTC · model grok-4.3

classification 💻 cs.LG
keywords VAElatent space geometryperceptual lossadversarial lossrate-distortion tradeoffposterior varianceisotropic representationsneural reconstruction
0
0 comments X

The pith

Augmenting VAE reconstruction with perceptual and adversarial losses reduces information stored in the latent representations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the choice of reconstruction loss in VAEs fundamentally alters the rate-distortion optimization. Adding neural terms such as perceptual and adversarial objectives leads to lower information content in the latents compared to pointwise likelihood alone. These losses also reshape the latent geometry, producing more isotropic representations where uncertainty is spread more evenly across dimensions. A reader would care because this explains observed differences in VAE behavior that are not visible from output quality or standard rate-distortion analysis.

Core claim

Augmenting pointwise reconstruction with neural terms reduces the amount of information stored in the latent representations. Neural reconstruction losses systematically change the geometry of the latent space: they make representations more isotropic and distribute uncertainty more evenly across latent dimensions, producing different posterior variance profiles. The rate-distortion tradeoff is not a comprehensive lens to understand VAE behavior.

What carries the argument

the rate-distortion optimization problem, reshaped by the choice of distortion metric from pointwise to neural reconstruction losses

If this is right

  • Neural losses produce different posterior variance profiles than pointwise reconstruction.
  • The standard rate-distortion lens fails to capture how distortion metric choice affects learned representations.
  • A mechanistic investigation of how each distortion metric reshapes the optimization is required instead.
  • Latent space properties can be steered by loss choice without changing the model architecture.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This suggests that practitioners could select losses to achieve desired latent properties like isotropy for downstream tasks such as interpolation.
  • Similar effects may appear in other generative models that combine reconstruction with perceptual objectives.
  • The findings motivate experiments that isolate the contribution of each neural loss term to the observed geometry changes.

Load-bearing premise

The observed changes in information content and latent geometry are caused by the neural losses altering the rate-distortion problem rather than by optimizer dynamics, regularization schedules, or other training details.

What would settle it

Training identical VAE architectures with neural losses but under controlled optimizer and schedule conditions that match the pointwise baseline, then measuring whether the reduction in latent information and increase in isotropy still occur.

Figures

Figures reproduced from arXiv: 2606.00635 by Emanuele Rodol\`a, Giorgio Strano, Luca Cerovaz, Michele Mancusi, Tommaso Mencattini.

Figure 1
Figure 1. Figure 1: Geometric decomposition of the VAE KL term. The gray wireframe is the prior [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Rate-Distortion curve traced by β. If the decoder likelihood is Gaussian with fixed variance, the distortion term reduces to scaled squared error ∥x−p(x|z)∥ 2 2 . More broadly, a chosen likelihood family induces a particular reconstruction loss. In modern practice, this term is often augmented or replaced with perceptual or adversarial losses, breaking the standard pixel-squared-error assumption that Secti… view at source ↗
Figure 3
Figure 3. Figure 3: Experimental results supporting the claim in Section 3 for the [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Rate reached at convergence as a function of λ for the pythae VAE trained on CelebA with adversarial loss. To test robustness, we vary model architecture (a tra￾ditional VAE from pythae [6] and AutoencoderKL from diffusers [33]), dataset (CelebA [24] and Tiny-ImageNet [12]), and the family of neural dis￾tortions (LPIPS [36] and DINOv2 features [28] as perceptual losses, and a PatchGAN hinge loss with featu… view at source ↗
Figure 5
Figure 5. Figure 5: Rate-matched training of the pythae VAE on CelebA, using the perceptual loss LPIPS [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Equivalence classes under pixel vs. perceptual distortion. Dashed curves enclose equivalent perturbations under each loss. We repeat the experiment across two dataset-model combinations, two neural losses (perceptual and dis￾criminative), and 4 fixed target rate values. For each combination, we sweep λ and measure the average per-sample posterior anisotropy Apost [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Toy model at matched rate: posterior standard deviation per latent dimension. The mechanism behind this counter-intuitive direction can be read off the water-filling formula (8) combined with how neural losses act on pixel space. Natural-image datasets are highly anisotropic in pixel space: a small number of principal components capture most of the dataset’s variance. Since pixel SSE penalizes every pixel … view at source ↗
Figure 8
Figure 8. Figure 8: Experimental results supporting the claim in Section 3.2 for the AutoencoderKL architecture, [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Experimental results supporting the claim in Section 3 for the AutoencoderKL architecture [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Experimental results supporting the claim in Section 3 for the AutoencoderKL architecture [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Rate-matched training of the AutoencoderKL architecture on Tiny-ImageNet using the [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Reconstructions of the same input as the weight [PITH_FULL_IMAGE:figures/full_fig_p014_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Toy linear-Gaussian model: per-dimension certainty at matched rate. We instantiate the construction of Section D.2 with D = 16, W = ID and compare the optimal posterior variances s ⋆ i (M, β) obtained for the isotropic metric Miso and the anisotropic metric Maniso at a common variance-KL operating point, i.e., for βiso and βaniso such that P i g(s ⋆ i ) matches the same target level. For the isotropic met… view at source ↗
read the original abstract

Modern VAEs are rarely trained with the pointwise likelihood implied by the standard $\beta$-VAE objective. In practice, pointwise reconstruction is often combined with perceptual and adversarial losses, despite a lack of understanding of how this changes the latent dynamics of the model. We show that the choice of reconstruction loss reshapes the rate-distortion problem itself, altering both the information content and the geometry of the learned latent space in ways that may be invisible from reconstructions alone. First, we prove and verify empirically that augmenting pointwise reconstruction with neural terms, such as perceptual and adversarial objectives, reduces the amount of information stored in the latent representations. Second, we show that neural reconstruction losses systematically change the geometry of the latent space: they make representations more isotropic and distribute uncertainty more evenly across latent dimensions, producing different posterior variance profiles. These findings highlight how the rate-distortion tradeoff is not a comprehensive lens to understand the behavior of VAEs, and we propose a more mechanistic approach to investigate how the choice of a distortion metric reshapes the optimization problem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that replacing or augmenting pointwise reconstruction in VAEs with neural losses (perceptual, adversarial) reshapes the underlying rate-distortion objective. It proves that such augmentation reduces the mutual information stored in the latent variables and empirically demonstrates that the resulting posteriors become more isotropic with flatter variance profiles across dimensions. The work concludes that the standard rate-distortion lens is insufficient and advocates a mechanistic view of how the distortion metric alters optimization.

Significance. If the theoretical reduction in latent information and the geometric effects are robustly isolated from training artifacts, the result would be significant: it supplies both a proof and concrete empirical signatures (isotropy, variance profiles) showing that widely used perceptual/adversarial objectives change VAE latents in ways invisible to reconstruction metrics alone. This would motivate new analysis tools beyond β-VAE theory and affect how reconstruction losses are chosen in practice.

major comments (2)
  1. [Empirical verification sections] The central empirical claim—that observed isotropy and even uncertainty distribution arise from rate-distortion reshaping rather than optimizer dynamics, regularization schedules, or implementation details—lacks the necessary isolation experiments. No description of matched hyperparameter sweeps, fixed-optimizer ablations, or controlled training procedures is referenced, leaving open the possibility that the geometry changes are artifacts of those factors rather than the modified objective.
  2. [Theoretical proof section] The proof that neural augmentation of the distortion term reduces latent mutual information is load-bearing for the first claim. Without the explicit derivation steps, assumptions on the form of the neural loss, and verification that the reduction holds independently of the variational family or optimization path, it is impossible to assess whether the result is parameter-free or relies on implicit regularizers introduced by the neural terms.
minor comments (1)
  1. Notation for the augmented distortion term and the precise definition of 'neural reconstruction loss' should be introduced early and used consistently to avoid ambiguity between perceptual, adversarial, and other neural objectives.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for strengthening the empirical isolation and theoretical presentation. We address each major comment below and will incorporate revisions to improve clarity and robustness.

read point-by-point responses
  1. Referee: [Empirical verification sections] The central empirical claim—that observed isotropy and even uncertainty distribution arise from rate-distortion reshaping rather than optimizer dynamics, regularization schedules, or implementation details—lacks the necessary isolation experiments. No description of matched hyperparameter sweeps, fixed-optimizer ablations, or controlled training procedures is referenced, leaving open the possibility that the geometry changes are artifacts of those factors rather than the modified objective.

    Authors: We agree that additional controls are needed to more rigorously isolate the contribution of the modified distortion metric. In the revised version, we will add a dedicated subsection detailing matched hyperparameter sweeps (e.g., identical learning rates, batch sizes, and optimizer settings across loss variants), fixed-optimizer ablations, and explicit descriptions of the controlled training procedures used. These will demonstrate that the isotropy and variance profile changes persist under matched conditions. revision: yes

  2. Referee: [Theoretical proof section] The proof that neural augmentation of the distortion term reduces latent mutual information is load-bearing for the first claim. Without the explicit derivation steps, assumptions on the form of the neural loss, and verification that the reduction holds independently of the variational family or optimization path, it is impossible to assess whether the result is parameter-free or relies on implicit regularizers introduced by the neural terms.

    Authors: We acknowledge that the proof section would benefit from greater explicitness. The manuscript currently presents a high-level argument; the revision will include the full step-by-step derivation, state the assumptions on the neural loss (specifically that it depends on the reconstruction output but introduces no direct latent dependence beyond the decoder), and add a short verification argument showing the mutual information reduction holds under the variational bound independently of the optimization trajectory. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims derived from modified objective and independent verification

full rationale

The paper derives its central results from the standard VAE rate-distortion objective after augmenting the distortion term with neural losses, proving reduced mutual information directly from the modified objective and verifying geometry changes empirically. No load-bearing steps reduce by construction to fitted parameters, self-citations, or renamed inputs; the proof and observations are self-contained against standard VAE theory without circular reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard VAE theory and information-theoretic concepts without introducing new free parameters, axioms beyond background math, or invented entities.

axioms (1)
  • standard math The standard beta-VAE ELBO and rate-distortion formulation applies as the baseline for comparison.
    The abstract frames all claims relative to the pointwise reconstruction implied by the beta-VAE objective.

pith-pipeline@v0.9.1-grok · 5721 in / 1225 out tokens · 22337 ms · 2026-06-28T18:52:40.238542+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 22 canonical work pages · 13 internal anchors

  1. [1]

    Alemi, Ben Poole, Ian Fischer, Joshua V

    Alexander A. Alemi, Ben Poole, Ian Fischer, Joshua V . Dillon, Rif A. Saurous, and Kevin Murphy. Fixing a broken ELBO. In Jennifer G. Dy and Andreas Krause, editors,Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 ofProceedings of Machine Learning Research, pa...

  2. [2]

    Zhang, Michael Ruan, Eric Wang, So Hasegawa, Jimmy Ba, and Roger Grosse

    Juhan Bae, Michael R. Zhang, Michael Ruan, Eric Wang, So Hasegawa, Jimmy Ba, and Roger Grosse. Multi-rate vae: Train once, get the full rate-distortion curve, 2023. URL https://arxiv.org/abs/2212.03905

  3. [3]

    The perception-distortion tradeoff

    Yochai Blau and Tomer Michaeli. The perception-distortion tradeoff. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6228–6237, 2018. doi: 10. 1109/CVPR.2018.00652

  4. [4]

    Rethinking lossy compression: The rate-distortion-perception tradeoff, 2019

    Yochai Blau and Tomer Michaeli. Rethinking lossy compression: The rate-distortion-perception tradeoff, 2019. URLhttps://arxiv.org/abs/1901.07821

  5. [5]

    Understanding disentangling in $\beta$-VAE

    Christopher P. Burgess, Irina Higgins, Arka Pal, Loic Matthey, Nick Watters, Guillaume Desjardins, and Alexander Lerchner. Understanding disentangling in β-vae, 2018. URL https://arxiv.org/abs/1804.03599

  6. [6]

    Pythae: Unifying generative autoencoders in python - a benchmarking use case

    Clément Chadebec, Louis Vincent, and Stephanie Allassonniere. Pythae: Unifying generative autoencoders in python - a benchmarking use case. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors,Advances in Neural Information Processing Systems, volume 35, pages 21575–21589. Curran Associates, Inc., 2022

  7. [7]

    Masked autoencoders are effective tokenizers for diffusion models

    Hao Chen, Yujin Han, Fangyi Chen, Xiang Li, Yidong Wang, Jindong Wang, Ze Wang, Zicheng Liu, Difan Zou, and Bhiksha Raj. Masked autoencoders are effective tokenizers for diffusion models. InForty-second International Conference on Machine Learning, 2025

  8. [8]

    Ricky T. Q. Chen, Xuechen Li, Roger Grosse, and David Duvenaud. Isolating sources of disentanglement in variational autoencoders, 2019. URL https://arxiv.org/abs/1802. 04942

  9. [9]

    Variational Lossy Autoencoder

    Xi Chen, Diederik P. Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, and Pieter Abbeel. Variational lossy autoencoder, 2017. URL https://arxiv. org/abs/1611.02731

  10. [10]

    Wiley, 2nd editio edition, 2009

    Thomas Cover and Joy Thomas.Elements of Information Theory. Wiley, 2nd editio edition, 2009

  11. [11]

    High Fidelity Neural Audio Compression

    Alexandre Défossez, Jade Copet, Gabriel Synnaeve, and Yossi Adi. High fidelity neural audio compression.arXiv preprint arXiv:2210.13438, 2022

  12. [12]

    Imagenet: A large- scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. IEEE, 2009

  13. [13]

    Generative modelling in latent space, 2025

    Sander Dieleman. Generative modelling in latent space, 2025. URL https://sander.ai/ 2025/04/15/latents.html

  14. [14]

    The geometry of efficient codes: How rate-distortion trade-offs distort the latent representations of generative models.PLOS Computational Biology, 21(5):1–30, 05 2025

    Leo D’Amato, Gian Luca Lancia, and Giovanni Pezzulo. The geometry of efficient codes: How rate-distortion trade-offs distort the latent representations of generative models.PLOS Computational Biology, 21(5):1–30, 05 2025. doi: 10.1371/journal.pcbi.1012952. URL https://doi.org/10.1371/journal.pcbi.1012952

  15. [15]

    beta-V AE: Learning basic visual concepts with a constrained variational framework

    Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. beta-V AE: Learning basic visual concepts with a constrained variational framework. InInternational Conference on Learning Representations,

  16. [16]

    URLhttps://openreview.net/forum?id=Sy2fzU9gl. 10

  17. [17]

    In: IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022, Waikoloa, HI, USA, January 3-8, 2022

    Xianxu Hou, Linlin Shen, Ke Sun, and Guoping Qiu. Deep feature consistent variational autoencoder. In2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1133–1141, 2017. doi: 10.1109/W ACV .2017.131

  18. [18]

    Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. Image-to-image translation with conditional adversarial networks, 2018. URLhttps://arxiv.org/abs/1611.07004

  19. [19]

    Perceptual Losses for Real-Time Style Transfer and Super-Resolution

    Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution, 2016. URLhttps://arxiv.org/abs/1603.08155

  20. [20]

    Auto-Encoding Variational Bayes

    Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. In Yoshua Bengio and Yann LeCun, editors,2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014. URL http://arxiv.org/abs/1312.6114

  21. [21]

    Eq-vae: Equivariance regularized latent space for improved generative image modeling, 2025

    Theodoros Kouzelis, Ioannis Kakogeorgiou, Spyros Gidaris, and Nikos Komodakis. Eq-vae: Equivariance regularized latent space for improved generative image modeling, 2025. URL https://arxiv.org/abs/2502.09509

  22. [22]

    V ARIATIONAL INFERENCE OF DISENTANGLED LATENT CONCEPTS FROM UNLABELED OBSERV ATIONS

    Abhishek Kumar, Prasanna Sattigeri, and Avinash Balakrishnan. V ARIATIONAL INFERENCE OF DISENTANGLED LATENT CONCEPTS FROM UNLABELED OBSERV ATIONS. In International Conference on Learning Representations, 2018. URL https://openreview. net/forum?id=H1kG7GZAW

  23. [23]

    Autoencoding beyond pixels using a learned similarity metric

    Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. Autoencoding beyond pixels using a learned similarity metric. In Maria-Florina Balcan and Kilian Q. Weinberger, editors,Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, volume 48 ofJMLR Workshop and ...

  24. [24]

    Repa-e: Unlocking vae for end-to-end tuning of latent diffusion transformers

    Xingjian Leng, Jaskirat Singh, Yunzhong Hou, Zhenchang Xing, Saining Xie, and Liang Zheng. Repa-e: Unlocking vae for end-to-end tuning of latent diffusion transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 18262–18272, 2025

  25. [25]

    Deep learning face attributes in the wild

    Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. InProceedings of International Conference on Computer Vision (ICCV), December 2015

  26. [26]

    Disentangling Disentanglement in Variational Autoencoders

    Emile Mathieu, Tom Rainforth, N. Siddharth, and Yee Whye Teh. Disentangling disentangle- ment in variational autoencoders, 2019. URLhttps://arxiv.org/abs/1812.02833

  27. [27]

    High-fidelity generative image compression, 2020

    Fabian Mentzer, George Toderici, Michael Tschannen, and Eirikur Agustsson. High-fidelity generative image compression, 2020. URLhttps://arxiv.org/abs/2006.09965

  28. [28]

    Spectral Normalization for Generative Adversarial Networks

    Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks, 2018. URLhttps://arxiv.org/abs/1802.05957

  29. [29]

    DINOv2: Learning Robust Visual Features without Supervision

    Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023

  30. [30]

    Taming VAEs

    Danilo Jimenez Rezende and Fabio Viola. Taming vaes.CoRR, abs/1810.00597, 2018. URL http://arxiv.org/abs/1810.00597

  31. [31]

    High-Resolution Image Synthesis with Latent Diffusion Models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models, 2022. URL https://arxiv.org/ abs/2112.10752

  32. [32]

    C. E. Shannon. A mathematical theory of communication.The Bell System Technical Journal, 27(3):379–423, 1948. doi: 10.1002/j.1538-7305.1948.tb01338.x

  33. [33]

    Improving the diffusability of autoencoders, 2025

    Ivan Skorokhodov, Sharath Girish, Benran Hu, Willi Menapace, Yanyu Li, Rameen Abdal, Sergey Tulyakov, and Aliaksandr Siarohin. Improving the diffusability of autoencoders, 2025. URLhttps://arxiv.org/abs/2502.14831. 11

  34. [34]

    Diffusers: State-of-the-art diffusion models

    Patrick von Platen, Suraj Patil, Anton Lozhkov, Pedro Cuenca, Nathan Lambert, Kashif Rasul, Mishig Davaadorj, Dhruv Nair, Sayak Paul, William Berman, Yiyi Xu, Steven Liu, and Thomas Wolf. Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/ diffusers, 2022

  35. [35]

    Reconstruction vs

    Jingfeng Yao, Bin Yang, and Xinggang Wang. Reconstruction vs. generation: Taming optimiza- tion dilemma in latent diffusion models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 15703–15712, 2025

  36. [36]

    Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

    Sihyun Yu, Sangkyung Kwak, Huiwon Jang, Jongheon Jeong, Jonathan Huang, Jinwoo Shin, and Saining Xie. Representation alignment for generation: Training diffusion transformers is easier than you think.arXiv preprint arXiv:2410.06940, 2024

  37. [37]

    The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

    Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric, 2018. URL https: //arxiv.org/abs/1801.03924. 12 A Additional plots 0.0 0.2 0.4 0.6 0.8 1.0 (Perceptual Loss Weight) 0 20 40 60 80 100 120 140Rate (KL Divergence) Rate vs (color) =0 =0.25 =0.5 =0.75 =1 (shap...

  38. [38]

    variance- KL budget

    Multiply by wℓ and sum over ℓ∈ L. This pointwise domination implies that any reconstruction rule meeting a pixel-MSE budget also meets an appropriately rescaled feature-matching budget. The corresponding RD ordering is an immediate application of Theorem 11. Corollary 28(RD ordering for feature matching vs. pixel MSE).Under Assumption 26, for all ∆≥0, RdF...

  39. [39]

    ForZ∼ N(µ,diag(s)), E[dM(x, Z)] = (x−W µ) ⊤G(x−W µ) + DX i=1 cisi = const(x, µ) + DX i=1 cisi.(44)

  40. [40]

    The objective can be written as LM,β(s) = const(x, µ) + DX i=1 ℓi(si), ℓ i(s) :=c is+βg(s).(45) Each ℓi is strictly convex on (0,∞) , hence LM,β has a unique minimizer s∗(M, β)∈ (0,∞) D

  41. [41]

    , D.(46) Proof.WriteZ=µ+εwithε∼ N 0,diag(s) and expand dM(x, Z) = (x−W µ−W ε) ⊤G(x−W µ−W ε)

    The minimizer is given in closed form by s∗ i (M, β) = 1 1 + 2ci/β = β β+ 2c i , i= 1, . . . , D.(46) Proof.WriteZ=µ+εwithε∼ N 0,diag(s) and expand dM(x, Z) = (x−W µ−W ε) ⊤G(x−W µ−W ε). The cross term vanishes in expectation, and E[ε⊤Bε] = tr Bdiag(s) =P i cisi, which yields (44) and the separable form (45). Since g′′(s) = 1/(2s2)>0 for s >0 , each ℓi is ...

  42. [42]

    If all ci are equal, then all s∗ i (M, β) coincide and Apost s∗(M, β) = 0 (isotropic posterior)

  43. [43]

    TargetP i g(s∗ i )

    If the coefficients ci are not all equal, then the entries of s∗(M, β) are not all equal and Apost s∗(M, β) >0(anisotropic posterior). We now show that, for a fixed distortionM, every positive target value of the variance part of the KL can be obtained by a suitable choice ofβ. Theorem 41(Surjectivity of the variance-KL map). FixW, Mand assume at least on...