arxiv: 2604.14603 · v1 · submitted 2026-04-16 · 💻 cs.IT · cs.LG· eess.SP· math.IT

Recognition: unknown

A Synonymous Variational Perspective on the Rate-Distortion-Perception Tradeoff

Zijian Liang , Kai Niu , Changshuo Wang , Jin Xu , Ping Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 10:57 UTC · model grok-4.3

classification 💻 cs.IT cs.LGeess.SPmath.IT

keywords rate-distortion-perception tradeoffsynonymous source codingvariational inferenceperceptual qualitysemantic informationsource codingdistributional divergencesynset

0 comments

The pith

Reformulating perceptual reconstruction via synonymous sets derives the rate-distortion-perception tradeoff directly from the reconstruction objective.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that perceptual quality can be modeled by recovering any admissible sample from an ideal synonymous set tied to the source rather than matching the source exactly. This leads to a synonymous source coding scheme whose analysis via variational methods produces a rate-distortion-perception tradeoff in which the distributional divergence term emerges naturally instead of being imposed from outside. The approach also yields a consistency result linking optimal semantic synonymity to perceptual quality and demonstrates compatibility with classical rate-distortion theory. A reader would care because the reformulation supplies a semantic grounding for perception measures that previously lacked a clear theoretical origin, potentially guiding more principled codec design.

Core claim

Motivated by a synonymity-based semantic information view, the authors define perceptual reconstruction as recovering any member of an ideal synonymous set associated with the source. They introduce a synonymous source coding architecture and a synonymous variational inference framework equipped with a synonymous variational lower bound. From this they derive a synonymity-perception consistency principle and prove the synonymous rate-distortion-perception tradeoff, establishing that the divergence term is a direct consequence of the synset reconstruction goal and that the resulting formulation is compatible with existing rate-distortion-perception models and classical rate-distortion theory.

What carries the argument

The synonymous variational lower bound obtained from synonymous variational inference on synset-oriented compression, which enables the derivation of the synonymous rate-distortion-perception tradeoff and the synonymity-perception consistency principle.

If this is right

The distributional divergence term in rate-distortion-perception formulations arises directly from the synset-based reconstruction objective.
Optimal semantic information identification is theoretically consistent with perceptual optimization.
Synonymous source coding is compatible with both classical rate-distortion theory and prior rate-distortion-perception models.
The synonymous variational inference framework supplies a tractable analytic tool for synset-oriented compression problems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Codecs could be designed to operate explicitly on semantic equivalence classes to achieve target perceptual quality at lower rates than pixel-level methods.
The perspective may extend to semantic communication systems where preserving meaning across equivalent signals is the primary goal.
Controlled experiments on image or video data could check whether synset-aware coding measurably reduces the rate needed for a given perceptual score.

Load-bearing premise

An ideal synonymous set exists for each source sample such that any member of the set counts as a perceptually perfect reconstruction.

What would settle it

Empirical rate-perception curves measured on a dataset fail to match the divergence term predicted by the derived synonymous tradeoff at comparable rates and distortion levels.

Figures

Figures reproduced from arXiv: 2604.14603 by Changshuo Wang, Jin Xu, Kai Niu, Ping Zhang, Zijian Liang.

**Figure 1.** Figure 1: synonymous source coding architecture for semantic-preserving signal compression and reconstruction. [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: An illustration of the optimization criterion of synonymous variational inference. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of optimization in classical variational inference and synonymous variational inference. [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Illustration of single-point likelihood analysis in the synonymous likelihood term. [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 5.** Figure 5: Illustration of the equivalence for minimizing the distribution ratio term in synonymous likelihood optimization. [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Compatibility of synonymous source coding theory with existing compression theories. [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

read the original abstract

The fundamental limit of natural signal compression has traditionally been characterized by classical rate-distortion (RD) theory through the tradeoff between coding rate and reconstruction distortion, while the rate-distortion-perception (RDP) framework introduces a divergence-based measure of perceptual quality as a modeling principle rather than a theoretically-derived principle, leaving its theoretical origin unclear. In this paper, motivated by a synonymity-based semantic information perspective, we reformulate perceptual reconstruction as recovering any admissible sample within an ideal synonymous set (synset) associated with the source, rather than the source sample itself, and correspondingly establish a synonymous source coding architecture. On this basis, we develop a synonymous variational inference (SVI) analysis framework with a synonymous variational lower bound (SVLBO) for tractable analysis of synset-oriented compression. Within this framework, we establish a synonymity-perception consistency principle, showing that optimal identification of semantic information is theoretically consistent with perceptual optimization. Based on its derivation result, we prove a synonymous RDP tradeoff for the proposed synonymous source coding. These analytical results show that the distributional divergence term arises naturally from the synset-based reconstruction objective, clarify its compatibility with existing RDP formulations and classical RD theory, and suggest the potential advantages of synonymous source coding.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper derives the perception term in RDP theory from a synset reconstruction objective and recovers classical RD as a special case.

read the letter

This paper derives the perception term in the rate-distortion-perception tradeoff from a synonymity-based semantic perspective. The authors reformulate the reconstruction goal as hitting any member of an ideal synonymous set for the source instead of the exact sample. They then build a synonymous source coding architecture and a variational inference framework that produces a lower bound from which the distributional divergence arises naturally. The derivation is the main new element. It leads to a proven synonymous RDP tradeoff and includes a consistency principle linking semantic identification to perceptual optimization. A useful check is that the standard rate-distortion function appears when the synonymous set is a singleton, confirming compatibility with classical theory. The paper does well at making the perception measure a consequence of the objective rather than an external modeling decision. The steps are laid out clearly enough to follow the logic. The soft spot is that the synonymous sets remain an idealized construct. The work does not address how to identify or approximate them for practical signals, so the path to implementation is not yet clear. This is minor for a theoretical paper but limits immediate applicability. The math checks out without circularity or hidden fitting, and the citations to prior RD and RDP work are handled directly. This paper is for information theorists interested in extensions of rate-distortion theory and for those working on semantic communications or learned compression who need a grounded way to include perception. A reader looking for theoretical origins of these tradeoffs will get value from it. It shows honest engagement with the literature and deserves a serious referee. I would recommend sending it to peer review.

Referee Report

0 major / 2 minor

Summary. The paper proposes a synonymous variational perspective on the rate-distortion-perception (RDP) tradeoff. Motivated by a synonymity-based semantic information view, it reformulates perceptual reconstruction as recovering any admissible sample from an ideal synonymous set (synset) associated with each source sample rather than the sample itself. It introduces a synonymous source coding architecture, develops synonymous variational inference (SVI) with a synonymous variational lower bound (SVLBO), establishes a synonymity-perception consistency principle, and proves a synonymous RDP tradeoff. The distributional divergence term is shown to arise directly from the synset-based objective, with compatibility to existing RDP formulations and classical RD theory recovered as the special case when synsets collapse to singletons.

Significance. If the derivations hold, the work supplies a first-principles theoretical origin for the perceptual divergence term in RDP frameworks, derived from semantic synonymity rather than posited as a modeling principle. It unifies RDP with classical rate-distortion theory and introduces a new synonymous source coding architecture whose analytical properties (SVLBO and consistency principle) are strengths. The explicit recovery of the standard RD function as a limiting case and the natural emergence of the divergence term provide falsifiable predictions that could guide semantic communication system design.

minor comments (2)

§1 and §2: the definitions of 'ideal synset' and 'synonymous source coding architecture' are introduced without a compact formal notation (e.g., a set-valued mapping S(x) or diagram); adding one early would improve readability for readers outside the immediate subfield.
The abstract asserts 'we prove a synonymous RDP tradeoff' and 'these analytical results show…'; a single sentence in the abstract or introduction that names the key theorem or equation number would help readers locate the central result immediately.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary and significance assessment of our work on the synonymous variational perspective for the rate-distortion-perception tradeoff. We appreciate the recommendation for minor revision. As the report raises no specific major comments, we will prepare a revised manuscript addressing any minor editorial or clarification points while preserving the core derivations.

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained from synset objective

full rationale

The paper defines a new synonymous reconstruction objective via ideal synsets, derives the SVLBO directly from variational analysis of recovering any element in the synset, and obtains the synonymous RDP tradeoff as a consequence of that bound. The distributional divergence term is shown to emerge from the synset formulation itself, with classical RD recovered as the special case of singleton synsets. No step reduces by construction to a fitted parameter renamed as prediction, a self-citation chain, or an imported uniqueness theorem; the central claims remain independent of the inputs once the synset model is adopted.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 3 invented entities

Abstract-only review limits visibility into explicit parameters; the central claims rest on the introduced synset concept and variational bound whose concrete parameterization is not shown.

axioms (2)

domain assumption synonymity-based semantic information perspective exists and can be used to define admissible synonymous sets
Invoked in the opening motivation to reformulate perceptual reconstruction.
ad hoc to paper an ideal synonymous set (synset) is associated with every source sample
Central modeling choice that enables the synonymous reconstruction objective.

invented entities (3)

synset no independent evidence
purpose: ideal synonymous set for reconstruction instead of exact source sample
Core modeling device introduced to ground perceptual quality.
synonymous source coding architecture no independent evidence
purpose: new coding structure based on synset recovery
Proposed architecture on which the SVI analysis is built.
synonymous variational lower bound (SVLBO) no independent evidence
purpose: tractable analysis tool for synset-oriented compression
Developed within the SVI framework to enable the tradeoff derivation.

pith-pipeline@v0.9.0 · 5535 in / 1476 out tokens · 45914 ms · 2026-05-10T10:57:50.354611+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 4 canonical work pages · 1 internal anchor

[1]

Synonymous variational inference for perceptual image compression,

Z. Liang, K. Niu, C. Wang, J. Xu, and P. Zhang, “Synonymous variational inference for perceptual image compression,” inProceedings of the 42nd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 267. PMLR, 13–19 Jul 2025, pp. 37 339–37 369. [Online]. Available: https://proceedings.mlr.press/v267/liang25m.html

2025
[2]

A mathematical theory of communication,

C. E. Shannon, “A mathematical theory of communication,”The Bell system technical journal, vol. 27, no. 3, pp. 379–423, 1948

1948
[3]

Coding theorems for a discrete source with a fidelity criterion,

C. E. Shannonet al., “Coding theorems for a discrete source with a fidelity criterion,”IRE Nat. Conv. Rec, vol. 4, no. 142-163, p. 1, 1959

1959
[4]

T. M. Cover and J. A. Thomas,Elements of information theory. Wiley-Interscience, 2006

2006
[5]

MP3 and AAC explained,

K. Brandenburget al., “MP3 and AAC explained,” inAES 17th International Conference on High-Quality Audio Coding, 1999, pp. 1–12

1999
[6]

An overview of jpeg-2000,

M. W. Marcellin, M. J. Gormish, A. Bilgin, and M. P. Boliek, “An overview of jpeg-2000,” inProceedings DCC 2000. Data compression conference. IEEE, 2000, pp. 523–541

2000
[7]

Overview of the h. 264/avc video coding standard,

T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the h. 264/avc video coding standard,”IEEE Transactions on circuits and systems for video technology, vol. 13, no. 7, pp. 560–576, 2003

2003
[8]

The perception-distortion tradeoff,

Y . Blau and T. Michaeli, “The perception-distortion tradeoff,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

2018
[9]

Rethinking lossy compression: The rate-distortion-perception tradeoff,

——, “Rethinking lossy compression: The rate-distortion-perception tradeoff,” inProceedings of the 36th International Conference on Machine Learning (ICML). PMLR, 2019, pp. 675–685

2019
[10]

A coding theorem for the rate-distortion-perception function,

L. Theis and A. B. Wagner, “A coding theorem for the rate-distortion-perception function,” inNeural Compression: From Information Theory to Applications – Workshop @ ICLR 2021, 2021. [Online]. Available: https://openreview.net/forum?id=BzUaLGtKecs

2021
[11]

On the rate-distortion-perception function,

J. Chen, L. Yu, J. Wang, W. Shi, Y . Ge, and W. Tong, “On the rate-distortion-perception function,”IEEE Journal on Selected Areas in Information Theory, vol. 3, no. 4, pp. 664–673, 2022

2022
[12]

Rate-distortion-perception tradeoff based on the conditional-distribution perception measure,

S. Salehkalaibar, J. Chen, A. Khisti, and W. Yu, “Rate-distortion-perception tradeoff based on the conditional-distribution perception measure,”IEEE Transactions on Information Theory, vol. 70, no. 12, pp. 8432–8454, 2024

2024
[13]

Computation of rate-distortion-perception function under f-divergence perception constraints,

G. Serra, P. A. Stavrou, and M. Kountouris, “Computation of rate-distortion-perception function under f-divergence perception constraints,” in2023 IEEE International Symposium on Information Theory (ISIT). IEEE, 2023, pp. 531–536

2023
[14]

Lossy compression with data, perception, and classification constraints,

Y . Wang, Y . Wu, S. Ma, and Y .-J. A. Zhang, “Lossy compression with data, perception, and classification constraints,” in2024 IEEE Information Theory Workshop (ITW). IEEE, 2024, pp. 366–371

2024
[15]

Generative adversarial networks for extreme learned image compression,

E. Agustsson, M. Tschannen, F. Mentzer, R. Timofte, and L. V . Gool, “Generative adversarial networks for extreme learned image compression,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 221–231

2019
[16]

High-fidelity generative image compression,

F. Mentzer, G. D. Toderici, M. Tschannen, and E. Agustsson, “High-fidelity generative image compression,”Advances in neural information processing systems, vol. 33, pp. 11 913–11 924, 2020

2020
[17]

Universal rate-distortion-perception representations for lossy compression,

G. Zhang, J. Qian, J. Chen, and A. Khisti, “Universal rate-distortion-perception representations for lossy compression,”Advances in Neural Information Processing Systems, vol. 34, pp. 11 517–11 529, 2021

2021
[18]

Improving statistical fidelity for neural image compression with implicit local likelihood models,

M. J. Muckley, A. El-Nouby, K. Ullrich, H. Jegou, and J. Verbeek, “Improving statistical fidelity for neural image compression with implicit local likelihood models,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 25 426–25 443

2023
[19]

Generative latent coding for ultra-low bitrate image compression,

Z. Jia, J. Li, B. Li, H. Li, and Y . Lu, “Generative latent coding for ultra-low bitrate image compression,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 26 088–26 098

2024
[20]

Soundstream: An end-to-end neural audio codec,

N. Zeghidour, A. Luebs, A. Omran, J. Skoglund, and M. Tagliasacchi, “Soundstream: An end-to-end neural audio codec,”IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 495–507, 2021

2021
[21]

Speech resynthesis from discrete disentangled self-supervised representations

A. Polyak, Y . Adi, J. Copet, E. Kharitonov, K. Lakhotia, W.-N. Hsu, A. Mohamed, and E. Dupoux, “Speech resynthesis from discrete disentangled self-supervised representations,”arXiv preprint arXiv:2104.00355, 2021

work page arXiv 2021
[22]

One-shot free-view neural talking-head synthesis for video conferencing,

T.-C. Wang, A. Mallya, and M.-Y . Liu, “One-shot free-view neural talking-head synthesis for video conferencing,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 10 039–10 049

2021
[23]

Perceptual learned video compression with recurrent conditional gan

R. Yang, R. Timofte, and L. Van Gool, “Perceptual learned video compression with recurrent conditional gan.” inIJCAI, 2022, pp. 1537–1544

2022
[24]

Towards image compression with perfect realism at ultra-low bitrates,

M. Careil, M. J. Muckley, J. Verbeek, and S. Lathuili `ere, “Towards image compression with perfect realism at ultra-low bitrates,” inThe Twelfth International Conference on Learning Representations, 2023

2023
[25]

Diffusion-based perceptual neural video compression with temporal diffusion information reuse,

W. Ma and Z. Chen, “Diffusion-based perceptual neural video compression with temporal diffusion information reuse,”ACM Transactions on Multimedia Computing, Communications and Applications, vol. 21, no. 12, pp. 1–22, 2025

2025
[26]

On distribution preserving quantization,

M. Li, J. Klejsa, and W. B. Kleijn, “On distribution preserving quantization,”arXiv preprint arXiv:1108.3728, 2011

work page arXiv 2011
[27]

Output constrained lossy source coding with limited common randomness,

N. Saldi, T. Linder, and S. Y ¨uksel, “Output constrained lossy source coding with limited common randomness,”IEEE Transactions on Information Theory, vol. 61, no. 9, pp. 4984–4998, 2015

2015
[28]

A mathematical theory of semantic communication,

K. Niu and P. Zhang, “A mathematical theory of semantic communication,”Journal on Communications, vol. 45, no. 6, pp. 7–59, 2024. [Online]. Available: https://www.joconline.com.cn/en/article/doi/10.11959/j.issn.1000-436x.2024111/

work page doi:10.11959/j.issn.1000-436x.2024111/ 2024
[29]

Springer Nature, 2025

——,The Mathematical Theory of Semantic Communication. Springer Nature, 2025. [Online]. Available: https://link.springer.com/book/10.1007/ 978-981-96-5132-0

2025
[30]

Auto-Encoding Variational Bayes

D. P. Kingma and M. Welling, “Auto-encoding variational bayes,”arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[31]

End-to-end optimized image compression,

J. Ball ´e, V . Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” inInternational Conference on Learning Representations (ICLR), 2017

2017
[32]

Variational image compression with a scale hyperprior,

J. Ball ´e, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” inInternational Conference on Learning Representations (ICLR), 2018

2018
[33]

Beyond shannon: Semantic information theory and methodology,

P. Zhang, K. Niu, Z. Liang, C. Wang, J. Wu, Y . Liu, W. Xu, N. Ma, X. Xu, and R. Zhang, “Beyond shannon: Semantic information theory and methodology,”IEEE Transactions on Network Science and Engineering, vol. 13, pp. 8062–8079, 2026

2026