Generative Semantic Communication via Alternating Dual-Domain Posterior Sampling

Qianqian Yang; Shunpu Tang

arxiv: 2604.16796 · v1 · submitted 2026-04-18 · 💻 cs.CV · cs.IT· eess.SP· math.IT

Generative Semantic Communication via Alternating Dual-Domain Posterior Sampling

Shunpu Tang , Qianqian Yang This is my paper

Pith reviewed 2026-05-10 07:30 UTC · model grok-4.3

classification 💻 cs.CV cs.ITeess.SPmath.IT

keywords semantic communicationposterior samplingdiffusion modelsBayesian inverse problemperceptual qualitywireless image transmissiondual-domain guidancegenerative receivers

0 comments

The pith

Treating semantic decoding as a Bayesian inverse problem shows that posterior sampling preserves the data distribution to reach optimal perceptual quality in wireless image transmission.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that maximum a posteriori estimation in generative semantic communication receivers cannot preserve the full source data distribution and therefore caps perceptual quality. Formulating the task as a Bayesian inverse problem makes clear that drawing samples from the posterior distribution recovers images whose statistics match the original data. Existing diffusion approaches suffer from noise sensitivity in latent space or decoder bias in image space, and combining both domains at once creates an overconfident pseudo-posterior. The authors introduce alternating dual-domain posterior sampling that switches between the two domains during the diffusion process, turning the joint sampling task into simpler alternating subproblems. Experiments on face images confirm the resulting receiver produces higher perceptual quality than prior methods.

Core claim

We formulate semantic decoding as a Bayesian inverse problem and prove that posterior sampling achieves optimal perceptual quality by preserving the data distribution. Building on this insight, we propose alternating dual-domain posterior sampling (ADDPS), a diffusion-based SemCom receiver that alternately enforces latent-domain and image-domain consistency during the sampling process. This alternating strategy decomposes joint posterior sampling into simpler subproblems, avoiding gradient conflicts while retaining the complementary strengths of both domains.

What carries the argument

Alternating dual-domain posterior sampling (ADDPS), which switches between latent-domain and image-domain consistency checks inside a diffusion sampler to solve the Bayesian inverse problem for semantic decoding.

If this is right

Posterior sampling recovers the full statistical properties of the source images rather than collapsing to a single most-likely reconstruction.
ADDPS avoids the overconfident pseudo-posterior that arises when latent and image domain guidance are applied simultaneously.
The receiver attains higher perceptual quality than MAP-based or single-domain diffusion receivers on standard face-image benchmarks.
Pretrained generative priors can be used in semantic communication without the distribution mismatch that limits MAP estimators.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the alternating decomposition works across channel conditions, future semantic systems could transmit under higher noise levels while still recovering natural image statistics.
The same alternating dual-domain idea might extend to other generative inverse problems where both compressed and pixel-space guidance are available, such as video or depth reconstruction.
Testing the method on non-face image categories would show whether the perceptual gains depend on the specific statistics of the FFHQ training distribution.

Load-bearing premise

Alternating enforcement of latent-domain and image-domain consistency decomposes the joint posterior into simpler subproblems without introducing gradient conflicts or distribution mismatches.

What would settle it

If a non-alternating joint guidance method or a pure MAP estimator on the same diffusion backbone yields equal or higher perceptual scores than ADDPS on the FFHQ test set, the claimed advantage of posterior sampling and the alternating decomposition would be falsified.

Figures

Figures reproduced from arXiv: 2604.16796 by Qianqian Yang, Shunpu Tang.

read the original abstract

Generative semantic communication (SemCom) harnesses pretrained generative priors to improve the perceptual quality of wireless image transmission. Existing generative SemCom receivers, however, rely on maximum a posteriori (MAP) estimation, which fundamentally cannot preserve the data distribution and thus limits achievable perceptual quality. Moreover, current diffusion-based approaches using single-domain guidance face significant limitations: latent-domain guidance is sensitive to channel noise, while image-domain guidance inherits decoder bias. Simply combining both domains simultaneously yields an overconfident pseudo-posterior. In this paper, we formulate semantic decoding as a Bayesian inverse problem and prove that posterior sampling achieves optimal perceptual quality by preserving the data distribution. Building on this insight, we propose alternating dual-domain posterior sampling (ADDPS), a diffusion-based SemCom receiver that alternately enforces latent-domain and image-domain consistency during the sampling process. This alternating strategy decomposes joint posterior sampling into simpler subproblems, avoiding gradient conflicts while retaining the complementary strengths of both domains. Experiments on FFHQ demonstrate that the proposed ADDPS achieves superior perceptual quality compared with existing methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's alternating dual-domain sampling is a sensible engineering move for diffusion receivers in semantic comm, but the optimality claim rests on whether the alternation actually samples the joint posterior without mismatch.

read the letter

The paper's central claim is that framing semantic decoding as a Bayesian inverse problem shows posterior sampling preserves the data distribution and therefore gives optimal perceptual quality, while MAP cannot. They introduce ADDPS, which alternates between latent-domain and image-domain consistency steps inside the diffusion process to avoid the noise sensitivity of one and the decoder bias of the other, plus the over of trying both at once.

Referee Report

3 major / 1 minor

Summary. The paper formulates semantic decoding as a Bayesian inverse problem and asserts a proof that posterior sampling achieves optimal perceptual quality by preserving the data distribution. It proposes Alternating Dual-Domain Posterior Sampling (ADDPS), a diffusion-based receiver that alternates latent-domain and image-domain consistency enforcement during sampling to decompose the joint posterior into simpler subproblems while avoiding gradient conflicts. Experiments on FFHQ are claimed to show superior perceptual quality over existing generative SemCom methods.

Significance. If the optimality proof holds and the alternating procedure is shown to preserve the correct joint posterior, the work would offer a theoretically grounded advance over MAP-based receivers in generative semantic communication. It directly addresses limitations of single-domain guidance in diffusion models and could improve perceptual quality in wireless image transmission systems that leverage pretrained generative priors.

major comments (3)

[Abstract / §3] Abstract and §3 (Optimality Proof): The claim that posterior sampling is proven optimal for perceptual quality by preserving the data distribution is asserted without any equations, assumptions on the generative prior or channel, or error analysis visible in the abstract. The full derivation must be supplied with explicit statements of the perceptual quality metric and conditions under which the stationary distribution is preserved.
[§4] §4 (ADDPS Algorithm): The alternating dual-domain consistency steps are stated to decompose joint posterior sampling without introducing gradient conflicts or distribution mismatches. No derivation or invariance argument is referenced showing that sequential latent-domain and image-domain score updates leave the stationary distribution equal to the true joint p(x | y, z); sequential application of domain-specific conditionals can converge to a different invariant measure, directly threatening transfer of the general optimality result.
[Experiments] Experiments section: The abstract states that ADDPS achieves superior perceptual quality on FFHQ, yet supplies no quantitative metrics (FID, LPIPS, etc.), baselines, number of trials, or statistical details. These must be reported with error bars and ablation studies on the alternating schedule to substantiate the superiority claim.

minor comments (1)

Define all acronyms (SemCom, ADDPS, MAP) on first use and ensure consistent notation for the latent variable z and observation y across sections.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for improving clarity in the theoretical sections and experimental reporting. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract / §3] Abstract and §3 (Optimality Proof): The claim that posterior sampling is proven optimal for perceptual quality by preserving the data distribution is asserted without any equations, assumptions on the generative prior or channel, or error analysis visible in the abstract. The full derivation must be supplied with explicit statements of the perceptual quality metric and conditions under which the stationary distribution is preserved.

Authors: We agree that the abstract is necessarily concise and omits the detailed derivation. The full proof appears in §3, but we will revise the abstract to include a high-level statement of the result along with the key assumptions. In the revised §3, we will explicitly state the perceptual quality metric (the expected value of a perceptual distance function integrated against the posterior), the assumptions (the generative prior is a pretrained diffusion model whose stationary distribution matches the data distribution p(x), and the channel is an additive white Gaussian noise model), and the conditions under which posterior sampling preserves the stationary distribution. A brief error analysis for the finite-step diffusion approximation will also be added. revision: yes
Referee: [§4] §4 (ADDPS Algorithm): The alternating dual-domain consistency steps are stated to decompose joint posterior sampling without introducing gradient conflicts or distribution mismatches. No derivation or invariance argument is referenced showing that sequential latent-domain and image-domain score updates leave the stationary distribution equal to the true joint p(x | y, z); sequential application of domain-specific conditionals can converge to a different invariant measure, directly threatening transfer of the general optimality result.

Authors: This concern about the invariance of the alternating procedure is well-taken. In the revision we will expand §4 with a dedicated invariance argument. We will show that the alternating updates implement a form of coordinate-wise score matching on the joint posterior, where each half-step conditions on the current sample from the complementary domain. Under the Lipschitz continuity of the score functions and the contractive nature of the diffusion reverse process, the overall Markov chain converges to the correct joint stationary distribution p(x | y, z). We will also reference related results from alternating MCMC and score-based sampling literature to support the argument. revision: yes
Referee: [Experiments] Experiments section: The abstract states that ADDPS achieves superior perceptual quality on FFHQ, yet supplies no quantitative metrics (FID, LPIPS, etc.), baselines, number of trials, or statistical details. These must be reported with error bars and ablation studies on the alternating schedule to substantiate the superiority claim.

Authors: The full manuscript already contains quantitative results (FID, LPIPS, PSNR) comparing ADDPS against MAP-based and single-domain diffusion baselines on FFHQ. To satisfy the request, we will make these tables and figures more prominent, add error bars computed over 10 independent trials, report the exact test-set size and random seeds, and include a new ablation subsection varying the alternating schedule frequency. Statistical significance tests will be added where appropriate. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected in the derivation chain.

full rationale

The paper states that semantic decoding is formulated as a Bayesian inverse problem and that posterior sampling preserves the data distribution for optimal perceptual quality. This is a standard property of sampling from the posterior and does not reduce to the ADDPS procedure or any fitted parameter by construction. The alternating dual-domain strategy is introduced as an implementation choice to approximate the joint posterior, with no equations or self-citations shown that make the optimality claim equivalent to the method's own inputs. The derivation remains independent of the proposed receiver and relies on general Bayesian principles rather than self-referential fitting or unverified uniqueness theorems.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based on abstract only; full paper may contain additional parameters or assumptions not visible here.

axioms (1)

domain assumption Posterior sampling from the data distribution achieves optimal perceptual quality for generative reconstruction
Stated as proven in the abstract for the Bayesian inverse problem formulation.

pith-pipeline@v0.9.0 · 5483 in / 1169 out tokens · 30676 ms · 2026-05-10T07:30:49.120433+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

[1]

Beyond transmitting bits: Context, semantics, and task-oriented communications,

D. G ¨und¨uz, Z. Qin, I. E. Aguerri, H. S. Dhillon, Z. Yang, A. Yener, K. Wong, and C. Chae, “Beyond transmitting bits: Context, semantics, and task-oriented communications,”IEEE J. Sel. Areas Commun., vol. 41, no. 1, pp. 5–41, 2023

work page 2023
[2]

Semantic communications: Principles and challenges,

Z. Qin, X. Tao, J. Lu, and G. Y . Li, “Semantic communications: Principles and challenges,”arXiv:2201.01389, 2022

work page arXiv 2022
[3]

Deep joint source- channel coding for wireless image transmission,

E. Bourtsoulatze, D. B. Kurka, and D. G ¨und¨uz, “Deep joint source- channel coding for wireless image transmission,”IEEE Trans. Cogn. Commun. Netw., vol. 5, no. 3, pp. 567–579, 2019

work page 2019
[4]

SwinJSCC: Taming swin transformer for deep joint source-channel coding,

K. Yang, S. Wang, J. Dai, X. Qin, K. Niu, and P. Zhang, “SwinJSCC: Taming swin transformer for deep joint source-channel coding,”IEEE Trans. Cogn. Commun. Netw., vol. 11, no. 1, pp. 90–104, 2025

work page 2025
[5]

Contrastive learning-based semantic communications,

S. Tang, Q. Yang, L. Fan, X. Lei, A. Nallanathan, and G. K. Kara- giannidis, “Contrastive learning-based semantic communications,”IEEE Trans. Commun., vol. 72, no. 10, pp. 6328–6343, 2024

work page 2024
[6]

SNR-adaptive deep joint source- channel coding for wireless image transmission,

M. Ding, J. Li, M. Ma, and X. Fan, “SNR-adaptive deep joint source- channel coding for wireless image transmission,” inIEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), 2021, pp. 1555–1559

work page 2021
[7]

The perception-distortion tradeoff,

Y . Blau and T. Michaeli, “The perception-distortion tradeoff,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 6228–6237

work page 2018
[8]

Rethinking lossy compression: The rate-distortion-perception tradeoff,

——, “Rethinking lossy compression: The rate-distortion-perception tradeoff,” inProf. ICLR. PMLR, 2019, pp. 675–685

work page 2019
[9]

Generative AI-enabled semantic communication: State-of-the-art, applications, and the way ahead,

C. Liang and D. Li, “Generative AI-enabled semantic communication: State-of-the-art, applications, and the way ahead,”IEEE Commun. Surv. Tutorials, vol. 28, pp. 3976–4015, 2025

work page 2025
[10]

Generative joint source-channel coding for semantic image transmission,

E. Erdemir, T.-Y . Tung, P. L. Dragotti, and D. G ¨und¨uz, “Generative joint source-channel coding for semantic image transmission,”IEEE J. Sel. Areas Commun., vol. 41, no. 8, pp. 2645–2657, 2023

work page 2023
[11]

Cache- enabled generative joint source-channel coding for evolving semantic communications,

S. Tang, Q. Yang, J. Park, Z. Zhang, K. Huang, and D. Gunduz, “Cache- enabled generative joint source-channel coding for evolving semantic communications,”arXiv:2603.17702, 2026

work page arXiv 2026
[12]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” inProc. NeurIPS, vol. 33, 2020, pp. 6840–6851

work page 2020
[13]

Score-based generative modeling through stochastic differential equations,

Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” inProc. ICLR, 2021

work page 2021
[14]

Cddm: Channel denoising diffusion models for wireless semantic communications,

T. Wu, Z. Chen, D. He, L. Qian, Y . Xu, M. Tao, and W. Zhang, “Cddm: Channel denoising diffusion models for wireless semantic communications,”IEEE Trans. Wireless Commun., vol. 23, no. 9, pp. 11 168–11 183, 2024

work page 2024
[15]

High perceptual quality wireless image delivery with denoising diffusion models,

S. F. Yilmaz, X. Niu, B. Bai, W. Han, L. Deng, and D. Gunduz, “High perceptual quality wireless image delivery with denoising diffusion models,” inIEEE INFOCOM Workshops, May 2024

work page 2024
[16]

Enabling training-free semantic communication systems with generative diffusion models,

S. Tang, Y . Jia, Q. Yang, R. Zhang, J. Park, and D. Niyato, “Enabling training-free semantic communication systems with generative diffusion models,”Proc. Globecom, 2025

work page 2025
[17]

Diffcom: Channel received signal is a natural condition to guide diffusion posterior sampling,

S. Wang, J. Dai, K. Tan, X. Qin, K. Niu, and P. Zhang, “Diffcom: Channel received signal is a natural condition to guide diffusion posterior sampling,”IEEE J. Sel. Areas Commun., vol. 43, no. 7, pp. 2651–2666, 2025

work page 2025
[18]

Diffusion posterior sampling for general noisy inverse problems,

H. Chung, J. Kim, M. T. Mccann, M. L. Klasky, and J. C. Ye, “Diffusion posterior sampling for general noisy inverse problems,” inInt’l. Conf. on Learn. Repr ., ICLR, 2023

work page 2023
[19]

A connection between score matching and denoising autoencoders,

P. Vincent, “A connection between score matching and denoising autoencoders,”Neural Computation, vol. 23, no. 7, pp. 1661–1674, 2011

work page 2011
[20]

The unreasonable effectiveness of deep features as a perceptual metric,

R. Zhang, P. Isola, A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in IEEE/CVF Int. Comp. Vis. Pattern Recog. (CVPR), 2018, pp. 586–595

work page 2018

[1] [1]

Beyond transmitting bits: Context, semantics, and task-oriented communications,

D. G ¨und¨uz, Z. Qin, I. E. Aguerri, H. S. Dhillon, Z. Yang, A. Yener, K. Wong, and C. Chae, “Beyond transmitting bits: Context, semantics, and task-oriented communications,”IEEE J. Sel. Areas Commun., vol. 41, no. 1, pp. 5–41, 2023

work page 2023

[2] [2]

Semantic communications: Principles and challenges,

Z. Qin, X. Tao, J. Lu, and G. Y . Li, “Semantic communications: Principles and challenges,”arXiv:2201.01389, 2022

work page arXiv 2022

[3] [3]

Deep joint source- channel coding for wireless image transmission,

E. Bourtsoulatze, D. B. Kurka, and D. G ¨und¨uz, “Deep joint source- channel coding for wireless image transmission,”IEEE Trans. Cogn. Commun. Netw., vol. 5, no. 3, pp. 567–579, 2019

work page 2019

[4] [4]

SwinJSCC: Taming swin transformer for deep joint source-channel coding,

K. Yang, S. Wang, J. Dai, X. Qin, K. Niu, and P. Zhang, “SwinJSCC: Taming swin transformer for deep joint source-channel coding,”IEEE Trans. Cogn. Commun. Netw., vol. 11, no. 1, pp. 90–104, 2025

work page 2025

[5] [5]

Contrastive learning-based semantic communications,

S. Tang, Q. Yang, L. Fan, X. Lei, A. Nallanathan, and G. K. Kara- giannidis, “Contrastive learning-based semantic communications,”IEEE Trans. Commun., vol. 72, no. 10, pp. 6328–6343, 2024

work page 2024

[6] [6]

SNR-adaptive deep joint source- channel coding for wireless image transmission,

M. Ding, J. Li, M. Ma, and X. Fan, “SNR-adaptive deep joint source- channel coding for wireless image transmission,” inIEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), 2021, pp. 1555–1559

work page 2021

[7] [7]

The perception-distortion tradeoff,

Y . Blau and T. Michaeli, “The perception-distortion tradeoff,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 6228–6237

work page 2018

[8] [8]

Rethinking lossy compression: The rate-distortion-perception tradeoff,

——, “Rethinking lossy compression: The rate-distortion-perception tradeoff,” inProf. ICLR. PMLR, 2019, pp. 675–685

work page 2019

[9] [9]

Generative AI-enabled semantic communication: State-of-the-art, applications, and the way ahead,

C. Liang and D. Li, “Generative AI-enabled semantic communication: State-of-the-art, applications, and the way ahead,”IEEE Commun. Surv. Tutorials, vol. 28, pp. 3976–4015, 2025

work page 2025

[10] [10]

Generative joint source-channel coding for semantic image transmission,

E. Erdemir, T.-Y . Tung, P. L. Dragotti, and D. G ¨und¨uz, “Generative joint source-channel coding for semantic image transmission,”IEEE J. Sel. Areas Commun., vol. 41, no. 8, pp. 2645–2657, 2023

work page 2023

[11] [11]

Cache- enabled generative joint source-channel coding for evolving semantic communications,

S. Tang, Q. Yang, J. Park, Z. Zhang, K. Huang, and D. Gunduz, “Cache- enabled generative joint source-channel coding for evolving semantic communications,”arXiv:2603.17702, 2026

work page arXiv 2026

[12] [12]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” inProc. NeurIPS, vol. 33, 2020, pp. 6840–6851

work page 2020

[13] [13]

Score-based generative modeling through stochastic differential equations,

Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” inProc. ICLR, 2021

work page 2021

[14] [14]

Cddm: Channel denoising diffusion models for wireless semantic communications,

T. Wu, Z. Chen, D. He, L. Qian, Y . Xu, M. Tao, and W. Zhang, “Cddm: Channel denoising diffusion models for wireless semantic communications,”IEEE Trans. Wireless Commun., vol. 23, no. 9, pp. 11 168–11 183, 2024

work page 2024

[15] [15]

High perceptual quality wireless image delivery with denoising diffusion models,

S. F. Yilmaz, X. Niu, B. Bai, W. Han, L. Deng, and D. Gunduz, “High perceptual quality wireless image delivery with denoising diffusion models,” inIEEE INFOCOM Workshops, May 2024

work page 2024

[16] [16]

Enabling training-free semantic communication systems with generative diffusion models,

S. Tang, Y . Jia, Q. Yang, R. Zhang, J. Park, and D. Niyato, “Enabling training-free semantic communication systems with generative diffusion models,”Proc. Globecom, 2025

work page 2025

[17] [17]

Diffcom: Channel received signal is a natural condition to guide diffusion posterior sampling,

S. Wang, J. Dai, K. Tan, X. Qin, K. Niu, and P. Zhang, “Diffcom: Channel received signal is a natural condition to guide diffusion posterior sampling,”IEEE J. Sel. Areas Commun., vol. 43, no. 7, pp. 2651–2666, 2025

work page 2025

[18] [18]

Diffusion posterior sampling for general noisy inverse problems,

H. Chung, J. Kim, M. T. Mccann, M. L. Klasky, and J. C. Ye, “Diffusion posterior sampling for general noisy inverse problems,” inInt’l. Conf. on Learn. Repr ., ICLR, 2023

work page 2023

[19] [19]

A connection between score matching and denoising autoencoders,

P. Vincent, “A connection between score matching and denoising autoencoders,”Neural Computation, vol. 23, no. 7, pp. 1661–1674, 2011

work page 2011

[20] [20]

The unreasonable effectiveness of deep features as a perceptual metric,

R. Zhang, P. Isola, A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in IEEE/CVF Int. Comp. Vis. Pattern Recog. (CVPR), 2018, pp. 586–595

work page 2018