pith. sign in

arxiv: 2405.09866 · v2 · submitted 2024-05-16 · 📡 eess.SP · cs.LG

Training-Free Multi-User Generative Semantic Communications via Null-Space Diffusion Sampling

Pith reviewed 2026-05-24 01:36 UTC · model grok-4.3

classification 📡 eess.SP cs.LG
keywords generative semantic communicationsdiffusion modelsmulti-usernull-space samplingOFDMAtraining-freesemantic regeneration
0
0 comments X

The pith

Multi-user OFDMA transmits only minimal bits for diffusion models to regenerate semantic content at receivers without training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a generative semantic communication framework for multiple users sharing channels. Instead of transmitting complete data, the transmitter sends a small set of bits that allow diffusion models at each receiver to semantically regenerate the missing information. This operates without any training or fine-tuning of the models for the specific communication scenario. The approach relies on null-space diffusion sampling to guide the recovery process while assigning channel resources across users. If correct, multi-user systems can prioritize semantic essentials over full bit delivery.

Core claim

The paper claims that null-space diffusion sampling enables a training-free multi-user generative semantic communication system. By constraining the diffusion process to the null space of the transmitted signal, each receiver's model generates the lost semantic content from only the minimal guiding bits sent over the shared OFDMA channel.

What carries the argument

Null-space diffusion sampling: the mechanism that performs diffusion-based generation inside the null space of the channel to recover missing semantic information from partial transmissions.

If this is right

  • Transmitters allocate resources knowing generative recovery will handle losses rather than aiming for complete delivery.
  • OFDMA systems can operate with significantly reduced transmitted data volume per user.
  • The framework supports multiple users simultaneously without dedicated model training for each scenario.
  • Experimental results indicate effective content regeneration from compressed semantic information.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Resource allocation in wireless networks could shift toward sending semantic cues instead of error-free full data.
  • The same null-space principle may extend to other generative models in communication tasks.
  • Performance in dense networks with high user counts could be tested by varying interference levels.

Load-bearing premise

A pre-trained diffusion model can reliably regenerate the semantically missing information from only the minimal transmitted bits in a multi-user setting without scenario-specific training.

What would settle it

A measurement showing low semantic similarity between original and regenerated content when only the minimal bits are transmitted through the multi-user channel.

Figures

Figures reproduced from arXiv: 2405.09866 by Danilo Comminiello, Eleonora Grassucci, Giordano Cicchetti, Jihong Park, Jinho Choi, Riccardo F. Gramaccioni.

Figure 1
Figure 1. Figure 1: Sample results of our method on out-of-dataset images for two-user [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Schematic representation of the proposed generative semantic com [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of diffusion model forward process, utilized in training, diffusion model standard reverse process for sampling, and ours null-space diffusion [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Random samples from the ImageNet test set regenerated by our method. We consider two users and three different channel assignment scenarios, where [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison for different N/M subcarriers ratio, where N < M and M = 256, for K = 2 users during transmission, evaluated with different metrics, namely SSIM↑, PSNR↑, FID↓, LPIPS↓ on the CelebA-HQ dataset. The proposed method (in red line) far exceeds any other method according to all the four metrics in each scenario. FID = ∥µr − µg∥ 2 + T r  Σr + Σg − 2 (ΣrΣg) 1/2  , (31) whereby, µ represents the mean a… view at source ↗
Figure 6
Figure 6. Figure 6: Visual comparison of different methods under extremely low channel noise SNR values ( [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of the methods against different channel SNR, evaluated [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: Performance of the proposed method in extreme scenarios with very [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗
Figure 8
Figure 8. Figure 8: Results for the experiments with joint N < M subcarriers and different channel noise SNR values. We consider two scenarios for the N/M ratio equal to 0.7 and 0.6 in the first and second columns, respectively. For both the scenarios we consider an SNR equally spaced from −10 to 10. Our method, in red, largely outperforms all other comparisons in each of the scenarios considered. AWGN channel noise, and most… view at source ↗
read the original abstract

In recent years, novel communication strategies have emerged to face the challenges that the increased number of connected devices and the higher quality of transmitted information are posing. Among them, semantic communication obtained promising results especially when combined with state-of-the-art deep generative models, such as large language or diffusion models, able to regenerate content from extremely compressed semantic information. However, most of these approaches focus on single-user scenarios processing the received content at the receiver on top of conventional communication systems. In this paper, we propose to go beyond these methods by developing a novel generative semantic communication framework tailored for multi-user scenarios. This system assigns the channel to users knowing that the lost information can be filled in with a diffusion model at the receivers. Under this innovative perspective, OFDMA systems should not aim to transmit the largest part of information, but solely the bits necessary to the generative model to semantically regenerate the missing ones. The thorough experimental evaluation shows the capabilities of the novel diffusion model and the effectiveness of the proposed framework, leading towards a GenAI-based next generation of communications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a training-free framework for multi-user generative semantic communications over OFDMA channels. It uses null-space projection at the transmitter to allocate resources such that each user receives only the minimal conditioning bits required by a fixed, pre-trained diffusion model at the receiver to semantically regenerate the omitted content.

Significance. If the central claim is validated, the work offers a route to substantially lower transmitted rates in multi-user settings by delegating semantic completion to off-the-shelf generative models. The explicit training-free design is a concrete strength that distinguishes it from fine-tuning-based semantic schemes and could be directly applicable to existing diffusion backbones.

major comments (3)
  1. [§3.2] §3.2 (Null-Space Sampling): the derivation assumes that the projected bits isolate precisely the information needed by the diffusion prior, yet no argument or bound is given showing that channel-induced distortions or residual inter-user leakage remain inside the model's training support; this is load-bearing for the no-adaptation claim.
  2. [§5] §5 (Experimental Evaluation): results are reported for a fixed number of users and a single operating point of transmitted bits; the absence of ablations on user count or on the fraction of information retained prevents isolation of whether the diffusion regeneration succeeds because of, or despite, the multi-user null-space mechanism.
  3. [§5.3] §5.3 (Multi-user Results): the reported semantic metrics do not include controls that vary channel estimation error or inter-user interference power, leaving open whether performance degrades when the received conditioning vector falls outside the diffusion model's training distribution.
minor comments (2)
  1. [§2] Notation for the null-space projector and the conditioning vector should be introduced with an explicit equation reference in §2 to aid readers unfamiliar with the diffusion literature.
  2. [Figure 3] Figure 3 caption does not state the number of Monte-Carlo channel realizations used to generate the plotted curves.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment point-by-point below, indicating revisions where the manuscript will be updated to strengthen the claims.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Null-Space Sampling): the derivation assumes that the projected bits isolate precisely the information needed by the diffusion prior, yet no argument or bound is given showing that channel-induced distortions or residual inter-user leakage remain inside the model's training support; this is load-bearing for the no-adaptation claim.

    Authors: We agree that an explicit supporting argument would reinforce the training-free claim. In the revision we will add a lemma in §3.2 that bounds the deviation of the received conditioning vector from the diffusion model's training distribution under bounded channel estimation error and residual leakage permitted by OFDMA orthogonality, thereby showing the projected signal remains inside the model's support. revision: yes

  2. Referee: [§5] §5 (Experimental Evaluation): results are reported for a fixed number of users and a single operating point of transmitted bits; the absence of ablations on user count or on the fraction of information retained prevents isolation of whether the diffusion regeneration succeeds because of, or despite, the multi-user null-space mechanism.

    Authors: The present experiments validate the end-to-end system at representative points. To isolate the null-space contribution we will add ablations in the revised §5 that sweep user count (K=2,4,8) and retained-bit fraction, reporting semantic metrics for each to separate the effect of the projection from the generative prior. revision: yes

  3. Referee: [§5.3] §5.3 (Multi-user Results): the reported semantic metrics do not include controls that vary channel estimation error or inter-user interference power, leaving open whether performance degrades when the received conditioning vector falls outside the diffusion model's training distribution.

    Authors: We concur that robustness checks are needed. The revision will extend §5.3 with new curves that vary channel estimation error variance and residual interference power, plotting semantic similarity versus these parameters to confirm graceful degradation while the conditioning vector stays inside the training support. revision: yes

Circularity Check

0 steps flagged

No significant circularity; framework relies on external pre-trained diffusion models

full rationale

The paper proposes a training-free multi-user semantic communication system that assigns channels via null-space projection and relies on pre-existing diffusion models to regenerate missing semantic content at receivers. No derivation chain reduces by construction to fitted parameters, self-defined quantities, or self-citation load-bearing steps. The central claim depends on the independent capabilities of external generative models rather than internal fits or renamings. This is the common case of a self-contained proposal against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies insufficient technical detail to enumerate free parameters, axioms, or invented entities; the diffusion model is referenced as an external component.

pith-pipeline@v0.9.0 · 5733 in / 1049 out tokens · 30683 ms · 2026-05-24T01:36:32.601443+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 1 internal anchor

  1. [1]

    Ai empowered wireless communications: From bits to semantics,

    Z. Qin, L. Liang, Z. Wang, S. Jin, X. Tao, W. Tong, and G. Y . Li, “Ai empowered wireless communications: From bits to semantics,” Proceedings of the IEEE, vol. 112, no. 7, pp. 621–652, 2024

  2. [2]

    Efficient explainable metric for semantic communication of images using image captioning,

    A. O. Mahgoub and E. Yaacoub, “Efficient explainable metric for semantic communication of images using image captioning,”IEEE Access, vol. 14, pp. 8757–8772, 2026

  3. [3]

    Physical-layer adversarial robustness for deep learning-based semantic communications,

    G. Nan, Z. Li, J. Zhai, Q. Cui, G. Chen, X. Du, X. Zhang, X. Tao, Z. Han, and T. Q. S. Quek, “Physical-layer adversarial robustness for deep learning-based semantic communications,”IEEE Journal on Selected Areas in Comm., vol. 41, no. 8, pp. 2592–2608, 2023

  4. [4]

    Communication beyond transmitting bits: Semantics-guided source and channel coding,

    J. Dai, P. Zhang, K. Niu, S. Wang, Z. Si, and X. Qin, “Communication beyond transmitting bits: Semantics-guided source and channel coding,” IEEE Wireless Comm., 2022

  5. [5]

    Semantic com- munication challenges: Understanding dos and avoiding don’ts,

    J. Choi, J. Park, E. Grassucci, and D. Comminiello, “Semantic com- munication challenges: Understanding dos and avoiding don’ts,”IEEE V eichular and Tech. Conf. (VTC) Spring, 2024

  6. [6]

    Enhancing semantic communication with deep generative models: An overview,

    E. Grassucci, Y . Mitsufuji, P. Zhang, and D. Comminiello, “Enhancing semantic communication with deep generative models: An overview,” in IEEE Int. Conf. on Audio, Speech, and Signal Process. (ICASSP), 2024

  7. [7]

    CDDM: Channel denoising diffusion models for wireless semantic communications,

    T. Wu, Z. Chen, D. He, L. Qian, Y . Xu, and M. Tao, “CDDM: Channel denoising diffusion models for wireless semantic communications,” IEEE Trans. on Wireless Comm., no. 9, pp. 11168–11183, 2024

  8. [8]

    Generative AI meets semantic communication: Evolution and revolution of communication tasks,

    E. Grassucci, J. Park, S. Barbarossa, S.-L. Kim, J. Choi, and D. Com- miniello, “Generative AI meets semantic communication: Evolution and revolution of communication tasks,”ArXiv preprint: arXiv:2401.06803, 2024

  9. [9]

    Diffusion- driven semantic communication for generative models with bandwidth constraints,

    L. Guo, W. Chen, Y . Sun, N. P. B. Ai, and T. Q. S. Quek, “Diffusion- driven semantic communication for generative models with bandwidth constraints,”IEEE Trans. on Wireless Comm., no. 8, pp. 6490–6503, 2025

  10. [10]

    Lightweight and adaptive deep coding for wireless image transmission in semantic communication,

    Y . Sun, J. Wang, L. Wei, H. Chen, S. Dang, and X. Li, “Lightweight and adaptive deep coding for wireless image transmission in semantic communication,”IEEE Access, vol. 13, pp. 158285–158301, 2025

  11. [11]

    Gen- erative semantic communications with foundation models: Perception- error analysis and semantic-aware power allocation,

    C. Xu, M. B. Mashhadi, Y . Ma, R. Tafazolli, and J. Wang, “Gen- erative semantic communications with foundation models: Perception- error analysis and semantic-aware power allocation,”IEEE Journal on Selected Areas in Comm., 2025

  12. [12]

    Generative Semantic Communication: Diffusion Models Beyond Bit Recovery

    E. Grassucci, S. Barbarossa, and D. Comminiello, “Generative semantic communication: Diffusion models beyond bit recovery,”ArXiv preprint: arXiv:2306.04321, 2023

  13. [13]

    Semantic successive refinement: A generative ai-aided semantic communication framework,

    K. Zhang, L. Li, W. Lin, Y . Yan, R. Li, W. Cheng, and Z. Han, “Semantic successive refinement: A generative ai-aided semantic communication framework,”IEEE Trans. on Cognitive Comm. and Networking, vol. 11, no. 2, pp. 687–699, 2025

  14. [14]

    Diffu- sion models for audio semantic communication,

    E. Grassucci, C. Marinoni, A. Rodriguez, and D. Comminiello, “Diffu- sion models for audio semantic communication,” inIEEE Int. Conf. on Audio, Speech, and Signal Process. (ICASSP), 2024

  15. [15]

    Semantic-preserved communication system for highly efficient speech transmission,

    T. Han, Q. Yang, Z. Shi, S. He, and Z. Zhang, “Semantic-preserved communication system for highly efficient speech transmission,”IEEE Journal on Selected Areas in Comm., vol. 41, pp. 245–259, 2022

  16. [16]

    Personalized neural speech codec,

    I. Jang, H. Yang, W. Lim, S. Beack, , and M. Kim, “Personalized neural speech codec,” inIEEE Int. Conf. on Acoustics, Speech, and Signal Process. (ICASSP), 2024

  17. [17]

    Language-oriented communication with semantic coding and knowledge distillation for text- to-image generation,

    H. Nam, J. Park, J. Choi, M. Bennis, and S.-L. Kim, “Language-oriented communication with semantic coding and knowledge distillation for text- to-image generation,” inIEEE Int. Conf. on Audio, Speech, and Signal Process. (ICASSP), 2024

  18. [18]

    Sequential semantic gener- ative communication for progressive text-to-image generation,

    H. Nam, J. Park, J. Choi, and S.-L. Kim, “Sequential semantic gener- ative communication for progressive text-to-image generation,” in20th Annual IEEE Int. Conf. on Sensing, Comm., and Netw. (SECON), pp. 91– 94, 2023

  19. [19]

    VQ-V AE empowered wireless commu- nication for joint source-channel coding and beyond,

    M. Nemati, J. Park, and J. Choi, “VQ-V AE empowered wireless commu- nication for joint source-channel coding and beyond,” inIEEE Global Comm. Conf. (GLOBECOM), pp. 1–6, 2023

  20. [20]

    Latent diffusion model-enabled low-latency semantic communication in the presence of semantic ambiguities and wireless channel noises,

    J. Pei, C. Feng, P. Wang, H. Tabassum, and D. Shi, “Latent diffusion model-enabled low-latency semantic communication in the presence of semantic ambiguities and wireless channel noises,”IEEE Trans. on Wireless Comm., 2025

  21. [21]

    Task-oriented multi-user semantic communications,

    H. Xie, Z. Qin, X. Tao, and K. B. Letaief, “Task-oriented multi-user semantic communications,”IEEE Journal on Selected Areas in Comm., vol. 40, no. 9, pp. 2584–2597, 2022

  22. [22]

    Swin transformer-based dynamic semantic communication for multi-user with different computing capacity,

    X. N. Loc, T. L. Ye, T. Y . Kyaw, N. N. H. Minh, C. Zhang, H. Zhu, and C. S. Hong, “Swin transformer-based dynamic semantic communication for multi-user with different computing capacity,”IEEE Trans. on V ehicular Technology, pp. 1–16, 2024

  23. [23]

    Multi-user matching and resource allocation in vision aided communications,

    W. Xu, F. Gao, Y . Zhang, C. Pan, and G. Liu, “Multi-user matching and resource allocation in vision aided communications,”IEEE Trans. on Comm., vol. 71, no. 8, pp. 4528–4543, 2023

  24. [24]

    Solving audio inverse problems with a diffusion model,

    E. Moliner, J. Lehtinen, and V . V ¨alim¨aki, “Solving audio inverse problems with a diffusion model,”IEEE Int. Conf. on Acoustics, Speech and Signal Process. (ICASSP), pp. 1–5, 2022

  25. [25]

    GibbsDDRM: A partially collapsed gibbs sampler for solving blind inverse problems with denoising diffusion restoration,

    N. Murata, K. Saito, C.-H. Lai, Y . Takida, T. Uesaka, Y . Mitsufuji, and S. Ermon, “GibbsDDRM: A partially collapsed gibbs sampler for solving blind inverse problems with denoising diffusion restoration,” in Int. Conf. on Machine Learning (ICML), 2023

  26. [26]

    SDEdit: Guided image synthesis and editing with stochastic differential equations,

    C. Meng, Y . He, Y . Song, J. Song, J. Wu, J.-Y . Zhu, and S. Ermon, “SDEdit: Guided image synthesis and editing with stochastic differential equations,” inInt. Conf. on Learning Repr ., 2021

  27. [27]

    Deep null space learning for inverse problems: convergence analysis and rates,

    J. Schwab, S. Antholzer, and M. Haltmeier, “Deep null space learning for inverse problems: convergence analysis and rates,”Inverse Problems, vol. 35, jan 2019

  28. [28]

    Semantic communications: Overview, open issues, and future research directions,

    X. Luo, H.-H. Chen, and Q. Guo, “Semantic communications: Overview, open issues, and future research directions,”IEEE Wireless Comm., vol. 29, no. 1, pp. 210–219, 2022

  29. [29]

    Joint task and data oriented semantic communications: A deep separate source-channel coding scheme,

    J. Huang, D. Li, C. H. Xiu, X. Qin, and W. Zhang, “Joint task and data oriented semantic communications: A deep separate source-channel coding scheme,”ArXiv preprint: arXiv:2302.13580, 2023

  30. [30]

    Semantic communications: Principles and challenges,

    Z. Qin, X. Tao, J. Lu, and G. Y . Li, “Semantic communications: Principles and challenges,”ArXiv preprint: arXiv:2201.01389, 2021

  31. [31]

    Goal-oriented and semantic commu- nication in 6G AI-native networks: The 6G-GOALS approach,

    E. Calvanese Strinati and et al., “Goal-oriented and semantic commu- nication in 6G AI-native networks: The 6G-GOALS approach,”ArXiv preprint: arXiv:2402.07573, 2024

  32. [32]

    Semantic-preserving image compression,

    N. Patwa, N. A. Ahuja, S. Somayazulu, O. Tickoo, S. Varadarajan, and S. G. Koolagudi, “Semantic-preserving image compression,”IEEE International Conference on Image Processing (ICIP), pp. 1281–1285, 2020

  33. [33]

    An end-to-end deep learning image compression framework based on semantic analysis,

    C. Wang, Y . Han, and W. Wang, “An end-to-end deep learning image compression framework based on semantic analysis,”Applied Sciences, 2019

  34. [34]

    Semantic communications for speech signals,

    Z. Weng, Z. Qin, and G. Y . Li, “Semantic communications for speech signals,” inIEEE Int. Conf. on Comm. (ICC), 2021

  35. [35]

    Wireless deep speech semantic transmission,

    Z. Xiao, S. Yao, J. Dai, S. Wang, K. Niu, and P. Zhang, “Wireless deep speech semantic transmission,” inIEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2023

  36. [36]

    Wireless semantic commu- nications for video conferencing,

    P. Jiang, C.-K. Wen, S. Jin, and G. Y . Li, “Wireless semantic commu- nications for video conferencing,”IEEE Journal on Selected Areas in Comm., vol. 41, pp. 230–244, 2022

  37. [37]

    VideoQA-SC: Adaptive semantic communication for video question answering,

    J. Guo, W. Chen, Y . Sun, J. Xu, and B. Ai, “VideoQA-SC: Adaptive semantic communication for video question answering,”IEEE Journal on Selected Areas in Comm., 2025

  38. [38]

    Adaptive semantic token selection for ai-native goal-oriented commu- nications,

    A. Devoto, J. Pomponi, S. Petruzzi, P. Di Lorenzo, and S. Scardapane, “Adaptive semantic token selection for ai-native goal-oriented commu- nications,” inIEEE Globecom Workshops (GC Wkshps), 2024

  39. [39]

    Token-domain mul- tiple access: Exploiting semantic orthogonality for collision mitigation,

    L. Qiao, M. B. Mashhadi, Z. Gao, and D. G ¨und¨uz, “Token-domain mul- tiple access: Exploiting semantic orthogonality for collision mitigation,” inIEEE INFOCOM Workshops 2025, 2025

  40. [40]

    Denoising diffusion probabilistic models,

    J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” inAdvances in Neural Information Processing Systems (NeurIPS), pp. 6840–6851, 2020. 12

  41. [41]

    Photorealistic text-to-image diffusion models with deep language understanding,

    C. Saharia, C. W., S. Saxena, L. Li, J. Whang, E. Denton, S. K. S. Ghasemipour, R. Gontijo-Lopes, B. K. Ayan, T. Salimans, J. Ho, D. J. Fleet, and M. Norouzi, “Photorealistic text-to-image diffusion models with deep language understanding,” inAdvances in Neural Information Processing Systems (NeurIPS), 2022

  42. [42]

    High- resolution image synthesis with latent diffusion models,

    R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), p. 10674–10685, 2021

  43. [43]

    AudioLDM 2: learning holis- tic audio generation with self-supervised pretraining,

    H. Liu, Q. Tian, Y . Yuan, X. Liu, X. Mei, Q. Kong, Y . Wang, W. Wang, Y . Wang, and M. D. Plumbley, “AudioLDM 2: learning holis- tic audio generation with self-supervised pretraining,”ArXiv preprint: arXiv:2308.05734, 2023

  44. [44]

    Text-to-audio generation using instruction-tuned LLM and latent diffusion model,

    D. Ghosal, N. Majumder, A. Mehrish, and S. Poria, “Text-to-audio generation using instruction-tuned LLM and latent diffusion model,” ArXiv preprint: arXiv:2304.13731, 2023

  45. [45]

    Plan-X: Instruct video generation via semantic planning,

    L. Huang, Y . u Xie, H. Xu, T. Gu, C. Zhang, G. Song, Z. Li, X. Zhao, L. Luo, and G. Sapiro, “Plan-X: Instruct video generation via semantic planning,”ArXiv preprint: arXiv:2511.17986, 2025

  46. [46]

    Text2Performer: Text-driven human video generation,

    Y . Jiang, S. Yang, T. K. Liang, W. Wu, C. L. Change, and Z. Liu, “Text2Performer: Text-driven human video generation,”ArXiv preprint: ArXiv:2304.08483, 2023

  47. [47]

    Diffusion models beat GANs on image synthesis,

    P. Dhariwal and A. Q. Nichol, “Diffusion models beat GANs on image synthesis,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 34, 2021

  48. [48]

    Diffusion models in vision: A survey,

    F.-A. Croitoru, V . Hondru, R. T. Ionescu, and M. Shah, “Diffusion models in vision: A survey,”IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–20, 2023

  49. [49]

    Optimizing resource allocation for multi-modal semantic communication in mobile aigc networks: A diffusion-based game approach,

    J. Liu, M. Xiao, J. Wen, J. Kang, R. Zhang, T. Zhang, D. Niyato, W. Zhang, and Y . Liu, “Optimizing resource allocation for multi-modal semantic communication in mobile aigc networks: A diffusion-based game approach,”IEEE Trans. on Cognitive Comm. and Networking, vol. 11, no. 5, pp. 3346–3360, 2025

  50. [50]

    SG2SC: a generative semantic communication framework for scene understanding- oriented image transmission,

    M. Yang, D. Gao, F. Xie, J. Li, X. Song, and G. Shi, “SG2SC: a generative semantic communication framework for scene understanding- oriented image transmission,” inIEEE Int. Conf. on Acoustics, Speech and Signal Process. (ICASSP), pp. 13486–13490, 2024

  51. [51]

    DMCE: Diffusion model channel enhancer for multi-user semantic communication systems,

    Y . Zeng, X. He, X. Chen, H. Tong, Z. Yang, Y . Guo, and J. Hao, “DMCE: Diffusion model channel enhancer for multi-user semantic communication systems,” 2024

  52. [52]

    Generative joint source-channel coding for semantic image transmission,

    E. Erdemir, T.-Y . Tung, P. L. Dragotti, and D. Gunduz, “Generative joint source-channel coding for semantic image transmission,”ArXiv preprint: arXiv:2211.13772, 2022

  53. [53]

    V AE for joint source-channel coding of distributed gaussian sources over AWGN MAC,

    Y . Malur Saidutta, A. Abdi, and F. Fekri, “V AE for joint source-channel coding of distributed gaussian sources over AWGN MAC,” inIEEE Int. Workshop on Signal Processing Advances in Wireless Comm. (SPA WC), pp. 1–5, 2020

  54. [54]

    A variational auto-encoder approach for im- age transmission in wireless channel,

    A. H. Estiri, M. R. Sabramooz, A. Banaei, A. H. Dehghan, B. Jamialah- madi, and M. J. Siavoshani, “A variational auto-encoder approach for im- age transmission in wireless channel,”arXiv preprint: arXiv:2010.03967, 2020

  55. [55]

    Generative ai-enabled semantic communication: State-of-the-art, applications, and the way ahead,

    C. Liang and D. Li, “Generative ai-enabled semantic communication: State-of-the-art, applications, and the way ahead,”IEEE Communica- tions Surveys & Tutorials, vol. 28, pp. 3976–4015, 2026

  56. [56]

    Goldsmith,Wireless Communications

    A. Goldsmith,Wireless Communications. Cambridge University Press, 2005

  57. [57]

    Zero-shot image restoration using denoising diffusion null-space model,

    Y . Wang, J. Yu, and J. Zhang, “Zero-shot image restoration using denoising diffusion null-space model,”Int. Conf. on Learning Repr . (ICLR), 2023

  58. [58]

    Tse and P

    D. Tse and P. Viswanath,Fundamentals of Wireless Communication. Cambridge University Press, 2005

  59. [59]

    Deep joint source channel coding for wirelessimage transmission with ofdm,

    M. Yang, C. Bian, and H.-S. Kim, “Deep joint source channel coding for wirelessimage transmission with ofdm,” inInt. Conf. on Comm. (ICC), 2021

  60. [60]

    The unreasonable effectiveness of deep features as a perceptual metric,

    R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2018

  61. [61]

    Extreme image compression using fine-tuned VQGANs,

    Q. Mao, T. Yang, Y . Zhang, Z. Wang, M. Wang, S. Wang, and S. Ma, “Extreme image compression using fine-tuned VQGANs,” 2023

  62. [62]

    SDEdit: Guided image synthesis and editing with stochastic differential equations,

    C. Meng, Y . He, Y . Song, J. Song, J. Wu, J.-Y . Zhu, and S. Ermon, “SDEdit: Guided image synthesis and editing with stochastic differential equations,” inInt. Conf. on Learning Repr . (ICLR), 2022

  63. [63]

    Large scale GAN training for high fidelity natural image synthesis,

    A. Brock, J. Donahue, and K. Simonyan, “Large scale GAN training for high fidelity natural image synthesis,” 2019

  64. [64]

    Score-based generative modeling through stochastic differ- ential equations,

    Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differ- ential equations,” inInt. Conf. on Learning Repr . (ICLR), 2021