TONIC: Token-Centric Semantic Communication for Task-Oriented Wireless Systems
Pith reviewed 2026-05-22 00:37 UTC · model grok-4.3
The pith
A token-centric wireless framework protects the most task-relevant tokens and repairs unreliable ones with a receiver completion model.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TONIC converts each source sample into tokens, estimates token-level task relevance at the transmitter to apply utility-aware unequal error protection, and at the receiver uses token confidence to gate unreliable outputs into recoverable erasures that a Transformer-based completion model then restores for final task inference, yielding higher accuracy than baselines under matched communication budgets.
What carries the argument
Utility-aware unequal error protection at the transmitter combined with confidence-aware gating and Transformer-based token completion at the receiver.
If this is right
- Task accuracy rises when protection strength tracks token utility instead of treating all tokens equally.
- Receiver gating converts likely errors into correctable erasures before completion occurs.
- The modular split between protection, gating, and completion supports separate tuning of each part.
- Gains persist over AWGN, Rayleigh, and Rician channels when total channel uses are held constant.
- The approach outperforms both classical separation methods and end-to-end pixel or token baselines on image classification.
Where Pith is reading between the lines
- The same token-relevance idea could be tested on text or multimodal tasks where foundation models already operate on discrete units.
- Lowering the need for perfect bit recovery might reduce transmit power in energy-constrained edge devices.
- A direct test would compare end-to-end latency when the completion model runs on-device versus offloading it.
Load-bearing premise
That token-level task relevance can be estimated accurately enough at the transmitter and that the Transformer completion model can reliably restore the masked tokens after gating.
What would settle it
An experiment in which replacing the relevance-based protection with uniform allocation or removing the completion model causes the performance advantage to vanish across the tested channels.
Figures
read the original abstract
Tokens are becoming the basic units through which foundation models represent and process information for understanding and inference. However, traditional wireless communication, centered on bit-level fidelity, faces a mismatch between what is transmitted reliably and what downstream models actually consume. This mismatch calls for a communication design that directly accounts for token-level task relevance and downstream model requirements, rather than treating all transmitted bits as equally important. In this paper, we propose TONIC, a token-centric semantic communication framework for task-oriented wireless systems. The transmitter converts each source sample into a sequence of tokens, estimates token-level task relevance, and allocates protection through utility-aware unequal error protection under a fixed channel-use budget. At the receiver, token-level confidence is used to gate unreliable decisions, turning harmful substitutions into recoverable erasures before a Transformer-based completion model restores the masked tokens for final task inference. Our framework combines transmitter-side semantic-aware protection with receiver-side confidence-aware gating in a modular and interpretable architecture, rather than relying solely on fully black-box end-to-end learning. We further establish a utility-aware Bayes-risk interpretation for the receiver-side gating rule and study its interaction with unequal protection and completion. Experimental results on image classification show that TONIC consistently outperforms separation-based schemes, the pixel-domain DeepJSCC baseline, and token-domain baselines under matched communication budgets over AWGN, Rayleigh, and Rician channels.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes TONIC, a token-centric semantic communication framework for task-oriented wireless systems. The transmitter converts source samples into token sequences, estimates token-level task relevance, and applies utility-aware unequal error protection under a fixed channel-use budget. At the receiver, token-level confidence gates unreliable decisions (converting substitutions to erasures), after which a Transformer-based completion model restores masked tokens for downstream task inference. The framework is presented as modular and interpretable, with a utility-aware Bayes-risk interpretation of the gating rule. Experiments on image classification tasks report that TONIC consistently outperforms separation-based schemes, pixel-domain DeepJSCC, and token-domain baselines under matched budgets over AWGN, Rayleigh, and Rician channels.
Significance. If the empirical results hold, TONIC advances semantic communication by aligning transmission protection with token-level task relevance and downstream model needs, offering a modular alternative to fully end-to-end learned systems. The explicit utility-aware Bayes-risk interpretation for gating and its interaction with unequal protection provide theoretical grounding that could aid reproducibility and extension. Strengths include the interpretable architecture and evaluation across multiple channel models with matched communication budgets.
major comments (2)
- [§5] §5 (Experimental Results): The central claim of consistent outperformance over baselines is load-bearing, yet the provided description remains high-level without specific quantitative metrics (e.g., accuracy deltas, SNR points), error bars, dataset details, or ablation results on the gating and completion modules. This prevents verification of the magnitude and robustness of gains.
- [Receiver-side gating] Receiver-side gating and Bayes-risk interpretation (around the utility-aware rule): The interpretation is presented as grounding the gating decision, but it is unclear whether the derivation accounts for estimation errors in transmitter-side token relevance or assumes perfect relevance knowledge; if the latter, this could undermine optimality under realistic channel and estimation conditions.
minor comments (2)
- [Abstract] Abstract: While concise, it could briefly note the specific image classification datasets and task metrics used to give readers immediate context for the reported outperformance.
- [Notation] Notation and figures: Ensure consistent symbols for token relevance scores, confidence thresholds, and channel-use budgets across text, equations, and diagrams to improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive review and for recognizing the potential of TONIC to advance semantic communication through its token-centric and modular design. We address each major comment below and indicate the corresponding revisions.
read point-by-point responses
-
Referee: [§5] §5 (Experimental Results): The central claim of consistent outperformance over baselines is load-bearing, yet the provided description remains high-level without specific quantitative metrics (e.g., accuracy deltas, SNR points), error bars, dataset details, or ablation results on the gating and completion modules. This prevents verification of the magnitude and robustness of gains.
Authors: We agree that the current presentation of results in §5 is high-level and that additional quantitative detail is required for verification. In the revised manuscript we will expand §5 to report concrete accuracy deltas (e.g., percentage-point gains over each baseline at representative SNR values), error bars obtained from repeated trials, full dataset specifications, and ablation studies that isolate the contributions of the gating rule and the completion model. revision: yes
-
Referee: [Receiver-side gating] Receiver-side gating and Bayes-risk interpretation (around the utility-aware rule): The interpretation is presented as grounding the gating decision, but it is unclear whether the derivation accounts for estimation errors in transmitter-side token relevance or assumes perfect relevance knowledge; if the latter, this could undermine optimality under realistic channel and estimation conditions.
Authors: The Bayes-risk derivation is presented under the modeling assumption that token relevance is known when analyzing the optimality of the gating threshold. In the implemented system, relevance is estimated from the source. We will revise the manuscript to explicitly state this modeling assumption, discuss its implications for realistic estimation error, and add a brief sensitivity analysis or performance bound that quantifies degradation under imperfect relevance estimates. revision: partial
Circularity Check
No significant circularity
full rationale
The paper presents a modular token-centric semantic communication framework consisting of transmitter-side token relevance estimation with unequal error protection and receiver-side confidence gating plus Transformer-based token completion. No equations, derivations, or parameter-fitting steps are described in the abstract or framework overview that reduce by construction to the inputs or to self-citations. Experimental claims of outperformance are based on comparisons against baselines under matched budgets across channels, with no evidence that results are forced by definition or by load-bearing self-citation chains. The architecture is presented as interpretable and independent of fully end-to-end black-box learning.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdv. Neural Inf. Process. Syst., 2017, pp. 5998–6008
work page 2017
-
[2]
Forget bit, it is all about token: Towards semantic information theory for llms,
B. Bai, “Forget bit, it is all about token: Towards semantic information theory for llms,” 2025, technical report
work page 2025
-
[3]
An image is worth 16x16 words: Transformers for image recognition at scale,
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” inProc. Int. Conf. Learn. Represent. (ICLR), 2021
work page 2021
-
[4]
Neural discrete representation learning,
A. van den Oord, O. Vinyals, and K. Kavukcuoglu, “Neural discrete representation learning,” inAdv. Neural Inf. Process. Syst., 2017, pp. 6306–6315
work page 2017
-
[5]
Taming transformers for high- resolution image synthesis,
P. Esser, R. Rombach, and B. Ommer, “Taming transformers for high- resolution image synthesis,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 12 873–12 883
work page 2021
-
[6]
Token communication in the era of large models: An information bottleneck- based approach,
H. Wei, W. Ni, W. Wang, W. Xu, D. Niyato, and P. Zhang, “Token communication in the era of large models: An information bottleneck- based approach,”IEEE Wireless Commun. Lett., vol. 15, pp. 186–190, Oct. 2026
work page 2026
-
[7]
ToDMA: Large model-driven token-domain multiple access for semantic commu- nications,
L. Qiao, M. B. Mashhadi, Z. Gao, R. Schober, and D. G ¨und¨uz, “ToDMA: Large model-driven token-domain multiple access for semantic commu- nications,” May 2025
work page 2025
-
[8]
A mathematical theory of communication,
C. E. Shannon, “A mathematical theory of communication,”Bell Syst. Tech. J., vol. 27, no. 3–4, pp. 379–423, 623–656, 1948
work page 1948
-
[9]
Beyond transmitting bits: Context, seman- tics, and task-oriented communications,
D. G ¨und¨uz, Z. Qin, I. E. Aguerri, H. S. Dhillon, Z. Yang, A. Yener, K.- K. Wong, and C.-B. Chae, “Beyond transmitting bits: Context, seman- tics, and task-oriented communications,”IEEE J. Sel. Areas Commun., vol. 41, no. 1, pp. 5–41, Jan. 2023
work page 2023
-
[10]
Semantic communication: A survey of its theoretical development,
G. Xin, P. Fan, and K. B. Letaief, “Semantic communication: A survey of its theoretical development,”Entropy, vol. 26, no. 2, p. 102, 2024
work page 2024
-
[11]
Maskgit: Masked generative image transformer,
H. Chang, H. Zhang, L. Jiang, C. Liu, and W. T. Freeman, “Maskgit: Masked generative image transformer,” inProc. IEEE/CVF Conf. Com- put. Vis. Pattern Recognit. (CVPR), 2022, pp. 11 315–11 325
work page 2022
-
[12]
High-resolution image synthesis with latent diffusion models,
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 10 684–10 695
work page 2022
-
[13]
Diffusion-driven semantic communication for generative models with bandwidth constraints,
L. Guo, W. Chen, Y . Sun, B. Ai, N. Pappas, and T. Q. S. Quek, “Diffusion-driven semantic communication for generative models with bandwidth constraints,”IEEE Trans. Wireless Commun., vol. 24, no. 8, pp. 6490–6503, Aug. 2025
work page 2025
-
[14]
Deep learning enabled semantic communication systems,
H. Xie, Z. Qin, G. Y . Li, and B.-H. Juang, “Deep learning enabled semantic communication systems,”IEEE Trans. Signal Process., vol. 69, pp. 2663–2675, Apr. 2021
work page 2021
-
[15]
Learning task-oriented communication for edge inference: An information bottleneck approach,
J. Shao, Y . Mao, and J. Zhang, “Learning task-oriented communication for edge inference: An information bottleneck approach,”IEEE J. Sel. Areas Commun., vol. 40, no. 1, pp. 197–211, Jan. 2022
work page 2022
-
[16]
Semantic communication systems for speech transmission,
Z. Weng and Z. Qin, “Semantic communication systems for speech transmission,”IEEE J. Sel. Areas Commun., vol. 39, no. 8, pp. 2434– 2444, Aug. 2021
work page 2021
-
[17]
Task-oriented multi-user semantic communications for VQA task,
H. Xie, Z. Qin, and G. Y . Li, “Task-oriented multi-user semantic communications for VQA task,”IEEE Wireless Commun. Lett., vol. 11, no. 3, pp. 553–557, 2022
work page 2022
-
[18]
Task-oriented explainable semantic communications,
S. Ma, W. Qiao, Y . Wu, H. Li, G. Shi, D. Gao, Y . Shi, S. Li, and N. Al-Dhahir, “Task-oriented explainable semantic communications,” IEEE Trans. Wireless Commun., vol. 22, no. 12, pp. 9248–9262, 2023
work page 2023
-
[19]
Deep joint source- channel coding for wireless image transmission,
E. Bourtsoulatze, D. B. Kurka, and D. G”und”uz, “Deep joint source- channel coding for wireless image transmission,”IEEE Trans. Cogn. Commun. Netw., vol. 5, no. 3, pp. 567–579, 2019
work page 2019
-
[20]
Bandwidth-agile image transmission with deep joint source-channel coding,
D. B. Kurka and D. G”und”uz, “Bandwidth-agile image transmission with deep joint source-channel coding,”IEEE Trans. Wireless Commun., vol. 20, no. 12, pp. 8081–8095, 2021
work page 2021
-
[21]
Deepjscc- q: Constellation constrained deep joint source-channel coding,
T.-Y . Tung, D. B. Kurka, M. Jankowski, and D. G”und”uz, “Deepjscc- q: Constellation constrained deep joint source-channel coding,”IEEE J. Sel. Areas Inf. Theory, vol. 3, no. 4, pp. 720–731, 2022
work page 2022
-
[22]
Deep joint source-channel coding for wireless image transmission with OFDM,
M. Yang, C. Bian, and H.-S. Kim, “Deep joint source-channel coding for wireless image transmission with OFDM,” inProc. IEEE Int. Conf. Commun. (ICC), 2021, pp. 1–6
work page 2021
-
[23]
Swinjscc: Taming Swin transformer for deep joint source-channel coding,
K. Yang, S. Wang, J. Dai, X. Qin, K. Niu, and P. Zhang, “Swinjscc: Taming Swin transformer for deep joint source-channel coding,”IEEE Trans. Cogn. Commun. Netw., vol. 11, no. 1, pp. 90–104, 2025
work page 2025
-
[24]
Joint semantic-channel coding and modulation for token communications,
J. Ying, Z. Qin, Y . Feng, L. Wang, and X. Tao, “Joint semantic-channel coding and modulation for token communications,”IEEE Trans. Wireless Commun., vol. 25, pp. 8179–8193, 2026
work page 2026
-
[25]
Large model empowered multi-modal semantic communication with selective tokens for training,
J. Peng, H. Xing, Z. Xiao, L. Xu, and X. Lei, “Large model empowered multi-modal semantic communication with selective tokens for training,” IEEE Signal Process. Lett., vol. 32, pp. 2967–2971, 2025
work page 2025
-
[26]
Federated learning-enabled hybrid language models for communication-efficient token transmission,
F. Solat, J. Lee, M. Seif, D. Niyato, and H. V . Poor, “Federated learning-enabled hybrid language models for communication-efficient token transmission,”IEEE Internet Things J., vol. 12, no. 24, pp. 53 574– 53 592, 2025
work page 2025
-
[27]
D2-jscc: Digital deep joint source-channel coding for semantic communications,
J. Huang, K. Yuan, C. Huang, and K. Huang, “D2-jscc: Digital deep joint source-channel coding for semantic communications,” inProc. IEEE Int. Symp. Pers., Indoor, Mobile Radio Commun. (PIMRC), 2024, pp. 1–7
work page 2024
-
[28]
Process- and-forward: Deep joint source-channel coding over cooperative relay networks,
C. Bian, Y . Shao, H. Wu, E. Ozfatura, and D. G”und”uz, “Process- and-forward: Deep joint source-channel coding over cooperative relay networks,”IEEE J. Sel. Areas Commun., vol. 43, no. 4, pp. 1118–1134, 2025
work page 2025
-
[29]
Attention-driven semantic transmission scheme for AI-native wireless communications,
K.-H. Lee, H.-H. Choi, and J.-R. Lee, “Attention-driven semantic transmission scheme for AI-native wireless communications,”IEEE Commun. Lett., vol. 30, pp. 287–291, 2026
work page 2026
-
[30]
Language-oriented semantic communication for image transmission with fine-tuned diffusion model,
X. Wei, H. Tong, N. Yang, and C. Yin, “Language-oriented semantic communication for image transmission with fine-tuned diffusion model,” inProc. 16th Int. Conf. Wireless Commun. Signal Process. (WCSP), 2024
work page 2024
-
[31]
Generative semantic communication for joint image transmission and segmentation,
W. Yuan, J. Ren, C. Wang, R. Zhang, J. Wei, D. I. Kim, and S. Cui, “Generative semantic communication for joint image transmission and segmentation,” inProc. IEEE Int. Conf. Commun. Workshops (ICC Workshops), 2025, pp. 1110–1115
work page 2025
-
[32]
C. Xu, M. B. Mashhadi, Y . Ma, R. Tafazolli, and J. Wang, “Gen- erative semantic communications with foundation models: Perception- error analysis and semantic-aware power allocation,”IEEE J. Sel. Areas Commun., vol. 43, no. 7, pp. 2493–2505, 2025
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.