Evolving Token Communication with Parametric Memory Network

Qianqian Yang; Weixuan Chen

arxiv: 2605.01869 · v1 · submitted 2026-05-03 · 💻 cs.IT · cs.CV· math.IT

Evolving Token Communication with Parametric Memory Network

Weixuan Chen , Qianqian Yang This is my paper

Pith reviewed 2026-05-09 16:04 UTC · model grok-4.3

classification 💻 cs.IT cs.CVmath.IT

keywords semantic token communicationparametric memory networkMIMO fading channelstoken codebookkNN teacher distributionsGPT-2 recovery moduleevolving systemPSNR improvement

0 comments

The pith

Transmitting only prefixes of semantic tokens and recovering the rest with a parametric memory network reduces overhead in MIMO wireless communication.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that semantic communication efficiency can be improved by sending equal-length prefixes of tokens instead of full ones, with a receiver-side parametric memory network reconstructing the missing parts. The network uses a fine-tuned GPT-2 model guided by kNN distributions from a token codebook to infer complete tokens. An online evolution updates the system periodically for better adaptation. This matters because it lowers communication costs over fading channels while achieving higher reconstruction quality than prior methods.

Core claim

The authors show that by creating a codebook of full semantic tokens, pairing truncated versions with their labels, and fine-tuning a pretrained GPT-2 recovery module using kNN-based teacher distributions, the parametric memory network can reliably recover complete tokens from prefixes transmitted over MIMO fading channels. Combined with an online evolution strategy that updates the network using new samples, this yields consistent outperformance of existing evolving memory benchmarks across channel conditions and bandwidth ratios, including up to 1.09 dB PSNR gains.

What carries the argument

A parametric memory network that implicitly stores semantic memory in its parameters, fine-tuned to predict the distribution of full tokens from their prefixes using kNN teacher signals.

Load-bearing premise

The GPT-2-based recovery module, when fine-tuned with kNN teacher distributions from the token codebook, can accurately infer missing suffix information from received prefixes even as channel conditions vary.

What would settle it

A test showing that in high-variance MIMO fading or at low bandwidth ratios, the proposed system's PSNR falls below or equals the existing benchmark would falsify the consistent outperformance claim.

Figures

Figures reproduced from arXiv: 2605.01869 by Qianqian Yang, Weixuan Chen.

**Figure 1.** Figure 1: Overall framework of the proposed evolving semantic token communication system with a parametric memory network. view at source ↗

**Figure 2.** Figure 2: PSNR comparison of different approaches under view at source ↗

**Figure 3.** Figure 3: PSNR comparison of different approaches under view at source ↗

read the original abstract

Token communication has emerged as a promising framework for efficient wireless transmission by representing source data as compact semantic tokens. However, transmitting full semantic tokens still incurs considerable communication overhead. In this paper, we propose an evolving semantic token communication system with a parametric memory network over MIMO fading channels. Specifically, only an equal-length prefix of each semantic token is transmitted, which reduces transmission cost while preserving a consistent token structure for receiver-side recovery. At the receiver, a parametric memory network is introduced to reconstruct the missing suffix information from the received token prefixes, where semantic memory is stored implicitly in the network parameters. To realize this design, full semantic tokens are first organized into a codebook, and truncated tokens are paired with the codeword labels of their corresponding full tokens. Based on these token-label pairs, kNN-based teacher distributions are constructed to fine-tune a pretrained GPT-2-based recovery module, which learns to infer the codeword distribution of each incomplete token and recover the corresponding complete semantic token. In addition, an online evolution strategy is developed to periodically update the parametric memory network and the entire system using newly observed test samples, thereby improving adaptability under distribution shifts. Experimental results demonstrate that the proposed method consistently outperforms the existing evolving memory benchmark under different channel conditions and channel bandwidth ratios, with up to 1.09 dB PSNR improvement.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's new angle is prefix-only token transmission plus a GPT-2 parametric memory network for suffix recovery, with online evolution on test samples, but the 1.09 dB PSNR claim needs checking against the clean-vs-noisy training mismatch.

read the letter

The core idea is to send only equal-length prefixes of semantic tokens over MIMO fading channels, then let a fine-tuned GPT-2 recover the missing suffixes by treating the network weights as the memory store. They build a codebook of full tokens, create kNN teacher distributions from truncated pairs, fine-tune the GPT-2 on those, and add periodic updates using new test samples to handle shifts. This setup keeps token structure intact while cutting transmission length, which is the main practical hook for wireless token systems.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes an evolving semantic token communication system over MIMO fading channels in which only equal-length prefixes of semantic tokens are transmitted to reduce overhead. A parametric memory network, realized by fine-tuning a pretrained GPT-2 model on kNN-derived teacher distributions from clean truncated token-label pairs, is used at the receiver to recover missing suffixes. An online evolution strategy periodically updates the network with new test samples. The abstract asserts that the method consistently outperforms an existing evolving-memory benchmark under varied channel conditions and bandwidth ratios, with a maximum PSNR gain of 1.09 dB.

Significance. If the performance claims are substantiated with proper experimental controls, the work would offer a concrete mechanism for trading transmission rate against learned semantic recovery in token-based communication, potentially improving efficiency in bandwidth-limited wireless settings while maintaining adaptability via parameter updates.

major comments (2)

Abstract: the central claim of consistent outperformance with up to 1.09 dB PSNR improvement is stated without any description of the experimental setup, baselines, number of trials, error bars, or statistical tests, rendering the quantitative result unverifiable from the provided text.
Abstract: the GPT-2 recovery module is fine-tuned exclusively on clean truncated tokens paired with full-token codeword labels via kNN teacher distributions, yet the receiver input consists of equal-length prefixes that have passed through MIMO fading and additive noise. No indication is given that the channel model (fading coefficients, SNR, bandwidth ratio) is reflected in the training distribution, which directly threatens the attribution of any observed gains to the parametric memory network itself.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will incorporate clarifications and expansions into the revised version to improve verifiability and transparency.

read point-by-point responses

Referee: Abstract: the central claim of consistent outperformance with up to 1.09 dB PSNR improvement is stated without any description of the experimental setup, baselines, number of trials, error bars, or statistical tests, rendering the quantitative result unverifiable from the provided text.

Authors: We agree that the abstract would benefit from additional context to make the performance claims verifiable at a glance. In the revised manuscript, we will expand the abstract to briefly describe the MIMO fading channel model, the considered SNR ranges and bandwidth ratios, the evolving-memory benchmark used for comparison, and note that results are averaged over multiple independent trials with reported standard deviations. The full experimental configuration, including any statistical details, remains in Section IV. This change will strengthen the abstract without exceeding length constraints. revision: yes
Referee: Abstract: the GPT-2 recovery module is fine-tuned exclusively on clean truncated tokens paired with full-token codeword labels via kNN teacher distributions, yet the receiver input consists of equal-length prefixes that have passed through MIMO fading and additive noise. No indication is given that the channel model (fading coefficients, SNR, bandwidth ratio) is reflected in the training distribution, which directly threatens the attribution of any observed gains to the parametric memory network itself.

Authors: We acknowledge the distinction between training and inference distributions. The parametric memory network is intentionally trained on clean truncated token pairs to learn robust semantic mappings via kNN teacher distributions, allowing the GPT-2 module to capture the underlying token structure independently of channel effects. At inference, it processes noisy received prefixes, with the online evolution strategy enabling periodic updates using actual test samples that incorporate the MIMO channel, fading, and noise. We will revise the manuscript to explicitly clarify this design choice and the end-to-end evaluation protocol, ensuring the attribution of gains to the learned semantic recovery is transparent. If space allows, we will also add a short discussion on potential benefits of noisy training data. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical system design with external benchmarking

full rationale

The paper presents an engineering proposal for an evolving semantic token communication system using a parametric memory network and GPT-2 fine-tuning. The core steps—codebook construction from full tokens, kNN teacher distribution generation from truncated token-label pairs, fine-tuning of a pretrained recovery module, and online evolution on observed samples—are described as a practical pipeline whose performance is measured experimentally against an external evolving-memory benchmark. No derivation, prediction, or uniqueness claim is asserted that reduces by construction to its own fitted inputs or self-citations. The reported PSNR gains are empirical outcomes, not logically forced by the method's definition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claim rests on the unstated premise that the parametric memory network can generalize suffix reconstruction from the constructed teacher distributions; no free parameters, axioms, or invented entities are explicitly quantified in the abstract.

invented entities (1)

parametric memory network no independent evidence
purpose: implicit storage of semantic memory in network parameters for suffix reconstruction
Introduced as the core recovery mechanism without external validation or falsifiable prediction outside the paper.

pith-pipeline@v0.9.0 · 5532 in / 1062 out tokens · 38894 ms · 2026-05-09T16:04:25.334811+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

[1]

Intellicise wireless networks from semantic communications: A survey, research issues, and challenges,

P. Zhang, W. Xu, Y . Liu, X. Qin, K. Niu, S. Cui, G. Shi, Z. Qin, X. Xu, F. Wang, Y . Meng, C. Dong, J. Dai, Q. Yang, Y . Sun, D. Gao, H. Gao, S. Han, and X. Song, “Intellicise wireless networks from semantic communications: A survey, research issues, and challenges,” IEEE Commun. Surv. Tutorials, vol. 27, no. 3, pp. 2051–2084, Jul. 2025

work page 2051
[2]

Token communications: A large model-driven framework for cross-modal context-aware semantic communications,

L. Qiao, M. B. Mashhadi, Z. Gao, R. Tafazolli, M. Bennis, and D. Niy- ato, “Token communications: A large model-driven framework for cross-modal context-aware semantic communications,”IEEE Wireless Commun., vol. 32, no. 5, pp. 80–88, Sep. 2025

work page 2025
[3]

Communication- efficient framework for distributed image semantic wireless transmis- sion,

B. Xie, Y . Wu, Y . Shi, D. W. K. Ng, and W. Zhang, “Communication- efficient framework for distributed image semantic wireless transmis- sion,”IEEE Internet Things J., vol. 10, no. 24, pp. 22 555–22 568, Aug. 2023

work page 2023
[4]

Semantic entropy can simultaneously benefit transmission efficiency and channel security of wireless semantic communications,

Y . Rong, G. Nan, M. Zhang, S. Chen, S. Wang, X. Zhang, N. Ma, S. Gong, Z. Yang, Q. Cui, X. Tao, and T. Q. S. Quek, “Semantic entropy can simultaneously benefit transmission efficiency and channel security of wireless semantic communications,”IEEE Trans. Inf. Forensics Secur., vol. 20, pp. 2067–2082, Jan. 2025

work page 2067
[5]

Adaptive semantic token communication for transformer-based edge inference,

A. Devoto, J. Pomponi, M. Merluzzi, P. D. Lorenzo, and S. Scardapane, “Adaptive semantic token communication for transformer-based edge inference,”IEEE Trans. Mach. Learn. Commun. Netw., vol. 4, pp. 422– 437, Jan. 2026

work page 2026
[6]

Joint semantic- channel coding and modulation for token communications,

J. Ying, Z. Qin, Y . Feng, L. Wang, and X. Tao, “Joint semantic- channel coding and modulation for token communications,”IEEE Trans. Wireless Commun., vol. 25, pp. 8179–8193, Dec. 2025

work page 2025
[7]

Semantic communication with memory,

H. Xie, Z. Qin, and G. Y . Li, “Semantic communication with memory,” IEEE J. Sel. Areas Commun., vol. 41, no. 8, pp. 2658–2669, Jun. 2023

work page 2023
[8]

Evolving semantic communication with generative modelling,

S. Tang, Q. Yang, D. G ¨und¨uz, and Z. Zhang, “Evolving semantic communication with generative modelling,” inProc. IEEE PIMRC, Valencia, Spain, Sep. 2024, pp. 1–6

work page 2024
[9]

Knowledge enhanced semantic communication receiver,

B. Wang, R. Li, J. Zhu, Z. Zhao, and H. Zhang, “Knowledge enhanced semantic communication receiver,”IEEE Commun. Lett., vol. 27, no. 7, pp. 1794–1798, May 2023

work page 2023
[10]

Cognitive semantic communication systems driven by knowledge graph: Principle, implementation, and performance evaluation,

F. Zhou, Y . Li, M. Xu, L. Yuan, Q. Wu, R. Q. Hu, and N. Al-Dhahir, “Cognitive semantic communication systems driven by knowledge graph: Principle, implementation, and performance evaluation,”IEEE Trans. Commun., vol. 72, no. 1, pp. 193–208, Sep. 2023

work page 2023
[11]

Memory decoder: A pretrained, plug-and-play memory for large language models

J. Cao, J. Wang, R. Wei, Q. Guo, K. Chen, B. Zhou, and Z. Lin, “Memory decoder: A pretrained, plug-and-play memory for large language models,”arXiv:2508.09874v2 [cs.CL], Oct. 2025

work page arXiv 2025
[12]

Swinjscc: Taming swin transformer for deep joint source-channel coding,

K. Yang, S. Wang, J. Dai, X. Qin, K. Niu, and P. Zhang, “Swinjscc: Taming swin transformer for deep joint source-channel coding,”IEEE Trans. Cogn. Commun. Netw., vol. 11, no. 1, pp. 90–104, Feb. 2025

work page 2025
[13]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inProc. IEEE/CVF ICCV , Montreal, QC, Canada, Oct. 2021, pp. 9992–10 002

work page 2021
[14]

A style-based generator architecture for generative adversarial networks,

T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 12, pp. 4217–4228, Dec. 2021

work page 2021

[1] [1]

Intellicise wireless networks from semantic communications: A survey, research issues, and challenges,

P. Zhang, W. Xu, Y . Liu, X. Qin, K. Niu, S. Cui, G. Shi, Z. Qin, X. Xu, F. Wang, Y . Meng, C. Dong, J. Dai, Q. Yang, Y . Sun, D. Gao, H. Gao, S. Han, and X. Song, “Intellicise wireless networks from semantic communications: A survey, research issues, and challenges,” IEEE Commun. Surv. Tutorials, vol. 27, no. 3, pp. 2051–2084, Jul. 2025

work page 2051

[2] [2]

Token communications: A large model-driven framework for cross-modal context-aware semantic communications,

L. Qiao, M. B. Mashhadi, Z. Gao, R. Tafazolli, M. Bennis, and D. Niy- ato, “Token communications: A large model-driven framework for cross-modal context-aware semantic communications,”IEEE Wireless Commun., vol. 32, no. 5, pp. 80–88, Sep. 2025

work page 2025

[3] [3]

Communication- efficient framework for distributed image semantic wireless transmis- sion,

B. Xie, Y . Wu, Y . Shi, D. W. K. Ng, and W. Zhang, “Communication- efficient framework for distributed image semantic wireless transmis- sion,”IEEE Internet Things J., vol. 10, no. 24, pp. 22 555–22 568, Aug. 2023

work page 2023

[4] [4]

Semantic entropy can simultaneously benefit transmission efficiency and channel security of wireless semantic communications,

Y . Rong, G. Nan, M. Zhang, S. Chen, S. Wang, X. Zhang, N. Ma, S. Gong, Z. Yang, Q. Cui, X. Tao, and T. Q. S. Quek, “Semantic entropy can simultaneously benefit transmission efficiency and channel security of wireless semantic communications,”IEEE Trans. Inf. Forensics Secur., vol. 20, pp. 2067–2082, Jan. 2025

work page 2067

[5] [5]

Adaptive semantic token communication for transformer-based edge inference,

A. Devoto, J. Pomponi, M. Merluzzi, P. D. Lorenzo, and S. Scardapane, “Adaptive semantic token communication for transformer-based edge inference,”IEEE Trans. Mach. Learn. Commun. Netw., vol. 4, pp. 422– 437, Jan. 2026

work page 2026

[6] [6]

Joint semantic- channel coding and modulation for token communications,

J. Ying, Z. Qin, Y . Feng, L. Wang, and X. Tao, “Joint semantic- channel coding and modulation for token communications,”IEEE Trans. Wireless Commun., vol. 25, pp. 8179–8193, Dec. 2025

work page 2025

[7] [7]

Semantic communication with memory,

H. Xie, Z. Qin, and G. Y . Li, “Semantic communication with memory,” IEEE J. Sel. Areas Commun., vol. 41, no. 8, pp. 2658–2669, Jun. 2023

work page 2023

[8] [8]

Evolving semantic communication with generative modelling,

S. Tang, Q. Yang, D. G ¨und¨uz, and Z. Zhang, “Evolving semantic communication with generative modelling,” inProc. IEEE PIMRC, Valencia, Spain, Sep. 2024, pp. 1–6

work page 2024

[9] [9]

Knowledge enhanced semantic communication receiver,

B. Wang, R. Li, J. Zhu, Z. Zhao, and H. Zhang, “Knowledge enhanced semantic communication receiver,”IEEE Commun. Lett., vol. 27, no. 7, pp. 1794–1798, May 2023

work page 2023

[10] [10]

Cognitive semantic communication systems driven by knowledge graph: Principle, implementation, and performance evaluation,

F. Zhou, Y . Li, M. Xu, L. Yuan, Q. Wu, R. Q. Hu, and N. Al-Dhahir, “Cognitive semantic communication systems driven by knowledge graph: Principle, implementation, and performance evaluation,”IEEE Trans. Commun., vol. 72, no. 1, pp. 193–208, Sep. 2023

work page 2023

[11] [11]

Memory decoder: A pretrained, plug-and-play memory for large language models

J. Cao, J. Wang, R. Wei, Q. Guo, K. Chen, B. Zhou, and Z. Lin, “Memory decoder: A pretrained, plug-and-play memory for large language models,”arXiv:2508.09874v2 [cs.CL], Oct. 2025

work page arXiv 2025

[12] [12]

Swinjscc: Taming swin transformer for deep joint source-channel coding,

K. Yang, S. Wang, J. Dai, X. Qin, K. Niu, and P. Zhang, “Swinjscc: Taming swin transformer for deep joint source-channel coding,”IEEE Trans. Cogn. Commun. Netw., vol. 11, no. 1, pp. 90–104, Feb. 2025

work page 2025

[13] [13]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inProc. IEEE/CVF ICCV , Montreal, QC, Canada, Oct. 2021, pp. 9992–10 002

work page 2021

[14] [14]

A style-based generator architecture for generative adversarial networks,

T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 12, pp. 4217–4228, Dec. 2021

work page 2021