arxiv: 2604.23691 · v1 · submitted 2026-04-26 · 📡 eess.SP

Recognition: unknown

Intention-Aware Semantic Agent Communications for AI Glasses

Peiwen Jiang , Fangyu Liu , Jiajia Guo , Chao-Kai Wen , Shi Jin , Jun Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-08 05:29 UTC · model grok-4.3

classification 📡 eess.SP

keywords intention-aware communicationsemantic transmissionAI glassesvision-language modelsbandwidth reductionedge computingwearable AIsemantic communications

0 comments

The pith

AI glasses use intention inference to cut bandwidth by over 50% while preserving task performance

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to establish that guiding wireless transmission by inferred user intention rather than raw image fidelity allows AI glasses to send far less data to cloud vision-language models. The glasses act as an edge agent that adaptively keeps only textual content, document layout, or object semantics based on what the server deduces the user wants from current content and prior prompts. A sympathetic reader would care because continuous high-resolution visual offloading quickly exhausts battery and spectrum on wearables, making always-available AI assistance impractical today. If the claim holds, communication resources become task-aligned instead of pixel-aligned, enabling longer operation and more responsive interactions. The authors demonstrate this through three scenarios where intention-aware preprocessing yields the reported savings and shows resilience when signal quality drops.

Core claim

The central claim is that an intention-aware semantic agent communication framework lets AI glasses preprocess and transmit visual data selectively according to server-inferred user intentions. The server-side VLM deduces intent from the partially transmitted content plus historical prompts, prompting the glasses to preserve only the relevant semantics (text, layout, or objects) via lightweight local tools. Across three representative scenarios this approach delivers more than 50% bandwidth reduction while task performance stays intact, and the semantic stream degrades gracefully rather than failing under low signal-to-noise ratios.

What carries the argument

Intention inference plus adaptive semantic preservation, in which the server VLM identifies the needed elements from context and the glasses apply lightweight filters to retain only those elements before transmission.

If this is right

Task-specific bandwidth consumption falls by more than half depending on the application.
Downstream VLM performance holds steady even though far fewer pixels are sent.
Transmission quality declines gradually with worsening channel conditions instead of collapsing.
The same principle supports continuous real-time assistance on power- and spectrum-limited wearables.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could extend to audio or depth sensors by inferring intent across multiple modalities.
Sending less raw visual data may reduce privacy exposure of complete scenes.
Pairing the method with lightweight on-device intention models could lower cloud dependency further.

Load-bearing premise

The server VLM can accurately infer the user's current intention from the partially transmitted content and past prompts, and that keeping only selected semantic elements is sufficient to maintain downstream task accuracy.

What would settle it

An experiment in which the inferred intention repeatedly causes the wrong semantic elements to be preserved, resulting in clear drops in VLM task accuracy relative to full-image baselines, would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.23691 by Chao-Kai Wen, Fangyu Liu, Jiajia Guo, Jun Zhang, Peiwen Jiang, Shi Jin.

**Figure 1.** Figure 1: Wireless communication framework between the AI glasses and the cloud server. The voice input or intention-aware view at source ↗

**Figure 3.** Figure 3: Three preprocessing tools for different scenarios view at source ↗

**Figure 4.** Figure 4: Robust semantic encoder-decoder in a fully convolutional architecture, where the hidden channel dimensions are 128 view at source ↗

**Figure 5.** Figure 5: Failed example in Case I, where the low-resolution view at source ↗

**Figure 6.** Figure 6: (a) Bandwidth comparison of the different document transmission methods. (b) SR under different preprocessing and view at source ↗

**Figure 7.** Figure 7: Examples under different preprocessing or transmission methods, where the semantic methods can still provide task view at source ↗

**Figure 9.** Figure 9: Task performance and bandwidth cost of the proposed view at source ↗

**Figure 10.** Figure 10: Examples of intention-aware preprocessed image and the competing ones. view at source ↗

read the original abstract

Smart glasses are emerging as a promising interface between humans and artificial intelligence (AI) agents, enabling first-person perception, contextual awareness, and real-time assistance. However, continuous offloading of visual data from wearable devices to cloud-based vision-language models (VLMs) is fundamentally constrained by limited wireless bandwidth and energy resources. This paper proposes an intention-aware semantic agent communication framework for AI glasses, where data transmission is guided by user intention rather than raw pixel fidelity. In the proposed architecture, AI glasses act as an edge semantic agent while a server-side VLM executes high-level cognition and reasoning. The user intention can be inferred by the server-side VLM through the current transmitted content and the historical prompts. Driven by specific user intentions, the glasses adaptively preserve textual content, document layout, or object semantics before transmission. We evaluate three representative scenarios with different lightweight preprocessing tools on the AI glasses. Simulation results demonstrate that intention-aware preprocessing significantly achieves more than 50% bandwidth reduction depending on the current task while maintaining task performance. Moreover, semantic transmission exhibits graceful degradation under low SNRs. The findings demonstrate that aligning communication resources with user intention is essential for robust and efficient wearable AI agent systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a concrete architecture for intention-driven semantic preprocessing on AI glasses that reports over 50% bandwidth savings in simulation, but the inference loop between server and edge looks underspecified.

read the letter

The core contribution is an edge-cloud setup where glasses run lightweight semantic filters tuned to the current user task, while the server VLM infers that task from the partial transmission plus history and feeds the decision back. They run three scenarios—text preservation, document layout, and object semantics—each with simple on-device tools, and the simulations show the claimed bandwidth drop plus graceful behavior when SNR falls. That is the practical piece worth noting: it moves semantic communications from generic compression toward task-specific preservation for a wearable use case that actually exists today.

Referee Report

2 major / 2 minor

Summary. The paper proposes an intention-aware semantic communication framework for AI glasses, where glasses act as edge agents performing adaptive preprocessing (preserving text, layout, or object semantics) based on user intentions inferred by a server-side VLM from the transmitted content plus historical prompts. Simulations in three scenarios demonstrate >50% bandwidth reduction while preserving task performance, plus graceful degradation under low SNR.

Significance. If the central claims hold after addressing protocol details, the work offers a practical step toward bandwidth-efficient wearable AI agents by aligning transmission with inferred intent rather than pixel-level fidelity. The multi-scenario simulation evaluation provides initial evidence of task-dependent savings, which could inform semantic comms designs for resource-limited devices.

major comments (2)

[Architecture/Protocol] Architecture description (likely §2-3): The server infers intention from the already-adapted transmitted content, yet the glasses require intention knowledge to perform the adaptation step beforehand. No bootstrap procedure, separate low-rate intention signaling channel, or iterative exchange protocol is described, creating an unresolved ordering inconsistency that is load-bearing for the framework's feasibility.
[Evaluation/Simulations] Evaluation section (likely §4): The >50% bandwidth reduction claim and graceful degradation result rest on simulations whose setup is insufficiently detailed (no explicit baselines, preprocessing tool parameters, trial counts, error bars, or intention-inference accuracy metrics). Performance under imperfect or erroneous intention inference is not reported, weakening the robustness assertion.

minor comments (2)

[Abstract] Abstract: The three scenarios and lightweight preprocessing tools are referenced but not named or characterized, reducing clarity for readers.
[Throughout] Notation and figures: Ensure consistent use of terms such as 'semantic agent' and 'intention-aware preprocessing' across text and diagrams; add axis labels or legends if any simulation plots lack them.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's insightful comments on our manuscript. We have carefully considered the points raised and provide point-by-point responses below. We agree that additional clarifications and details are needed and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Architecture/Protocol] Architecture description (likely §2-3): The server infers intention from the already-adapted transmitted content, yet the glasses require intention knowledge to perform the adaptation step beforehand. No bootstrap procedure, separate low-rate intention signaling channel, or iterative exchange protocol is described, creating an unresolved ordering inconsistency that is load-bearing for the framework's feasibility.

Authors: We thank the referee for highlighting this important architectural detail. Upon re-examination, we acknowledge that the current description in Sections 2-3 does not explicitly address the initialization of the intention inference process. The framework relies on historical prompts for initial intention estimation, with the current transmitted content used for refinement. However, to resolve the potential ordering issue, we will revise the manuscript to include a detailed bootstrap procedure: the glasses initially transmit a low-resolution or default-preprocessed version of the content (e.g., using a fixed semantic preservation mode based on the last known intention from history), allowing the server to infer the initial intention. This intention is then used for subsequent adaptive preprocessing in an iterative manner if needed. We will also add a figure or pseudocode to illustrate the protocol flow. This revision will be made in the next version. revision: yes
Referee: [Evaluation/Simulations] Evaluation section (likely §4): The >50% bandwidth reduction claim and graceful degradation result rest on simulations whose setup is insufficiently detailed (no explicit baselines, preprocessing tool parameters, trial counts, error bars, or intention-inference accuracy metrics). Performance under imperfect or erroneous intention inference is not reported, weakening the robustness assertion.

Authors: We agree with the referee that the simulation setup in Section 4 requires more detailed description to ensure reproducibility and to strengthen the claims. In the revised manuscript, we will expand the evaluation section to include: explicit description of the baselines used (e.g., standard semantic communication without intention awareness, raw transmission, etc.), specific parameters for the preprocessing tools (such as OCR thresholds, object detection models, compression ratios), number of trials conducted, error bars on the reported bandwidth reduction and performance metrics, and metrics for intention-inference accuracy (e.g., precision/recall of the VLM's intention prediction). Additionally, we will include new simulation results showing performance under imperfect intention inference, such as cases with 10-20% inference error rates, to demonstrate robustness. These additions will support the >50% bandwidth reduction and graceful degradation claims more rigorously. revision: yes

Circularity Check

0 steps flagged

No significant circularity; framework is a simulation-evaluated proposal without self-referential derivations

full rationale

The paper describes an architectural proposal for intention-aware semantic communications in AI glasses, with claims supported by simulation results on bandwidth reduction and graceful degradation. No mathematical derivation chain, equations, or fitted parameters are presented that reduce by construction to the inputs. The intention inference mechanism is stated as using transmitted content plus historical prompts, but this does not create a self-definitional loop or fitted-input prediction; the work remains an engineering framework evaluated externally via simulation rather than internally forced. No self-citations, uniqueness theorems, or ansatzes are invoked in a load-bearing way from the provided text. The derivation is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on domain assumptions about reliable intention inference and semantic preservation; no free parameters or invented physical entities are explicitly introduced in the abstract.

axioms (2)

domain assumption User intention can be reliably inferred by the server VLM from the current transmitted content and historical prompts
Directly stated as the mechanism driving adaptive preprocessing on the glasses.
domain assumption Lightweight preprocessing on the edge can selectively preserve task-relevant semantics without degrading downstream VLM performance
Required for the claimed bandwidth savings while maintaining task performance.

pith-pipeline@v0.9.0 · 5517 in / 1423 out tokens · 27480 ms · 2026-05-08T05:29:47.471816+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

75 extracted references · 21 canonical work pages · 3 internal anchors

[1]

Meta smart glasses—large language models and the future for assistive glasses for individuals with vision impairments,

E. Waisberg, J. Ong, M. Masalkhi, N. Zaman, P. Sarker, A. G. Lee, and A. Tavakkoli, “Meta smart glasses—large language models and the future for assistive glasses for individuals with vision impairments,”Eye, vol. 38, no. 6, pp. 1036–1038, 2024

2024
[2]

A systematic literature review on integrating ai-powered smart glasses into digital health management for proactive healthcare solutions,

B. Wang, Y . Zheng, X. Han, L. Kong, G. Xiao, Z. Xiao, and S. Chen, “A systematic literature review on integrating ai-powered smart glasses into digital health management for proactive healthcare solutions,”NPJ digital medicine, vol. 8, no. 1, p. 410, 2025

2025
[3]

Applying smart glasses in situated exploration for learning english in a national science museum,

H.-R. Chen, W.-S. Lin, T.-Y . Hsu, T.-C. Lin, and N.-S. Chen, “Applying smart glasses in situated exploration for learning english in a national science museum,”IEEE Trans. Learn. Techn., vol. 16, no. 5, pp. 820– 830, 2023

2023
[4]

Enabling mobile AI agent in 6g era: Architecture and key technologies,

Z. Chen, Q. Sun, N. Li, X. Li, Y . Wanget al., “Enabling mobile AI agent in 6g era: Architecture and key technologies,”IEEE Network, vol. 38, no. 5, pp. 66–75, 2024

2024
[5]

AI embodiment through 6g: Shaping the future of agi,

L. Bariah and M. Debbah, “AI embodiment through 6g: Shaping the future of agi,”IEEE Wireless Commun., vol. 31, no. 5, pp. 174–181, Oct. 2024

2024
[6]

Unleashing the power of edge-cloud generative AI in mobile networks: A survey of aigc services,

M. Xu, H. Du, D. Niyato, J. Kang, Z. Xiong, S. Mao, Z. Han, A. Jamalipour, D. I. Kim, X. Shenet al., “Unleashing the power of edge-cloud generative AI in mobile networks: A survey of aigc services,” IEEE Commun. Surv. Tutor ., vol. 26, no. 2, pp. 1127–1170, Jan. 2024

2024
[7]

Large language models empowered autonomous edge AI for connected intelligence,

Y . Shen, J. Shao, X. Zhang, Z. Lin, H. Pan, D. Li, J. Zhang, and K. B. Letaief, “Large language models empowered autonomous edge AI for connected intelligence,”arXiv preprint arXiv:2307.02779, 2023

work page arXiv 2023
[8]

Towards a theory of semantic communication,

J. Bao, P. Basu, M. Dean, C. Partridge, A. Swami, W. Leland, and J. A. Hendler, “Towards a theory of semantic communication,” inIEEE Netw. Sci. Workshop, 2011, pp. 110–117

2011
[9]

Beyond transmitting bits: Context, seman- tics, and task-oriented communications,

D. Gündüz, Z. Qin, I. E. Aguerri, H. S. Dhillon, Z. Yang, A. Yener, K. K. Wong, and C.-B. Chae, “Beyond transmitting bits: Context, seman- tics, and task-oriented communications,”IEEE J. Sel. Areas Commun., vol. 41, no. 1, pp. 5–41, Jan. 2023

2023
[10]

Semantic communications: Principles and challenges,

Z. Qin, X. Tao, J. Lu, W. Tong, and G. Y . Li, “Semantic communications: Principles and challenges,”arXiv preprint arXiv:2201.01389, 2021

work page arXiv 2021
[11]

From semantic communication to semantic-aware networking: Model, architecture, and open problems,

G. Shi, Y . Xiao, Y . Li, and X. Xie, “From semantic communication to semantic-aware networking: Model, architecture, and open problems,” IEEE Commun. Mag., vol. 59, no. 8, pp. 44–50, Aug. 2021

2021
[12]

Semantic communications for future internet: Fundamentals, applications, and challenges,

W. Yang, H. Du, Z. Q. Liew, W. Y . B. Lim, Z. Xiong, D. Niyato, X. Chi, X. Shen, and C. Miao, “Semantic communications for future internet: Fundamentals, applications, and challenges,”IEEE Commun. Surv. Tutorials, vol. 25, no. 1, pp. 213–250, 2022

2022
[13]

Deep learning enabled semantic communication systems,

H. Xie, Z. Qin, G. Y . Li, and B.-H. Juang, “Deep learning enabled semantic communication systems,”IEEE Trans. Signal Process., vol. 69, pp. 2663–2675, Apr. 2021

2021
[14]

Wireless deep video semantic transmission,

S. Wang, J. Dai, Z. Liang, K. Niu, Z. Si, C. Dong, X. Qin, and P. Zhang, “Wireless deep video semantic transmission,”IEEE J. Sel. Areas Commun., vol. 41, no. 1, pp. 214–229, Jan, 2022

2022
[15]

Task-oriented explainable semantic communications,

S. Ma, W. Qiao, Y . Wu, H. Li, G. Shi, D. Gao, Y . Shi, S. Li, and N. Al-Dhahir, “Task-oriented explainable semantic communications,” IEEE Trans. Wireless commun., vol. 22, no. 12, pp. 9248–9262, 2023

2023
[16]

Intellicise wireless networks from semantic communications: A survey, research issues, and challenges,

P. Zhang, W. Xu, Y . Liu, X. Qin, K. Niu, S. Cui, G. Shi, Z. Qin, X. Xu, F. Wanget al., “Intellicise wireless networks from semantic communications: A survey, research issues, and challenges,”IEEE Commun. Surv. Tutorials, 2024

2024
[17]

Large AI model-based semantic communications,

F. Jiang, Y . Peng, L. Dong, K. Wang, K. Yang, C. Pan, and X. You, “Large AI model-based semantic communications,”IEEE Wireless Com- mun., vol. 31, no. 3, pp. 68–75, June 2024

2024
[18]

Semantic satellite communications based on generative foundation model,

P. Jiang, C.-K. Wen, X. Li, S. Jin, and G. Y . Li, “Semantic satellite communications based on generative foundation model,”IEEE J. Sel. Areas Commun., vol. 43, no. 7, pp. 2431–2445, Jul. 2025

2025
[19]

End-to-end gener- ative semantic communication powered by shared semantic knowledge base,

S. Li, Y . Sun, J. Zhang, K. Cai, S. Cui, and X. Xu, “End-to-end gener- ative semantic communication powered by shared semantic knowledge base,”arXiv preprint arXiv:2405.05738, 2024

work page arXiv 2024
[20]

Generative semantic communication: Diffusion models beyond bit recovery,

E. Grassucci, S. Barbarossa, and D. Comminiello, “Generative semantic communication: Diffusion models beyond bit recovery,”arXiv preprint arXiv:2306.04321, 2023

work page arXiv 2023
[21]

Semantic-aware power allocation for generative semantic communications with foundation models,

C. Xu, M. B. Mashhadi, Y . Ma, and R. Tafazolli, “Semantic-aware power allocation for generative semantic communications with foundation models,”arXiv preprint arXiv:2407.03050, 2024

work page arXiv 2024
[22]

WirelessLLM: Empowering large language models towards wireless intelligence,

J. Shao, J. Tong, Q. Wu, W. Guo, Z. Li, Z. Lin, and J. Zhang, “WirelessLLM: Empowering large language models towards wireless intelligence,”Journal of Commun. Info. Netw., vol. 9, no. 2, pp. 99– 112, June 2024

2024
[23]

Position-aided semantic communication for efficient image transmission: Design, implementa- tion, and experimental results,

P. Jiang, C.-K. Wen, S. Jin, and J. Zhang, “Position-aided semantic communication for efficient image transmission: Design, implementa- tion, and experimental results,”IEEE Trans. Wireless Commun., vol. 25, pp. 4887–4902, 2026

2026
[24]

From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review

M. A. Ferrag, N. Tihanyi, and M. Debbah, “From LLM reasoning to autonomous AI agents: A comprehensive review,”arXiv preprint arXiv:2504.19678, 2025

work page internal anchor Pith review arXiv 2025
[25]

Deep source-channel coding for sentence semantic transmission with HARQ,

P. Jiang, C.-K. Wen, S. Jin, and G. Y . Li, “Deep source-channel coding for sentence semantic transmission with HARQ,”IEEE Trans. Commun., vol. 70, no. 8, pp. 5225–5240, Aug. 2022

2022
[26]

Wireless multi-agent generative ai: From connected intelligence to collective intelligence.arXiv preprint arXiv:2307.02757, 2023

H. Zou, Q. Zhao, L. Bariah, M. Bennis, and M. Debbah, “Wireless multi-agent generative AI: From connected intelligence to collective intelligence,”arXiv preprint arXiv:2307.02757, 2023

work page arXiv 2023
[27]

Agentic AI: The era of semantic decoding,

M. Peyrard, M. Josifoski, and R. West, “Agentic AI: The era of semantic decoding,”arXiv preprint arXiv:2403.14562, 2024

work page arXiv 2024
[28]

Semantic- driven AI agent communications: Challenges and solutions,

K. Yu, M. Sun, Z. Qin, X. Xu, P. Yang, Y . Xiao, and G. Wu, “Semantic- driven AI agent communications: Challenges and solutions,”arXiv preprint arXiv:2510.00381, 2025

work page arXiv 2025
[29]

Semantic information extraction and multi-agent communica- tion optimization based on generative pre-trained transformer,

L. Zhou, X. Deng, Z. Wang, X. Zhang, Y . Dong, X. Hu, Z. Ning, and J. Wei, “Semantic information extraction and multi-agent communica- tion optimization based on generative pre-trained transformer,”IEEE Trans. Cogn. Commun. Netw., vol. 11, no. 2, pp. 725–737, 2024

2024
[30]

Goal-oriented multi-agent semantic networking: Unifying intents, semantics, and intelligence,

S. Chen, Q. Liao, A. Aijaz, and Y . Deng, “Goal-oriented multi-agent semantic networking: Unifying intents, semantics, and intelligence,” arXiv preprint arXiv:2512.01035, 2025

work page arXiv 2025
[31]

Toward goal-oriented communication in multi-agent systems: An overview,

T. Charalambous, N. Pappas, N. Nomikos, and R. Wichman, “Toward goal-oriented communication in multi-agent systems: An overview,” arXiv preprint arXiv:2508.07720, 2025

work page arXiv 2025
[32]

Less data, more knowledge: Building next-generation semantic communication networks,

C. Chaccour, W. Saad, M. Debbah, Z. Han, and H. V . Poor, “Less data, more knowledge: Building next-generation semantic communication networks,”IEEE Commun. Surv. Tutor ., vol. 26, no. 1, pp. 1–36, First Quart. 2024

2024
[33]

The information bottleneck method,

N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,”Proc. 37th Annu. Allerton Conf. Commun., Control, Comput., pp. 368–377, Sep. 1999

1999
[34]

The perception-distortion tradeoff,

Y . Blau and T. Michaeli, “The perception-distortion tradeoff,” inProc. IEEE Conf. Comput. Vision Patt. Recogn. (CVPR), 2018, pp. 6228–6237

2018
[35]

Deep joint source–channel coding for wireless image transmission,

E. Bourtsoulatze, D. Kurka, and D. Gündüz, “Deep joint source–channel coding for wireless image transmission,”IEEE Trans. Cogn. Commun. Netw., vol. 5, no. 3, pp. 567–579, Sep. 2019

2019
[36]

Vision transformer for adaptive image transmission over MIMO channels,

H. Wu, Y . Shao, C. Bian, K. Mikolajczyk, and D. Gündüz, “Vision transformer for adaptive image transmission over MIMO channels,”
[37]

Available: http://arxiv.org/abs/2210.15347

[Online]. Available: http://arxiv.org/abs/2210.15347

work page arXiv
[38]

Real-time semantic communications with a vision transformer,

H. Yoo, T. Jung, L. Dai, S. Kim, and C.-B. Chae, “Real-time semantic communications with a vision transformer,” in2022 IEEE Int. Conf. Commun. Workshops (ICC Workshops). IEEE, 2022, pp. 1–2

2022
[39]

Reinforcement learning-powered semantic communication via semantic similarity,

K. Lu, R. Li, X. Chen, Z. Zhao, and H. Zhang, “Reinforcement learning-powered semantic communication via semantic similarity,” arXiv preprint arXiv:2108.12121, 2021

work page arXiv 2021
[40]

Adaptive resource allocation for semantic communication networks,

L. Wang, W. Wu, F. Zhou, Z. Yang, Z. Qin, and Q. Wu, “Adaptive resource allocation for semantic communication networks,”IEEE Trans- actions on Communications, vol. 72, no. 11, pp. 6900–6916, 2024

2024
[41]

Goal-oriented semantic communication for wireless video transmission via generative AI,

N. Li, Y . Deng, and D. Niyato, “Goal-oriented semantic communication for wireless video transmission via generative AI,”IEEE Trans. Wireless Commun., vol. 25, pp. 10 841–10 854, Jan. 2026

2026
[42]

From large AI models to agentic AI: A tutorial on future intelligent communications,

F. Jiang, C. Pan, K. Wang, P. Michiardi, O. A. Dobre, and M. Debbah, “From large AI models to agentic AI: A tutorial on future intelligent communications,”IEEE J. Sel. Areas Commun., vol. 44, pp. 3507–3540, Feb. 2026

2026
[43]

Large AI models for wireless physical layer,

J. Guo, Y . Cui, S. Jin, and J. Zhang, “Large AI models for wireless physical layer,”IEEE Communications Magazine, 2026, Early access

2026
[44]

Wireless agentic AI with retrieval-augmented multi- modal semantic perception,

G. Liu, Y . Liu, R. Zhang, H. Du, D. Niyato, Z. Xiong, S. Sun, and A. Jamalipour, “Wireless agentic AI with retrieval-augmented multi- modal semantic perception,”IEEE Communications Magazine, vol. 64, no. 1, pp. 230–236, Jan. 2026

2026
[45]

Toward agentic AI: Generative information retrieval inspired intelligent communications and networking,

R. Zhang, S. Tang, Y . Liu, D. Niyato, Z. Xiong, S. Sun, S. Mao, and Z. Han, “Toward agentic AI: Generative information retrieval inspired intelligent communications and networking,”arXiv preprint arXiv:2502.16866, 2025

work page arXiv 2025
[46]

Generative AI-driven semantic communication framework for nextg wireless network,

A. D. Raha, M. S. Munir, A. Adhikary, M. Gain, Y . Qiao, and C. S. Hong, “Generative AI-driven semantic communication framework for nextg wireless network,”IEEE Trans. V ehi. Techn., pp. 1–6, 2026, Early access

2026
[47]

LLM-based semantic communication: The way from task-originated to general,

M. Chen, Z. Sun, X. He, L. Wang, and A. Al-Dulaimi, “LLM-based semantic communication: The way from task-originated to general,” IEEE Wireless Commun. Lett., vol. 14, no. 10, pp. 3029–3033, Oct. 2025

2025
[48]

Large AI model-based semantic communications,

F. Jiang, Y . Peng, L. Dong, K. Wang, K. Yang, C. Pan, and X. You, “Large AI model-based semantic communications,”IEEE Wireless Com- mun., vol. 31, no. 3, pp. 68–75, Jun. 2024

2024
[49]

Large AI model empowered multimodal semantic communications,

F. Jiang, L. Dong, Y . Peng, K. Wang, K. Yang, C. Pan, and X. You, “Large AI model empowered multimodal semantic communications,” IEEE Commun. Mag., vol. 63, no. 1, pp. 76–82, Jan. 2025

2025
[50]

Large language model enhanced multi-agent systems for 6g communications,

F. Jiang, Y . Peng, L. Dong, K. Wang, K. Yang, C. Pan, D. Niyato, and O. A. Dobre, “Large language model enhanced multi-agent systems for 6g communications,”IEEE Wireless Commun., vol. 31, no. 6, pp. 48–55, Dec. 2024

2024
[51]

Camel: Communicative agents for mind exploration of large language model society,

G. Li, R. Wang, Y . Chen, and X. Zhang, “Camel: Communicative agents for mind exploration of large language model society,” inAdvances in Neural Information Processing Systems (NeurIPS), 2023

2023
[52]

Agentverse: Facilitating multi- agent collaboration and exploring emergent behaviors,

W. Chen, Y . Zhao, H. Liu, and et al., “Agentverse: Facilitating multi- agent collaboration and exploring emergent behaviors,” inIn The 12th Int. Conf. Learning Represent. (ICLR), 2024

2024
[53]

A dynamic LLM-powered agent net- work for task-oriented agent collaboration,

Z. Liu, Y . Zhang, and X. Chen, “A dynamic LLM-powered agent net- work for task-oriented agent collaboration,” inProc. 1st Conf. Language Model. (COLM), 2024

2024
[54]

AgentComm: Semantic Communication for Embodied Agents

P. Jiang, Y . Feng, J. Guo, C.-K. Wen, and S. Jin, “Agentcomm: Semantic communication for embodied agents,”arXiv preprint arXiv:2604.13558, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[55]

Generative AI agents with large language model for satellite networks via a mixture of experts transmission,

R. Zhang, H. Du, Y . Liu, D. Niyato, J. Kang, Z. Xiong, A. Jamalipour, and D. I. Kim, “Generative AI agents with large language model for satellite networks via a mixture of experts transmission,”IEEE J. Sel. Areas Commun., vol. 42, no. 12, pp. 3581–3596, Dec. 2024

2024
[56]

Intellicise wireless networks meet agentic AI: A security and privacy perspective,

R. Meng, Z. Zhang, S. Gao, Y . Wang, X. Xu, Y . Lin, Y . Liu, C. Feng, L. Xu, Y . Maet al., “Intellicise wireless networks meet agentic AI: A security and privacy perspective,”arXiv preprint arXiv:2602.15290, 2026

work page arXiv 2026
[57]

Towards semantic-based agent communication networks: Vision, technologies, and challenges,

P. Zhang, R. Meng, X. Xu, Y . Wang, Z. Huang, Y . Liu, R. Zhang, Y . Liu, H. Tong, H. Songet al., “Towards semantic-based agent communication networks: Vision, technologies, and challenges,”arXiv preprint arXiv:2603.24328, 2026

work page arXiv 2026
[58]

Agentic AI-RAN: Enabling intent-driven, explainable and self-evolving open ran intelligence,

Z. He, Y . Luo, X. Liu, M. B. Mashhadi, M. Shojafar, M. Debbah, and R. Tafazolli, “Agentic AI-RAN: Enabling intent-driven, explainable and self-evolving open ran intelligence,”arXiv preprint arXiv:2602.24115, 2026

work page arXiv 2026
[59]

Scomcp: Task-oriented semantic communication for collaborative per- ception,

J. Gan, Y . Sheng, H. Zhang, L. Liang, H. Ye, C. Guo, and S. Jin, “Scomcp: Task-oriented semantic communication for collaborative per- ception,”IEEE Trans. V ehi. Techn., pp. 1–15, Jan. 2026

2026
[60]

PaLM-E: An Embodied Multimodal Language Model

D. Driess, F. Xia, M. S. Sajjadi, and et al., “Palm-e: An embodied multimodal language model,”arXiv preprint arXiv:2303.03378, 2023

work page internal anchor Pith review arXiv 2023
[61]

Rt-2: Vision-language-action models transfer web knowledge to robotic control,

B. Zitkovich, T. Yu, S. Xu, and et al., “Rt-2: Vision-language-action models transfer web knowledge to robotic control,” inProc. Conf. Robot Learn., 2023

2023
[62]

Vrkitchen: An interactive 3d virtual environment for task-oriented learning,

X. Gao, R. Gong, T. Shu, and et al., “Vrkitchen: An interactive 3d virtual environment for task-oriented learning,” inProc. Int. Conf. Machine Learn. (ICML) Worshop Reinforcement Learning for Real Life, 2019

2019
[63]

arXiv preprint arXiv:2310.13724 (2023) 14

X. Puig, E. Undersander, A. Szot, and et al., “Habitat 3.0: A co-habitat for humans, avatars and robots,”arXiv preprint arXiv:2310.13724, 2023

work page arXiv 2023
[64]

Corporation

N. Corporation. Phy abstraction tutorial - effective sinr. [Online]. Available: https://nvlabs.github.io/sionna/sys/tutorials/PHY_Abstraction. html#Effective-SINR
[65]

Compressai: a pytorch library and evalua- tion platform for end-to-end compression research.arXiv preprint arXiv:2011.03029, 2020

J. Bégaint, F. Racapé, S. Feltman, and A. Pushparaja, “Compressai: a pytorch library and evaluation platform for end-to-end compression research,”arXiv preprint arXiv:2011.03029, 2020

work page arXiv 2011
[66]

A deep learning framework performance evaluation to use YOLO in nvidia jetson platform,

D.-J. Shin and J.-J. Kim, “A deep learning framework performance evaluation to use YOLO in nvidia jetson platform,”Applied Sci., vol. 12, no. 8, p. 3734, 2022

2022
[67]

Poster: xg. glass: Develop AI glasses applications in 10 lines of code and easily deploy onto different glasses,

J. Xu, Y . Zhuang, Z. Li, J. Zhang, and Z. Meng, “Poster: xg. glass: Develop AI glasses applications in 10 lines of code and easily deploy onto different glasses,” inProc. of 27th Int. Workshop Mobile Comput. Syst. Appl., 2026, pp. 160–160

2026
[68]

Opencv: Open source computer vision library

OpenCV . Opencv: Open source computer vision library. [Online]. Available: https://github.com/opencv/opencv
[69]

Docvqa: A dataset for vqa on document images,

M. Mathew, D. Karatzas, and C. Jawahar, “Docvqa: A dataset for vqa on document images,” inProc. IEEE Winter Conf. Appl. Comput. Vision, 2021, pp. 2200–2209

2021
[70]

A review on yolov8 and its advancements,

M. Sohan, T. Sai Ram, and C. V . Rami Reddy, “A review on yolov8 and its advancements,” inInt. Conf. Data Intell. Cogn. Informat., Tirunelveli, India, June 2024, pp. 529–545

2024
[71]

Microsoft coco: Common objects in context,

T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” inEuropean Conf. Comput. Vision (ECCV). Springer, 2014, pp. 740–755

2014
[72]

Thinking in space: How multimodal large language models see, remember, and recall spaces,

J. Yang, S. Yang, A. W. Gupta, R. Han, L. Fei-Fei, and S. Xie, “Thinking in space: How multimodal large language models see, remember, and recall spaces,” inIEEE Comput Vision Pattern Recogn. Conf. (CVPR), 2025, pp. 10 632–10 643

2025
[73]

BERTScore: Evaluating text generation with BERT,

T. Zhang, V . Kishore, F. Wu, K. Q. Weinberger, and Y . Artzi, “BERTScore: Evaluating text generation with BERT,” in8th Int. Conf. Learn. Represent., (ICLR), Addis Ababa, Ethiopia, April 26-30, 2020

2020
[74]

Reproducibility issues for bert-based evaluation metrics,

Y . Chen, J. Belouadi, and S. Eger, “Reproducibility issues for bert-based evaluation metrics,” in2022 Conf. Empirical Methods Natural Language Process.(EMNL), 2022, pp. 2965–2989

2022
[75]

Analysis of focus measure operators for shape-from-focus,

S. Pertuz, D. Puig, and M. A. Garcia, “Analysis of focus measure operators for shape-from-focus,”Pattern Recognition, vol. 46, no. 5, pp. 1415–1432, 2013

2013