Talk Less, Fly Lighter: Autonomous Semantic Compression for UAV Swarm Communication via LLMs

Fei Lin; Jun Huang; Naiqi Wu; Qinghua Ni; Siji Ma; Tengchao Zhang; Yisheng Lv; Yonglin Tian

arxiv: 2508.12043 · v1 · pith:BYKR6I6Jnew · submitted 2025-08-16 · 💻 cs.RO

Talk Less, Fly Lighter: Autonomous Semantic Compression for UAV Swarm Communication via LLMs

Fei Lin , Tengchao Zhang , Qinghua Ni , Jun Huang , Siji Ma , Yonglin Tian , Yisheng Lv , Naiqi Wu This is my paper

Pith reviewed 2026-05-25 07:35 UTC · model grok-4.3

classification 💻 cs.RO

keywords UAV swarm communicationsemantic compressionlarge language modelsbandwidth constrained networksmulti-agent systemsautonomous agentssimulation evaluation

0 comments

The pith

Large language models enable autonomous semantic compression for UAV swarms, cutting communication needs while retaining task semantics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether LLMs can help UAV swarms compress semantic information on their own to ease bandwidth pressure during frequent interactions. Four 2D simulation scenarios with increasing complexity were built, along with a pipeline that combines system and task prompts to guide the LLMs. Nine different LLMs were evaluated for how well they preserve essential meaning after compression, with checks on how swarm size and scene difficulty affect results. If this works, swarms could coordinate more effectively in settings where data links are limited or multi-hop.

Core claim

By designing a prompt-based communication-execution pipeline and testing it in four simulation scenarios, the authors show that LLMs can autonomously compress semantic content for UAV swarm communication, achieving efficient collaboration under bandwidth constraints and multi-hop conditions while maintaining critical task information.

What carries the argument

The prompt-based communication-execution pipeline that combines system prompts with task instruction prompts to drive LLM semantic compression in UAV swarms.

Load-bearing premise

The four 2D simulation scenarios and the prompt-based pipeline capture the essential semantic preservation and communication constraints faced by real UAV swarms.

What would settle it

A field experiment with physical UAVs transmitting compressed messages over actual wireless links and measuring both data volume and task completion success rates compared to uncompressed baselines.

Figures

Figures reproduced from arXiv: 2508.12043 by Fei Lin, Jun Huang, Naiqi Wu, Qinghua Ni, Siji Ma, Tengchao Zhang, Yisheng Lv, Yonglin Tian.

**Figure 2.** Figure 2: 2D visualization of the Extreme scenario with hierarchical UAV [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 4.** Figure 4: 3D Surface Analysis of UAV Swarm Size under Extreme [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 3.** Figure 3: Heatmap of Performance Metrics across Environment Complexity [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

The rapid adoption of Large Language Models (LLMs) in unmanned systems has significantly enhanced the semantic understanding and autonomous task execution capabilities of Unmanned Aerial Vehicle (UAV) swarms. However, limited communication bandwidth and the need for high-frequency interactions pose severe challenges to semantic information transmission within the swarm. This paper explores the feasibility of LLM-driven UAV swarms for autonomous semantic compression communication, aiming to reduce communication load while preserving critical task semantics. To this end, we construct four types of 2D simulation scenarios with different levels of environmental complexity and design a communication-execution pipeline that integrates system prompts with task instruction prompts. On this basis, we systematically evaluate the semantic compression performance of nine mainstream LLMs in different scenarios and analyze their adaptability and stability through ablation studies on environmental complexity and swarm size. Experimental results demonstrate that LLM-based UAV swarms have the potential to achieve efficient collaborative communication under bandwidth-constrained and multi-hop link conditions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper evaluates nine LLMs on prompt-driven semantic tasks in four 2D UAV scenarios with ablations, but never models bandwidth limits, channel effects, or multi-hop routing, so the core claim about constrained communication is untested.

read the letter

The new piece is a side-by-side run of nine mainstream LLMs inside a prompt-based pipeline across four constructed 2D scenarios that vary in complexity, plus ablations on swarm size. That gives a clean empirical snapshot of how different models handle task semantics in those specific setups, and the choice to test multiple models plus the two ablation axes is a reasonable way to map basic behavior. Credit for keeping the evaluation experimental and parameter-free rather than fitting anything post hoc. The soft spot is the mismatch with the abstract's claim. The work states results under bandwidth-constrained and multi-hop conditions, yet the described pipeline stays inside unconstrained prompts with no channel model, no packet-size caps, no latency or loss, and no topology or routing that would force compression to matter. Without those elements the measured outputs do not show whether the semantics would survive actual transmission or relay hops. The 2D scenarios themselves are simple by design, which is fine for an initial check but leaves open how well they capture real semantic preservation needs once radio constraints enter. This is the sort of incremental LLM-robotics experiment that might interest a narrow group already working on prompt engineering for agents, but the gap between the stated goal and the implemented test makes the results hard to use for anyone focused on swarm communication. I would not send it to peer review; the central promise is not addressed by the setup.

Referee Report

3 major / 2 minor

Summary. The paper explores LLM-driven autonomous semantic compression for UAV swarm communication to reduce bandwidth load while preserving task semantics. It constructs four 2D simulation scenarios of varying environmental complexity, designs a prompt-based communication-execution pipeline integrating system and task prompts, evaluates semantic compression performance across nine mainstream LLMs, and performs ablation studies on environmental complexity and swarm size. The central claim is that these experiments demonstrate the potential for efficient collaborative communication under bandwidth-constrained and multi-hop link conditions.

Significance. If the central claim were supported by experiments that actually enforce bandwidth limits and multi-hop routing, the work could offer a novel direction for resource-efficient UAV swarm coordination by leveraging LLMs for semantic understanding, with potential applications in bandwidth-scarce environments such as disaster response or remote operations.

major comments (3)

[Abstract] Abstract: the claim that 'experimental results demonstrate that LLM-based UAV swarms have the potential to achieve efficient collaborative communication under bandwidth-constrained and multi-hop link conditions' is not supported by the described setup. The four 2D scenarios and prompt-based pipeline contain no channel models, message-size caps, latency constraints, packet loss, routing tables, or topology that would enforce those conditions; performance is therefore measured only in an unconstrained prompt environment.
[Experimental evaluation] Experimental evaluation section (implied by abstract description): no quantitative metrics, compression ratios, error bars, or failure cases are reported, leaving the 'potential' claim at a high-level qualitative level without falsifiable evidence of semantic preservation under transmission limits.
[Ablation studies] Ablation studies on environmental complexity and swarm size: these vary only prompt-based scenario parameters and do not test whether LLM outputs survive actual bandwidth caps or relay hops, so they do not address the load-bearing assumption that the pipeline represents real communication constraints.

minor comments (2)

[Methods] The description of how semantic compression performance is scored (e.g., exact metrics for 'preserving critical task semantics') is not detailed enough to allow reproduction or comparison with baseline compression methods.
[Discussion] No discussion of LLM inference latency or onboard computational cost is provided, which is relevant for real-time UAV swarm deployment even if bandwidth is the primary focus.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We agree that the abstract overstates the direct support from our simulations for bandwidth-constrained and multi-hop conditions, as the experiments rely on prompt-based semantic compression in 2D scenarios without explicit channel or routing models. We will revise claims, add quantitative metrics, and clarify limitations.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'experimental results demonstrate that LLM-based UAV swarms have the potential to achieve efficient collaborative communication under bandwidth-constrained and multi-hop link conditions' is not supported by the described setup. The four 2D scenarios and prompt-based pipeline contain no channel models, message-size caps, latency constraints, packet loss, routing tables, or topology that would enforce those conditions; performance is therefore measured only in an unconstrained prompt environment.

Authors: We acknowledge the referee's observation. Our simulations evaluate LLM-driven semantic compression through integrated system and task prompts to reduce information volume for collaborative tasks, but do not incorporate physical-layer channel models or enforced constraints. We will revise the abstract to indicate that the results show potential for reduced communication load via semantic understanding, with the bandwidth-constrained and multi-hop aspects presented as motivating context rather than directly demonstrated outcomes. revision: yes
Referee: [Experimental evaluation] Experimental evaluation section (implied by abstract description): no quantitative metrics, compression ratios, error bars, or failure cases are reported, leaving the 'potential' claim at a high-level qualitative level without falsifiable evidence of semantic preservation under transmission limits.

Authors: The evaluations across nine LLMs include scenario-based performance comparisons, but we agree that explicit quantitative metrics are needed for rigor. We will add tables reporting compression ratios (defined as the reduction in descriptive tokens while preserving task semantics), task success rates, and observed failure cases such as incomplete semantic capture in complex scenarios. revision: yes
Referee: [Ablation studies] Ablation studies on environmental complexity and swarm size: these vary only prompt-based scenario parameters and do not test whether LLM outputs survive actual bandwidth caps or relay hops, so they do not address the load-bearing assumption that the pipeline represents real communication constraints.

Authors: The ablations test LLM adaptability by varying prompt complexity and agent count to assess stability of semantic compression. We agree these do not simulate bandwidth caps or multi-hop relays. We will revise the text to explicitly state the scope of the ablations and add a dedicated limitations paragraph noting that physical transmission effects remain for future integration with communication simulators. revision: partial

Circularity Check

0 steps flagged

No circularity: purely experimental evaluation with no derivations

full rationale

The paper describes construction of 2D simulation scenarios, a prompt-based communication-execution pipeline, and empirical evaluation of nine LLMs across scenarios and ablation studies on complexity and swarm size. No equations, mathematical derivations, fitted parameters, or self-citation chains are present in the provided text that could reduce any claimed result to its inputs by construction. The central claim rests on observed LLM outputs in unconstrained prompt environments, which is an independent empirical finding rather than a self-referential reduction. This matches the default expectation for non-circular experimental work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities identified from abstract; work is empirical demonstration rather than theoretical construction.

pith-pipeline@v0.9.0 · 5715 in / 854 out tokens · 17895 ms · 2026-05-25T07:35:55.543930+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 6 internal anchors

[1]

State-of-the-art and future research challenges in uav swarms,

S. Javed, A. Hassan, R. Ahmad, W. Ahmed, R. Ahmed, A. Saadat, and M. Guizani, “State-of-the-art and future research challenges in uav swarms,” IEEE Internet of Things Journal , vol. 11, no. 11, pp. 19 023–19 045, 2024

work page 2024
[2]

Chatnav: Leveraging llm to zero-shot semantic reasoning in object navigation,

Y . Zhu, Z. Wen, X. Li, X. Shi, X. Wu, H. Dong, and J. Chen, “Chatnav: Leveraging llm to zero-shot semantic reasoning in object navigation,” IEEE Transactions on Circuits and Systems for Video Technology , 2024

work page 2024
[3]

Visual embodied brain: Let multimodal large language models see, think, and control in spaces,

G. Luo, G. Yang, Z. Gong, G. Chen, H. Duan, E. Cui, R. Tong, Z. Hou, T. Zhang, Z. Chen et al., “Visual embodied brain: Let multimodal large language models see, think, and control in spaces,” arXiv preprint arXiv:2506.00123, 2025

work page arXiv 2025
[4]

Learning from observer gaze: Zero-shot attention prediction oriented by human-object interaction recognition,

Y . Zhou, L. Liu, and C. Gou, “Learning from observer gaze: Zero-shot attention prediction oriented by human-object interaction recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 28 390–28 400

work page 2024
[5]

Space-10: A comprehensive benchmark for multimodal large language models in compositional spatial intelligence,

Z. Gong, W. Li, O. Ma, S. Li, J. Ji, X. Yang, G. Luo, J. Yan, and R. Ji, “Space-10: A comprehensive benchmark for multimodal large language models in compositional spatial intelligence,” arXiv preprint arXiv:2506.07966, 2025

work page arXiv 2025
[6]

Where, what, why: Towards explainable driver attention prediction,

Y . Zhou, J. Tang, X. Xiao, Y . Lin, L. Liu, Z. Guo, H. Fei, X. Xia, and C. Gou, “Where, what, why: Towards explainable driver attention prediction,” arXiv preprint arXiv:2506.23088 , 2025

work page arXiv 2025
[7]

Deer-vla: Dynamic inference of multimodal large language models for efficient robot execution,

Y . Yue, Y . Wang, B. Kang, Y . Han, S. Wang, S. Song, J. Feng, and G. Huang, “Deer-vla: Dynamic inference of multimodal large language models for efficient robot execution,” Advances in Neural Information Processing Systems, vol. 37, pp. 56 619–56 643, 2024

work page 2024
[8]

Uavs meet llms: Overviews and perspectives towards agentic low-altitude mobility,

Y . Tian, F. Lin, Y . Li, T. Zhang, Q. Zhang, X. Fu, J. Huang, X. Dai, Y . Wang, C. Tian, B. Li, Y . Lv, L. Kovács, and F.-Y . Wang, “Uavs meet llms: Overviews and perspectives towards agentic low-altitude mobility,” Information Fusion, vol. 122, p. 103158, 2025

work page 2025
[9]

Tasf: Terrestrial- aerial synergistic framework for a new generation of intelligent trans- portation,

C. Wang, D. Nie, R. Zhang, S. Teng, and L. Chen, “Tasf: Terrestrial- aerial synergistic framework for a new generation of intelligent trans- portation,” in 2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC) . IEEE, 2024, pp. 4256–4263

work page 2024
[10]

AirVista-II: An Agentic System for Embodied UAVs Toward Dynamic Scene Semantic Understanding

F. Lin, Y . Tian, T. Zhang, J. Huang, S. Guan, and F.-Y . Wang, “Airvista-ii: An agentic system for embodied uavs toward dynamic scene semantic understanding,” arXiv preprint arXiv:2504.09583 , 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[11]

Logisticsvln: Vision-language navigation for low- altitude terminal delivery based on agentic uavs,

X. Zhang, Y . Tian, F. Lin, Y . Liu, J. Ma, K. S. Szatmáry, and F.-Y . Wang, “Logisticsvln: Vision-language navigation for low- altitude terminal delivery based on agentic uavs,” arXiv preprint arXiv:2505.03460, 2025

work page arXiv 2025
[12]

Coordfield: Coordination field for agentic uav task allocation in low- altitude urban scenarios,

T. Zhang, Y . Tian, F. Lin, J. Huang, P. P. Süli, R. Qin, and F.-Y . Wang, “Coordfield: Coordination field for agentic uav task allocation in low- altitude urban scenarios,” arXiv preprint arXiv:2505.00091 , 2025

work page arXiv 2025
[13]

Uav swarm commu- nication and control architectures: a review,

M. Campion, P. Ranganathan, and S. Faruque, “Uav swarm commu- nication and control architectures: a review,” Journal of Unmanned Vehicle Systems, vol. 7, no. 2, pp. 93–106, 2018

work page 2018
[14]

Multimodal large language models- enabled uav swarm: Towards efficient and intelligent autonomous aerial systems,

Y . Ping, T. Liang, H. Ding, G. Lei, J. Wu, X. Zou, K. Shi, R. Shao, C. Zhang, W. Zhang et al. , “Multimodal large language models- enabled uav swarm: Towards efficient and intelligent autonomous aerial systems,” arXiv preprint arXiv:2506.12710 , 2025

work page arXiv 2025
[15]

Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression,

H. Jiang, Q. Wu, X. Luo, D. Li, C.-Y . Lin, Y . Yang, and L. Qiu, “Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression,” arXiv preprint arXiv:2310.06839, 2023

work page arXiv 2023
[16]

Prompt compression with context-aware sentence encoding for fast and improved llm inference,

B. Liskavets, M. Ushakov, S. Roy, M. Klibanov, A. Etemad, and S. K. Luke, “Prompt compression with context-aware sentence encoding for fast and improved llm inference,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 23, 2025, pp. 24 595– 24 604

work page 2025
[17]

From large ai models to agentic ai: A tutorial on future intelligent communications,

F. Jiang, C. Pan, L. Dong, K. Wang, O. A. Dobre, and M. Debbah, “From large ai models to agentic ai: A tutorial on future intelligent communications,” arXiv preprint arXiv:2505.22311 , 2025

work page arXiv 2025
[18]

Emerging trends in uavs: From placement, semantic com- munications to generative ai for mission-critical networks,

Z. Kaleem, F. A. Orakzai, W. Ishaq, K. Latif, J. Zhao, and A. Ja- malipour, “Emerging trends in uavs: From placement, semantic com- munications to generative ai for mission-critical networks,” IEEE Transactions on Consumer Electronics , 2024

work page 2024
[19]

Scalable uav multi-hop networking via multi-agent re- inforcement learning with large language models,

Y . Xu, W. Hong, J. Zha, G. Chen, J. Zheng, C.-C. Hsia, and X. Chen, “Scalable uav multi-hop networking via multi-agent re- inforcement learning with large language models,” arXiv preprint arXiv:2505.08448, 2025

work page arXiv 2025
[20]

Hierarchical and col- laborative llm-based control for multi-uav motion and communication in integrated terrestrial and non-terrestrial networks,

Z. Yan, H. Zhou, J. Pei, and H. Tabassum, “Hierarchical and col- laborative llm-based control for multi-uav motion and communication in integrated terrestrial and non-terrestrial networks,” arXiv preprint arXiv:2506.06532, 2025

work page arXiv 2025
[21]

Wireless agentic ai with retrieval-augmented multimodal semantic perception,

G. Liu, Y . Liu, R. Zhang, H. Du, D. Niyato, Z. Xiong, S. Sun, and A. Jamalipour, “Wireless agentic ai with retrieval-augmented multimodal semantic perception,” arXiv preprint arXiv:2505.23275 , 2025

work page arXiv 2025
[22]

Semantic compression with large language models,

H. Gilbert, M. Sandborn, D. C. Schmidt, J. Spencer-Smith, and J. White, “Semantic compression with large language models,” in 2023 Tenth International Conference on Social Networks Analysis, Management and Security (SNAMS) . IEEE, 2023, pp. 1–8

work page 2023
[23]

Llm-enabled data transmission in end-to-end semantic communication,

S. Salehi, M. Erol-Kantarci, and D. Niyato, “Llm-enabled data transmission in end-to-end semantic communication,” arXiv preprint arXiv:2504.07431, 2025

work page arXiv 2025
[24]

Large- language-model-enabled text semantic communication systems,

Z. Wang, L. Zou, S. Wei, K. Li, F. Liao, H. Mi, and R. Lai, “Large- language-model-enabled text semantic communication systems,” Ap- plied Sciences, vol. 15, no. 13, p. 7227, 2025

work page 2025
[25]

Mul- timodal llm integrated semantic communications for 6g immersive experiences,

Y . Zhang, Y . Sun, L. Guo, W. Chen, B. Ai, and D. Gunduz, “Mul- timodal llm integrated semantic communications for 6g immersive experiences,” arXiv preprint arXiv:2507.04621 , 2025

work page arXiv 2025
[26]

M4sc: An mllm-based multi-modal, multi-task and multi-user semantic commu- nication system,

F. Jiang, S. Tu, L. Dong, K. Wang, K. Yang, and C. Pan, “M4sc: An mllm-based multi-modal, multi-task and multi-user semantic commu- nication system,” arXiv preprint arXiv:2502.16418 , 2025

work page arXiv 2025
[27]

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

P. He, J. Gao, and W. Chen, “Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding shar- ing,” arXiv preprint arXiv:2111.09543 , 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[28]

The llama 4 herd: The beginning of a new era of natively multimodal ai innovation,

MetaAI, “The llama 4 herd: The beginning of a new era of natively multimodal ai innovation,” https://ai.meta.com/blog/ llama-4-multimodal-intelligence/, 2025, accessed: 2025-07-10

work page 2025
[29]

Introducing o3 and o4-mini,

OpenAI, “Introducing o3 and o4-mini,” https://openai.com/blog/ introducing-o3-and-o4-mini, 2024, accessed: 2025-07-05

work page 2024
[30]

Introducing claude 4,

Anthropic, “Introducing claude 4,” https://www.anthropic.com/news/ claude-4, 2025, accessed: 2025-07-05

work page 2025
[31]

GPT-4o System Card

A. Hurst and A. e. a. Lerer, “Gpt-4o system card,” arXiv preprint arXiv:2410.21276, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[32]

xAI, “Grok 4,” https://x.ai/news/grok-4, 2025, accessed: 2025-08-06

work page 2025
[33]

DeepSeek-V3 Technical Report

A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruan et al. , “Deepseek-v3 technical report,” arXiv preprint arXiv:2412.19437, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[34]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, Q. Zhu, S. Ma, P. Wang, X. Biet al., “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,” arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[35]

Qwen3 Technical Report

A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv et al. , “Qwen3 technical report,” arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[1] [1]

State-of-the-art and future research challenges in uav swarms,

S. Javed, A. Hassan, R. Ahmad, W. Ahmed, R. Ahmed, A. Saadat, and M. Guizani, “State-of-the-art and future research challenges in uav swarms,” IEEE Internet of Things Journal , vol. 11, no. 11, pp. 19 023–19 045, 2024

work page 2024

[2] [2]

Chatnav: Leveraging llm to zero-shot semantic reasoning in object navigation,

Y . Zhu, Z. Wen, X. Li, X. Shi, X. Wu, H. Dong, and J. Chen, “Chatnav: Leveraging llm to zero-shot semantic reasoning in object navigation,” IEEE Transactions on Circuits and Systems for Video Technology , 2024

work page 2024

[3] [3]

Visual embodied brain: Let multimodal large language models see, think, and control in spaces,

G. Luo, G. Yang, Z. Gong, G. Chen, H. Duan, E. Cui, R. Tong, Z. Hou, T. Zhang, Z. Chen et al., “Visual embodied brain: Let multimodal large language models see, think, and control in spaces,” arXiv preprint arXiv:2506.00123, 2025

work page arXiv 2025

[4] [4]

Learning from observer gaze: Zero-shot attention prediction oriented by human-object interaction recognition,

Y . Zhou, L. Liu, and C. Gou, “Learning from observer gaze: Zero-shot attention prediction oriented by human-object interaction recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 28 390–28 400

work page 2024

[5] [5]

Space-10: A comprehensive benchmark for multimodal large language models in compositional spatial intelligence,

Z. Gong, W. Li, O. Ma, S. Li, J. Ji, X. Yang, G. Luo, J. Yan, and R. Ji, “Space-10: A comprehensive benchmark for multimodal large language models in compositional spatial intelligence,” arXiv preprint arXiv:2506.07966, 2025

work page arXiv 2025

[6] [6]

Where, what, why: Towards explainable driver attention prediction,

Y . Zhou, J. Tang, X. Xiao, Y . Lin, L. Liu, Z. Guo, H. Fei, X. Xia, and C. Gou, “Where, what, why: Towards explainable driver attention prediction,” arXiv preprint arXiv:2506.23088 , 2025

work page arXiv 2025

[7] [7]

Deer-vla: Dynamic inference of multimodal large language models for efficient robot execution,

Y . Yue, Y . Wang, B. Kang, Y . Han, S. Wang, S. Song, J. Feng, and G. Huang, “Deer-vla: Dynamic inference of multimodal large language models for efficient robot execution,” Advances in Neural Information Processing Systems, vol. 37, pp. 56 619–56 643, 2024

work page 2024

[8] [8]

Uavs meet llms: Overviews and perspectives towards agentic low-altitude mobility,

Y . Tian, F. Lin, Y . Li, T. Zhang, Q. Zhang, X. Fu, J. Huang, X. Dai, Y . Wang, C. Tian, B. Li, Y . Lv, L. Kovács, and F.-Y . Wang, “Uavs meet llms: Overviews and perspectives towards agentic low-altitude mobility,” Information Fusion, vol. 122, p. 103158, 2025

work page 2025

[9] [9]

Tasf: Terrestrial- aerial synergistic framework for a new generation of intelligent trans- portation,

C. Wang, D. Nie, R. Zhang, S. Teng, and L. Chen, “Tasf: Terrestrial- aerial synergistic framework for a new generation of intelligent trans- portation,” in 2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC) . IEEE, 2024, pp. 4256–4263

work page 2024

[10] [10]

AirVista-II: An Agentic System for Embodied UAVs Toward Dynamic Scene Semantic Understanding

F. Lin, Y . Tian, T. Zhang, J. Huang, S. Guan, and F.-Y . Wang, “Airvista-ii: An agentic system for embodied uavs toward dynamic scene semantic understanding,” arXiv preprint arXiv:2504.09583 , 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[11] [11]

Logisticsvln: Vision-language navigation for low- altitude terminal delivery based on agentic uavs,

X. Zhang, Y . Tian, F. Lin, Y . Liu, J. Ma, K. S. Szatmáry, and F.-Y . Wang, “Logisticsvln: Vision-language navigation for low- altitude terminal delivery based on agentic uavs,” arXiv preprint arXiv:2505.03460, 2025

work page arXiv 2025

[12] [12]

Coordfield: Coordination field for agentic uav task allocation in low- altitude urban scenarios,

T. Zhang, Y . Tian, F. Lin, J. Huang, P. P. Süli, R. Qin, and F.-Y . Wang, “Coordfield: Coordination field for agentic uav task allocation in low- altitude urban scenarios,” arXiv preprint arXiv:2505.00091 , 2025

work page arXiv 2025

[13] [13]

Uav swarm commu- nication and control architectures: a review,

M. Campion, P. Ranganathan, and S. Faruque, “Uav swarm commu- nication and control architectures: a review,” Journal of Unmanned Vehicle Systems, vol. 7, no. 2, pp. 93–106, 2018

work page 2018

[14] [14]

Multimodal large language models- enabled uav swarm: Towards efficient and intelligent autonomous aerial systems,

Y . Ping, T. Liang, H. Ding, G. Lei, J. Wu, X. Zou, K. Shi, R. Shao, C. Zhang, W. Zhang et al. , “Multimodal large language models- enabled uav swarm: Towards efficient and intelligent autonomous aerial systems,” arXiv preprint arXiv:2506.12710 , 2025

work page arXiv 2025

[15] [15]

Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression,

H. Jiang, Q. Wu, X. Luo, D. Li, C.-Y . Lin, Y . Yang, and L. Qiu, “Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression,” arXiv preprint arXiv:2310.06839, 2023

work page arXiv 2023

[16] [16]

Prompt compression with context-aware sentence encoding for fast and improved llm inference,

B. Liskavets, M. Ushakov, S. Roy, M. Klibanov, A. Etemad, and S. K. Luke, “Prompt compression with context-aware sentence encoding for fast and improved llm inference,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 23, 2025, pp. 24 595– 24 604

work page 2025

[17] [17]

From large ai models to agentic ai: A tutorial on future intelligent communications,

F. Jiang, C. Pan, L. Dong, K. Wang, O. A. Dobre, and M. Debbah, “From large ai models to agentic ai: A tutorial on future intelligent communications,” arXiv preprint arXiv:2505.22311 , 2025

work page arXiv 2025

[18] [18]

Emerging trends in uavs: From placement, semantic com- munications to generative ai for mission-critical networks,

Z. Kaleem, F. A. Orakzai, W. Ishaq, K. Latif, J. Zhao, and A. Ja- malipour, “Emerging trends in uavs: From placement, semantic com- munications to generative ai for mission-critical networks,” IEEE Transactions on Consumer Electronics , 2024

work page 2024

[19] [19]

Scalable uav multi-hop networking via multi-agent re- inforcement learning with large language models,

Y . Xu, W. Hong, J. Zha, G. Chen, J. Zheng, C.-C. Hsia, and X. Chen, “Scalable uav multi-hop networking via multi-agent re- inforcement learning with large language models,” arXiv preprint arXiv:2505.08448, 2025

work page arXiv 2025

[20] [20]

Hierarchical and col- laborative llm-based control for multi-uav motion and communication in integrated terrestrial and non-terrestrial networks,

Z. Yan, H. Zhou, J. Pei, and H. Tabassum, “Hierarchical and col- laborative llm-based control for multi-uav motion and communication in integrated terrestrial and non-terrestrial networks,” arXiv preprint arXiv:2506.06532, 2025

work page arXiv 2025

[21] [21]

Wireless agentic ai with retrieval-augmented multimodal semantic perception,

G. Liu, Y . Liu, R. Zhang, H. Du, D. Niyato, Z. Xiong, S. Sun, and A. Jamalipour, “Wireless agentic ai with retrieval-augmented multimodal semantic perception,” arXiv preprint arXiv:2505.23275 , 2025

work page arXiv 2025

[22] [22]

Semantic compression with large language models,

H. Gilbert, M. Sandborn, D. C. Schmidt, J. Spencer-Smith, and J. White, “Semantic compression with large language models,” in 2023 Tenth International Conference on Social Networks Analysis, Management and Security (SNAMS) . IEEE, 2023, pp. 1–8

work page 2023

[23] [23]

Llm-enabled data transmission in end-to-end semantic communication,

S. Salehi, M. Erol-Kantarci, and D. Niyato, “Llm-enabled data transmission in end-to-end semantic communication,” arXiv preprint arXiv:2504.07431, 2025

work page arXiv 2025

[24] [24]

Large- language-model-enabled text semantic communication systems,

Z. Wang, L. Zou, S. Wei, K. Li, F. Liao, H. Mi, and R. Lai, “Large- language-model-enabled text semantic communication systems,” Ap- plied Sciences, vol. 15, no. 13, p. 7227, 2025

work page 2025

[25] [25]

Mul- timodal llm integrated semantic communications for 6g immersive experiences,

Y . Zhang, Y . Sun, L. Guo, W. Chen, B. Ai, and D. Gunduz, “Mul- timodal llm integrated semantic communications for 6g immersive experiences,” arXiv preprint arXiv:2507.04621 , 2025

work page arXiv 2025

[26] [26]

M4sc: An mllm-based multi-modal, multi-task and multi-user semantic commu- nication system,

F. Jiang, S. Tu, L. Dong, K. Wang, K. Yang, and C. Pan, “M4sc: An mllm-based multi-modal, multi-task and multi-user semantic commu- nication system,” arXiv preprint arXiv:2502.16418 , 2025

work page arXiv 2025

[27] [27]

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

P. He, J. Gao, and W. Chen, “Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding shar- ing,” arXiv preprint arXiv:2111.09543 , 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[28] [28]

The llama 4 herd: The beginning of a new era of natively multimodal ai innovation,

MetaAI, “The llama 4 herd: The beginning of a new era of natively multimodal ai innovation,” https://ai.meta.com/blog/ llama-4-multimodal-intelligence/, 2025, accessed: 2025-07-10

work page 2025

[29] [29]

Introducing o3 and o4-mini,

OpenAI, “Introducing o3 and o4-mini,” https://openai.com/blog/ introducing-o3-and-o4-mini, 2024, accessed: 2025-07-05

work page 2024

[30] [30]

Introducing claude 4,

Anthropic, “Introducing claude 4,” https://www.anthropic.com/news/ claude-4, 2025, accessed: 2025-07-05

work page 2025

[31] [31]

GPT-4o System Card

A. Hurst and A. e. a. Lerer, “Gpt-4o system card,” arXiv preprint arXiv:2410.21276, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[32] [32]

xAI, “Grok 4,” https://x.ai/news/grok-4, 2025, accessed: 2025-08-06

work page 2025

[33] [33]

DeepSeek-V3 Technical Report

A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruan et al. , “Deepseek-v3 technical report,” arXiv preprint arXiv:2412.19437, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[34] [34]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, Q. Zhu, S. Ma, P. Wang, X. Biet al., “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,” arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[35] [35]

Qwen3 Technical Report

A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv et al. , “Qwen3 technical report,” arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025