pith. sign in

arxiv: 2605.24972 · v1 · pith:YPBWMUU6new · submitted 2026-05-24 · 💻 cs.IT · math.IT

Integrated Sensing, Communication, and Computing for NR-V2X: A Cross-Layer Resource Allocation Framework Using Multi-Agent Reinforcement Learning

Pith reviewed 2026-06-30 00:08 UTC · model grok-4.3

classification 💻 cs.IT math.IT
keywords ISCCNR-V2XMulti-agent reinforcement learningSB-SPSResource allocationSensing accuracyMEC offloadingCross-layer design
0
0 comments X

The pith

MAPPO-SPS uses multi-agent RL to jointly adapt SB-SPS reservations, radio partitioning, and MEC offloading in NR-V2X for balanced sensing, reliability, throughput, energy, and delay.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops MAPPO-SPS as an ISCC-aware scheduler for NR-V2X Mode 2 that addresses limitations in standard SB-SPS by incorporating sensing demands and computation offloading. It models the joint decision problem as a cooperative partially observable Markov game solved via MAPPO with centralized training and decentralized execution. Simulations indicate the method produces a balanced performance across CRLB sensing accuracy, packet reception ratio, effective throughput, energy consumption, and end-to-end delay. A sympathetic reader would care because future vehicular networks require distributed resource selection that handles environment perception, safety messaging, and latency-sensitive tasks without centralized coordination.

Core claim

The paper establishes that MAPPO-SPS, by jointly adapting SB-SPS reservation, radio-resource partitioning, and overflow-driven computation-offloading decisions at control epochs in a cooperative partially observable Markov game solved with MAPPO under CTDE, achieves a balanced tradeoff among CRLB-based sensing accuracy, packet reception ratio, effective throughput, energy consumption, and end-to-end delay.

What carries the argument

MAPPO-SPS: multi-agent proximal policy optimization applied to sensing-based semi-persistent scheduling, performing joint adaptation of reservations, partitioning, and offloading via centralized training and decentralized execution.

If this is right

  • The scheduler enables distributed autonomous resource selection that explicitly accounts for sensing-resource demand and MEC-induced latency.
  • Joint adaptation at control epochs produces simultaneous gains across sensing accuracy, communication reliability, and computation metrics.
  • The CTDE structure supports deployment where each vehicle acts on local observations while benefiting from centralized training.
  • Performance is evaluated against standard SB-SPS to quantify the benefit of including sensing and computation objectives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach may apply to other distributed sensing-communication systems beyond vehicular networks if similar partially observable Markov game structures hold.
  • Reward function choices could be tested for robustness by varying vehicle density or task arrival rates in follow-on simulations.
  • Integration with existing NR-V2X protocol stacks would require mapping the learned policies to actual control signaling intervals.
  • The framework suggests potential for hybrid RL-rule-based methods when full decentralization is required in safety-critical settings.

Load-bearing premise

The simulation environment and reward function accurately capture real-world NR-V2X channel dynamics, sensing measurement errors, MEC offloading latencies, and vehicle mobility patterns without requiring post-hoc tuning that would not generalize.

What would settle it

A hardware-in-the-loop test or trace-driven simulation using measured real-world NR-V2X channel data and mobility patterns that shows whether the reported tradeoffs in CRLB accuracy, PRR, throughput, energy, and delay persist or degrade relative to baseline SB-SPS.

Figures

Figures reproduced from arXiv: 2605.24972 by Indulekha K. P., T. G. Venkatesh.

Figure 1
Figure 1. Figure 1: NR-V2X Mode-2 ISCC architecture with joint sensing, sidelink communication, and MEC-assisted computation. III. System Model We consider an NR-V2X sidelink network operating under 3GPP Release 16 Mode 2, with N vehicles U = {u1, u2, . . . , uN } moving along a road segment, as shown in [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Proposed MAPPO-based ISCC architecture. TABLE II Simulation Parameters Parameter Value Carrier frequency, fc 5.9 GHz System bandwidth, B 10 MHz Modulation/coding MCS 9 Packet size, Z 190 bytes Message rate 10, 20, 50 Hz Tx power, p tx i 23 dBm RSRP threshold −128 dBm SNR threshold 8 dB ρSI, RCS 10−7 , 10 dBsm Antenna gain 3 dB Max. speed 70 km/h Veh. density, ρ 20–100 veh/km Tsen 1000 ms Tsel 100 ms CRC(TR… view at source ↗
Figure 3
Figure 3. Figure 3: Training evolution of communication, sensing, computation, and reward metrics at ρ = 80 veh/km. different random seeds. MAPPO-SPS achieves faster PRR improvement and a higher steady-state reliability than MA-A2C-SPS, as shown in Fig. 3a, indicating stronger sidelink resource adaptation under contention. It also reduces the Root-CRLB for range estimation more rapidly in Fig. 3b, showing that sensing resourc… view at source ↗
Figure 4
Figure 4. Figure 4: Velocity root-CRLB vs. ρ. 20 40 60 80 100 Vehicle Density, (veh/km) 10-2 10-1 100 Root-CRLB for Range (m) MAPPO-SPS MA-A2C-SPS SCG-SPS CCG-SPS [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Range root-CRLB vs. distance. 20 40 60 80 100 120 140 Distance, d (m) 0.7 0.75 0.8 0.85 0.9 0.95 1 PRR MAPPO-SPS MA-A2C-SPS SCG-SPS CCG-SPS [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Effective throughput versus vehicle density. 20 40 60 80 100 Vehicle Density, (veh/km) 0.15 0.25 0.35 0.45 0.55 Channel Busy Ratio MAPPO-SPS MA-A2C-SPS SCG-SPS CCG-SPS [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: Maximum reliable distance versus vehicle density. 20 40 60 80 100 Vehicle Density, (veh/km) 8 10 12 14 16 18 20 22 Average E2E Delay (ms) MAPPO-SPS MA-A2C-SPS SCG-SPS CCG-SPS [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
Figure 12
Figure 12. Figure 12: Average MEC queueing delay versus number of offloading vehicles. 20 40 60 80 100 Vehicle Density, (veh/km) 5 10 15 20 25 30 Average Energy Consumption (mJ) MAPPO-SPS MA-A2C-SPS SCG-SPS CCG-SPS [PITH_FULL_IMAGE:figures/full_fig_p014_12.png] view at source ↗
read the original abstract

Integrated sensing, communication, and computation (ISCC) is emerging as a unified design paradigm for future vehicular networks that require joint environment perception, safety-critical information exchange, and latency-sensitive task processing. In New Radio Vehicle-to-Everything (NR-V2X) Mode 2, autonomous resource selection is performed through sensing-based semi-persistent scheduling (SB-SPS), which is effective for distributed communication resource reservation but does not explicitly consider sensing-resource demand, task-induced computation workload, and the additional latency introduced by mobile edge computing (MEC) offloading. This paper develops multi-agent proximal policy optimization-based SB-SPS (MAPPO-SPS), an ISCC-aware cross-layer scheduler that jointly adapts SB-SPS reservation, radio-resource partitioning, and overflow-driven computation-offloading decisions at control epochs. The scheduling problem is formulated as a cooperative partially observable Markov game and solved using MAPPO with centralized training and decentralized execution (CTDE). Simulation results show that MAPPO-SPS achieves a balanced tradeoff among CRLB-based sensing accuracy, packet reception ratio (PRR), effective throughput, energy consumption, and end-to-end delay.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes MAPPO-SPS, a multi-agent proximal policy optimization algorithm for integrated sensing, communication, and computing (ISCC) in NR-V2X Mode 2. It formulates cross-layer resource allocation (SB-SPS reservation, radio partitioning, and MEC offloading) as a cooperative partially observable Markov game solved via MAPPO with centralized training and decentralized execution (CTDE). Simulation results are presented to show that the approach achieves a balanced tradeoff among CRLB-based sensing accuracy, packet reception ratio, effective throughput, energy consumption, and end-to-end delay.

Significance. If the simulation fidelity and generalizability hold, the work could advance cross-layer ISCC design for vehicular networks by showing how MARL can jointly handle sensing demands, communication reservations, and computation offloading in distributed NR-V2X Mode 2. The formulation as a POMG is standard, but the significance is currently limited by the lack of external validation or parameter-independent benchmarks.

major comments (2)
  1. [Simulation Results] Simulation Results section: No quantitative baselines (e.g., conventional SB-SPS, single-agent RL, or non-ISCC-aware schedulers), error bars, or statistical significance tests are reported for the claimed tradeoffs among CRLB, PRR, throughput, energy, and delay. This leaves the central performance claims without independent anchors.
  2. [Problem Formulation] Problem Formulation / System Model section: The construction of the Markov game state, partial observations, transition dynamics (including channel models, sensing errors, MEC latencies, and vehicle mobility), and reward function weights are not specified in sufficient detail to assess whether they accurately reflect NR-V2X Mode 2 without post-hoc tuning that would not generalize.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below and will revise the paper accordingly to improve the presentation of results and formulation details.

read point-by-point responses
  1. Referee: [Simulation Results] Simulation Results section: No quantitative baselines (e.g., conventional SB-SPS, single-agent RL, or non-ISCC-aware schedulers), error bars, or statistical significance tests are reported for the claimed tradeoffs among CRLB, PRR, throughput, energy, and delay. This leaves the central performance claims without independent anchors.

    Authors: We agree that the simulation results section would benefit from explicit quantitative baselines and statistical measures. In the revised manuscript, we will add comparisons against conventional SB-SPS, single-agent PPO, and non-ISCC-aware schedulers. We will also report results with error bars (standard deviation over multiple runs) and include statistical significance tests (e.g., paired t-tests) to anchor the claimed tradeoffs among CRLB, PRR, throughput, energy, and delay. revision: yes

  2. Referee: [Problem Formulation] Problem Formulation / System Model section: The construction of the Markov game state, partial observations, transition dynamics (including channel models, sensing errors, MEC latencies, and vehicle mobility), and reward function weights are not specified in sufficient detail to assess whether they accurately reflect NR-V2X Mode 2 without post-hoc tuning that would not generalize.

    Authors: We concur that greater specificity is needed for reproducibility and to demonstrate alignment with NR-V2X Mode 2. The revised manuscript will expand the System Model and Problem Formulation sections with explicit definitions of the joint state, per-agent observation functions, transition dynamics (including 3GPP channel models, sensing error models, MEC latency expressions, and mobility traces), and the precise reward function with all weighting coefficients. These will be justified using standard 3GPP parameters to address generalizability concerns. revision: yes

Circularity Check

1 steps flagged

Simulation tradeoffs are produced by training MAPPO policy on author-defined reward that directly encodes the reported metrics

specific steps
  1. fitted input called prediction [Abstract (simulation results paragraph)]
    "Simulation results show that MAPPO-SPS achieves a balanced tradeoff among CRLB-based sensing accuracy, packet reception ratio (PRR), effective throughput, energy consumption, and end-to-end delay."

    The MAPPO reward function is defined to include weighted terms for exactly these quantities (sensing accuracy via CRLB, PRR, throughput, energy, delay). Training the policy to maximize this reward and then reporting the resulting 'balanced tradeoff' makes the reported outcome statistically forced by the choice of reward weights and simulator parameters; the numbers are outputs of the same optimization that was set up to produce them.

full rationale

The paper's central empirical claim rests on simulation results from an RL agent whose reward function and environment model are constructed by the authors to optimize precisely the listed objectives (CRLB sensing, PRR, throughput, energy, delay). This makes the 'balanced tradeoff' a direct consequence of the optimization setup rather than an independent prediction. The POMG formulation and MAPPO-CTDE solver are standard and non-circular, but the load-bearing performance numbers reduce to the fitted policy by construction. No external benchmark, closed-form derivation, or hardware validation is indicated to break the loop.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on a simulation model whose channel, sensing, and computing parameters are not derived from first principles or external measurements; multiple reward weights and policy hyperparameters are necessarily tuned to obtain the reported balanced tradeoff.

free parameters (2)
  • reward weights for sensing accuracy, PRR, throughput, energy, delay
    These scalar weights must be chosen to produce the claimed balanced tradeoff; they are not fixed by any external measurement or uniqueness theorem.
  • MAPPO hyperparameters (learning rate, clip ratio, number of agents, control epoch length)
    Standard RL training parameters that are adjusted until the policy exhibits the desired simulation behavior.
axioms (2)
  • domain assumption The cooperative partially observable Markov game formulation captures all relevant interactions among sensing, communication, and computing decisions.
    Invoked when the scheduling problem is cast as a POMG solved by MAPPO.
  • domain assumption CRLB-based sensing accuracy, PRR, effective throughput, energy, and end-to-end delay are the only metrics that matter and can be traded off linearly via the reward function.
    Used to define the objective that the learned policy optimizes.

pith-pipeline@v0.9.1-grok · 5744 in / 1586 out tokens · 26573 ms · 2026-06-30T00:08:30.405683+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 2 canonical work pages · 1 internal anchor

  1. [1]

    A tutorial on 5g nr v2x communi- cations,

    M. H. C. Garcia et al., “A tutorial on 5g nr v2x communi- cations,” IEEE Communications Surveys & Tutorials, vol. 23, no. 3, pp. 1972–2026, Jul. 2021

  2. [2]

    Resource allocation modes in c-v2x: From lte-v2x to 5g-v2x,

    K. Sehla, T. M. T. Nguyen, G. Pujolle, and P. B. Velloso, “Resource allocation modes in c-v2x: From lte-v2x to 5g-v2x,” IEEE Internet of Things Journal, vol. 9, no. 11, pp. 8291–8314, Jun. 2022

  3. [3]

    Performance analysis of sidelink 5g- v2x mode 2 through an open-source simulator,

    V. Todisco, S. Bartoletti, C. Campolo, A. Molinaro, A. O. Berthet, and A. Bazzi, “Performance analysis of sidelink 5g- v2x mode 2 through an open-source simulator,” IEEE Access, vol. 9, pp. 145 648–145 661, 2021

  4. [4]

    Millimeter-wave vehicular communication to support massive automotive sensing,

    J. Choi, V. Va, N. Gonzalez-Prelcic, R. Daniels, C. R. Bhat, and R. W. Heath, “Millimeter-wave vehicular communication to support massive automotive sensing,” IEEE Communications Magazine, vol. 54, no. 12, pp. 160–167, 2016

  5. [5]

    A survey on fundamental limits of integrated sensing and communication,

    A. Liu, Z. Huang, M. Li, Y. Wan, W. Li, T. X. Han, C. Liu, R. Du, D. K. P. Tan, J. Lu, Y. Shen, F. Colone, and K. Chetty, “A survey on fundamental limits of integrated sensing and communication,” IEEE Communications Surveys & Tutorials, vol. 24, no. 2, pp. 994–1035, 2022

  6. [6]

    Integrated sensing and communications: Recent advances and ten open challenges,

    S. Lu, F. Liu, Y. Li, K. Zhang, H. Huang, J. Zou, X. Li, Y. Dong, F. Dong, J. Zhu, Y. Xiong, W. Yuan, Y. Cui, and L. Hanzo, “Integrated sensing and communications: Recent advances and ten open challenges,” IEEE Internet of Things Journal, vol. 11, no. 11, pp. 19 094–19 129, Jun. 2024

  7. [7]

    Deep reinforce- ment learning for integrated sensing and communication in ris-assisted 6g v2x system,

    X. Long, Y. Zhao, H. Wu, and C.-Z. Xu, “Deep reinforce- ment learning for integrated sensing and communication in ris-assisted 6g v2x system,” IEEE Internet of Things Journal, vol. 11, no. 24, pp. 39 834–39 849, Dec. 2024

  8. [8]

    Full-duplex communication for isac: Joint beam- forming and power optimization,

    Z. He et al., “Full-duplex communication for isac: Joint beam- forming and power optimization,” IEEE Journal on Selected Areas in Communications, vol. 41, no. 9, pp. 2920–2936, Sep. 2023

  9. [9]

    Communication–computation trade-off in resource-constrained edge inference,

    J. Shao and J. Zhang, “Communication–computation trade-off in resource-constrained edge inference,” IEEE Communications Magazine, vol. 58, no. 12, pp. 20–26, Dec. 2020

  10. [10]

    Joint com- putation and communication cooperation for mobile edge com- puting,

    X. Cao, F. Wang, J. Xu, R. Zhang, and S. Cui, “Joint com- putation and communication cooperation for mobile edge com- puting,” in Proc. 16th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), 2018, pp. 1–6

  11. [11]

    Optimized power control for over-the-air computation in fading channels,

    X. Cao, G. Zhu, J. Xu, and K. Huang, “Optimized power control for over-the-air computation in fading channels,” IEEE Transactions on Wireless Communications, vol. 19, no. 11, pp. 7498–7513, 2020

  12. [12]

    A survey on mobile crowdsensing systems: Challenges, solutions, and opportunities,

    A. Capponi, C. Fiandrino, B. Kantarci, L. Foschini, D. Kli- azovich, and P. Bouvry, “A survey on mobile crowdsensing systems: Challenges, solutions, and opportunities,” IEEE Com- munications Surveys & Tutorials, vol. 21, no. 3, pp. 2419–2465, 2019

  13. [13]

    Cooperative data sensing and computation offloading in uav-assisted crowdsensing with multi-agent deep reinforcement learning,

    T. Cai, Z. Yang, Y. Chen, W. Chen, Z. Zheng, Y. Yu, and H.-N. Dai, “Cooperative data sensing and computation offloading in uav-assisted crowdsensing with multi-agent deep reinforcement learning,” IEEE Transactions on Network Science and Engi- neering, vol. 9, no. 5, pp. 3197–3211, 2022

  14. [14]

    Comprehensive survey on machine learning in vehicular network: Technology, applications and challenges,

    F. Tang, Y. Kawamoto, N. Kato, and J. Liu, “Comprehensive survey on machine learning in vehicular network: Technology, applications and challenges,” IEEE Communications Surveys & Tutorials, vol. 23, no. 3, pp. 2027–2057, 2021

  15. [15]

    A survey on multi- agent reinforcement learning methods for vehicular networks,

    I. Althamary, C.-W. Huang, and P. Lin, “A survey on multi- agent reinforcement learning methods for vehicular networks,” in Proc. 15th Int. Wireless Commun. Mobile Comput. Conf. (IWCMC), 2019, pp. 1154–1159

  16. [16]

    Multi- agent reinforcement learning-based semi-persistent scheduling scheme in c-v2x mode 4,

    B. Gu, W. Chen, M. Alazab, X. Tan, and M. Guizani, “Multi- agent reinforcement learning-based semi-persistent scheduling scheme in c-v2x mode 4,” IEEE Transactions on Vehicular Technology, vol. 71, no. 11, pp. 12 044–12 056, 2022

  17. [17]

    A survey on integrated sensing, communication, and computa- tion,

    D. Wen, Y. Zhou, X. Li, Y. Shi, K. Huang, and K. B. Letaief, “A survey on integrated sensing, communication, and computa- tion,” IEEE Communications Surveys & Tutorials, 2024, early Access, doi: 10.1109/COMST.2024.3521498. 15

  18. [18]

    Integrating sensing, computing, and communication in 6g wireless networks: Design and optimization,

    Q. Qi, X. Chen, A. Khalili, C. Zhong, Z. Zhang, and D. W. K. Ng, “Integrating sensing, computing, and communication in 6g wireless networks: Design and optimization,” IEEE Transac- tions on Communications, vol. 70, no. 9, pp. 6212–6227, Sep. 2022

  19. [19]

    Inte- grated sensing, communication, and computation for over-the- air federated learning in 6g wireless networks,

    M. Du, H. Zheng, M. Gao, X. Feng, J. Hu, and Y. Chen, “Inte- grated sensing, communication, and computation for over-the- air federated learning in 6g wireless networks,” IEEE Internet of Things Journal, vol. 11, no. 21, pp. 35 551–35 562, Nov. 2024

  20. [20]

    Joint task offloading and resource allocation in multi-user mobile edge computing with continuous spectrum sharing,

    B. Liang, R. Fan, H. Hu, H. Jiang, J. Xu, and N. Zhang, “Joint task offloading and resource allocation in multi-user mobile edge computing with continuous spectrum sharing,” IEEE Transactions on Vehicular Technology, vol. 73, no. 5, pp. 7234–7249, May 2024

  21. [21]

    Energy-efficient joint computation offloading and resource allocation strategy for isac-aided 6g v2x networks,

    Q. Liu, R. Luo, H. Liang, and Q. Liu, “Energy-efficient joint computation offloading and resource allocation strategy for isac-aided 6g v2x networks,” IEEE Transactions on Green Communications and Networking, vol. 7, no. 1, pp. 413–428, Mar. 2023

  22. [22]

    Integrated sensing and communication assisted mobile edge computing: An energy-efficient design via intelligent reflect- ing surface,

    N. Huang, T. Wang, Y. Wu, Q. Wu, and T. Q. S. Quek, “Integrated sensing and communication assisted mobile edge computing: An energy-efficient design via intelligent reflect- ing surface,” IEEE Wireless Communications Letters, vol. 11, no. 10, pp. 2085–2089, Oct. 2022

  23. [23]

    Radio resource alloca- tion for integrated sensing, communication, and computation networks,

    L. Zhao, D. Wu, L. Zhou, and Y. Qian, “Radio resource alloca- tion for integrated sensing, communication, and computation networks,” IEEE Transactions on Wireless Communications, vol. 21, no. 10, pp. 8675–8687, Oct. 2022

  24. [24]

    Integrated sensing, computa- tion, and communication: System framework and performance optimization,

    Y. He, G. Yu, Y. Cai, and H. Luo, “Integrated sensing, computa- tion, and communication: System framework and performance optimization,” IEEE Transactions on Wireless Communica- tions, vol. 23, no. 2, pp. 1114–1128, Feb. 2024

  25. [25]

    Latency minimization oriented radio and computation resource allocations for 6g v2x networks with iscc,

    P. Liu, X. Wang, Z. Fei, Y. Wu, J. Xu, and A. Nallanathan, “Latency minimization oriented radio and computation resource allocations for 6g v2x networks with iscc,” IEEE Transactions on Communications, vol. 73, no. 12, pp. 15 851–15 865, 2025

  26. [26]

    Waveform design and signal processing aspects for fusion of wireless communications and radar sensing,

    C. Sturm and W. Wiesbeck, “Waveform design and signal processing aspects for fusion of wireless communications and radar sensing,” Proceedings of the IEEE, vol. 99, no. 7, pp. 1236–1259, Jul. 2011

  27. [27]

    Performance characterization of joint com- munication and sensing with beyond 5g nr-v2x sidelink,

    N. Decarli, S. Bartoletti, A. Bazzi, R. A. Stirling-Gallacher, and B. M. Masini, “Performance characterization of joint com- munication and sensing with beyond 5g nr-v2x sidelink,” IEEE Transactions on Vehicular Technology, vol. 73, no. 7, pp. 10 044– 10 059, 2024

  28. [28]

    Resource reservation in c-v2x networks for dynamic traffic environments: From vehicle density-driven to deep reinforcement learning,

    X. Zhou, F. Hui, J. Liu, W. Wang, and J. Zhang, “Resource reservation in c-v2x networks for dynamic traffic environments: From vehicle density-driven to deep reinforcement learning,” IEEE Transactions on Vehicular Technology, vol. 74, no. 11, pp. 17 429–17 444, Nov. 2025

  29. [29]

    Congestion control in au- tonomous resource selection of cellular-v2x,

    S. Sabeeh and K. Wesolowski, “Congestion control in au- tonomous resource selection of cellular-v2x,” IEEE Access, vol. 11, pp. 7450–7460, 2023

  30. [30]

    Vehicular edge computing and networking: A survey,

    L. Liu, C. Chen, Q. Pei, S. Maharjan, and Y. Zhang, “Vehicular edge computing and networking: A survey,” Mobile Networks and Applications, vol. 26, pp. 1145–1168, 2021

  31. [31]

    A survey on mobile edge computing: The communication perspec- tive,

    Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A survey on mobile edge computing: The communication perspec- tive,” IEEE Communications Surveys & Tutorials, vol. 19, no. 4, pp. 2322–2358, 2017

  32. [32]

    Energy-efficiency computation offloading strategy in uav aided v2x network with integrated sensing and communication,

    Q. Liu, H. Liang, R. Luo, and Q. Liu, “Energy-efficiency computation offloading strategy in uav aided v2x network with integrated sensing and communication,” IEEE Open Journal of the Communications Society, vol. 3, pp. 1337–1346, 2022

  33. [33]

    Energy-efficient joint task offloading and resource allocation in ofdma-based collaborative edge computing,

    L. Tan, Z. Kuang, L. Zhao, and A. Liu, “Energy-efficient joint task offloading and resource allocation in ofdma-based collaborative edge computing,” IEEE Transactions on Wireless Communications, vol. 21, no. 3, pp. 1960–1972, 2022

  34. [34]

    The surprising effectiveness of ppo in cooperative, multi-agent games,

    C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, and Y. Wu, “The surprising effectiveness of ppo in cooperative, multi-agent games,” in Advances in Neural Information Pro- cessing Systems, vol. 35, 2022, pp. 24 611–24 624

  35. [35]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017

  36. [36]

    Collaborative task offloading and resource allocation in small- cell mec: A multi-agent ppo-based scheme,

    H. Li, K. Xiong, Y. Lu, W. Chen, P. Fan, and K. B. Letaief, “Collaborative task offloading and resource allocation in small- cell mec: A multi-agent ppo-based scheme,” IEEE Transactions on Mobile Computing, vol. 24, no. 3, pp. 2346–2359, 2025