Integrated Sensing, Communication, and Computing for NR-V2X: A Cross-Layer Resource Allocation Framework Using Multi-Agent Reinforcement Learning
Pith reviewed 2026-06-30 00:08 UTC · model grok-4.3
The pith
MAPPO-SPS uses multi-agent RL to jointly adapt SB-SPS reservations, radio partitioning, and MEC offloading in NR-V2X for balanced sensing, reliability, throughput, energy, and delay.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that MAPPO-SPS, by jointly adapting SB-SPS reservation, radio-resource partitioning, and overflow-driven computation-offloading decisions at control epochs in a cooperative partially observable Markov game solved with MAPPO under CTDE, achieves a balanced tradeoff among CRLB-based sensing accuracy, packet reception ratio, effective throughput, energy consumption, and end-to-end delay.
What carries the argument
MAPPO-SPS: multi-agent proximal policy optimization applied to sensing-based semi-persistent scheduling, performing joint adaptation of reservations, partitioning, and offloading via centralized training and decentralized execution.
If this is right
- The scheduler enables distributed autonomous resource selection that explicitly accounts for sensing-resource demand and MEC-induced latency.
- Joint adaptation at control epochs produces simultaneous gains across sensing accuracy, communication reliability, and computation metrics.
- The CTDE structure supports deployment where each vehicle acts on local observations while benefiting from centralized training.
- Performance is evaluated against standard SB-SPS to quantify the benefit of including sensing and computation objectives.
Where Pith is reading between the lines
- The approach may apply to other distributed sensing-communication systems beyond vehicular networks if similar partially observable Markov game structures hold.
- Reward function choices could be tested for robustness by varying vehicle density or task arrival rates in follow-on simulations.
- Integration with existing NR-V2X protocol stacks would require mapping the learned policies to actual control signaling intervals.
- The framework suggests potential for hybrid RL-rule-based methods when full decentralization is required in safety-critical settings.
Load-bearing premise
The simulation environment and reward function accurately capture real-world NR-V2X channel dynamics, sensing measurement errors, MEC offloading latencies, and vehicle mobility patterns without requiring post-hoc tuning that would not generalize.
What would settle it
A hardware-in-the-loop test or trace-driven simulation using measured real-world NR-V2X channel data and mobility patterns that shows whether the reported tradeoffs in CRLB accuracy, PRR, throughput, energy, and delay persist or degrade relative to baseline SB-SPS.
Figures
read the original abstract
Integrated sensing, communication, and computation (ISCC) is emerging as a unified design paradigm for future vehicular networks that require joint environment perception, safety-critical information exchange, and latency-sensitive task processing. In New Radio Vehicle-to-Everything (NR-V2X) Mode 2, autonomous resource selection is performed through sensing-based semi-persistent scheduling (SB-SPS), which is effective for distributed communication resource reservation but does not explicitly consider sensing-resource demand, task-induced computation workload, and the additional latency introduced by mobile edge computing (MEC) offloading. This paper develops multi-agent proximal policy optimization-based SB-SPS (MAPPO-SPS), an ISCC-aware cross-layer scheduler that jointly adapts SB-SPS reservation, radio-resource partitioning, and overflow-driven computation-offloading decisions at control epochs. The scheduling problem is formulated as a cooperative partially observable Markov game and solved using MAPPO with centralized training and decentralized execution (CTDE). Simulation results show that MAPPO-SPS achieves a balanced tradeoff among CRLB-based sensing accuracy, packet reception ratio (PRR), effective throughput, energy consumption, and end-to-end delay.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes MAPPO-SPS, a multi-agent proximal policy optimization algorithm for integrated sensing, communication, and computing (ISCC) in NR-V2X Mode 2. It formulates cross-layer resource allocation (SB-SPS reservation, radio partitioning, and MEC offloading) as a cooperative partially observable Markov game solved via MAPPO with centralized training and decentralized execution (CTDE). Simulation results are presented to show that the approach achieves a balanced tradeoff among CRLB-based sensing accuracy, packet reception ratio, effective throughput, energy consumption, and end-to-end delay.
Significance. If the simulation fidelity and generalizability hold, the work could advance cross-layer ISCC design for vehicular networks by showing how MARL can jointly handle sensing demands, communication reservations, and computation offloading in distributed NR-V2X Mode 2. The formulation as a POMG is standard, but the significance is currently limited by the lack of external validation or parameter-independent benchmarks.
major comments (2)
- [Simulation Results] Simulation Results section: No quantitative baselines (e.g., conventional SB-SPS, single-agent RL, or non-ISCC-aware schedulers), error bars, or statistical significance tests are reported for the claimed tradeoffs among CRLB, PRR, throughput, energy, and delay. This leaves the central performance claims without independent anchors.
- [Problem Formulation] Problem Formulation / System Model section: The construction of the Markov game state, partial observations, transition dynamics (including channel models, sensing errors, MEC latencies, and vehicle mobility), and reward function weights are not specified in sufficient detail to assess whether they accurately reflect NR-V2X Mode 2 without post-hoc tuning that would not generalize.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major comment below and will revise the paper accordingly to improve the presentation of results and formulation details.
read point-by-point responses
-
Referee: [Simulation Results] Simulation Results section: No quantitative baselines (e.g., conventional SB-SPS, single-agent RL, or non-ISCC-aware schedulers), error bars, or statistical significance tests are reported for the claimed tradeoffs among CRLB, PRR, throughput, energy, and delay. This leaves the central performance claims without independent anchors.
Authors: We agree that the simulation results section would benefit from explicit quantitative baselines and statistical measures. In the revised manuscript, we will add comparisons against conventional SB-SPS, single-agent PPO, and non-ISCC-aware schedulers. We will also report results with error bars (standard deviation over multiple runs) and include statistical significance tests (e.g., paired t-tests) to anchor the claimed tradeoffs among CRLB, PRR, throughput, energy, and delay. revision: yes
-
Referee: [Problem Formulation] Problem Formulation / System Model section: The construction of the Markov game state, partial observations, transition dynamics (including channel models, sensing errors, MEC latencies, and vehicle mobility), and reward function weights are not specified in sufficient detail to assess whether they accurately reflect NR-V2X Mode 2 without post-hoc tuning that would not generalize.
Authors: We concur that greater specificity is needed for reproducibility and to demonstrate alignment with NR-V2X Mode 2. The revised manuscript will expand the System Model and Problem Formulation sections with explicit definitions of the joint state, per-agent observation functions, transition dynamics (including 3GPP channel models, sensing error models, MEC latency expressions, and mobility traces), and the precise reward function with all weighting coefficients. These will be justified using standard 3GPP parameters to address generalizability concerns. revision: yes
Circularity Check
Simulation tradeoffs are produced by training MAPPO policy on author-defined reward that directly encodes the reported metrics
specific steps
-
fitted input called prediction
[Abstract (simulation results paragraph)]
"Simulation results show that MAPPO-SPS achieves a balanced tradeoff among CRLB-based sensing accuracy, packet reception ratio (PRR), effective throughput, energy consumption, and end-to-end delay."
The MAPPO reward function is defined to include weighted terms for exactly these quantities (sensing accuracy via CRLB, PRR, throughput, energy, delay). Training the policy to maximize this reward and then reporting the resulting 'balanced tradeoff' makes the reported outcome statistically forced by the choice of reward weights and simulator parameters; the numbers are outputs of the same optimization that was set up to produce them.
full rationale
The paper's central empirical claim rests on simulation results from an RL agent whose reward function and environment model are constructed by the authors to optimize precisely the listed objectives (CRLB sensing, PRR, throughput, energy, delay). This makes the 'balanced tradeoff' a direct consequence of the optimization setup rather than an independent prediction. The POMG formulation and MAPPO-CTDE solver are standard and non-circular, but the load-bearing performance numbers reduce to the fitted policy by construction. No external benchmark, closed-form derivation, or hardware validation is indicated to break the loop.
Axiom & Free-Parameter Ledger
free parameters (2)
- reward weights for sensing accuracy, PRR, throughput, energy, delay
- MAPPO hyperparameters (learning rate, clip ratio, number of agents, control epoch length)
axioms (2)
- domain assumption The cooperative partially observable Markov game formulation captures all relevant interactions among sensing, communication, and computing decisions.
- domain assumption CRLB-based sensing accuracy, PRR, effective throughput, energy, and end-to-end delay are the only metrics that matter and can be traded off linearly via the reward function.
Reference graph
Works this paper leans on
-
[1]
A tutorial on 5g nr v2x communi- cations,
M. H. C. Garcia et al., “A tutorial on 5g nr v2x communi- cations,” IEEE Communications Surveys & Tutorials, vol. 23, no. 3, pp. 1972–2026, Jul. 2021
1972
-
[2]
Resource allocation modes in c-v2x: From lte-v2x to 5g-v2x,
K. Sehla, T. M. T. Nguyen, G. Pujolle, and P. B. Velloso, “Resource allocation modes in c-v2x: From lte-v2x to 5g-v2x,” IEEE Internet of Things Journal, vol. 9, no. 11, pp. 8291–8314, Jun. 2022
2022
-
[3]
Performance analysis of sidelink 5g- v2x mode 2 through an open-source simulator,
V. Todisco, S. Bartoletti, C. Campolo, A. Molinaro, A. O. Berthet, and A. Bazzi, “Performance analysis of sidelink 5g- v2x mode 2 through an open-source simulator,” IEEE Access, vol. 9, pp. 145 648–145 661, 2021
2021
-
[4]
Millimeter-wave vehicular communication to support massive automotive sensing,
J. Choi, V. Va, N. Gonzalez-Prelcic, R. Daniels, C. R. Bhat, and R. W. Heath, “Millimeter-wave vehicular communication to support massive automotive sensing,” IEEE Communications Magazine, vol. 54, no. 12, pp. 160–167, 2016
2016
-
[5]
A survey on fundamental limits of integrated sensing and communication,
A. Liu, Z. Huang, M. Li, Y. Wan, W. Li, T. X. Han, C. Liu, R. Du, D. K. P. Tan, J. Lu, Y. Shen, F. Colone, and K. Chetty, “A survey on fundamental limits of integrated sensing and communication,” IEEE Communications Surveys & Tutorials, vol. 24, no. 2, pp. 994–1035, 2022
2022
-
[6]
Integrated sensing and communications: Recent advances and ten open challenges,
S. Lu, F. Liu, Y. Li, K. Zhang, H. Huang, J. Zou, X. Li, Y. Dong, F. Dong, J. Zhu, Y. Xiong, W. Yuan, Y. Cui, and L. Hanzo, “Integrated sensing and communications: Recent advances and ten open challenges,” IEEE Internet of Things Journal, vol. 11, no. 11, pp. 19 094–19 129, Jun. 2024
2024
-
[7]
Deep reinforce- ment learning for integrated sensing and communication in ris-assisted 6g v2x system,
X. Long, Y. Zhao, H. Wu, and C.-Z. Xu, “Deep reinforce- ment learning for integrated sensing and communication in ris-assisted 6g v2x system,” IEEE Internet of Things Journal, vol. 11, no. 24, pp. 39 834–39 849, Dec. 2024
2024
-
[8]
Full-duplex communication for isac: Joint beam- forming and power optimization,
Z. He et al., “Full-duplex communication for isac: Joint beam- forming and power optimization,” IEEE Journal on Selected Areas in Communications, vol. 41, no. 9, pp. 2920–2936, Sep. 2023
2023
-
[9]
Communication–computation trade-off in resource-constrained edge inference,
J. Shao and J. Zhang, “Communication–computation trade-off in resource-constrained edge inference,” IEEE Communications Magazine, vol. 58, no. 12, pp. 20–26, Dec. 2020
2020
-
[10]
Joint com- putation and communication cooperation for mobile edge com- puting,
X. Cao, F. Wang, J. Xu, R. Zhang, and S. Cui, “Joint com- putation and communication cooperation for mobile edge com- puting,” in Proc. 16th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), 2018, pp. 1–6
2018
-
[11]
Optimized power control for over-the-air computation in fading channels,
X. Cao, G. Zhu, J. Xu, and K. Huang, “Optimized power control for over-the-air computation in fading channels,” IEEE Transactions on Wireless Communications, vol. 19, no. 11, pp. 7498–7513, 2020
2020
-
[12]
A survey on mobile crowdsensing systems: Challenges, solutions, and opportunities,
A. Capponi, C. Fiandrino, B. Kantarci, L. Foschini, D. Kli- azovich, and P. Bouvry, “A survey on mobile crowdsensing systems: Challenges, solutions, and opportunities,” IEEE Com- munications Surveys & Tutorials, vol. 21, no. 3, pp. 2419–2465, 2019
2019
-
[13]
Cooperative data sensing and computation offloading in uav-assisted crowdsensing with multi-agent deep reinforcement learning,
T. Cai, Z. Yang, Y. Chen, W. Chen, Z. Zheng, Y. Yu, and H.-N. Dai, “Cooperative data sensing and computation offloading in uav-assisted crowdsensing with multi-agent deep reinforcement learning,” IEEE Transactions on Network Science and Engi- neering, vol. 9, no. 5, pp. 3197–3211, 2022
2022
-
[14]
Comprehensive survey on machine learning in vehicular network: Technology, applications and challenges,
F. Tang, Y. Kawamoto, N. Kato, and J. Liu, “Comprehensive survey on machine learning in vehicular network: Technology, applications and challenges,” IEEE Communications Surveys & Tutorials, vol. 23, no. 3, pp. 2027–2057, 2021
2027
-
[15]
A survey on multi- agent reinforcement learning methods for vehicular networks,
I. Althamary, C.-W. Huang, and P. Lin, “A survey on multi- agent reinforcement learning methods for vehicular networks,” in Proc. 15th Int. Wireless Commun. Mobile Comput. Conf. (IWCMC), 2019, pp. 1154–1159
2019
-
[16]
Multi- agent reinforcement learning-based semi-persistent scheduling scheme in c-v2x mode 4,
B. Gu, W. Chen, M. Alazab, X. Tan, and M. Guizani, “Multi- agent reinforcement learning-based semi-persistent scheduling scheme in c-v2x mode 4,” IEEE Transactions on Vehicular Technology, vol. 71, no. 11, pp. 12 044–12 056, 2022
2022
-
[17]
A survey on integrated sensing, communication, and computa- tion,
D. Wen, Y. Zhou, X. Li, Y. Shi, K. Huang, and K. B. Letaief, “A survey on integrated sensing, communication, and computa- tion,” IEEE Communications Surveys & Tutorials, 2024, early Access, doi: 10.1109/COMST.2024.3521498. 15
-
[18]
Integrating sensing, computing, and communication in 6g wireless networks: Design and optimization,
Q. Qi, X. Chen, A. Khalili, C. Zhong, Z. Zhang, and D. W. K. Ng, “Integrating sensing, computing, and communication in 6g wireless networks: Design and optimization,” IEEE Transac- tions on Communications, vol. 70, no. 9, pp. 6212–6227, Sep. 2022
2022
-
[19]
Inte- grated sensing, communication, and computation for over-the- air federated learning in 6g wireless networks,
M. Du, H. Zheng, M. Gao, X. Feng, J. Hu, and Y. Chen, “Inte- grated sensing, communication, and computation for over-the- air federated learning in 6g wireless networks,” IEEE Internet of Things Journal, vol. 11, no. 21, pp. 35 551–35 562, Nov. 2024
2024
-
[20]
Joint task offloading and resource allocation in multi-user mobile edge computing with continuous spectrum sharing,
B. Liang, R. Fan, H. Hu, H. Jiang, J. Xu, and N. Zhang, “Joint task offloading and resource allocation in multi-user mobile edge computing with continuous spectrum sharing,” IEEE Transactions on Vehicular Technology, vol. 73, no. 5, pp. 7234–7249, May 2024
2024
-
[21]
Energy-efficient joint computation offloading and resource allocation strategy for isac-aided 6g v2x networks,
Q. Liu, R. Luo, H. Liang, and Q. Liu, “Energy-efficient joint computation offloading and resource allocation strategy for isac-aided 6g v2x networks,” IEEE Transactions on Green Communications and Networking, vol. 7, no. 1, pp. 413–428, Mar. 2023
2023
-
[22]
Integrated sensing and communication assisted mobile edge computing: An energy-efficient design via intelligent reflect- ing surface,
N. Huang, T. Wang, Y. Wu, Q. Wu, and T. Q. S. Quek, “Integrated sensing and communication assisted mobile edge computing: An energy-efficient design via intelligent reflect- ing surface,” IEEE Wireless Communications Letters, vol. 11, no. 10, pp. 2085–2089, Oct. 2022
2085
-
[23]
Radio resource alloca- tion for integrated sensing, communication, and computation networks,
L. Zhao, D. Wu, L. Zhou, and Y. Qian, “Radio resource alloca- tion for integrated sensing, communication, and computation networks,” IEEE Transactions on Wireless Communications, vol. 21, no. 10, pp. 8675–8687, Oct. 2022
2022
-
[24]
Integrated sensing, computa- tion, and communication: System framework and performance optimization,
Y. He, G. Yu, Y. Cai, and H. Luo, “Integrated sensing, computa- tion, and communication: System framework and performance optimization,” IEEE Transactions on Wireless Communica- tions, vol. 23, no. 2, pp. 1114–1128, Feb. 2024
2024
-
[25]
Latency minimization oriented radio and computation resource allocations for 6g v2x networks with iscc,
P. Liu, X. Wang, Z. Fei, Y. Wu, J. Xu, and A. Nallanathan, “Latency minimization oriented radio and computation resource allocations for 6g v2x networks with iscc,” IEEE Transactions on Communications, vol. 73, no. 12, pp. 15 851–15 865, 2025
2025
-
[26]
Waveform design and signal processing aspects for fusion of wireless communications and radar sensing,
C. Sturm and W. Wiesbeck, “Waveform design and signal processing aspects for fusion of wireless communications and radar sensing,” Proceedings of the IEEE, vol. 99, no. 7, pp. 1236–1259, Jul. 2011
2011
-
[27]
Performance characterization of joint com- munication and sensing with beyond 5g nr-v2x sidelink,
N. Decarli, S. Bartoletti, A. Bazzi, R. A. Stirling-Gallacher, and B. M. Masini, “Performance characterization of joint com- munication and sensing with beyond 5g nr-v2x sidelink,” IEEE Transactions on Vehicular Technology, vol. 73, no. 7, pp. 10 044– 10 059, 2024
2024
-
[28]
Resource reservation in c-v2x networks for dynamic traffic environments: From vehicle density-driven to deep reinforcement learning,
X. Zhou, F. Hui, J. Liu, W. Wang, and J. Zhang, “Resource reservation in c-v2x networks for dynamic traffic environments: From vehicle density-driven to deep reinforcement learning,” IEEE Transactions on Vehicular Technology, vol. 74, no. 11, pp. 17 429–17 444, Nov. 2025
2025
-
[29]
Congestion control in au- tonomous resource selection of cellular-v2x,
S. Sabeeh and K. Wesolowski, “Congestion control in au- tonomous resource selection of cellular-v2x,” IEEE Access, vol. 11, pp. 7450–7460, 2023
2023
-
[30]
Vehicular edge computing and networking: A survey,
L. Liu, C. Chen, Q. Pei, S. Maharjan, and Y. Zhang, “Vehicular edge computing and networking: A survey,” Mobile Networks and Applications, vol. 26, pp. 1145–1168, 2021
2021
-
[31]
A survey on mobile edge computing: The communication perspec- tive,
Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A survey on mobile edge computing: The communication perspec- tive,” IEEE Communications Surveys & Tutorials, vol. 19, no. 4, pp. 2322–2358, 2017
2017
-
[32]
Energy-efficiency computation offloading strategy in uav aided v2x network with integrated sensing and communication,
Q. Liu, H. Liang, R. Luo, and Q. Liu, “Energy-efficiency computation offloading strategy in uav aided v2x network with integrated sensing and communication,” IEEE Open Journal of the Communications Society, vol. 3, pp. 1337–1346, 2022
2022
-
[33]
Energy-efficient joint task offloading and resource allocation in ofdma-based collaborative edge computing,
L. Tan, Z. Kuang, L. Zhao, and A. Liu, “Energy-efficient joint task offloading and resource allocation in ofdma-based collaborative edge computing,” IEEE Transactions on Wireless Communications, vol. 21, no. 3, pp. 1960–1972, 2022
1960
-
[34]
The surprising effectiveness of ppo in cooperative, multi-agent games,
C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, and Y. Wu, “The surprising effectiveness of ppo in cooperative, multi-agent games,” in Advances in Neural Information Pro- cessing Systems, vol. 35, 2022, pp. 24 611–24 624
2022
-
[35]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[36]
Collaborative task offloading and resource allocation in small- cell mec: A multi-agent ppo-based scheme,
H. Li, K. Xiong, Y. Lu, W. Chen, P. Fan, and K. B. Letaief, “Collaborative task offloading and resource allocation in small- cell mec: A multi-agent ppo-based scheme,” IEEE Transactions on Mobile Computing, vol. 24, no. 3, pp. 2346–2359, 2025
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.