CD-TWINSAFE: A ROS-enabled Digital Twin for Scene Understanding and Safety Emerging V2I Technology
Pith reviewed 2026-05-21 15:04 UTC · model grok-4.3
The pith
A vehicle sends stereo camera perception data over 4G to keep an infrastructure digital twin updated and return safety alerts in real time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The CD-TWINSAFE architecture runs an on-board stack that obtains ego-vehicle pose from sensors and processes 20-fps stereo images through object detection and feature extraction to obtain velocity, yaw, time-to-collision and time-headway. These data are packaged into custom ROS2 messages and transmitted over UDP on a 4G modem to the infrastructure side. There a digital twin replica in Unreal Engine 5 updates the positions of the ego vehicle and detected objects and returns safety alerts to the cockpit. Tests across multiple driving scenarios confirm that the combined system maintains real-time response.
What carries the argument
ROS-enabled V2I communication that carries localization and perception outputs from the vehicle to update the Unreal Engine 5 scene replica in real time.
If this is right
- Safety alerts generated from the shared digital twin can reach the vehicle cockpit in addition to on-board calculations.
- Multiple driving scenarios can be tested to verify that the V2I link keeps the twin synchronized with the real scene.
- The architecture supports emerging vehicle-to-infrastructure technology by separating perception computation from scene visualization.
- Real-time updates allow the infrastructure to monitor the environment beyond the range of any single vehicle sensor.
Where Pith is reading between the lines
- Similar links could let one digital twin serve several nearby vehicles at once for coordinated hazard warnings.
- Adding other sensor types to the on-board stack might reduce reliance on stereo vision alone in low-light or weather conditions.
- The same message format could support logging for later safety analysis or regulatory review.
Load-bearing premise
The perception module running at 20 fps on stereo images accurately extracts object velocity, yaw, time-to-collision, and time-headway without significant errors or latency that would invalidate the safety alerts.
What would settle it
A recorded drive in which the digital twin either misses an imminent collision or issues an alert at the wrong moment because the 20-fps perception data arrived late or contained an error in velocity or time-to-collision.
Figures
read the original abstract
In this paper, the CD-TWINSAFE is introduced, a V2I-based digital twin for Autonomous Vehicles. The proposed architecture is composed of two stacks running simultaneously, an on-board driving stack that includes a stereo camera for scene understanding, and a digital twin stack that runs an Unreal Engine 5 replica of the scene viewed by the camera as well as returning safety alerts to the cockpit. The on-board stack is implemented on the vehicle side including 2 main autonomous modules; localization and perception. The position and orientation of the ego vehicle are obtained using on-board sensors. Furthermore, the perception module is responsible for processing 20-fps images from stereo camera and understands the scene through two complementary pipelines. The pipeline are working on object detection and feature extraction including object velocity, yaw and the safety metrics time-to-collision and time-headway. The collected data form the driving stack are sent to the infrastructure side through the ROS-enabled architecture in the form of custom ROS2 messages and sent over UDP links that ride a 4G modem for V2I communication. The environment is monitored via the digital twin through the shared messages which update the information of the spawned ego vehicle and detected objects based on the real-time localization and perception data. Several tests with different driving scenarios to confirm the validity and real-time response of the proposed architecture.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces CD-TWINSAFE, a V2I-based digital twin for autonomous vehicles consisting of two simultaneously running stacks: an on-board driving stack using stereo cameras for localization and perception (object detection, velocity, yaw, time-to-collision, and time-headway extracted at 20 fps), and an infrastructure-side digital twin in Unreal Engine 5 that receives ROS2 messages over UDP/4G, updates the virtual scene, and returns safety alerts. The authors state that tests in various driving scenarios confirm the architecture's validity and real-time response.
Significance. If the unquantified performance claims are substantiated, the work could offer a practical demonstration of integrating on-board stereo perception with a ROS-enabled digital twin for V2I safety monitoring. The choice of ROS2 for message passing and Unreal Engine 5 for the twin provides a concrete, implementable example that may guide future V2I systems. The absence of any measured latencies, accuracy statistics, or failure modes currently limits its contribution to the literature on real-time scene understanding.
major comments (2)
- [Abstract] Abstract: the statement that 'Several tests with different driving scenarios to confirm the validity and real-time response of the proposed architecture' supplies no quantitative results (latency, perception error rates for velocity/yaw/TTC/THW, or ground-truth comparisons). This directly undermines the central claim that the 20 fps on-board perception and 4G UDP V2I link enable reliable real-time safety alerts.
- [Perception module description] Perception module description: the pipeline is asserted to extract object velocity, yaw, time-to-collision, and time-headway from 20-fps stereo images, yet no algorithm details, accuracy metrics, or error analysis are provided. Because the safety-alert validity rests on these quantities being sufficiently accurate and low-latency, the lack of evaluation is load-bearing for the architecture's claimed utility.
minor comments (2)
- The text contains a grammatical error ('The pipeline are working' should read 'The pipelines are working').
- A system diagram illustrating the two stacks, ROS2 message definitions, and data flow between vehicle and infrastructure would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We agree that the current description would benefit from quantitative support to strengthen the real-time performance claims and will revise the paper to address these points directly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the statement that 'Several tests with different driving scenarios to confirm the validity and real-time response of the proposed architecture' supplies no quantitative results (latency, perception error rates for velocity/yaw/TTC/THW, or ground-truth comparisons). This directly undermines the central claim that the 20 fps on-board perception and 4G UDP V2I link enable reliable real-time safety alerts.
Authors: We agree that quantitative results are required to substantiate the real-time claims. In the revised manuscript we will add a new evaluation subsection reporting measured end-to-end latencies for the on-board perception pipeline and the 4G UDP V2I link, together with accuracy statistics and ground-truth comparisons for velocity, yaw, TTC and THW extracted from the stereo images across the tested scenarios. revision: yes
-
Referee: [Perception module description] Perception module description: the pipeline is asserted to extract object velocity, yaw, time-to-collision, and time-headway from 20-fps stereo images, yet no algorithm details, accuracy metrics, or error analysis are provided. Because the safety-alert validity rests on these quantities being sufficiently accurate and low-latency, the lack of evaluation is load-bearing for the architecture's claimed utility.
Authors: We accept that additional detail is needed. The revised perception-module section will describe the concrete algorithms employed for velocity and yaw estimation (including the stereo-based feature tracking and filtering steps) and will include quantitative accuracy metrics and error analysis obtained by comparing the extracted TTC and THW values against ground-truth data available in the Unreal Engine 5 environment. revision: yes
Circularity Check
No circularity: purely descriptive systems architecture paper
full rationale
The manuscript describes a ROS2/UDP V2I architecture linking an on-board stereo perception stack (object detection, velocity/yaw/TTC/THW extraction at 20 fps) with an Unreal Engine 5 digital twin that receives custom messages to update ego and object states and issue safety alerts. No equations, fitted parameters, predictions, or derivation steps appear anywhere in the text. Validation is limited to qualitative scenario tests whose outcomes are not shown to reduce to any input quantities by construction. The contribution is therefore self-contained as an engineering description; no load-bearing claim collapses into a self-definition or self-citation chain.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption On-board sensors supply accurate ego-vehicle position and orientation.
- domain assumption The two perception pipelines correctly compute object velocity, yaw, time-to-collision, and time-headway from 20-fps stereo images.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The perception module is responsible for processing 20-fps images from stereo camera and understands the scene through two complementary pipelines... object velocity, yaw and the safety metrics time-to-collision and time-headway.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The collected data form the driving stack are sent to the infrastructure side through the ROS-enabled architecture in the form of custom ROS2 messages and sent over UDP links that ride a 4G modem for V2I communication.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A matched case-control analysis of autonomous vs.human-driven vehicle accidents,
M. Abdel-Aty and S. Ding, “A matched case-control analysis of autonomous vs.human-driven vehicle accidents,”Nature Communica- tions, vol. 15, no. 1, p. 4931, 2024
work page 2024
-
[2]
A systematic review on the current research of digital twin in automotive application,
S. Deng, L. Ling, C. Zhang, C. Li, T. Zeng, K. Zhang, and G. Guo, “A systematic review on the current research of digital twin in automotive application,”Internet of Things and Cyber-Physical Systems, vol. 3, pp. 180–191, 2023. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S2667345223000251
work page 2023
-
[3]
A. Masoumian, D. G. Marei, S. Abdulwahab, J. Cristiano, D. Puig, and H. A. Rashwan, “Absolute distance prediction based on deep learning object detection and monocular depth estimation models,” inArtificial Intelligence Research and Development. IOS Press, 2021, pp. 325– 334
work page 2021
-
[4]
Teleoperated driving with virtual twin tech- nology: A simulator-based approach,
K. Kim and S.-C. Kee, “Teleoperated driving with virtual twin tech- nology: A simulator-based approach,”World Electric Vehicle Journal, vol. 15, no. 7, p. 311, 2024
work page 2024
-
[5]
Digital twin simulation of connected and automated vehicles with the unity game engine,
Z. Wang, K. Han, and P. Tiwari, “Digital twin simulation of connected and automated vehicles with the unity game engine,” in2021 IEEE 1st International Conference on Digital Twins and Parallel Intelligence (DTPI). IEEE, 2021, pp. 1–4
work page 2021
-
[6]
Tracked inspection robot: Teleop- eration based on ros and unity 3d,
R. Du, X. Wang, and M. Tian, “Tracked inspection robot: Teleop- eration based on ros and unity 3d,” inProceedings of the 2024 8th International Conference on Big Data and Internet of Things, 2024, pp. 338–343
work page 2024
-
[7]
C. M. Elias, O. M. Shehata, E. I. Morgan, and C. Stiller, “Emerging of v2x paradigm in the development of a ros-based cooperative architecture for transportation system agents,” in2022 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2022, pp. 1303–1308
work page 2022
-
[8]
C. G ´omez-Hu´elamo, A. Diaz-Diaz, J. Araluce, M. E. Ortiz, R. Guti ´errez, F. Arango, A. Llamazares, and L. M. Bergasa, “How to build and validate a safe and reliable autonomous driving stack? a ros based software modular architecture baseline,” in2022 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2022, pp. 1282–1289
work page 2022
-
[9]
C. M. Elias, O. M. Shehata, E. I. Morgan, and C. Stiller, “Cooperative architecture for transportation system (cats): Development of a convoy agent for (v/i) 2c applications,” in2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2022, pp. 2335–2340
work page 2022
-
[10]
M. A. Manzour, C. M. Elias, E. I. Morgan, and O. M. Shehata, “Development of a modular ros-enabled pedestrian intention prediction architecture for avs maneuvering control,”IEEE Transactions on Intelligent Transportation Systems, 2024
work page 2024
-
[11]
S. Guirguis, M. Gergis, C. M. Elias, O. M. Shehata, and S. Ab- dennadher, “Ros-based model predictive trajectory tracking control architecture using lidar-based mapping and hybrid a planning,” in2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE, 2019, pp. 2750–2756
work page 2019
-
[12]
Yolov8: The latest in real-time object detection,
Ultralytics, “Yolov8: The latest in real-time object detection,” https: //yolov8.com/, 2025, accessed: 2025-07-05
work page 2025
-
[13]
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” 2016. [Online]. Available: https://arxiv.org/abs/1506.01497
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[14]
SSD: Single shot multibox detector,
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y . Liu, P. Goyal, W. Yuan, X. Liu, Y . Ahet al., “SSD: Single shot multibox detector,” inEuropean Conference on Computer Vision. Springer, 2016, pp. 21–37
work page 2016
-
[15]
Bytetrack: Multi-object tracking by associat- ing every detection box.ArXiv, abs/2110.06864,
Y . Zhang, P. Sun, Y . Jiang, D. Yu, F. Weng, Z. Yuan, P. Luo, W. Liu, and X. Wang, “Bytetrack: Multi-object tracking by associating every detection box,” 2022. [Online]. Available: https://arxiv.org/abs/2110.06864
-
[16]
Raft-stereo: Multilevel recurrent field transforms for stereo matching,
L. Lipson, Z. Teed, and J. Deng, “Raft-stereo: Multilevel recurrent field transforms for stereo matching,” inInternational Conference on 3D Vision (3DV), 2021
work page 2021
-
[17]
Rethinking Atrous Convolution for Semantic Image Segmentation
L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,”arXiv preprint arXiv:1706.05587, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[18]
S. B. Kamtam, Q. Lu, F. Bouali, O. C. Haas, and S. Birrell, “Network latency in teleoperation of connected and autonomous vehicles: A review of trends, challenges, and mitigation strategies,”Sensors, vol. 24, no. 12, p. 3957, 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.