pith. sign in

arxiv: 2601.12373 · v1 · pith:JSGENODTnew · submitted 2026-01-18 · 💻 cs.CV · cs.HC· cs.RO

CD-TWINSAFE: A ROS-enabled Digital Twin for Scene Understanding and Safety Emerging V2I Technology

Pith reviewed 2026-05-21 15:04 UTC · model grok-4.3

classification 💻 cs.CV cs.HCcs.RO
keywords digital twinV2I communicationautonomous vehiclesscene understandingROSsafety alertsstereo cameraUnreal Engine
0
0 comments X

The pith

A vehicle sends stereo camera perception data over 4G to keep an infrastructure digital twin updated and return safety alerts in real time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an architecture that runs two stacks at once for autonomous vehicles. One stack sits on the vehicle and uses a stereo camera at 20 frames per second to detect objects, measure their speed and yaw, and calculate time-to-collision and time-headway. These results travel as custom messages across a ROS-enabled 4G link to an infrastructure side that maintains a matching scene inside Unreal Engine 5. The digital twin then issues safety alerts back to the vehicle cockpit. A reader would care because the setup shows how infrastructure can supplement limited on-board sensing to support safer driving decisions.

Core claim

The CD-TWINSAFE architecture runs an on-board stack that obtains ego-vehicle pose from sensors and processes 20-fps stereo images through object detection and feature extraction to obtain velocity, yaw, time-to-collision and time-headway. These data are packaged into custom ROS2 messages and transmitted over UDP on a 4G modem to the infrastructure side. There a digital twin replica in Unreal Engine 5 updates the positions of the ego vehicle and detected objects and returns safety alerts to the cockpit. Tests across multiple driving scenarios confirm that the combined system maintains real-time response.

What carries the argument

ROS-enabled V2I communication that carries localization and perception outputs from the vehicle to update the Unreal Engine 5 scene replica in real time.

If this is right

  • Safety alerts generated from the shared digital twin can reach the vehicle cockpit in addition to on-board calculations.
  • Multiple driving scenarios can be tested to verify that the V2I link keeps the twin synchronized with the real scene.
  • The architecture supports emerging vehicle-to-infrastructure technology by separating perception computation from scene visualization.
  • Real-time updates allow the infrastructure to monitor the environment beyond the range of any single vehicle sensor.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar links could let one digital twin serve several nearby vehicles at once for coordinated hazard warnings.
  • Adding other sensor types to the on-board stack might reduce reliance on stereo vision alone in low-light or weather conditions.
  • The same message format could support logging for later safety analysis or regulatory review.

Load-bearing premise

The perception module running at 20 fps on stereo images accurately extracts object velocity, yaw, time-to-collision, and time-headway without significant errors or latency that would invalidate the safety alerts.

What would settle it

A recorded drive in which the digital twin either misses an imminent collision or issues an alert at the wrong moment because the 20-fps perception data arrived late or contained an error in velocity or time-to-collision.

Figures

Figures reproduced from arXiv: 2601.12373 by Amro Khaled, Catherine M. Elias, Farah Khaled, Omar Riad.

Figure 1
Figure 1. Figure 1: The CD-TWINSAFE Architecture Overview The obtained point depth is further calibrated to compensate the 15◦ tilt of the camera mounted on the vehicle. Moreover regarding the implementation of the forth stage -Object Tracking & Kinematic Estimation-, an object tracker class is created to store the history of each object detected for a few frames, even after its disappearance from the camera view, to enhance … view at source ↗
Figure 2
Figure 2. Figure 2: On-board UI showing the state as background color [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: (a) Message Format Used for Byte Encoding, (b) Implemented Class Structure of Digital Twin [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Snapshot of Digital Twin UI IV. RESULTS When comparing the results of the perception pipelines 1 and 2, the results are shown in figure 5. (a) Distance values (b) TTC values [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Values of perception pipeline 1 and 2 in comparison [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Driving stack alert system responding to detections, [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗
read the original abstract

In this paper, the CD-TWINSAFE is introduced, a V2I-based digital twin for Autonomous Vehicles. The proposed architecture is composed of two stacks running simultaneously, an on-board driving stack that includes a stereo camera for scene understanding, and a digital twin stack that runs an Unreal Engine 5 replica of the scene viewed by the camera as well as returning safety alerts to the cockpit. The on-board stack is implemented on the vehicle side including 2 main autonomous modules; localization and perception. The position and orientation of the ego vehicle are obtained using on-board sensors. Furthermore, the perception module is responsible for processing 20-fps images from stereo camera and understands the scene through two complementary pipelines. The pipeline are working on object detection and feature extraction including object velocity, yaw and the safety metrics time-to-collision and time-headway. The collected data form the driving stack are sent to the infrastructure side through the ROS-enabled architecture in the form of custom ROS2 messages and sent over UDP links that ride a 4G modem for V2I communication. The environment is monitored via the digital twin through the shared messages which update the information of the spawned ego vehicle and detected objects based on the real-time localization and perception data. Several tests with different driving scenarios to confirm the validity and real-time response of the proposed architecture.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces CD-TWINSAFE, a V2I-based digital twin for autonomous vehicles consisting of two simultaneously running stacks: an on-board driving stack using stereo cameras for localization and perception (object detection, velocity, yaw, time-to-collision, and time-headway extracted at 20 fps), and an infrastructure-side digital twin in Unreal Engine 5 that receives ROS2 messages over UDP/4G, updates the virtual scene, and returns safety alerts. The authors state that tests in various driving scenarios confirm the architecture's validity and real-time response.

Significance. If the unquantified performance claims are substantiated, the work could offer a practical demonstration of integrating on-board stereo perception with a ROS-enabled digital twin for V2I safety monitoring. The choice of ROS2 for message passing and Unreal Engine 5 for the twin provides a concrete, implementable example that may guide future V2I systems. The absence of any measured latencies, accuracy statistics, or failure modes currently limits its contribution to the literature on real-time scene understanding.

major comments (2)
  1. [Abstract] Abstract: the statement that 'Several tests with different driving scenarios to confirm the validity and real-time response of the proposed architecture' supplies no quantitative results (latency, perception error rates for velocity/yaw/TTC/THW, or ground-truth comparisons). This directly undermines the central claim that the 20 fps on-board perception and 4G UDP V2I link enable reliable real-time safety alerts.
  2. [Perception module description] Perception module description: the pipeline is asserted to extract object velocity, yaw, time-to-collision, and time-headway from 20-fps stereo images, yet no algorithm details, accuracy metrics, or error analysis are provided. Because the safety-alert validity rests on these quantities being sufficiently accurate and low-latency, the lack of evaluation is load-bearing for the architecture's claimed utility.
minor comments (2)
  1. The text contains a grammatical error ('The pipeline are working' should read 'The pipelines are working').
  2. A system diagram illustrating the two stacks, ROS2 message definitions, and data flow between vehicle and infrastructure would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We agree that the current description would benefit from quantitative support to strengthen the real-time performance claims and will revise the paper to address these points directly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the statement that 'Several tests with different driving scenarios to confirm the validity and real-time response of the proposed architecture' supplies no quantitative results (latency, perception error rates for velocity/yaw/TTC/THW, or ground-truth comparisons). This directly undermines the central claim that the 20 fps on-board perception and 4G UDP V2I link enable reliable real-time safety alerts.

    Authors: We agree that quantitative results are required to substantiate the real-time claims. In the revised manuscript we will add a new evaluation subsection reporting measured end-to-end latencies for the on-board perception pipeline and the 4G UDP V2I link, together with accuracy statistics and ground-truth comparisons for velocity, yaw, TTC and THW extracted from the stereo images across the tested scenarios. revision: yes

  2. Referee: [Perception module description] Perception module description: the pipeline is asserted to extract object velocity, yaw, time-to-collision, and time-headway from 20-fps stereo images, yet no algorithm details, accuracy metrics, or error analysis are provided. Because the safety-alert validity rests on these quantities being sufficiently accurate and low-latency, the lack of evaluation is load-bearing for the architecture's claimed utility.

    Authors: We accept that additional detail is needed. The revised perception-module section will describe the concrete algorithms employed for velocity and yaw estimation (including the stereo-based feature tracking and filtering steps) and will include quantitative accuracy metrics and error analysis obtained by comparing the extracted TTC and THW values against ground-truth data available in the Unreal Engine 5 environment. revision: yes

Circularity Check

0 steps flagged

No circularity: purely descriptive systems architecture paper

full rationale

The manuscript describes a ROS2/UDP V2I architecture linking an on-board stereo perception stack (object detection, velocity/yaw/TTC/THW extraction at 20 fps) with an Unreal Engine 5 digital twin that receives custom messages to update ego and object states and issue safety alerts. No equations, fitted parameters, predictions, or derivation steps appear anywhere in the text. Validation is limited to qualitative scenario tests whose outcomes are not shown to reduce to any input quantities by construction. The contribution is therefore self-contained as an engineering description; no load-bearing claim collapses into a self-definition or self-citation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The architecture rests on common robotics assumptions about sensor accuracy and perception reliability without introducing new free parameters or postulated entities.

axioms (2)
  • domain assumption On-board sensors supply accurate ego-vehicle position and orientation.
    Invoked in the localization module description.
  • domain assumption The two perception pipelines correctly compute object velocity, yaw, time-to-collision, and time-headway from 20-fps stereo images.
    Required for the safety metrics that feed the digital twin.

pith-pipeline@v0.9.0 · 5785 in / 1435 out tokens · 62792 ms · 2026-05-21T15:04:09.505813+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 2 internal anchors

  1. [1]

    A matched case-control analysis of autonomous vs.human-driven vehicle accidents,

    M. Abdel-Aty and S. Ding, “A matched case-control analysis of autonomous vs.human-driven vehicle accidents,”Nature Communica- tions, vol. 15, no. 1, p. 4931, 2024

  2. [2]

    A systematic review on the current research of digital twin in automotive application,

    S. Deng, L. Ling, C. Zhang, C. Li, T. Zeng, K. Zhang, and G. Guo, “A systematic review on the current research of digital twin in automotive application,”Internet of Things and Cyber-Physical Systems, vol. 3, pp. 180–191, 2023. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S2667345223000251

  3. [3]

    Absolute distance prediction based on deep learning object detection and monocular depth estimation models,

    A. Masoumian, D. G. Marei, S. Abdulwahab, J. Cristiano, D. Puig, and H. A. Rashwan, “Absolute distance prediction based on deep learning object detection and monocular depth estimation models,” inArtificial Intelligence Research and Development. IOS Press, 2021, pp. 325– 334

  4. [4]

    Teleoperated driving with virtual twin tech- nology: A simulator-based approach,

    K. Kim and S.-C. Kee, “Teleoperated driving with virtual twin tech- nology: A simulator-based approach,”World Electric Vehicle Journal, vol. 15, no. 7, p. 311, 2024

  5. [5]

    Digital twin simulation of connected and automated vehicles with the unity game engine,

    Z. Wang, K. Han, and P. Tiwari, “Digital twin simulation of connected and automated vehicles with the unity game engine,” in2021 IEEE 1st International Conference on Digital Twins and Parallel Intelligence (DTPI). IEEE, 2021, pp. 1–4

  6. [6]

    Tracked inspection robot: Teleop- eration based on ros and unity 3d,

    R. Du, X. Wang, and M. Tian, “Tracked inspection robot: Teleop- eration based on ros and unity 3d,” inProceedings of the 2024 8th International Conference on Big Data and Internet of Things, 2024, pp. 338–343

  7. [7]

    Emerging of v2x paradigm in the development of a ros-based cooperative architecture for transportation system agents,

    C. M. Elias, O. M. Shehata, E. I. Morgan, and C. Stiller, “Emerging of v2x paradigm in the development of a ros-based cooperative architecture for transportation system agents,” in2022 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2022, pp. 1303–1308

  8. [8]

    How to build and validate a safe and reliable autonomous driving stack? a ros based software modular architecture baseline,

    C. G ´omez-Hu´elamo, A. Diaz-Diaz, J. Araluce, M. E. Ortiz, R. Guti ´errez, F. Arango, A. Llamazares, and L. M. Bergasa, “How to build and validate a safe and reliable autonomous driving stack? a ros based software modular architecture baseline,” in2022 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2022, pp. 1282–1289

  9. [9]

    Cooperative architecture for transportation system (cats): Development of a convoy agent for (v/i) 2c applications,

    C. M. Elias, O. M. Shehata, E. I. Morgan, and C. Stiller, “Cooperative architecture for transportation system (cats): Development of a convoy agent for (v/i) 2c applications,” in2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2022, pp. 2335–2340

  10. [10]

    Development of a modular ros-enabled pedestrian intention prediction architecture for avs maneuvering control,

    M. A. Manzour, C. M. Elias, E. I. Morgan, and O. M. Shehata, “Development of a modular ros-enabled pedestrian intention prediction architecture for avs maneuvering control,”IEEE Transactions on Intelligent Transportation Systems, 2024

  11. [11]

    Ros-based model predictive trajectory tracking control architecture using lidar-based mapping and hybrid a planning,

    S. Guirguis, M. Gergis, C. M. Elias, O. M. Shehata, and S. Ab- dennadher, “Ros-based model predictive trajectory tracking control architecture using lidar-based mapping and hybrid a planning,” in2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE, 2019, pp. 2750–2756

  12. [12]

    Yolov8: The latest in real-time object detection,

    Ultralytics, “Yolov8: The latest in real-time object detection,” https: //yolov8.com/, 2025, accessed: 2025-07-05

  13. [13]

    Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

    S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” 2016. [Online]. Available: https://arxiv.org/abs/1506.01497

  14. [14]

    SSD: Single shot multibox detector,

    W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y . Liu, P. Goyal, W. Yuan, X. Liu, Y . Ahet al., “SSD: Single shot multibox detector,” inEuropean Conference on Computer Vision. Springer, 2016, pp. 21–37

  15. [15]

    Bytetrack: Multi-object tracking by associat- ing every detection box.ArXiv, abs/2110.06864,

    Y . Zhang, P. Sun, Y . Jiang, D. Yu, F. Weng, Z. Yuan, P. Luo, W. Liu, and X. Wang, “Bytetrack: Multi-object tracking by associating every detection box,” 2022. [Online]. Available: https://arxiv.org/abs/2110.06864

  16. [16]

    Raft-stereo: Multilevel recurrent field transforms for stereo matching,

    L. Lipson, Z. Teed, and J. Deng, “Raft-stereo: Multilevel recurrent field transforms for stereo matching,” inInternational Conference on 3D Vision (3DV), 2021

  17. [17]

    Rethinking Atrous Convolution for Semantic Image Segmentation

    L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,”arXiv preprint arXiv:1706.05587, 2017

  18. [18]

    Network latency in teleoperation of connected and autonomous vehicles: A review of trends, challenges, and mitigation strategies,

    S. B. Kamtam, Q. Lu, F. Bouali, O. C. Haas, and S. Birrell, “Network latency in teleoperation of connected and autonomous vehicles: A review of trends, challenges, and mitigation strategies,”Sensors, vol. 24, no. 12, p. 3957, 2024