arxiv: 2605.11972 · v1 · submitted 2026-05-12 · 💻 cs.RO · cs.AI· cs.ET· cs.SY· eess.SY

Recognition: no theorem link

Cooperative Robotics Reinforced by Collective Perception for Traffic Moderation

Mohammad Khoshkdahan , John Pravin Arockiasamy , Andy Flores Comeca , Alexey Vinel

Authors on Pith no claims yet

Pith reviewed 2026-05-13 05:01 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.ETcs.SYeess.SY

keywords cooperative roboticscollective perceptionV2X communicationnon-line-of-sight intersectionstraffic safetyhumanoid robotsensor fusiontraffic moderation

0 comments

The pith

A humanoid robot fuses camera and V2X data to detect hazards and physically block unsafe merges at non-line-of-sight intersections.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Collisions at non-line-of-sight intersections remain dangerous because drivers lack visibility of approaching traffic and many vehicles lack V2X equipment or ignore digital alerts. This paper proposes adding a cooperative humanoid robot that receives collective perception messages from roadside cameras and cooperative awareness messages from connected vehicles. These streams are fused to maintain a continuous view of the main road, a zone of danger is defined to predict collision risks for any merging vehicle, and the robot responds with a STOP gesture plus physical blocking until the hazard clears. Experiments confirm that the dual perception pathways enable early detection and reliable intervention in real-world non-line-of-sight settings. The approach directly influences unconnected drivers that digital warnings alone cannot reach.

Core claim

The paper claims that a cooperative humanoid robot reinforced by collective perception can maintain a robust real-time view of approaching vehicles at non-line-of-sight intersections by fusing dual-camera infrastructure data transmitted as collective perception messages with V2X cooperative awareness and decentralized environmental notification messages, define a zone of danger to predict whether a merging vehicle faces an imminent collision risk, and intervene with a human-like STOP gesture and physical blocking of the merge path until the hazard passes, as validated through real-world deployment and testing.

What carries the argument

The dual perception pathways (vision-based collective perception messages from infrastructure cameras and V2X cooperative awareness messages) combined in a fusion module, together with the zone of danger definition that triggers the robot's STOP gesture and physical blocking action.

If this is right

The robot can extend safety coverage to vehicles without V2X equipment by direct physical intervention.
The system can relay decentralized environmental notification messages from other road segments.
Parallel vision and V2X pathways provide redundancy that single-channel alerts lack.
Early hazard prediction via the zone of danger enables proactive rather than reactive moderation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Robots of this type could reduce the required penetration rate of V2X-equipped vehicles for intersection safety.
Similar physical moderators might be applied at construction zones or school crossings where visibility is limited.
Integration with existing traffic signals could create layered moderation that combines digital, visual, and physical cues.
Longer-term tests in dense traffic or poor visibility would be required to establish whether false-negative rates remain low outside the reported conditions.

Load-bearing premise

The premise that drivers will reliably obey the robot's STOP gesture and physical blocking rather than ignore it or collide with the robot itself, and that the fused perception will produce no critical false negatives across varying weather and traffic conditions.

What would settle it

A recorded case in which the fused perception misses an approaching vehicle or the driver proceeds through the merge despite the robot's visible STOP gesture and blocking position.

Figures

Figures reproduced from arXiv: 2605.11972 by Alexey Vinel, Andy Flores Comeca, John Pravin Arockiasamy, Mohammad Khoshkdahan.

**Figure 2.** Figure 2: Front and rear views of the ARI robot with its sensing, computation [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Detection output of the dual camera infrastructure system showing [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Experimental setup at the FMP. (a) Test site map showing the main [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

read the original abstract

Collisions at non-line-of-sight (NLOS) intersections remain a major safety concern because drivers have limited visibility of approaching traffic. V2X based warnings can reduce these risks, yet many vehicles are not equipped with V2X and drivers may ignore in vehicle alerts. Collective perception (CP) can compensate for low V2X penetration by extending the awareness of connected vehicles, but it cannot influence unconnected vehicles. To fill this gap, our work introduces a complementary concept that adds a cooperative humanoid robot as an active traffic moderator capable of physically stopping a vehicle that attempts to merge into an unseen traffic stream. The system operates on two parallel perception pathways. A dual camera infrastructure unit detects the position, speed and motion of approaching vehicles and transmits this information to the robot as a collective perception message (CPM). The robot also receives cooperative awareness messages (CAM) from connected vehicles through its onboard V2X unit and can act as a relay for decentralized environmental notification messages (DENM) when safety events originate elsewhere along the road. A fusion module combines these streams to maintain a robust real time view of the main road. A Zone of Danger (ZoD) is defined and used to predict whether an approaching vehicle creates a collision risk for a merging road user. When such a risk is detected, the robot issues a human-like STOP gesture and blocks the merging path until the hazard disappears. The full system was deployed at the Future Mobility Park (FMP) in Rotterdam. Experiments show that the combined vision and V2X perception allows the robot to detect approaching vehicles early, predict hazards reliably and prevent unsafe merges in real world NLOS conditions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The robot-as-physical-moderator concept is a fresh angle on NLOS safety, but the paper supplies no numbers to show the fused perception or driver compliance actually works.

read the letter

The paper puts a humanoid robot in the role of active traffic moderator at blind intersections. It fuses dual-camera collective perception messages with V2X CAM and DENM data, defines a Zone of Danger, and has the robot issue a STOP gesture plus physical block when a hazard is predicted. They ran the full setup at the Future Mobility Park in Rotterdam. That integration of existing pieces into a physical intervention is the main new element; most prior CP and V2X work stays in the digital warning layer and does not cross into blocking unconnected vehicles directly.

Referee Report

2 major / 1 minor

Summary. The paper introduces a cooperative humanoid robot system that uses dual-camera collective perception messages (CPM) and V2X communications (CAM/DENM) to detect approaching vehicles at non-line-of-sight intersections, define a Zone of Danger (ZoD) for hazard prediction, and physically intervene with a STOP gesture to prevent unsafe merges by unconnected vehicles. The system was deployed at the Future Mobility Park (FMP) in Rotterdam, with the abstract claiming that experiments demonstrate early detection, reliable hazard prediction, and prevention of unsafe merges in real-world NLOS conditions.

Significance. If the unprovided experimental data were to confirm the claims with quantitative evidence of detection accuracy, low false negatives, and effective driver compliance, this work could offer a significant advancement in traffic safety by extending collective perception to active robotic moderation, particularly in scenarios with low V2X penetration rates. It integrates robotics, sensor fusion, and V2X in a practical deployment.

major comments (2)

[Abstract] Abstract: The assertion that 'experiments show that the combined vision and V2X perception allows the robot to detect approaching vehicles early, predict hazards reliably and prevent unsafe merges in real world NLOS conditions' lacks any supporting quantitative metrics, such as detection rates, latency, false-negative counts, success rates for interventions, or analysis of driver responses to the STOP gesture. This makes the central claims unverifiable based on the provided manuscript.
[Experimental evaluation] The manuscript describes the dual-camera CPM pipeline, V2X relay, fusion module, and ZoD definition but supplies no detection rates, latency figures, false-negative counts, success rates for interventions, or analysis of driver responses, leaving the transition from system deployment to validated prevention of unsafe merges unsupported.

minor comments (1)

[System description] The Zone of Danger (ZoD) concept is central but would benefit from an explicit mathematical definition or parameter specification to support reproducibility and analysis of its hazard prediction logic.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We agree that the current manuscript lacks the quantitative metrics needed to substantiate the claims in the abstract and experimental sections. We will revise the paper to include these details from our deployment at the Future Mobility Park.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion that 'experiments show that the combined vision and V2X perception allows the robot to detect approaching vehicles early, predict hazards reliably and prevent unsafe merges in real world NLOS conditions' lacks any supporting quantitative metrics, such as detection rates, latency, false-negative counts, success rates for interventions, or analysis of driver responses to the STOP gesture. This makes the central claims unverifiable based on the provided manuscript.

Authors: We agree that the abstract claims require supporting quantitative metrics to be verifiable. In the revised version, we will update the abstract to include key results from the FMP experiments, such as detection rates, average latency, false-negative counts, intervention success rates, and observations on driver compliance with the STOP gesture. revision: yes
Referee: [Experimental evaluation] The manuscript describes the dual-camera CPM pipeline, V2X relay, fusion module, and ZoD definition but supplies no detection rates, latency figures, false-negative counts, success rates for interventions, or analysis of driver responses, leaving the transition from system deployment to validated prevention of unsafe merges unsupported.

Authors: We acknowledge that the experimental evaluation section does not currently provide these quantitative figures. We will add a dedicated results subsection with tables, figures, and analysis reporting detection rates, latency, false-negative counts, intervention success rates, and driver response data to demonstrate the validated prevention of unsafe merges. revision: yes

Circularity Check

0 steps flagged

No significant circularity; purely descriptive system deployment without equations or self-referential fits

full rationale

The paper presents an engineering description of a humanoid robot traffic moderator using dual-camera collective perception messages (CPM), V2X CAM/DENM relay, sensor fusion, and a Zone of Danger (ZoD) definition to trigger STOP gestures. No derivation chain, mathematical equations, fitted parameters, or predictive models appear in the abstract or described full text. Claims of 'reliable' hazard prediction and prevention rest on deployment at FMP Rotterdam rather than any self-contained computation that reduces to its own inputs. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps. This matches the expected non-circular outcome for a prototype system paper whose central content is external sensor data and physical testing.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review limits visibility into parameters; system introduces ZoD definition and fusion logic without stated numerical thresholds or independent validation.

axioms (1)

domain assumption Drivers will respond to the robot's human-like STOP gesture by stopping their vehicle.
Central to the blocking mechanism described in the abstract.

invented entities (1)

Zone of Danger (ZoD) no independent evidence
purpose: To predict collision risk for merging vehicles based on fused perception data.
Defined in the abstract as the decision trigger for robot action; no independent evidence provided.

pith-pipeline@v0.9.0 · 5622 in / 1271 out tokens · 45439 ms · 2026-05-13T05:01:56.712176+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 1 internal anchor

[1]

Impact of intersection angle on highway safety,

D. L. Harkey, B. Lan, R. Srinivasan, W. Kumfer, D. Carteret al., “Impact of intersection angle on highway safety,” United States. Federal Highway Administration. Office of Safety Research and . . . , Tech. Rep., 2021

work page 2021
[2]

A hybrid approach for identifying factors affecting driver reaction time using naturalistic driving data,

N. Arbabzadeh, M. Jafari, M. Jalayer, S. Jiang, and M. Kharbeche, “A hybrid approach for identifying factors affecting driver reaction time using naturalistic driving data,”Transportation research part C: emerging technologies, vol. 100, pp. 107–124, 2019

work page 2019
[3]

An analysis of visibility requirements and reaction times of near-field projections,

T. Schl ¨urscheid, A. Stuckert, A. Erkan, and T. Q. Khanh, “An analysis of visibility requirements and reaction times of near-field projections,” Applied Sciences, vol. 14, no. 2, p. 872, 2024

work page 2024
[4]

Analysis and evaluation of information redundancy mitigation for v2x collective perception,

Q. Delooz, A. Willecke, K. Garlichs, A.-C. Hagau, L. Wolf, A. Vinel, and A. Festag, “Analysis and evaluation of information redundancy mitigation for v2x collective perception,”IEEE access, vol. 10, pp. 47 076–47 093, 2022

work page 2022
[5]

Intelligent transport systems (its); vehicular communications; basic set of applications; part 2: Specification of cooperative awareness basic service,

ETSI, “Intelligent transport systems (its); vehicular communications; basic set of applications; part 2: Specification of cooperative awareness basic service,” no. EN 302 637-2 v1.4.1 (2020-05)

work page 2020
[6]

Intelligent transport systems (its); vehicular communications; collective perception service,

“Intelligent transport systems (its); vehicular communications; collective perception service,” no. TS 103 324. ETSI, 2022

work page 2022
[7]

Intelligent transport systems (its); vehicular communications; basic set of applications; part 3: Specifications of decentralized environmental notification basic service,

“Intelligent transport systems (its); vehicular communications; basic set of applications; part 3: Specifications of decentralized environmental notification basic service,” no. EN 302 637-3. ETSI, 2019

work page 2019
[8]

Automated vehicle to vehicle conflict analysis at signalized intersec- tions by camera and lidar sensor fusion,

A. M. Anisha, M. Abdel-Aty, A. Abdelraouf, Z. Islam, and O. Zheng, “Automated vehicle to vehicle conflict analysis at signalized intersec- tions by camera and lidar sensor fusion,”Transportation research record, vol. 2677, no. 5, pp. 117–132, 2023

work page 2023
[9]

Triband-bev: Real-time lidar-only 3d pedestrian detection via height-aware bev and high-resolution feature fusion,

M. Khoshkdahan and A. Vinel, “Triband-bev: Real-time lidar-only 3d pedestrian detection via height-aware bev and high-resolution feature fusion,” inProceedings of the 25th International Conference on Au- tonomous Agents and Multiagent Systems, 2026

work page 2026
[10]

Fair-ped: Fairness evaluation in pedestrian detection using clip,

M. Khoshkdahan, N. Kj ¨ar, and F. B. Flohr, “Fair-ped: Fairness evaluation in pedestrian detection using clip,” in2025 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2025, pp. 1504–1509

work page 2025
[11]

Beyond overall accuracy: Pose-and occlusion-driven fairness analysis in pedes- trian detection for autonomous driving,

M. Khoshkdahan, A. Akbari, A. Akbari, and X. Zhang, “Beyond overall accuracy: Pose-and occlusion-driven fairness analysis in pedes- trian detection for autonomous driving,” inInternational Conference on Intelligent Transportation Systems (ITSC). IEEE, 2025

work page 2025
[12]

A multi-sensor video/lidar system for analyzing intersection safety,

A. Wu, T. Banerjee, K. Chen, A. Rangarajan, and S. Ranka, “A multi-sensor video/lidar system for analyzing intersection safety,” in 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), 2023, pp. 1158–1165

work page 2023
[13]

Cooperation of v2i/p2i commu- nication and roadside radar perception for the safety of vulnerable road users,

W. Liu, S. Muramatsu, and Y . Okubo, “Cooperation of v2i/p2i commu- nication and roadside radar perception for the safety of vulnerable road users,” in2018 16th International Conference on Intelligent Transporta- tion Systems Telecommunications (ITST). IEEE, 2018, pp. 1–7

work page 2018
[14]

Longitudinal- scanline-based arterial traffic video analytics with coordinate transforma- tion assisted by 3d infrastructure data,

T. T. Zhang, M. Guo, P. J. Jin, Y . Ge, and J. Gong, “Longitudinal- scanline-based arterial traffic video analytics with coordinate transforma- tion assisted by 3d infrastructure data,”Transportation Research Record, vol. 2675, no. 3, pp. 338–357, 2021

work page 2021
[15]

Arow: V2x- based automated right-of-way algorithm for cooperative intersection management,

G. Shah, D. Tian, E. Moradi-Pari, and Y . P. Fallah, “Arow: V2x- based automated right-of-way algorithm for cooperative intersection management,”IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 9, pp. 10 983–10 999, 2024

work page 2024
[16]

V-fcw: Vector-based forward collision warning algorithm for curved road conflicts using v2x networks,

X. Cai, B. Lv, H. Yao, T. Yang, and H. Dai, “V-fcw: Vector-based forward collision warning algorithm for curved road conflicts using v2x networks,”Accident Analysis & Prevention, vol. 210, p. 107836, 2025

work page 2025
[17]

A hardware-in-the-loop evaluation of the impact of the v2x channel on the traffic-safety versus efficiency trade- offs,

A. Bazzi, T. Blazek, M. Menarini, B. M. Masini, A. Zanella, C. Meck- lenbr¨auker, and G. Ghiaasi, “A hardware-in-the-loop evaluation of the impact of the v2x channel on the traffic-safety versus efficiency trade- offs,” in2020 14th European Conference on Antennas and Propagation (EuCAP). IEEE, 2020, pp. 1–5

work page 2020
[18]

Cooperative collision avoidance in a connected vehicle environment,

S. Y . Gelbal, S. Zhu, G. A. Anantharaman, B. A. Guvenc, and L. Guvenc, “Cooperative collision avoidance in a connected vehicle environment,” arXiv preprint arXiv:2306.01889, 2023

work page arXiv 2023
[19]

Interaction effects of pedestrian behavior, smartphone distraction and external communication of automated vehicles on crossing and gaze behavior,

M. Lanzer, I. Koniakowsky, M. Colley, and M. Baumann, “Interaction effects of pedestrian behavior, smartphone distraction and external communication of automated vehicles on crossing and gaze behavior,” inProceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 2023, pp. 1–18

work page 2023
[20]

Towards a smart robot model for traffic signal management in developing countries,

A. Najjar, H. Prakash, I. Tchappi, J. E. Ndamlabin Mboula, and Y . Mualla, “Towards a smart robot model for traffic signal management in developing countries,” inProceedings of the 10th International Conference on Human-Agent Interaction, 2022, pp. 333–336

work page 2022
[21]

Controlling traffic with humanoid social robot,

F. Ghaffar, “Controlling traffic with humanoid social robot,”arXiv preprint arXiv:2204.04240, 2022

work page arXiv 2022
[22]

Robots for safer pedestrian crossing on two-lane roads,

A. L. F. Comeca, N. Masarykova, M. Halinkovic, M. Galinski, P. Laskov, and A. Vinel, “Robots for safer pedestrian crossing on two-lane roads,” in2025 IEEE International Automated Vehicle Validation Conference (IAVVC), 2025, pp. 1–6

work page 2025
[23]

Social robots for road safety: Pedestrian crossing assistance use- case,

——, “Social robots for road safety: Pedestrian crossing assistance use- case,” in2025 International Symposium ELMAR. IEEE, 2025, pp. 53–56

work page 2025
[24]

Design and development of autonomous drone traffic control system,

S. Mahendran, S. Paul, N. Ashraf, B. Anbarasu, S. Seralathanet al., “Design and development of autonomous drone traffic control system,” in2025 3rd International Conference on Artificial Intelligence and Machine Learning Applications Theme: Healthcare and Internet of Things (AIMLA). IEEE, 2025, pp. 1–6

work page 2025
[25]

Uav-based mec-assisted automated traffic management scheme using blockchain,

M. Masuduzzaman, A. Islam, K. Sadia, and S. Y . Shin, “Uav-based mec-assisted automated traffic management scheme using blockchain,” Future Generation Computer Systems, vol. 134, pp. 256–270, 2022

work page 2022
[26]

Urban traffic monitoring and analysis using unmanned aerial vehicles (uavs): A systematic literature review,

E. V . Butil ˘a and R. G. Boboc, “Urban traffic monitoring and analysis using unmanned aerial vehicles (uavs): A systematic literature review,” Remote Sensing, vol. 14, no. 3, p. 620, 2022

work page 2022
[27]

YOLOv12: Attention-Centric Real-Time Object Detectors

Y . Tian, Q. Ye, and D. Doermann, “Yolov12: Attention-centric real-time object detectors,”arXiv preprint arXiv:2502.12524, 2025

work page internal anchor Pith review arXiv 2025
[28]

Vulnerable road users safety in infrastructure assisted intelligent transportation system,

V . R. S. Banjade, S. C. Jha, K. Sivanesan, L. G. Baltar, S. A. Sehra, and S. J. Tan, “Vulnerable road users safety in infrastructure assisted intelligent transportation system,” in2021 IEEE International Smart Cities Conference (ISC2). IEEE, 2021, pp. 1–7

work page 2021