Suicide Risk Assessment from AI-powered Video Surveillance: An Interpretable Framework for Prevention in Metro Stations

Brian Mishara; Guillaume-Alexandre Bilodeau; Safwen Naimi; Wassim Bouachir

arxiv: 2605.22904 · v2 · pith:J7COEMFWnew · submitted 2026-05-21 · 💻 cs.CV · cs.AI

Suicide Risk Assessment from AI-powered Video Surveillance: An Interpretable Framework for Prevention in Metro Stations

Safwen Naimi , Wassim Bouachir , Guillaume-Alexandre Bilodeau , Brian Mishara This is my paper

Pith reviewed 2026-05-25 05:52 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords suicide risk assessmentvideo surveillanceinterpretable frameworkmetro stationsperson trackingactivity recognitionrisk heatmapprevention

0 comments

The pith

Suicide risk in metro stations can be assessed from surveillance video by accumulating evidence from passenger behavior, platform geometry, and trajectories in an interpretable pipeline.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper formalizes suicide risk assessment as a task that requires joint reasoning over each passenger's motion, the station platform layout, and how behaviors evolve across time. It presents a framework that chains person tracking, activity recognition, semantic segmentation of the platform, and trajectory-driven risk heatmaps to turn raw video into risk scores without attempting to read intent directly. The pipeline is run on real surveillance footage and produces measurable output, showing that risk can be treated as accumulated observable evidence rather than an isolated detection problem. A reader would care because this turns prevention from a reactive or purely psychological process into one that could use existing camera feeds for earlier alerts.

Core claim

Suicide Risk Assessment is formalized as a distinct task in metro stations, and the first interpretable framework addresses it by incorporating person tracking, activity recognition, semantic segmentation of the platform, and trajectory-driven risk heatmap modeling, with a complete pipeline benchmarked at 83.2 percent ROC-AUC on real surveillance data.

What carries the argument

trajectory-driven risk heatmap modeling, which aggregates behavioral cues, motion, and platform context over time to produce an interpretable risk score.

If this is right

High-risk situations become identifiable early enough for timely intervention using existing surveillance cameras.
The task is shown to be more complex than isolated subtasks such as activity recognition alone.
Research can now pursue other interpretable AI systems that build evidence from video for social-good applications.
Risk assessment shifts from direct intent inference to aggregation of heterogeneous cues.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pipeline structure could be tested in other public spaces that already have overhead cameras, such as train platforms or airport terminals.
Human operators could be studied to see whether the heatmaps and segmentations improve their ability to decide when to intervene compared with raw video.
Future work could add a feedback loop where actual interventions update the risk model without requiring new labeled video.

Load-bearing premise

Observable video behaviors, trajectories, and platform context provide enough accumulated evidence to assess suicide risk reliably without direct psychological data or confirmed ground-truth labels.

What would settle it

Running the pipeline on incidents where independent psychological evaluations or confirmed intervention records exist, then checking whether the generated risk scores align with those external judgments.

Figures

Figures reproduced from arXiv: 2605.22904 by Brian Mishara, Guillaume-Alexandre Bilodeau, Safwen Naimi, Wassim Bouachir.

**Figure 1.** Figure 1: Architecture overview of our proposed SRA-Framework. It takes a video as input and outputs the complete platform state and a suicide risk assessment for each person. different areas of the platform may warrant preventive attention. It is influenced not only by what actions are performed, but also by where they occur and how long they persist. This formulation is consistent with operational safety practice… view at source ↗

**Figure 2.** Figure 2: Overview of the Platform Semantics Modeling Process: (a–b) Boundary and Offset Lines Estimation, and (c) Final Zone Partitioning. Wt, providing high-level action cues for subsequent risk assessment. In our SRA-Framework, we employ SSTAR pretrained on the ARMM dataset [Naimi et al., 2025], a skeletonbased spatio-temporal action recognition model specifically designed for surveillance scenarios. In our ca… view at source ↗

**Figure 3.** Figure 3: Illustration of Individual At-Risk Heatmaps and their ag [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Feature importance of the eight risk indicators used for [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 6.** Figure 6: Qualitative Results. Output predictions of two frames from surveillance video streams. high risk score (R0=0.98), driven by a combination of a prolonged crossing of the yellow line (11 seconds) and elevated position risk score along the position risk heatmap. Our system also successfully assigned lower suicide risk scores for control individuals (R2=0.12 and R3=0.14). In the second video illustrated in F… view at source ↗

read the original abstract

Understanding and monitoring human behavior in metro stations play an important role in supporting suicide prevention efforts, where early identification of high-risk situations can enable timely intervention. This requires assessing suicide risk from a surveillance video by jointly reasoning about the behavior of each passenger, his/her spatial context, and temporal dynamics. However, this assessment using videos captured by surveillance cameras is challenging, as it demands accurate perception of human motion, understanding of platform geometry, and aggregation of heterogeneous behavioral cues over time. In this work, we formalize the task of Suicide Risk Assessment (SRA) in metro stations and introduce the first interpretable framework that addresses this challenge. Unlike approaches that focus on isolated subtasks or attempt to infer intent directly, our formulation assesses suicide risk from accumulated evidence by incorporating person tracking, activity recognition, semantic segmentation of the platform, and trajectory-driven risk heatmap modeling. By formalizing SRA as a distinct task and benchmarking a complete operational pipeline achieving 83.2% ROC-AUC on real surveillance data, this work highlights the complexity of suicide risk assessment and opens new directions for research on interpretable AI systems for social good.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper formalizes suicide risk assessment as a distinct metro video task and builds a modular pipeline, but the 83.2% AUC cannot be assessed without any data or labeling details.

read the letter

The main thing here is that the authors define Suicide Risk Assessment as its own task in metro stations and assemble a pipeline that combines person tracking, activity recognition, semantic segmentation of the platform, and trajectory-driven risk heatmaps to produce an interpretable output. They report 83.2% ROC-AUC on real surveillance data and position the work as the first such end-to-end framework rather than isolated subtasks or direct intent inference. That modular structure is a reasonable way to accumulate evidence from observable cues while keeping the system interpretable. Credit for treating the problem as more than standard activity recognition in a constrained space. The soft spot is the evaluation. The abstract supplies the AUC number but gives no dataset size, labeling process, class balance, baselines, or validation protocol. Without any description of how the ground-truth risk labels were assigned or confirmed, the performance figure cannot be interpreted as evidence that the pipeline solves the stated task. The stress-test concern about unvalidated surveillance labels holds up directly from the abstract. This paper is for researchers building applied vision systems for public safety or social-good settings. Someone looking for an architecture blueprint that ties together these components might extract useful ideas, but the results are not yet solid enough to rely on. It deserves peer review because the application area is high-stakes and the formalization of the task is substantive enough to merit referee input on the missing experimental details.

Referee Report

1 major / 0 minor

Summary. The paper formalizes Suicide Risk Assessment (SRA) as a distinct task from metro station surveillance video, requiring joint reasoning over passenger behavior, spatial context, and temporal dynamics. It introduces an interpretable framework integrating person tracking, activity recognition, semantic segmentation of the platform, and trajectory-driven risk heatmap modeling. The central empirical claim is that a complete operational pipeline achieves 83.2% ROC-AUC on real surveillance data, thereby highlighting the complexity of the task and opening directions for interpretable AI systems for social good.

Significance. If the performance claim is substantiated with proper dataset and labeling documentation, the work would be significant for establishing SRA as a benchmarkable computer-vision task and for demonstrating an end-to-end, interpretable pipeline on real operational footage. Such a result could usefully direct future research toward behavior aggregation and context modeling in high-stakes public-safety settings.

major comments (1)

[Abstract] Abstract: The claim that the pipeline achieves 83.2% ROC-AUC on real surveillance data is presented without any information on dataset size, collection period, labeling protocol (annotators, inter-annotator agreement, use of psychological records or follow-up confirmation), class balance, baselines, error analysis, or validation protocol (train/test split, cross-validation). This information is load-bearing for interpreting whether the metric constitutes evidence that observable video behaviors and trajectories suffice to assess suicide risk.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the constructive feedback emphasizing the need for comprehensive dataset and evaluation documentation to support the performance claims. We agree these details are essential and will revise the manuscript to strengthen the presentation of the experimental setup. We respond to the major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that the pipeline achieves 83.2% ROC-AUC on real surveillance data is presented without any information on dataset size, collection period, labeling protocol (annotators, inter-annotator agreement, use of psychological records or follow-up confirmation), class balance, baselines, error analysis, or validation protocol (train/test split, cross-validation). This information is load-bearing for interpreting whether the metric constitutes evidence that observable video behaviors and trajectories suffice to assess suicide risk.

Authors: We agree that the abstract lacks sufficient context on these elements and will revise it to include dataset size, collection period, class balance, and a high-level description of the validation protocol. Full details on labeling (performed by domain experts annotating observable high-risk behaviors such as edge proximity and trajectory patterns), inter-annotator agreement, baselines, error analysis, and train/test splits are already present in the Methods and Experiments sections; we will add explicit cross-references from the abstract. We will also clarify that the framework evaluates risk based on aggregated observable video evidence rather than direct intent inference. However, psychological records or follow-up confirmation are unavailable due to the anonymized nature of operational surveillance data. revision: partial

standing simulated objections not resolved

Use of psychological records or follow-up confirmation for labeling, as the dataset consists solely of de-identified surveillance footage without linked individual health data.

Circularity Check

0 steps flagged

No circularity; empirical benchmark independent of inputs

full rationale

The paper formalizes Suicide Risk Assessment (SRA) as a distinct task and reports an empirical ROC-AUC of 83.2% from a complete pipeline (tracking, activity recognition, segmentation, risk heatmaps) evaluated on real surveillance data. No equations, parameter-fitting steps, or self-citations are described that would reduce the performance metric or framework definition to a tautology or fitted input by construction. The central result is an external benchmark rather than a derived quantity forced by the paper's own definitions or prior self-citations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on the domain assumption that video-derived behavioral evidence can proxy suicide risk; no free parameters or invented entities are specified.

axioms (1)

domain assumption Suicide risk can be inferred from accumulated observable behaviors, trajectories, and spatial context in surveillance video
Invoked as the basis for the entire SRA formulation and pipeline

pith-pipeline@v0.9.0 · 5744 in / 1167 out tokens · 27518 ms · 2026-05-25T05:52:46.220098+00:00 · methodology

Suicide Risk Assessment from AI-powered Video Surveillance: An Interpretable Framework for Prevention in Metro Stations

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)