pith. sign in

arxiv: 2605.22904 · v2 · pith:J7COEMFWnew · submitted 2026-05-21 · 💻 cs.CV · cs.AI

Suicide Risk Assessment from AI-powered Video Surveillance: An Interpretable Framework for Prevention in Metro Stations

Pith reviewed 2026-05-25 05:52 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords suicide risk assessmentvideo surveillanceinterpretable frameworkmetro stationsperson trackingactivity recognitionrisk heatmapprevention
0
0 comments X

The pith

Suicide risk in metro stations can be assessed from surveillance video by accumulating evidence from passenger behavior, platform geometry, and trajectories in an interpretable pipeline.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper formalizes suicide risk assessment as a task that requires joint reasoning over each passenger's motion, the station platform layout, and how behaviors evolve across time. It presents a framework that chains person tracking, activity recognition, semantic segmentation of the platform, and trajectory-driven risk heatmaps to turn raw video into risk scores without attempting to read intent directly. The pipeline is run on real surveillance footage and produces measurable output, showing that risk can be treated as accumulated observable evidence rather than an isolated detection problem. A reader would care because this turns prevention from a reactive or purely psychological process into one that could use existing camera feeds for earlier alerts.

Core claim

Suicide Risk Assessment is formalized as a distinct task in metro stations, and the first interpretable framework addresses it by incorporating person tracking, activity recognition, semantic segmentation of the platform, and trajectory-driven risk heatmap modeling, with a complete pipeline benchmarked at 83.2 percent ROC-AUC on real surveillance data.

What carries the argument

trajectory-driven risk heatmap modeling, which aggregates behavioral cues, motion, and platform context over time to produce an interpretable risk score.

If this is right

  • High-risk situations become identifiable early enough for timely intervention using existing surveillance cameras.
  • The task is shown to be more complex than isolated subtasks such as activity recognition alone.
  • Research can now pursue other interpretable AI systems that build evidence from video for social-good applications.
  • Risk assessment shifts from direct intent inference to aggregation of heterogeneous cues.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same pipeline structure could be tested in other public spaces that already have overhead cameras, such as train platforms or airport terminals.
  • Human operators could be studied to see whether the heatmaps and segmentations improve their ability to decide when to intervene compared with raw video.
  • Future work could add a feedback loop where actual interventions update the risk model without requiring new labeled video.

Load-bearing premise

Observable video behaviors, trajectories, and platform context provide enough accumulated evidence to assess suicide risk reliably without direct psychological data or confirmed ground-truth labels.

What would settle it

Running the pipeline on incidents where independent psychological evaluations or confirmed intervention records exist, then checking whether the generated risk scores align with those external judgments.

Figures

Figures reproduced from arXiv: 2605.22904 by Brian Mishara, Guillaume-Alexandre Bilodeau, Safwen Naimi, Wassim Bouachir.

Figure 1
Figure 1. Figure 1: Architecture overview of our proposed SRA-Framework. It takes a video as input and outputs the complete platform state and a suicide risk assessment for each person. different areas of the platform may warrant preventive atten￾tion. It is influenced not only by what actions are performed, but also by where they occur and how long they persist. This formulation is consistent with operational safety practice… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the Platform Semantics Modeling Process: (a–b) Boundary and Offset Lines Estimation, and (c) Final Zone Partitioning. Wt, providing high-level action cues for subsequent risk as￾sessment. In our SRA-Framework, we employ SSTAR pre￾trained on the ARMM dataset [Naimi et al., 2025], a skeleton￾based spatio-temporal action recognition model specifically designed for surveillance scenarios. In our ca… view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of Individual At-Risk Heatmaps and their ag [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Feature importance of the eight risk indicators used for [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative Results. Output predictions of two frames from surveillance video streams. high risk score (R0=0.98), driven by a combination of a pro￾longed crossing of the yellow line (11 seconds) and elevated position risk score along the position risk heatmap. Our sys￾tem also successfully assigned lower suicide risk scores for control individuals (R2=0.12 and R3=0.14). In the second video illustrated in F… view at source ↗
read the original abstract

Understanding and monitoring human behavior in metro stations play an important role in supporting suicide prevention efforts, where early identification of high-risk situations can enable timely intervention. This requires assessing suicide risk from a surveillance video by jointly reasoning about the behavior of each passenger, his/her spatial context, and temporal dynamics. However, this assessment using videos captured by surveillance cameras is challenging, as it demands accurate perception of human motion, understanding of platform geometry, and aggregation of heterogeneous behavioral cues over time. In this work, we formalize the task of Suicide Risk Assessment (SRA) in metro stations and introduce the first interpretable framework that addresses this challenge. Unlike approaches that focus on isolated subtasks or attempt to infer intent directly, our formulation assesses suicide risk from accumulated evidence by incorporating person tracking, activity recognition, semantic segmentation of the platform, and trajectory-driven risk heatmap modeling. By formalizing SRA as a distinct task and benchmarking a complete operational pipeline achieving 83.2% ROC-AUC on real surveillance data, this work highlights the complexity of suicide risk assessment and opens new directions for research on interpretable AI systems for social good.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper formalizes Suicide Risk Assessment (SRA) as a distinct task from metro station surveillance video, requiring joint reasoning over passenger behavior, spatial context, and temporal dynamics. It introduces an interpretable framework integrating person tracking, activity recognition, semantic segmentation of the platform, and trajectory-driven risk heatmap modeling. The central empirical claim is that a complete operational pipeline achieves 83.2% ROC-AUC on real surveillance data, thereby highlighting the complexity of the task and opening directions for interpretable AI systems for social good.

Significance. If the performance claim is substantiated with proper dataset and labeling documentation, the work would be significant for establishing SRA as a benchmarkable computer-vision task and for demonstrating an end-to-end, interpretable pipeline on real operational footage. Such a result could usefully direct future research toward behavior aggregation and context modeling in high-stakes public-safety settings.

major comments (1)
  1. [Abstract] Abstract: The claim that the pipeline achieves 83.2% ROC-AUC on real surveillance data is presented without any information on dataset size, collection period, labeling protocol (annotators, inter-annotator agreement, use of psychological records or follow-up confirmation), class balance, baselines, error analysis, or validation protocol (train/test split, cross-validation). This information is load-bearing for interpreting whether the metric constitutes evidence that observable video behaviors and trajectories suffice to assess suicide risk.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the constructive feedback emphasizing the need for comprehensive dataset and evaluation documentation to support the performance claims. We agree these details are essential and will revise the manuscript to strengthen the presentation of the experimental setup. We respond to the major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that the pipeline achieves 83.2% ROC-AUC on real surveillance data is presented without any information on dataset size, collection period, labeling protocol (annotators, inter-annotator agreement, use of psychological records or follow-up confirmation), class balance, baselines, error analysis, or validation protocol (train/test split, cross-validation). This information is load-bearing for interpreting whether the metric constitutes evidence that observable video behaviors and trajectories suffice to assess suicide risk.

    Authors: We agree that the abstract lacks sufficient context on these elements and will revise it to include dataset size, collection period, class balance, and a high-level description of the validation protocol. Full details on labeling (performed by domain experts annotating observable high-risk behaviors such as edge proximity and trajectory patterns), inter-annotator agreement, baselines, error analysis, and train/test splits are already present in the Methods and Experiments sections; we will add explicit cross-references from the abstract. We will also clarify that the framework evaluates risk based on aggregated observable video evidence rather than direct intent inference. However, psychological records or follow-up confirmation are unavailable due to the anonymized nature of operational surveillance data. revision: partial

standing simulated objections not resolved
  • Use of psychological records or follow-up confirmation for labeling, as the dataset consists solely of de-identified surveillance footage without linked individual health data.

Circularity Check

0 steps flagged

No circularity; empirical benchmark independent of inputs

full rationale

The paper formalizes Suicide Risk Assessment (SRA) as a distinct task and reports an empirical ROC-AUC of 83.2% from a complete pipeline (tracking, activity recognition, segmentation, risk heatmaps) evaluated on real surveillance data. No equations, parameter-fitting steps, or self-citations are described that would reduce the performance metric or framework definition to a tautology or fitted input by construction. The central result is an external benchmark rather than a derived quantity forced by the paper's own definitions or prior self-citations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on the domain assumption that video-derived behavioral evidence can proxy suicide risk; no free parameters or invented entities are specified.

axioms (1)
  • domain assumption Suicide risk can be inferred from accumulated observable behaviors, trajectories, and spatial context in surveillance video
    Invoked as the basis for the entire SRA formulation and pipeline

pith-pipeline@v0.9.0 · 5744 in / 1167 out tokens · 27518 ms · 2026-05-25T05:52:46.220098+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.