Assessing the Feasibility of a Video-Based Conversational Chatbot Survey for Measuring Perceived Cycling Safety: A Pilot Study in New York City
Pith reviewed 2026-05-10 17:56 UTC · model grok-4.3
The pith
Video-based conversational AI chatbots can feasibly collect in-the-moment perceptions of cycling safety.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The study shows that a video-based conversational chatbot survey is feasible for measuring perceived cycling safety, as demonstrated by structured interactions with sixteen participants across nine New York City street segments, positive user experience and usability ratings, and the successful application of natural language processing to extract built-environment attributes, cluster reasons and suggestions, and regress safety outcomes against environmental and demographic variables.
What carries the argument
The modular LLM architecture chatbot integrating prompt engineering, state management, and rule-based control to structure human-AI conversations that capture safety perceptions and reasons during video viewing.
If this is right
- Built-environment attributes linked to safety can be extracted directly from open-ended responses using keyword extraction tools.
- Semantic clustering of responses identifies recurring reasons for safety perceptions and user suggestions for improvements.
- Regression models can quantify the influence of street features and rider demographics on perceived safety scores.
- The approach enables collection of data on future visions for transport planning in addition to current perceptions.
Where Pith is reading between the lines
- The method could be adapted to study perceptions for other transport modes such as walking or public transit.
- It may complement traditional surveys by providing richer, context-specific data that reduces reliance on long-term recall.
- Scaling the chatbot to larger participant groups could support city-wide infrastructure decisions based on aggregated perceptual maps.
Load-bearing premise
The chatbot interactions after watching selected videos produce unbiased, in-the-moment perceptions of cycling safety without meaningful influence from the AI's design or the particular video clips chosen.
What would settle it
A direct comparison where the same participants rate the same streets both through the chatbot after videos and immediately after cycling them in person, with large differences in reported safety levels or reasons undermining the method.
Figures
read the original abstract
Bicycle safety is important for bikeability and transportation efficiency. However, conventional surveys often fall short in capturing how people actually perceive cycling environments because they rely heavily on respondents' recall rather than in-the-moment experience. By leveraging large language models (LLMs), this study proposes a new method of combining video-based surveys with a conversational AI chatbot to collect human perceptions of cycling safety and the reasons behind these perceptions. The paper developed the AI chatbot using a modular LLM architecture, integrating prompt engineering, state management, and rule-based control to support the structure of human-AI interaction. This paper evaluates the feasibility of the proposed video-based conversational chatbot using complete responses from sixteen participants to the pilot survey across nine street segments in New York City. The method feasibility was assessed using a seven-point scale rating for user experience (i.e., ease of use, supportiveness, efficiency) and a five-point scale for chatbot usability (i.e., personality, roboticness, friendliness), yielding positive results with mean scores of 5.00 out of 7 (standard deviation = 1.6) and 3.47 out of 5 (standard deviation = 0.43), respectively. The data feasibility was assessed using multiple techniques: (1) Natural language processing (NLP), such as KeyBERT, for overall safety and feature analysis to extract built-environment attributes; (2) K-means clustering for semantic analysis to identify reasons and suggestions; and (3) regression to estimate the effects of built-environment and demographic variables on perceived safety outcomes. The results show the potential of AI chatbots as a novel approach to collecting data on human perception, behavior, and future visions for transport planning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes and pilots a video-based conversational chatbot survey using a modular LLM architecture (with prompt engineering, state management, and rule-based control) to capture in-the-moment perceptions of cycling safety. Feasibility is assessed via UX ratings (mean 5/7, SD=1.6) and usability scores (mean 3.47/5, SD=0.43) from N=16 participants across nine NYC street segments, followed by KeyBERT extraction of built-environment attributes, K-means clustering of reasons/suggestions, and regression on perceived safety.
Significance. If the central claim holds after addressing validation gaps, the work demonstrates a promising direction for richer, context-aware data collection in transportation planning that goes beyond recall-based surveys. The modular chatbot design is a concrete implementation strength that could be extended, though the pilot scale keeps immediate impact modest.
major comments (3)
- [Abstract / Results] Abstract and results on data feasibility: The K-means clustering, KeyBERT analysis, and regression to estimate effects of built-environment and demographic variables on perceived safety are performed on only 16 responses; with high variability (SD=1.6 on the 7-point UX scale), these analyses have low power and are sensitive to outliers or design choices, weakening the claim that the method feasibly yields reliable perceptual data.
- [Methods] Methods section on chatbot implementation: No control arm, non-chatbot survey comparison, or ablation of the prompt/state/rule-based components is reported, so it is impossible to rule out that extracted attributes, clusters, or regression coefficients reflect AI dialogue steering rather than unbiased participant perceptions of cycling safety.
- [Discussion] Discussion or limitations: The feasibility conclusion rests on self-reported UX without external validation against established cycling safety instruments or in-the-moment measures (e.g., think-aloud protocols), leaving open whether the positive scores (5/7 and 3.47/5) indicate genuine data quality or simply acceptable interaction.
minor comments (2)
- [Abstract] The abstract reports SD=0.43 for usability but does not specify the scale anchors or provide item-level breakdowns; adding these would improve interpretability of the 3.47/5 mean.
- [Methods] Clarify in the methods how the nine street segments and associated videos were selected and whether they represent a range of safety conditions.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify the scope and limitations of this pilot study. We address each major point below and will make targeted revisions to better frame the exploratory nature of the work.
read point-by-point responses
-
Referee: [Abstract / Results] Abstract and results on data feasibility: The K-means clustering, KeyBERT analysis, and regression to estimate effects of built-environment and demographic variables on perceived safety are performed on only 16 responses; with high variability (SD=1.6 on the 7-point UX scale), these analyses have low power and are sensitive to outliers or design choices, weakening the claim that the method feasibly yields reliable perceptual data.
Authors: We agree that N=16 and the observed variability limit the power and robustness of the secondary analyses. As this is a pilot study, the primary aim was to assess technical and user-experience feasibility of the chatbot approach; the KeyBERT, clustering, and regression results are intended as illustrative examples of extractable data rather than definitive inferences. In revision we will reframe the abstract and results to emphasize the exploratory character of these analyses and add explicit discussion of sample-size limitations and sensitivity to the limitations section. revision: partial
-
Referee: [Methods] Methods section on chatbot implementation: No control arm, non-chatbot survey comparison, or ablation of the prompt/state/rule-based components is reported, so it is impossible to rule out that extracted attributes, clusters, or regression coefficients reflect AI dialogue steering rather than unbiased participant perceptions of cycling safety.
Authors: The lack of a control arm or component ablation is a genuine limitation of the current pilot, which focused on demonstrating a working modular implementation rather than comparative validation. The rule-based control layer was introduced precisely to constrain dialogue flow and reduce steering, yet without a non-chatbot baseline we cannot empirically isolate its effect. We will expand the methods and limitations sections to describe these design choices more fully and to state clearly that future work must include controlled comparisons to assess potential AI influence on the collected perceptions. revision: partial
-
Referee: [Discussion] Discussion or limitations: The feasibility conclusion rests on self-reported UX without external validation against established cycling safety instruments or in-the-moment measures (e.g., think-aloud protocols), leaving open whether the positive scores (5/7 and 3.47/5) indicate genuine data quality or simply acceptable interaction.
Authors: Self-reported UX is the standard initial metric for feasibility pilots, but we recognize it does not substitute for external validation of data quality. We will revise the discussion and limitations sections to acknowledge this gap explicitly, note that positive UX scores demonstrate acceptable interaction but do not yet confirm perceptual accuracy, and outline plans for future validation against established instruments or think-aloud protocols. revision: partial
Circularity Check
No significant circularity; empirical pilot with direct data collection
full rationale
The paper presents a pilot study that develops a modular LLM chatbot, collects responses from N=16 participants on video-based cycling safety perceptions, and analyzes them via standard off-the-shelf techniques (KeyBERT for attribute extraction, K-means for clustering reasons, and regression for variable effects). Feasibility is assessed through direct self-reported UX and usability scales with no fitted parameters renamed as predictions, no self-definitional loops, and no load-bearing self-citations or uniqueness theorems. The central claim that the method shows potential for collecting perception data follows from the observed participant scores and extracted patterns rather than reducing to the input design by construction. This is a standard empirical feasibility assessment with independent content.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The seven-point and five-point scales accurately measure user experience and usability.
- domain assumption NLP methods like KeyBERT and K-means can reliably extract safety-related features and reasons from chatbot responses.
Reference graph
Works this paper leans on
-
[1]
Afzalan, N. and Muller, B. (2014). The role of social media in green infrastructure planning: A case study of neighborhood participation in park siting.Journal of Urban Technology, 21(3):67–83, ISSN:1466-1853, DOI:10.1080/10630732.2014.940701,http://dx.doi.org/10.1080/10630732.2014.940701. Al Sayyed, H. and Al-Azhari, W . (2025). Investigating the role of...
-
[2]
Kwon, J.-H. and Cho, G.-H. (2020). An examination of the intersection environment associated with per- ceived crash risk among school-aged children: using street-level imagery and computer vision.Acci- dent Analysis & Prevention, 146:105716, ISSN:0001-4575, DOI:10.1016/j.aap.2020.105716,http: //dx.doi.org/10.1016/j.aap.2020.105716. Lawson, A. R., Pakrashi...
-
[3]
Nankervis, M. (1999). The effect of weather and climate on bicycle commuting.Transportation Research Part A: Policy and Practice, 33(6):417–431, ISSN:0965-8564, DOI:10.1016/s0965-8564(98)00022-6, http://dx.doi.org/10.1016/S0965-8564(98)00022-6. New York City Department of City Planning (2024). Digital city map (dcm).https://www.nyc.gov/site/ planning/data-...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.