A Deep Learning-Based CCTV System for Automatic Smoking Detection in Fire Exit Zones
Pith reviewed 2026-05-18 22:58 UTC · model grok-4.3
The pith
A custom YOLO-based model detects smoking in fire exit zones with 78.9 percent recall and 83.7 percent mAP at 50.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that a custom object detection model derived from YOLOv8 with added structures for challenging surveillance contexts outperforms YOLOv8, YOLOv11, and YOLOv12. Evaluated on 8,124 images from 20 different scenarios and 2,708 low-light raw samples, the model achieves a recall of 78.90 percent and mAP at 50 of 83.70 percent. It further shows real-time suitability with inference times of 52 to 97 milliseconds on the Jetson Xavier NX, supporting deployment for public safety monitoring and regulatory compliance.
What carries the argument
Custom YOLOv8-derived object detection model with added structures for challenging surveillance contexts
If this is right
- The system supports real-time automatic monitoring of smoking violations in fire exit areas.
- Edge device performance enables deployment without constant cloud connectivity.
- Automatic detection aids regulatory compliance by logging potential violations.
- The approach provides a base for adapting similar detection to other safety rules in public spaces.
Where Pith is reading between the lines
- The model could connect to alarm systems that notify staff or authorities the moment smoking is detected.
- Similar customizations might apply to spotting other restricted actions such as open flames or blocked exits in the same zones.
Load-bearing premise
The dataset of images from 20 scenarios and low-light areas represents the full range of real-world fire exit zone CCTV conditions and lets the model generalize without overfitting.
What would settle it
Running the custom model on a new collection of live CCTV videos from actual fire exit zones never seen during training and finding that recall drops well below 78.9 percent or mAP at 50 falls below 83.7 percent.
read the original abstract
A deep learning real-time smoking detection system for CCTV surveillance of fire exit areas is proposed due to critical safety requirements. The dataset contains 8,124 images from 20 different scenarios along with 2,708 raw samples demonstrating low-light areas. We evaluated three advanced object detection models: YOLOv8, YOLOv11, and YOLOv12, followed by development of a custom model derived from YOLOv8 with added structures for challenging surveillance contexts. The proposed model outperformed the others, achieving a recall of 78.90 percent and mAP at 50 of 83.70 percent, delivering optimal object detection across varied environments. Performance evaluation on multiple edge devices using multithreaded operations showed the Jetson Xavier NX processed data at 52 to 97 milliseconds per inference, establishing its suitability for time-sensitive operations. This system offers a robust and adaptable platform for monitoring public safety and enabling automatic regulatory compliance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a real-time deep learning system for detecting smoking in fire exit CCTV zones. It describes a dataset of 8,124 images collected from 20 scenarios plus 2,708 low-light samples, evaluates YOLOv8/v11/v12, introduces a custom YOLOv8-derived model with added structures, reports that the custom model achieves 78.90% recall and 83.70% mAP@50, and demonstrates inference on edge devices (Jetson Xavier NX at 52–97 ms per frame).
Significance. If the performance claims hold under proper validation, the work could offer a deployable tool for automated safety monitoring and regulatory compliance in restricted public spaces, with practical emphasis on low-light conditions and edge-device efficiency. The empirical focus on multiple YOLO variants and hardware testing provides a concrete baseline for surveillance applications.
major comments (2)
- [Dataset and Experiments] Dataset description: the claim that the custom model delivers 'optimal object detection across varied environments' rests on 8,124 images from only 20 scenarios plus 2,708 low-light samples, yet no quantitative metrics of scenario diversity (camera angles, densities, partial occlusions, heights) or selection criteria are supplied. This directly undermines the generalization asserted in the abstract and results.
- [Results and Evaluation] Evaluation protocol: no details are given on train/validation/test splits, cross-validation, or leakage prevention. Without these, the headline metrics (recall 78.90%, mAP@50 83.70%) cannot be interpreted as evidence of robustness rather than possible overfitting to the collected data, which is load-bearing for the central performance claim.
minor comments (1)
- [Abstract] Abstract: 'mAP at 50' should be written consistently as mAP@50 or mAP50 to match standard object-detection notation.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the revisions planned for the next version.
read point-by-point responses
-
Referee: [Dataset and Experiments] Dataset description: the claim that the custom model delivers 'optimal object detection across varied environments' rests on 8,124 images from only 20 scenarios plus 2,708 low-light samples, yet no quantitative metrics of scenario diversity (camera angles, densities, partial occlusions, heights) or selection criteria are supplied. This directly undermines the generalization asserted in the abstract and results.
Authors: We agree that additional quantitative details on scenario diversity would strengthen the generalization claims. The current manuscript provides only the high-level count of 20 scenarios and low-light samples without metrics on camera angles, densities, occlusions, or heights, nor explicit selection criteria. In the revised manuscript we will add a table and accompanying text quantifying these attributes across the scenarios and describing the collection protocol and selection criteria. revision: yes
-
Referee: [Results and Evaluation] Evaluation protocol: no details are given on train/validation/test splits, cross-validation, or leakage prevention. Without these, the headline metrics (recall 78.90%, mAP@50 83.70%) cannot be interpreted as evidence of robustness rather than possible overfitting to the collected data, which is load-bearing for the central performance claim.
Authors: We acknowledge that the absence of evaluation protocol details is a limitation. The manuscript does not currently describe the train/validation/test splits, any cross-validation procedure, or leakage-prevention steps. We will revise the results section to include these specifics, stating the split ratios, confirming scenario-level separation to avoid leakage, and noting whether cross-validation was performed along with its rationale. revision: yes
Circularity Check
No circularity: empirical evaluation on collected dataset with standard metrics
full rationale
The paper describes dataset collection (8124 images from 20 scenarios plus low-light samples), training of YOLOv8/11/12 and a custom variant, and direct reporting of recall/mAP on test splits. No equations, derivations, or first-principles results exist that could reduce to self-definition or fitted inputs by construction. Central performance claims rest on empirical measurement against held-out data rather than any self-citation chain or ansatz smuggled from prior work. The representativeness assumption affects generalization risk but does not create circularity in the reported results, which remain falsifiable on the described test set.
Axiom & Free-Parameter Ledger
free parameters (1)
- Added structures in custom YOLOv8 model
axioms (1)
- domain assumption YOLO family models are appropriate base architectures for real-time smoking detection in varied lighting and angles.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinctionreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
custom model that retains core features of YOLOv8 while introducing enhancements for low-light conditions and varied camera angles... achieved the highest recall (78.90%)... mAP@50 score (83.70%)
-
IndisputableMonolith/Cost/FunctionalEquationwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
dataset of 8,124 images from 20 different scenarios... augmented by rotating them, adjusting exposure, and injecting a small amount (0.1%) of noise
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.