Two-Stage Framework for Efficient UAV-Based Wildfire Video Analysis with Adaptive Compression and Fire Source Detection
Pith reviewed 2026-05-18 20:50 UTC · model grok-4.3
The pith
A two-stage UAV framework reduces computational costs for wildfire video analysis while preserving accuracy and enabling real-time fire detection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes a lightweight two-stage framework for UAV wildfire video analysis. Stage 1 uses a policy network with a station point mechanism to identify and discard redundant clips, thereby lowering computational costs while operating near real time by incorporating future frame information. Stage 2 applies an improved YOLOv8 model to localize fire sources accurately and in real time only on the retained frames. Evaluations on the FLAME, HMDB51, and Fire & Smoke Detection datasets show significant cost reductions in Stage 1 with maintained classification accuracy and high detection accuracy with real-time inference in Stage 2.
What carries the argument
The station point mechanism within the sequential policy network, which incorporates future frame information to improve the accuracy of decisions on which video clips to discard before passing them to the fire detector.
If this is right
- Computational costs are significantly reduced in Stage 1 while classification accuracy is maintained on the FLAME and HMDB51 datasets.
- Stage 2 achieves high fire source detection accuracy with real-time inference on the Fire & Smoke Detection Dataset.
- The framework supports near-real-time operation suitable for onboard UAV disaster response applications.
- Large models can run efficiently on UAVs with limited resources through selective processing of only relevant frames.
Where Pith is reading between the lines
- The selective clip processing strategy could extend to other long-duration UAV video tasks such as flood monitoring or search-and-rescue operations.
- Joint training of the policy network and detector might further improve the balance between cost savings and detection reliability.
- Real-world UAV flight tests in actual wildfire conditions would be required to validate performance beyond the laboratory datasets used.
Load-bearing premise
The policy network with the station point mechanism accurately discards redundant clips without missing frames that contain emerging or small fire sources.
What would settle it
A test video sequence in which a small or emerging fire source appears in a clip that the policy network discards as redundant, resulting in the fire going undetected by the second stage.
Figures
read the original abstract
Unmanned Aerial Vehicles (UAVs) have become increasingly important in disaster emergency response by facilitating aerial video analysis. Due to the limited computational resources available on UAVs, large models cannot be run efficiently for on-board analysis. To overcome this challenge, we propose a lightweight and efficient two-stage framework for wildfire monitoring and fire source detection on UAV platforms. Specifically, in Stage 1, we utilize a policy network to identify and discard redundant video clips, thereby reducing computational costs. We also introduce a station point mechanism that incorporates future frame information within the sequential policy network to improve prediction accuracy. This mechanism allows Stage 1 to operate in a near-real-time manner. In Stage 2, for frames classified as containing fire, we apply an improved YOLOv8 model to accurately localize the fire source in real-time on selected frames. We evaluate Stage 1 using the FLAME and HMDB51 datasets, and Stage 2 using the Fire & Smoke Detection Dataset. Experimental results show that our method significantly reduces computational costs while maintaining classification accuracy in Stage 1, and achieves high detection accuracy with real-time inference in Stage 2.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a two-stage framework for efficient UAV-based wildfire video analysis. Stage 1 uses a policy network with a station point mechanism to identify and discard redundant video clips, reducing computational costs while maintaining classification accuracy, evaluated on the FLAME and HMDB51 datasets. Stage 2 applies an improved YOLOv8 model for real-time fire source localization on frames classified as containing fire, evaluated on the Fire & Smoke Detection Dataset. The abstract reports positive results on cost reduction and detection accuracy with real-time inference.
Significance. If the performance claims hold under rigorous testing, particularly the safe discarding of clips without missing emerging or small fire sources, the framework could provide a practical advance for on-board UAV wildfire monitoring by enabling efficient analysis on resource-constrained platforms while preserving detection utility.
major comments (2)
- [Abstract and Experimental Results] Abstract and Experimental Results section: The headline claim of significantly reducing computational costs while maintaining classification accuracy in Stage 1 depends on the policy network (with station point mechanism) having a low false-negative rate on clips containing small or emerging fire sources. However, the evaluation uses HMDB51, a generic action recognition dataset whose negative examples do not simulate subtle distant or smoke-obscured ignitions, and no per-class false-negative rates, ablation isolating the station-point contribution on onset frames, or test sets with gradual fire ignition sequences are reported.
- [Stage 1 Method and Evaluation] Stage 1 Method and Evaluation: The manuscript provides no details on baselines, error bars, exact metrics (e.g., precision/recall for fire vs. non-fire clips), or ablation studies, which prevents full assessment of whether the reported accuracy is competitive or if the cost savings preserve overall system utility for the target wildfire use case.
minor comments (2)
- [Abstract] The abstract refers to an 'improved YOLOv8' without specifying the modifications (e.g., architectural changes, loss functions, or training data augmentations).
- [Method Description] Notation and implementation details for the station point mechanism and policy network training hyperparameters are not fully elaborated, which could hinder reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our two-stage UAV wildfire analysis framework. We address the major comments below and have revised the manuscript to improve the evaluation and clarity of Stage 1 results.
read point-by-point responses
-
Referee: [Abstract and Experimental Results] Abstract and Experimental Results section: The headline claim of significantly reducing computational costs while maintaining classification accuracy in Stage 1 depends on the policy network (with station point mechanism) having a low false-negative rate on clips containing small or emerging fire sources. However, the evaluation uses HMDB51, a generic action recognition dataset whose negative examples do not simulate subtle distant or smoke-obscured ignitions, and no per-class false-negative rates, ablation isolating the station-point contribution on onset frames, or test sets with gradual fire ignition sequences are reported.
Authors: We thank the referee for this important observation. FLAME provides wildfire-specific clips while HMDB51 is included to demonstrate generalization of the policy network beyond fire data. We acknowledge that HMDB51 negatives do not explicitly model subtle or smoke-obscured ignitions. The station point mechanism incorporates future-frame context precisely to improve detection of emerging events in sequential clips. In the revised manuscript we will add per-class false-negative rates, an ablation isolating the station-point contribution on onset frames, and a discussion of limitations regarding gradual ignition sequences, along with suggestions for future specialized test sets. revision: partial
-
Referee: [Stage 1 Method and Evaluation] Stage 1 Method and Evaluation: The manuscript provides no details on baselines, error bars, exact metrics (e.g., precision/recall for fire vs. non-fire clips), or ablation studies, which prevents full assessment of whether the reported accuracy is competitive or if the cost savings preserve overall system utility for the target wildfire use case.
Authors: We agree that these details are necessary for rigorous assessment. The revised manuscript will include comparisons against relevant baselines for the policy network, error bars computed over multiple runs, exact precision and recall for fire versus non-fire clip classification, and ablation studies on the station point mechanism and its contribution to end-to-end system utility for resource-constrained UAV wildfire monitoring. revision: yes
Circularity Check
Derivation is self-contained with independent dataset evaluations
full rationale
The paper's two-stage framework (policy network with station-point mechanism in Stage 1 for discarding redundant clips, followed by improved YOLOv8 in Stage 2) is evaluated on independent public datasets: FLAME and HMDB51 for Stage 1 classification accuracy, and Fire & Smoke Detection Dataset for Stage 2 detection. No equations or central claims reduce by construction to fitted parameters presented as predictions, self-definitional loops, or load-bearing self-citations. Efficiency and accuracy results are reported as empirical outcomes against external benchmarks rather than internal redefinitions, making the derivation self-contained.
Axiom & Free-Parameter Ledger
free parameters (2)
- Policy network training hyperparameters
- YOLOv8 improvement parameters
axioms (1)
- domain assumption Station point mechanism incorporates future frame information to improve sequential policy prediction accuracy.
Reference graph
Works this paper leans on
-
[1]
A. Bouguettaya, H. Zarzour, A. M. Taberkit, and A. Kechida, “A review on early wildfire detection from unmanned aerial vehicles using deep learning-based computer vision algorithms,”Signal Processing, vol. 190, p. 108309, 2022
work page 2022
-
[2]
Multi-uav path planning methodology for postdisaster building damage surveying,
R. Nagasawa, E. Mas, L. Moya, and S. Koshimura, “Multi-uav path planning methodology for postdisaster building damage surveying,” 2020. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERV ATIONS AND REMOTE SENSING 11
work page 2020
-
[3]
O. Ozkan, “Optimization of the distance-constrained multi-based multi- uav routing problem with simulated annealing and local search-based matheuristic to detect forest fires: The case of turkey,” Applied Soft Computing, vol. 113, p. 108015, 2021
work page 2021
-
[4]
Model-based analysis of multi-uav path planning for surveying postdisaster building damage,
R. Nagasawa, E. Mas, L. Moya, and S. Koshimura, “Model-based analysis of multi-uav path planning for surveying postdisaster building damage,” Scientific reports, vol. 11, no. 1, pp. 1–14, 2021
work page 2021
-
[5]
Wild- fire detection from multisensor satellite imagery using deep semantic segmentation,
D. Rashkovetsky, F. Mauracher, M. Langer, and M. Schmitt, “Wild- fire detection from multisensor satellite imagery using deep semantic segmentation,” IEEE Journal of Selected Topics in Applied Earth Ob- servations and Remote Sensing , vol. 14, pp. 7001–7016, 2021
work page 2021
-
[6]
A yolo based technique for early forest fire detection,
S. Goyal, M. Shagill, A. Kaur, H. V ohra, and A. Singh, “A yolo based technique for early forest fire detection,” Int. J. Innov. Technol. Explor. Eng.(IJITEE) Vol, vol. 9, pp. 1357–1362, 2020
work page 2020
-
[7]
D. Alexandrov, E. Pertseva, I. Berman, I. Pantiukhin, and A. Kapitonov, “Analysis of machine learning methods for wildfire security monitoring with an unmanned aerial vehicles,” in 2019 24th conference of open innovations association (FRUCT) , pp. 3–9, IEEE, 2019
work page 2019
-
[8]
F. A. Hossain, Y . M. Zhang, and M. A. Tonima, “Forest fire flame and smoke detection from uav-captured images using fire-specific color fea- tures and multi-color space local binary pattern,” Journal of Unmanned Vehicle Systems, vol. 8, no. 4, pp. 285–309, 2020
work page 2020
-
[9]
J. Zhan, Y . Hu, W. Cai, G. Zhou, and L. Li, “Pdam–stpnnet: a small target detection approach for wildland fire smoke through remote sensing images,” Symmetry, vol. 13, no. 12, p. 2260, 2021
work page 2021
-
[10]
L. Zhao, J. Hu, J. Bi, Y . Bai, E. Mas, and S. Koshimura, “Streamlin- ing forest wildfire surveillance: Ai-enhanced uavs utilizing the flame aerial video dataset for lightweight and efficient monitoring,” in 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 8063–8068, IEEE, 2024
work page 2024
-
[11]
Digital twin computing for enhancing resilience of disaster response system,
S. Koshimura and E. Mas, “Digital twin computing for enhancing resilience of disaster response system,” in EGU General Assembly Conference Abstracts, pp. EGU–11756, 2023
work page 2023
-
[12]
A. Piergiovanni, A. Angelova, and M. S. Ryoo, “Tiny video networks,” Applied AI Letters , vol. 3, no. 1, p. e38, 2022
work page 2022
-
[13]
Video classification with channel-separated convolutional networks,
D. Tran, H. Wang, L. Torresani, and M. Feiszli, “Video classification with channel-separated convolutional networks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , pp. 5552– 5561, 2019
work page 2019
-
[14]
Light-weight semantic segmentation network for uav remote sensing images,
S. Liu, J. Cheng, L. Liang, H. Bai, and W. Dang, “Light-weight semantic segmentation network for uav remote sensing images,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , vol. 14, pp. 8287–8296, 2021
work page 2021
-
[15]
Deepcore: A comprehensive library for coreset selection in deep learning,
C. Guo, B. Zhao, and Y . Bai, “Deepcore: A comprehensive library for coreset selection in deep learning,” DEXA, 2022
work page 2022
-
[16]
Ar-net: Adaptive frame resolution for effi- cient action recognition,
Y . Meng, C.-C. Lin, R. Panda, P. Sattigeri, L. Karlinsky, A. Oliva, K. Saenko, and R. Feris, “Ar-net: Adaptive frame resolution for effi- cient action recognition,” in European Conference on Computer Vision , pp. 86–104, Springer, 2020
work page 2020
-
[17]
Y . Bai, X. Wu, L. Xu, J. Pei, E. Mas, and S. Koshimura, “Towards effi- cient disaster response via cost-effective unbiased class rate estimation through neyman allocation stratified sampling active learning,” arXiv preprint arXiv:2405.17734, 2024
-
[18]
Smoke detection on video sequences using 3d convolutional neural networks,
G. Lin, Y . Zhang, G. Xu, and Q. Zhang, “Smoke detection on video sequences using 3d convolutional neural networks,” Fire Technology, vol. 55, pp. 1827–1847, 2019
work page 2019
-
[19]
Tsunami flow measurement using the video recorded during the 2011 tohoku tsunami attack,
S. Koshimura and S. Hayashi, “Tsunami flow measurement using the video recorded during the 2011 tohoku tsunami attack,” in 2012 IEEE International Geoscience and Remote Sensing Symposium , pp. 6693– 6696, IEEE, 2012
work page 2011
-
[20]
Remote sensing approach for mapping and monitoring tsunami debris,
S. Koshimura and T. Fukuoka, “Remote sensing approach for mapping and monitoring tsunami debris,” in IGARSS 2019-2019 IEEE Inter- national Geoscience and Remote Sensing Symposium , pp. 4829–4832, IEEE, 2019
work page 2019
-
[21]
G. Jocher, A. Chaurasia, and J. Qiu, “Yolo by ultralytics,” Code repository, 2023
work page 2023
-
[22]
Ocsampler: Compress- ing videos to one clip with single-step sampling,
J. Lin, H. Duan, K. Chen, D. Lin, and L. Wang, “Ocsampler: Compress- ing videos to one clip with single-step sampling,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pp. 13894–13903, 2022
work page 2022
-
[23]
Adaframe: Adaptive frame selection for fast video recognition,
Z. Wu, C. Xiong, C.-Y . Ma, R. Socher, and L. S. Davis, “Adaframe: Adaptive frame selection for fast video recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1278–1287, 2019
work page 2019
-
[24]
Dynamic network quantization for efficient video inference,
X. Sun, R. Panda, C.-F. R. Chen, A. Oliva, R. Feris, and K. Saenko, “Dynamic network quantization for efficient video inference,” in Pro- ceedings of the IEEE/CVF International Conference on Computer Vi- sion, pp. 7375–7385, 2021
work page 2021
-
[25]
Scsampler: Sampling salient clips from video for efficient action recognition,
B. Korbar, D. Tran, and L. Torresani, “Scsampler: Sampling salient clips from video for efficient action recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , pp. 6232– 6242, 2019
work page 2019
-
[26]
Adafuse: Adaptive temporal fusion network for efficient action recognition,
Y . Meng, R. Panda, C.-C. Lin, P. Sattigeri, L. Karlinsky, K. Saenko, A. Oliva, and R. Feris, “Adafuse: Adaptive temporal fusion network for efficient action recognition,” arXiv preprint arXiv:2102.05775 , 2021
-
[27]
Activitynet: A large-scale video benchmark for human activity under- standing,
F. Caba Heilbron, V . Escorcia, B. Ghanem, and J. Carlos Niebles, “Activitynet: A large-scale video benchmark for human activity under- standing,” in Proceedings of the ieee conference on computer vision and pattern recognition, pp. 961–970, 2015
work page 2015
-
[28]
Y .-G. Jiang, Z. Wu, J. Wang, X. Xue, and S.-F. Chang, “Exploiting feature and class relationships in video categorization with regularized deep neural networks,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 2, pp. 352–364, 2017
work page 2017
-
[29]
The Kinetics Human Action Video Dataset
W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijaya- narasimhan, F. Viola, T. Green, T. Back, P. Natsev, et al., “The kinetics human action video dataset,” arXiv preprint arXiv:1705.06950 , 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[30]
End-to-end learning of action detection from frame glimpses in videos,
S. Yeung, O. Russakovsky, G. Mori, and L. Fei-Fei, “End-to-end learning of action detection from frame glimpses in videos,” inProceedings of the IEEE conference on computer vision and pattern recognition , pp. 2678– 2687, 2016
work page 2016
-
[31]
Smart frame selection for action recognition,
S. N. Gowda, M. Rohrbach, and L. Sevilla-Lara, “Smart frame selection for action recognition,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 1451–1459, 2021
work page 2021
-
[32]
Mgsampler: An explainable sampling strategy for video action recognition,
Y . Zhi, Z. Tong, L. Wang, and G. Wu, “Mgsampler: An explainable sampling strategy for video action recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , pp. 1513– 1522, 2021
work page 2021
-
[33]
Assessing the effectiveness of yolo architectures for smoke and wildfire detection,
E. Casas, L. Ramos, E. Bendek, and F. Rivas-Echeverr ´ıa, “Assessing the effectiveness of yolo architectures for smoke and wildfire detection,” IEEE Access, vol. 11, pp. 96554–96583, 2023
work page 2023
-
[34]
A study of yolo architectures for wildfire and smoke detection in ground and aerial imagery,
L. T. Ramos, E. Casas, C. Romero, F. Rivas-Echeverr ´ıa, and E. Bendek, “A study of yolo architectures for wildfire and smoke detection in ground and aerial imagery,” Results in Engineering , vol. 26, p. 104869, 2025
work page 2025
-
[35]
Squeeze-and-excitation networks,
J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141, 2018
work page 2018
-
[36]
Cbam: Convolutional block attention module,
S. Woo, J. Park, J.-Y . Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV) , pp. 3–19, 2018
work page 2018
-
[37]
Eca-net: Efficient channel attention for deep convolutional neural networks,
Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, “Eca-net: Efficient channel attention for deep convolutional neural networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11534–11542, 2020
work page 2020
-
[38]
Sa-net: Shuffle attention for deep con- volutional neural networks,
Q.-L. Zhang and Y .-B. Yang, “Sa-net: Shuffle attention for deep con- volutional neural networks,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 2235–2239, IEEE, 2021
work page 2021
-
[39]
mixup: Beyond Empirical Risk Minimization
H. Zhang, M. Cisse, Y . N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” arXiv preprint arXiv:1710.09412 , 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[40]
Y . Wu and K. He, “Group normalization,” in Proceedings of the European conference on computer vision (ECCV) , pp. 3–19, 2018
work page 2018
-
[41]
Liteeval: A coarse-to- fine framework for resource efficient video recognition,
Z. Wu, C. Xiong, Y .-G. Jiang, and L. S. Davis, “Liteeval: A coarse-to- fine framework for resource efficient video recognition,” Advances in Neural Information Processing Systems , vol. 32, 2019
work page 2019
-
[42]
Categorical Reparameterization with Gumbel-Softmax
E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with gumbel-softmax,” arXiv preprint arXiv:1611.01144 , 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[43]
Cspnet: A new backbone that can enhance learning capability of cnn,
C.-Y . Wang, H.-Y . M. Liao, Y .-H. Wu, P.-Y . Chen, J.-W. Hsieh, and I.-H. Yeh, “Cspnet: A new backbone that can enhance learning capability of cnn,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops , pp. 390–391, 2020
work page 2020
- [44]
-
[45]
Designing network design strategies through gradient path analysis,
C.-Y . Wang, H.-Y . M. Liao, and I.-H. Yeh, “Designing network design strategies through gradient path analysis,” arXiv preprint arXiv:2211.04800, 2022
-
[46]
Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,
C.-Y . Wang, A. Bochkovskiy, and H.-Y . M. Liao, “Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7464–7475, 2023
work page 2023
-
[47]
Feature pyramid networks for object detection,
T.-Y . Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition , pp. 2117– 2125, 2017. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERV ATIONS AND REMOTE SENSING 12
work page 2017
-
[48]
Path aggregation network for instance segmentation,
S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition , pp. 8759–8768, 2018
work page 2018
-
[49]
Faster r-cnn: Towards real-time object detection with region proposal networks,
S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems , vol. 28, 2015
work page 2015
-
[50]
Centernet: Keypoint triplets for object detection,
K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian, “Centernet: Keypoint triplets for object detection,” in Proceedings of the IEEE/CVF international conference on computer vision , pp. 6569–6578, 2019
work page 2019
-
[51]
R.-Y . Ju, C.-T. Chien, and J.-S. Chiang, “Yolov8-rescbam: Yolov8 based on an effective attention module for pediatric wrist fracture detection,” arXiv preprint arXiv:2409.18826 , 2024
-
[52]
Yolov8-am: Yolov8 based on effective attention mechanisms for pedi- atric wrist fracture detection,
C.-T. Chien, R.-Y . Ju, K.-Y . Chou, E. Xieerke, and J.-S. Chiang, “Yolov8-am: Yolov8 based on effective attention mechanisms for pedi- atric wrist fracture detection,” IEEE Access, vol. 13, pp. 52461–52477, 2025
work page 2025
-
[53]
X. Li, W. Wang, L. Wu, S. Chen, X. Hu, J. Li, J. Tang, and J. Yang, “Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection,” Advances in Neural Information Processing Systems, vol. 33, pp. 21002–21012, 2020
work page 2020
-
[54]
Z. Zheng, P. Wang, D. Ren, W. Liu, R. Ye, Q. Hu, and W. Zuo, “Enhancing geometric factors in model learning and inference for object detection and instance segmentation,” IEEE transactions on cybernetics, vol. 52, no. 8, pp. 8574–8586, 2021
work page 2021
-
[55]
Distance-iou loss: Faster and better learning for bounding box regression,
Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, “Distance-iou loss: Faster and better learning for bounding box regression,” in Proceedings of the AAAI conference on artificial intelligence , vol. 34, pp. 12993– 13000, 2020
work page 2020
-
[56]
Aerial imagery pile burn detection using deep learning: The flame dataset,
A. Shamsoshoara, F. Afghah, A. Razi, L. Zheng, P. Z. Ful ´e, and E. Blasch, “Aerial imagery pile burn detection using deep learning: The flame dataset,” Computer Networks, vol. 193, p. 108001, 2021
work page 2021
-
[57]
Hmdb: a large video database for human motion recognition,
H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “Hmdb: a large video database for human motion recognition,” in 2011 Interna- tional conference on computer vision , pp. 2556–2563, IEEE, 2011
work page 2011
-
[58]
A. Akhtamov, “Fire & smoke dataset.” https://www.kaggle.com/datasets/ azimjaan21/fire-and-smoke-dataset-object-detection-yolo, 2023
work page 2023
-
[59]
Mobilenetv2: Inverted residuals and linear bottlenecks,
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition , pp. 4510–4520, 2018
work page 2018
-
[60]
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
K. Cho, B. Van Merri ¨enboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y . Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[61]
Imagenet: A large-scale hierarchical image database,
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition , pp. 248–255, Ieee, 2009
work page 2009
-
[62]
Deep residual learning for image recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , pp. 770–778, 2016
work page 2016
-
[63]
Tsm: Temporal shift module for efficient video understanding,
J. Lin, C. Gan, and S. Han, “Tsm: Temporal shift module for efficient video understanding,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , pp. 7083–7093, 2019
work page 2019
-
[64]
An overview of gradient descent optimization algorithms
S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv preprint arXiv:1609.04747 , 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[65]
Y . Zhang and J. Yan, “Crossformer: Transformer utilizing cross- dimension dependency for multivariate time series forecasting,” in The eleventh international conference on learning representations , 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.