pith. sign in

arxiv: 2604.03267 · v1 · submitted 2026-03-19 · 💻 cs.CV · cs.AI

A reconfigurable smart camera implementation for jet flames characterization based on an optimized segmentation model

Pith reviewed 2026-05-15 07:51 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords jet flameUNetFPGAsegmentationsmart camerareal-timefire detectionedge computing
0
0 comments X

The pith

Optimized UNet on SoC FPGA delivers 30 FPS jet flame segmentation after 125x parameter reduction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper describes a reconfigurable smart camera built on an Ultra96 FPGA board to characterize jet flames in real time for industrial fire safety. The authors optimize a UNet segmentation model using the Vitis framework to shrink it from 7.5 million parameters down to 59,095, a 125-fold reduction. This change, combined with multi-threading and batch normalization, cuts latency by 7.5 times and reaches 30 frames per second on the device. The Dice score measuring segmentation quality remains unchanged from the full-precision version. The full pipeline runs entirely on the edge hardware, avoiding external processing delays.

Core claim

By mapping an optimized UNet model onto the reconfigurable logic of the Ultra96 SoC FPGA, the system achieves real-time 30 FPS performance for jet flame segmentation. The optimization reduces the model to 59,095 parameters from 7.5 million and improves latency by 7.5x through Vitis-driven pruning, multi-threading, and batch normalization, all while preserving the original Dice Score accuracy on the evaluated jet flame imagery.

What carries the argument

The Vitis-optimized UNet segmentation model deployed on the Ultra96 SoC FPGA's reconfigurable fabric for parallel execution of the fire segmentation pipeline.

If this is right

  • Industrial fire safety systems can now perform segmentation and characterization locally at video rates.
  • The reduced model size allows deployment on other resource-limited edge devices.
  • Replicable setup enables extension to additional fire types and safety scenarios.
  • Lower latency supports quicker automated responses in hazardous environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same optimization pipeline could speed up other computer vision tasks on FPGAs beyond flame detection.
  • Real-time edge processing might reduce the need for cloud connectivity in safety-critical monitoring.
  • Future versions could incorporate additional sensors for multi-modal flame analysis.

Load-bearing premise

That the accuracy preservation after optimization generalizes to new jet flame images outside the specific experimental dataset.

What would settle it

Running the deployed model on a diverse collection of jet flame videos from different industrial settings and verifying whether the Dice Score stays at the reported level.

Figures

Figures reproduced from arXiv: 2604.03267 by Adriana Palacios, Alba \`Agueda, Carmina Perez Guerrero, Eduardo Gardu\~no, Elsa Pastor, Gerardo Rodriguez-Hernandez, Gerardo Valente Vazquez-Garcia, Gilberto Ochoa-Ruiz, Miguel Gonzalez-Mendoza, Vahid Foroughi.

Figure 1
Figure 1. Figure 1: Different types of flames patterns in industrial settings: pool, ball and jet fires. Modified from Palacios (2011) [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visual example of jet fire characterization based on segmentation of three temperature zones. a fixed emissive power to determine the thermal radiation load on a target. Research done so far to characterize the main geometri￾cal features of jet flames include the work from Cuoci et al. (2021), which presents a systematic study of the factors that affect the lift-off distance, which is the length between th… view at source ↗
Figure 3
Figure 3. Figure 3: Overall flow followed in this work: a full precision model is trained using an IR dataset of jet flames. The model is then optimized using Vitis and the resulting moddel is ported on an FPGAs-based SoC architecture for evaluation (adapted from Garduño et al. (2024)). 1.1. Objectives The main objective of this work is to demonstrate a novel approach for implementing real-time image process￾ing and analysis … view at source ↗
Figure 4
Figure 4. Figure 4: Edge computing vs cloud computing. Once trained, AI models based either on Convolutional Neu￾ral Networks (CNN) or transformers can then be deployed for inference for tasks such as image recognition, natural language processing, and speech recognition. These models can leverage the power of Graphics Processing Unit (GPU)s in order to achieve higher on-device performance. On-device AI refers to the deployme… view at source ↗
Figure 5
Figure 5. Figure 5: Comparison in terms of NRE, flexibility, processing performance, power consumption and programmability of different technological choices for implementing smart cameras : a) ASICs; b) Processor (i.e.,SoC) and Digital Signal Processors ( DSPs); c) GPU and d) FPGAs (figure adapted from Real and Berry (2010) ) As it can be observed in [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Base hardware architecture used for this project. Images are fed through a USB 3.0 and then an MPSoC device (Ultra96v2) is in charge of processing the video stream by using the integrated Application Processing Unit (or PS, processing system) to run the Operative System (OS), whereas the Programmable Logic section of the SoC implements a neural network accelerator (custom logic). can be efficiently execute… view at source ↗
Figure 7
Figure 7. Figure 7: Proposed solution model for implementing a smart camera for jet fire risk assessment. The Ultra96 board is connected to an IR camera via an USB peripheral within the SoC. The input image is processed by the PS block (an ARM processor) and fed to the PL section where our binarized UNet model resides. After processing the image, this is fed back to the PS to perform feature extraction. These features can be … view at source ↗
Figure 8
Figure 8. Figure 8: Software stack of Vitis AI [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Process used to port a neural network to an embedded system extending from Vitis AI flow [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Flow chart for single- and multi-threading inference approaches. different hardware accelerators which can offload PS logic from compute-intensive tasks. In this work, we used the PL logic to create a single instance of the DPU, an Intellectual Property (IP) developed by Xilinx which speeds up the inference of CNN. From an architectural point of view, the DPU fits in the category of a single computation e… view at source ↗
Figure 11
Figure 11. Figure 11: Four examples of jet flame measurements using the proposed approach. our intent in this work was to show how any segmentation model could be integrated into an operational smart camera. Modifying the base model would involve replacing the cur￾rent UNet architecture by architectures specifically designed for computational efficiency without sacrificing segmenta￾tion quality. We plan to investigate these ar… view at source ↗
read the original abstract

In this work we present a novel framework for fire safety management in industrial settings through the implementation of a smart camera platform for jet flames characterization. The approach seeks to alleviate the lack of real-time solutions for industrial early fire segmentation and characterization. As a case study, we demonstrate how a SoC FPGA, running optimized Artificial Intelligence (AI) models can be leveraged to implement a full edge processing pipeline for jet flames analysis. In this paper we extend previous work on computer-vision jet fire segmentation by creating a novel experimental set-up and system implementation for addressing this issue, which can be replicated to other fire safety applications. The proposed platform is designed to carry out image processing tasks in real-time and on device, reducing video processing overheads, and thus the overall latency. This is achieved by optimizing a UNet segmentation model to make it amenable for an SoC FPGAs implementation; the optimized model can then be efficiently mapped onto the SoC reconfigurable logic for massively parallel execution. For our experiments, we have chosen the Ultra96 platform, as it also provides the means for implementing full-fledged intelligent systems using the SoC peripherals, as well as other Operating System (OS) capabilities (i.e., multi-threading) for systems management. For optimizing the model we made use of the Vitis (Xilinx) framework, which enabled us to optimize the full precision model from 7.5 million parameters to 59,095 parameters (125x less), which translated into a reduction of the processing latency of 2.9x. Further optimization (multi-threading and batch normalization) led to an improvement of 7.5x in terms of latency, yielding a performance of 30 Frames Per Second (FPS) without sacrificing accuracy in terms of the evaluated metrics (Dice Score).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper presents a reconfigurable smart camera platform using an optimized UNet model on the Ultra96 SoC FPGA for real-time jet flame segmentation and characterization. It claims that Vitis-based optimization reduces the model from 7.5 million to 59,095 parameters (125x reduction), and with additional multi-threading and batch normalization yields a 7.5x latency improvement to achieve 30 FPS while preserving Dice Score accuracy, extending prior computer-vision work on jet fires with a new experimental setup.

Significance. If the accuracy preservation claim holds, the work provides a concrete demonstration of deploying a heavily pruned segmentation model on reconfigurable edge hardware for industrial fire safety, with measurable gains in parameter count and latency on the target Ultra96 platform. The use of Vitis for full-model optimization and the integration of OS-level multi-threading are practical strengths that could inform similar embedded CV deployments.

major comments (3)
  1. [Abstract] Abstract: the assertion that the optimized model achieves 30 FPS 'without sacrificing accuracy in terms of the evaluated metrics (Dice Score)' lacks any supporting numerical evidence; no baseline Dice Score for the original 7.5M-parameter UNet, no post-optimization Dice Score, and no ablation on the parameter-reduction steps are reported.
  2. [Experiments] Experiments section: the manuscript supplies no information on dataset size, composition (e.g., number of images, variation in flame intensity/background/scale), train/test split, or validation protocol, making it impossible to evaluate whether accuracy is preserved on real-world jet flame imagery beyond the specific test set.
  3. [Results] Results: while latency (2.9x then 7.5x) and parameter counts are directly measured on the Ultra96 board, the absence of error bars, multiple-run statistics, or cross-condition testing leaves the central claim of unchanged segmentation performance weakly supported.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'we extend previous work on computer-vision jet fire segmentation' requires an explicit citation to the referenced prior publication.
  2. [Implementation] Implementation: clarify the exact sequence and contribution of batch normalization after Vitis pruning and how it interacts with the multi-threading to produce the final 7.5x latency gain.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and constructive comments. We address each major comment below and will make revisions to strengthen the manuscript's claims on accuracy preservation and experimental details.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion that the optimized model achieves 30 FPS 'without sacrificing accuracy in terms of the evaluated metrics (Dice Score)' lacks any supporting numerical evidence; no baseline Dice Score for the original 7.5M-parameter UNet, no post-optimization Dice Score, and no ablation on the parameter-reduction steps are reported.

    Authors: We thank the referee for this observation. The manuscript's results section demonstrates that the Dice Score is preserved, but we agree that the abstract lacks explicit numerical support and ablation details. We will revise the abstract to include the baseline and post-optimization Dice Scores, and add an ablation study in the Experiments section detailing the parameter reduction steps and their effect on accuracy metrics. revision: yes

  2. Referee: [Experiments] Experiments section: the manuscript supplies no information on dataset size, composition (e.g., number of images, variation in flame intensity/background/scale), train/test split, or validation protocol, making it impossible to evaluate whether accuracy is preserved on real-world jet flame imagery beyond the specific test set.

    Authors: We agree that these details were insufficiently described. We will expand the Experiments section with full details on the dataset size, composition including variations in flame intensity, background, and scale, as well as the train/test split and validation protocol used. revision: yes

  3. Referee: [Results] Results: while latency (2.9x then 7.5x) and parameter counts are directly measured on the Ultra96 board, the absence of error bars, multiple-run statistics, or cross-condition testing leaves the central claim of unchanged segmentation performance weakly supported.

    Authors: We acknowledge the need for stronger statistical support. We will add error bars based on multiple runs, report statistics from repeated measurements, and include cross-condition testing results in the revised Results section to bolster the claim of unchanged segmentation performance. revision: yes

Circularity Check

0 steps flagged

No significant circularity; metrics are direct hardware measurements

full rationale

The paper describes an engineering implementation: a UNet model is optimized via the Vitis framework (parameter count reduced from 7.5M to 59k), then mapped to Ultra96 SoC FPGA with multi-threading and batch normalization, yielding measured 30 FPS and 7.5x latency improvement. These performance figures are obtained by direct timing on the target board after optimization steps, not by any equation that re-derives them from fitted parameters or prior self-citations. No self-definitional loops, fitted-input predictions, or load-bearing self-citations appear in the provided text. The central claim rests on external benchmarks (hardware execution and Dice Score evaluation) rather than internal redefinition. This is the common case of a self-contained empirical report.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that standard UNet can be aggressively pruned via Vitis without accuracy loss on jet-flame data; no new physical entities or free parameters fitted to the target result are introduced.

axioms (1)
  • domain assumption UNet architecture remains effective for flame segmentation after extreme parameter reduction via Vitis optimization
    Invoked when mapping the model to FPGA logic and claiming preserved Dice Score.

pith-pipeline@v0.9.0 · 5678 in / 1266 out tokens · 40096 ms · 2026-05-15T07:51:26.339053+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

  1. [1]

    Journal of Computing in Civil Engineering 39, 04025068

    Visual fire safety inspection framework using computer vision algorithms. Journal of Computing in Civil Engineering 39, 04025068. doi:10.1061/JCCEE5.CPENG-6492. Beheshti, N., Johnsson, L., 2020. Squeeze u-net: A memory and energy efficient image segmentation network, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition wor...

  2. [2]

    Computer 39, 68–75

    Distributed embedded smart cameras for surveillance applica- tions. Computer 39, 68–75. Buslaev, A., Parinov, A., Khvedchenya, E., Iglovikov, V .I., Kalinin, A.A.,

  3. [3]

    ArXiv e-printsarXiv:1809.06839

    Albumentations: fast and flexible image augmentations. ArXiv e-printsarXiv:1809.06839. Colella, F., Ibarreta, A., Hart, R.J., Morrison, T., Watson, H.A., Yen, M., 2020. Jet fire consequence analysis. OTC Offshore Technology Conference doi:10.4043/30802-MS. Cuoci, A., Avedisian, C.T., Brunson, J.D., Guo, S., Dalili, A., Wang, Y ., Mehl, M., Frassoldati, A....