Recognition: unknown
FireSenseNet: A Dual-Branch CNN with Cross-Attentive Feature Interaction for Next-Day Wildfire Spread Prediction
Pith reviewed 2026-05-10 17:07 UTC · model grok-4.3
The pith
FireSenseNet separates static terrain from dynamic weather in dual CNN branches linked by attention to forecast next-day wildfire spread.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FireSenseNet is a dual-branch convolutional network in which one branch processes static fuel and terrain maps while the other processes time-varying meteorological fields; a Cross-Attentive Feature Interaction Module then uses learnable gates to exchange information between the branches at several encoder resolutions. On the Google Next-Day Wildfire Spread benchmark this architecture records an F1 score of 0.4176 and AUC-PR of 0.3435, exceeding the scores of all compared models including a SegFormer variant that contains 3.8 times more parameters. Channel-wise importance analysis shows the previous-day fire mask as the dominant input while wind speed contributes little at the dataset's one-
What carries the argument
Cross-Attentive Feature Interaction Module (CAFIM), which applies learnable attention gates to fuse static fuel/terrain features with dynamic meteorological features at multiple encoder scales.
If this is right
- The dual-branch design with explicit cross-attention yields a 7.1 percent relative F1 improvement over simple concatenation of inputs.
- Previous-day fire perimeter supplies the majority of predictive signal while wind speed acts as noise under coarse temporal sampling.
- Monte Carlo Dropout produces per-pixel uncertainty maps that can accompany the spread forecast.
- Common evaluation practices that ignore spatial autocorrelation inflate F1 scores by more than 44 percent on this task.
- The architecture uses fewer parameters than the strongest competing transformer while still achieving higher accuracy.
Where Pith is reading between the lines
- The same separation of static and dynamic inputs could improve forecasts for other geospatial phenomena such as flood extent or crop yield where terrain and weather interact.
- Uncertainty maps from Monte Carlo Dropout could be used to prioritize ground-truth collection in high-uncertainty regions for active learning loops.
- At higher temporal resolution the wind channel might become informative, suggesting the current noise finding is resolution-dependent rather than fundamental.
- Retraining on regional subsets could reveal whether the dominance of the fire mask persists outside the training geography.
Load-bearing premise
The Google Next-Day Wildfire Spread benchmark and its evaluation protocol reflect real-world next-day prediction difficulty without the shortcuts that artificially boost reported scores.
What would settle it
Retraining and testing the same architecture on a wildfire dataset that supplies hourly meteorological observations and finer spatial grids, then measuring whether wind speed regains predictive value and whether the reported F1 inflation disappears.
Figures
read the original abstract
Accurate prediction of next-day wildfire spread is critical for disaster response and resource allocation. Existing deep learning approaches typically concatenate heterogeneous geospatial inputs into a single tensor, ignoring the fundamental physical distinction between static fuel/terrain properties and dynamic meteorological conditions. We propose FireSenseNet, a dual-branch convolutional neural network equipped with a novel Cross-Attentive Feature Interaction Module (CAFIM) that explicitly models the spatially varying interaction between fuel and weather modalities through learnable attention gates at multiple encoder scales. Through a systematic comparison of seven architectures -- spanning pure CNNs, Vision Transformers, and hybrid designs -- on the Google Next-Day Wildfire Spread benchmark, we demonstrate that FireSenseNet achieves an F1 of 0.4176 and AUC-PR of 0.3435, outperforming all alternatives including a SegFormer with 3.8* more parameters (F1 = 0.3502). Ablation studies confirm that CAFIM provides a 7.1% relative F1 gain over naive concatenation, and channel-wise feature importance analysis reveals that the previous-day fire mask dominates prediction while wind speed acts as noise at the dataset's coarse temporal resolution. We further incorporate Monte Carlo Dropout for pixel-level uncertainty quantification and present a critical analysis showing that common evaluation shortcuts inflate reported F1 scores by over 44%.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes FireSenseNet, a dual-branch CNN equipped with a novel Cross-Attentive Feature Interaction Module (CAFIM) that models spatially varying interactions between static fuel/terrain and dynamic meteorological inputs for next-day wildfire spread prediction. On the Google Next-Day Wildfire Spread benchmark, it reports an F1 of 0.4176 and AUC-PR of 0.3435, outperforming seven other architectures including a 3.8× larger SegFormer (F1=0.3502). Ablation studies show a 7.1% relative F1 gain from CAFIM, channel-wise importance analysis finds the prior-day fire mask dominant and wind speed noisy, Monte Carlo Dropout is used for uncertainty, and a critical analysis claims common evaluation shortcuts inflate F1 by over 44%.
Significance. If the reported metrics and comparisons were computed under the non-shortcut protocol the authors themselves identify, the work would be significant for geospatial deep learning in disaster modeling: it provides a concrete demonstration that explicit cross-modal attention improves over naive concatenation, offers a systematic head-to-head comparison across CNNs, ViTs and hybrids, and supplies actionable feature-importance insights at the dataset's coarse temporal resolution. The addition of pixel-level uncertainty quantification is also a constructive contribution.
major comments (2)
- [Abstract and Experimental Evaluation] Abstract and Experimental Evaluation: The manuscript states clear performance numbers (F1=0.4176, AUC-PR=0.3435) and an ablation gain of 7.1% from CAFIM, yet provides no details on data splits, training protocol, positive/negative sampling, or the exact calculation behind the 44% F1 inflation claim. This is load-bearing for the central outperformance claim, because the paper itself flags that biased sampling, failure to mask non-burnable areas, or persistence-only labeling can inflate F1 by >44%; without explicit confirmation that the published numbers avoid these shortcuts, the attribution of gains to the dual-branch CAFIM design cannot be verified.
- [Results section (comparison table)] Results section (comparison table): The head-to-head claim that FireSenseNet outperforms a SegFormer with 3.8× more parameters (F1 0.4176 vs 0.3502) and six other models rests on the benchmark evaluation protocol. If the protocol used is the shortcut version the authors criticize, the relative ranking and the conclusion that CAFIM provides a meaningful modeling advance are not supported by the evidence presented.
minor comments (1)
- [Abstract] Abstract: The notation '3.8*' for the parameter ratio should be written as '3.8×' for clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive comments emphasizing the need for experimental transparency. We have revised the manuscript to address both major points by expanding the relevant sections with the requested details and explicit protocol confirmations.
read point-by-point responses
-
Referee: [Abstract and Experimental Evaluation] The manuscript states clear performance numbers (F1=0.4176, AUC-PR=0.3435) and an ablation gain of 7.1% from CAFIM, yet provides no details on data splits, training protocol, positive/negative sampling, or the exact calculation behind the 44% F1 inflation claim. This is load-bearing for the central outperformance claim, because the paper itself flags that biased sampling, failure to mask non-burnable areas, or persistence-only labeling can inflate F1 by >44%; without explicit confirmation that the published numbers avoid these shortcuts, the attribution of gains to the dual-branch CAFIM design cannot be verified.
Authors: We agree that these details are necessary to substantiate our claims. In the revised manuscript, we have added a new subsection under Experimental Setup that specifies: the temporal train/validation/test splits on the Google benchmark (with no future leakage), the full training protocol (optimizer, learning rate, epochs, batch size, and weighted loss for imbalance), the positive/negative sampling approach, and the exact procedure for the 44% inflation analysis (re-running baselines under biased sampling, unmasked non-burnable pixels, and persistence labeling). We explicitly confirm that all reported metrics, including the F1 of 0.4176, AUC-PR, and the 7.1% CAFIM ablation gain, were obtained under the non-shortcut protocol with non-burnable areas masked. revision: yes
-
Referee: [Results section (comparison table)] The head-to-head claim that FireSenseNet outperforms a SegFormer with 3.8× more parameters (F1 0.4176 vs 0.3502) and six other models rests on the benchmark evaluation protocol. If the protocol used is the shortcut version the authors criticize, the relative ranking and the conclusion that CAFIM provides a meaningful modeling advance are not supported by the evidence presented.
Authors: This is a fair point. The revised Results section and comparison table now include an explicit statement (with a footnote) that every model—including the 3.8× larger SegFormer and the other six architectures—was evaluated under the identical non-shortcut protocol we advocate: temporal splits without leakage, masking of non-burnable areas, and no biased or persistence-only sampling. We have also added implementation details for the SegFormer baseline to ensure fairness. Under this protocol the reported outperformance and the CAFIM ablation gain remain valid. revision: yes
Circularity Check
No circularity in empirical performance claims or model design
full rationale
The paper's central claims rest on direct empirical comparisons of FireSenseNet against seven other architectures on the fixed Google Next-Day Wildfire Spread benchmark, plus ablation studies isolating the CAFIM module's contribution. These results are obtained by training and evaluating models on public data splits; they do not reduce by the paper's own equations or definitions to quantities that are fitted or defined only in terms of the target metrics. The critical analysis of evaluation shortcuts is presented as an independent contribution that distinguishes shortcut-inflated scores from the protocol used for the reported numbers, with no self-citation load-bearing the architecture or results. No self-definitional loops, fitted-input predictions, or ansatz smuggling appear in the provided text.
Axiom & Free-Parameter Ledger
free parameters (1)
- learnable attention gates in CAFIM
axioms (1)
- domain assumption Static fuel/terrain properties and dynamic meteorological conditions are fundamentally distinct and benefit from separate processing branches
invented entities (1)
-
Cross-Attentive Feature Interaction Module (CAFIM)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Improved Regularization of Convolutional Neural Networks with Cutout
DeVries, T. and Taylor, G. W. (2017). Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552
work page internal anchor Pith review arXiv 2017
-
[2]
Di Giuseppe, F., McNorton, J., Lombardi, A., and Wetterhall, F. (2025). Global data-driven prediction of fire activity. Nature Communications , 16(1):2918
2025
-
[3]
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[4]
Duff, T. J. and Penman, T. D. (2021). Determining the likelihood of asset destruction during wildfires: modelling house destruction with fire simulator outputs and local-scale landscape properties. Safety science , 139:105196
2021
-
[5]
Finney, M. A. (1998). FARSITE: Fire Area Simulator-model development and evaluation . U.S. Department of Agriculture, Forest Service, Rocky Mountain Research Station
1998
-
[6]
Gerard, S., Zhao, Y., and Sullivan, J. (2023). Wildfirespreadts: A dataset of multi-modal time series for wildfire spread prediction. Advances in Neural Information Processing Systems , 36:74515--74529
2023
-
[7]
Hodges, J. L. and Lattimer, B. Y. (2019). Wildland fire spread modeling using convolutional neural networks. Fire technology , 55(6):2115--2142
2019
-
[8]
L., Goyal, N., Sankar, T., Ihme, M., and Chen, Y.-F
Huot, F., Hu, R. L., Goyal, N., Sankar, T., Ihme, M., and Chen, Y.-F. (2022). Next day wildfire spread: A machine learning dataset to predict wildfire spreading from remote-sensing data. IEEE Transactions on Geoscience and Remote Sensing , 60:1--13
2022
-
[9]
K., and Travis, W
Iglesias, V., Balch, J. K., and Travis, W. R. (2022). Us fires became larger, more frequent, and more widespread in the 2000s. Science advances , 8(11):eabc0020
2022
-
[10]
Kantarcioglu, O., Kocaman, S., and Schindler, K. (2023). Artificial neural networks for assessing forest fire susceptibility in t \"u rkiye. Ecological Informatics , 75:102034
2023
-
[11]
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision , pages 10012--10022
2021
-
[12]
Decoupled Weight Decay Regularization
Loshchilov, I. and Hutter, F. (2017). Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [13]
-
[14]
Wildland fire summary and statistics annual report 2024
National Interagency Coordination Center (2024). Wildland fire summary and statistics annual report 2024. Technical report, National Interagency Coordination Center
2024
-
[15]
Shadrin, D., Illarionova, S., Gubanov, F., Evteeva, K., Mironenko, M., Levchunets, I., Belousov, R., and Burnaev, E. (2024). Wildfire spreading prediction using multimodal data and deep neural network approach. Scientific reports , 14(1):2606
2024
-
[16]
K., and Srivastava, S
Singh, H., Ang, L.-M., Paudyal, D., Acuna, M., Srivastava, P. K., and Srivastava, S. K. (2025). A comprehensive review of empirical and dynamic wildfire simulators and machine learning techniques used for the prediction of wildfire in australia. Technology, Knowledge and Learning , 30(2):935--968
2025
-
[17]
M., and Luo, P
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J. M., and Luo, P. (2021). Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in neural information processing systems , 34:12077--12090
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.