pith. sign in

arxiv: 2604.02474 · v2 · submitted 2026-04-02 · 💻 cs.LG · stat.ML

Time-Warping Recurrent Neural Networks for Transfer Learning

Pith reviewed 2026-05-13 21:06 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords LSTMtime-warpingtransfer learningrecurrent neural networksdynamical systemstime-lag modelsfuel moisturewildfire
0
0 comments X

The pith

LSTMs can approximate time-lag models to any accuracy and retain that accuracy after time-warping.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that LSTMs can approximate time-lag models, a class of linear first-order differential equations, to any desired level of accuracy. It further shows that rescaling time in these models through time-warping maintains the approximation accuracy without loss. This forms the basis for a transfer learning technique in RNNs that adapts models from one time scale to others by modifying only a few parameters. When tested on predicting fuel moisture content in wildfire modeling, the time-warped models achieve accuracy comparable to other transfer methods but with far fewer changes to the network. Readers might care because this reduces the data and computation needed to model physical processes that run at different speeds in different conditions.

Core claim

An LSTM can approximate any time-lag model to arbitrary accuracy. Time-warping the LSTM, which rescales the time variable in the model, preserves this approximation property exactly. The resulting transfer learning method is applied to fuel moisture prediction, where it performs similarly to standard approaches while changing only a small fraction of the parameters.

What carries the argument

The time-warping operation on LSTM networks trained to approximate time-lag models.

If this is right

  • LSTMs trained on data from fuels with 10-hour time scales can predict for 1-hour, 100-hour, or 1000-hour scales after time-warping.
  • The method requires modifying only a small fraction of parameters compared to other transfer learning techniques.
  • Prediction accuracy for dynamical systems remains high across different environmental time scales.
  • Time-warping enables efficient reuse of pretrained RNNs for physical processes with varying speeds.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the approximation holds only for linear systems, extensions to nonlinear dynamics would require additional proofs.
  • Applying this to other sensor data with time-scale variations, such as weather or biological signals, could broaden its utility.
  • Future work might combine time-warping with other transfer methods to handle both time-scale and structural changes.

Load-bearing premise

Time-warping the LSTM does not introduce growing errors that depend on the size of the time rescaling for time-lag models.

What would settle it

Measuring the approximation error of a time-warped LSTM on a time-lag model and checking if it stays within the original bound independent of the warp factor.

Figures

Figures reproduced from arXiv: 2604.02474 by Jonathon Hirschi.

Figure 2.1
Figure 2.1. Figure 2.1: Two complementary views of an RNN. (a) Left: a single RNN cell, following the formulation in [11, p. 539]. (b) Right: the RNN unrolled in time. 2.3.2 Long Short-Term Memory The Long Short-Term Memory (LSTM) augments the simple RNN by maintaining the hidden state and an additional long-term cell state. The combination of these two states will be referred to as the recurrent state. The LSTM was originally … view at source ↗
Figure 2.2
Figure 2.2. Figure 2.2: A single LSTM unit. This is an original graphic, developed following the formulation in [11, p. 569]. The computations are described in Equations (2.8). ft = σ(Wxf Xt + Whf ht−1 + bf ) it = σ(Wxi Xt + Whi ht−1 + bi) gt = tanh(Wxg Xt + Whg ht−1 + bg) ot = σ(Wxo Xt + Who ht−1 + bo) ct = ft ⊗ ct−1 + it ⊗ gt ht = ot ⊗ tanh(ct) (2.8) Again, the products are standard matrix multiplications. The symbol “⊗" is t… view at source ↗
Figure 3.1
Figure 3.1. Figure 3.1: Deterministic trajectories from Newton’s Law of Cooling for varying cooling constants. Varying values of k change the characteristic time scale. The initial temperature and the ambient temperature of the environment are kept constant in each case. Using a simple RNN with a single recurrent unit, the system can be initialized for a starting value of k. Then, the time-warping can be made exact using the Eq… view at source ↗
Figure 3.2
Figure 3.2. Figure 3.2: RNN model architecture, pre-trained on FM10 from RAWS. An RNN with one recurrent layer with LSTM cells and three subsequent dense layers. The RNN was developed and evaluated using a spatiotemporal cross-validation procedure. The training set, validation set, and test set were ordered in time so the RNN predictions are an extrapolation in time, with the test set in the future relative to the training set.… view at source ↗
Figure 4.1
Figure 4.1. Figure 4.1: Observed vs Predicted FM10 for Zero-Shot Transfer RNN. The plotted RNN model is the set of weights that corresponds to the median RMSE on the test set out of 100 replications. Note the different x-axis ranges in the plots. (a) Top: All FM10 observations (n=1,232). The zero-shot RNN predictions substantially underestimate the FM10 for wet fuels. (b) Bottom: FM10 observations filtered to less than or equal… view at source ↗
Figure 4.2
Figure 4.2. Figure 4.2: One week of FM10 sensor measurements and corresponding HRRR weather at BAWC2, southwest of Denver. The wetting and drying equilibria are derived from the forecasted air temperature and RH. The sensor appears to have a maximal response to rain, with the FM10 reaching a maximum value of around 27%. A large-scale analysis was conducted in a previous publication to estimate the forecast accuracy for the RNN.… view at source ↗
Figure 4.3
Figure 4.3. Figure 4.3: One week of FM10 field measurements in Oklahoma, the corresponding ground-based weather observations, and the RNN zero-shot predictions. The RNN predictions are from the model replication with the median RMSE in the test set. The RNN predictions reach a maximum of about 27%, similar to the RAWS training data. was -0.19 ± 0.33 [12]. The confidence bounds were defined as one standard deviation over 500 rep… view at source ↗
Figure 4.4
Figure 4.4. Figure 4.4: Distribution of the mean LSTM forget-gate (bf ) and input-gate (bi) bias parameters across 100 training replications. In each replication, the mean bias across the 64 units in the LSTM layer is computed, producing one value per run. The histograms therefore summarize the distribution of these mean biases over the 100 replications. discrepancies in the distribution tails. The k-sample Anderson–Darling tes… view at source ↗
Figure 4.5
Figure 4.5. Figure 4.5: FM1 RMSE comparison for different transfer learning methods. Bars represent the mean RMSE when predicting FM1 on the test set over 100 replications, and brackets show ± 1 standard deviation of the RMSE [PITH_FULL_IMAGE:figures/full_fig_p105_4_5.png] view at source ↗
Figure 4.6
Figure 4.6. Figure 4.6: FM1 RMSE comparison for different transfer learning methods with FM1 filtered to (≤ 30%). Bars represent the mean RMSE when predicting FM1 on the test set over 100 replications, and brackets show ± 1 standard deviation of the RMSE. For the FM1 values less than or equal to 30%, the RNN-based methods have substantially lower RMSE values. However, all of the R2 values decrease except for the Time-Warping me… view at source ↗
Figure 4.7
Figure 4.7. Figure 4.7: Observed vs Predicted FM1 for Time-Warped RNN. The plotted RNN model is the set of weights that corresponds to the median RMSE on the test set out of 100 replications. (a) Top: All FM1 observations in the test set (n=271). The RNN predictions substantially underestimate the FM1 for wet fuels. (b) Bottom: FM1 observations filtered to less than or equal to 30% (n=247). The RNN predictions are much more acc… view at source ↗
Figure 4.8
Figure 4.8. Figure 4.8: FM100 RMSE comparison for different transfer learning methods. Bars represent the mean RMSE when predicting FM100 on the test set over 100 replications, and brackets show ± 1 standard deviation of the RMSE. that can be expected from an individual model prediction. The scatterplot shows that the RNN, as in the FM1 case, tends to overestimate FM100 for drier fuels and underestimate FM100 for wetter fuels. … view at source ↗
Figure 4.9
Figure 4.9. Figure 4.9: Observed vs Predicted FM100 for Time-Warped RNN. The plot includes all FM100 observations in the test set (n=206). The plotted RNN model is the set of weights that corresponds to the median RMSE on the test set out of 100 replications [PITH_FULL_IMAGE:figures/full_fig_p112_4_9.png] view at source ↗
Figure 4.10
Figure 4.10. Figure 4.10: FM1000 RMSE comparison for different transfer learning methods. Bars represent the mean RMSE when predicting FM1000 on the test set over 100 replications, and brackets show ± 1 standard deviation of the RMSE. The general prediction accuracy was similar for FM1000 and FM100, and the accuracy was better than for FM1. While the estimated R2 values are worse than for the FM100 case, the RMSE values are cons… view at source ↗
Figure 4.11
Figure 4.11. Figure 4.11: Observed vs Predicted FM1000 for Time-Warped RNN. The plot includes all FM1000 observations in the test set (n=209). The plotted RNN model is the set of weights that corresponds to the median RMSE on the test set out of 100 replications. Preprint v1 — 2026/04/06 103 [PITH_FULL_IMAGE:figures/full_fig_p114_4_11.png] view at source ↗
Figure 4.12
Figure 4.12. Figure 4.12: Example Time Series of Predictions for FM1, FM100, and FM1000 from the Time-Warped RNN, with no fine-tuning. The time period plotting is 72 hours during the first week of the test period in August 1997. The predictions are from the models that had the median RMSE across the 100 replications. the forget gate bias time-warping parameter. The values for the input gate time-warping parameters are more varia… view at source ↗
Figure 4.13
Figure 4.13. Figure 4.13: Autocorrelation functions for FM1 (top), FM100 (middle), and FM1000 (bottom) from predictions on the test set. The plots represent the ACF for the single models that had the median RMSE on the test set. The ACF for FM1 decays relatively quickly and has a visible diurnal periodicity, while FM100 and FM1000 show much slower decay consistent with stronger persistence in the system. Preprint v1 — 2026/04/06… view at source ↗
Figure 4.14
Figure 4.14. Figure 4.14: Partial autocorrelation functions for FM1 (top), FM100 (middle), and FM1000 (bottom) from predictions on the test set. The plots represent the ACF for the single models that had the median RMSE on the test set. FM100 and FM1000 show a dominant lag-1 effect consistent with strong persistence, whereas FM1 shows additional short-lag structure and weaker persistence beyond the first lag. Preprint v1 — 2026/… view at source ↗
read the original abstract

Dynamical systems describe how a physical system evolves over time. Physical processes can evolve faster or slower in different environmental conditions. We use time-warping as rescaling the time in a model of a physical system. This thesis proposes a new method of transfer learning for Recurrent Neural Networks (RNNs) based on time-warping. We prove that for a class of linear, first-order differential equations known as time lag models, an LSTM can approximate these systems with any desired accuracy, and the model can be time-warped while maintaining the approximation accuracy. The Time-Warping method of transfer learning is then evaluated in an applied problem on predicting fuel moisture content (FMC), an important concept in wildfire modeling. An RNN with LSTM recurrent layers is pretrained on fuels with a characteristic time scale of 10 hours, where there are large quantities of data available for training. The RNN is then modified with transfer learning to generate predictions for fuels with characteristic time scales of 1 hour, 100 hours, and 1000 hours. The Time-Warping method is evaluated against several known methods of transfer learning. The Time-Warping method produces predictions with an accuracy level comparable to the established methods, despite modifying only a small fraction of the parameters that the other methods modify.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper proves that LSTMs can approximate solutions to a class of linear first-order time-lag ODEs to arbitrary accuracy and that a time-warping rescaling operator preserves the approximation guarantee with an error bound independent of the warp factor. It then introduces time-warping as a transfer-learning technique for RNNs and evaluates it on fuel moisture content (FMC) prediction, pretraining on 10-hour time-scale fuels and transferring to 1-hour, 100-hour, and 1000-hour scales while modifying only a small fraction of parameters, achieving accuracy comparable to standard transfer methods.

Significance. If the theoretical claims hold, the work supplies a principled, parameter-efficient route to transfer RNN models across time scales in dynamical systems. The explicit construction of the warping operator together with the warp-independent error bound is a clear strength, offering a mathematically grounded alternative to full retraining. The FMC application demonstrates relevance to wildfire modeling where data availability differs by time scale.

major comments (1)
  1. FMC evaluation section: the claim that the time-warping method 'produces predictions with an accuracy level comparable to the established methods' is stated without numerical metrics, error bars, baseline values, or data-split details, so the support for the practical advantage of modifying only a small fraction of parameters cannot be assessed from the reported results.
minor comments (1)
  1. Abstract: the phrase 'several known methods of transfer learning' is used without naming the baselines; specifying them would improve clarity for readers.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive summary of our work and the recommendation for minor revision. We address the single major comment below.

read point-by-point responses
  1. Referee: FMC evaluation section: the claim that the time-warping method 'produces predictions with an accuracy level comparable to the established methods' is stated without numerical metrics, error bars, baseline values, or data-split details, so the support for the practical advantage of modifying only a small fraction of parameters cannot be assessed from the reported results.

    Authors: We agree that the current presentation of the FMC results is insufficiently quantitative. In the revised manuscript we will expand the evaluation section to report explicit performance metrics (e.g., RMSE and MAE) for the time-warping method and all baselines across the three target time scales, include error bars computed over multiple random seeds or cross-validation folds, state the precise data-split protocol, and tabulate the number of trainable parameters modified by each transfer method. These additions will make the claimed comparability and parameter-efficiency directly verifiable. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's central contribution is a mathematical proof establishing that LSTMs can approximate linear first-order time-lag ODEs to arbitrary accuracy, with an explicit construction showing that a time-warping rescaling operator preserves this uniform approximation without introducing warp-dependent error growth. This is followed by an empirical transfer-learning evaluation on fuel moisture content data, where the pretrained model is adapted by modifying only a small fraction of parameters. No load-bearing step reduces a reported prediction or accuracy metric directly to a fitted constant defined by the target data, nor does any self-citation chain serve as the sole justification for the uniqueness or preservation result. The derivation remains self-contained against external benchmarks and does not rely on renaming known results or smuggling ansatzes via prior work by the same authors.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unshown proof that LSTMs approximate linear time-lag ODEs to arbitrary accuracy and that this approximation is invariant under time rescaling; no free parameters or invented entities are declared in the abstract.

axioms (1)
  • domain assumption LSTMs can approximate solutions of linear first-order time-lag differential equations to arbitrary accuracy
    Invoked in the abstract as the foundation for both the proof and the warping invariance claim

pith-pipeline@v0.9.0 · 5516 in / 1323 out tokens · 34026 ms · 2026-05-13T21:06:02.833534+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages

  1. [1]

    H. E. Anderson. Aids to determining fuel models for estimating fire behavior. USDA Forest Service General Technical Report INT-122, 1982. http://www.fs.fed.us/rm/ pubs_int/int_gtr122.html

  2. [2]

    P. L. Andrews.BehavePlus fire modeling system, version 5.0: Variables. 2009. doi: 10.2737/rmrs-gtr-213. URL http://dx.doi.org/10.2737/RMRS-GTR-213

  3. [3]

    G. E. P. Box, G. M. Jenkins, G. C. Reinsel, G. M. Ljung, and P. (Firm).Time series analysis: forecasting and control. John Wiley & Sons, Inc, Hoboken, New Jersey, fifth edition, 2016. ISBN 111867491X;9781118674918

  4. [4]

    https://s.campbellsci.com/documents/us/manuals/cs506.pdf, Accessed: July 3, 2024

    Campbell Scientific, Inc.CS506 Fuel Moisture Sensor Instruction Manual, 2015. https://s.campbellsci.com/documents/us/manuals/cs506.pdf, Accessed: July 3, 2024

  5. [5]

    J. D. Carlson, L. S. Bradshaw, J. Nelson, Ralph M., R. R. Bensch, and R. Jabrzemski. Application of the Nelson model to four timelag fuel classes using Oklahoma field observations: Model evaluation and comparison with National Fire Danger Rating System algorithms.International Journal of Wildland Fire, 16: 204–216, 2007. doi: 10.1071/WF06073

  6. [6]

    R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud. Neural ordinary differentialequations. InS.Bengio, H.Wallach, H.Larochelle, K.Grauman, N.Cesa- Bianchi, and R. Garnett, editors,Advances in Neural Information Processing Sys- tems, volume 31. Curran Associates, Inc., 2018. URL https://proceedings.neurips. cc/paper_files/paper/2018/file/69386f...

  7. [7]

    M. A. Finney.FARSITE: Fire Area Simulator-model development and evaluation

  8. [8]

    URL http://dx.doi.org/10.2737/RMRS-RP-4

    doi: 10.2737/rmrs-rp-4. URL http://dx.doi.org/10.2737/RMRS-RP-4

  9. [9]

    M. A. Fosberg and J. E. Deeming. Derivation of the 1- and 10-hour timelag fuel moisture calculations for fire-danger rating. U.S. Forest Service Research Note RM-207, 1971. http://hdl.handle.net/2027/umn.31951d02995763p

  10. [10]

    Fuel stick sensor (fs-3) technical specifications

    FTS, Inc. Fuel stick sensor (fs-3) technical specifications. https://ftsinc.com/ wp-content/uploads/2016/12/Fuel-Stick-Sensor-Technical-Specifications.pdf, 2016. Accessed: 2025-10-15

  11. [11]

    Goodfellow, Y

    I. Goodfellow, Y. Bengio, and A. Courville.Deep learning. The MIT Press, Cambridge, Massachusetts, 2016. ISBN 9780262337434;0262337436

  12. [12]

    Géron.Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

    A. Géron.Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media, Sebastopol, CA, second edition, 2019. ISBN 978-1-492-03264-

  13. [13]

    Preprint v1 — 2026/04/06 118 Preprint v1 — 2026/04/06

    https://research-ebsco-com.aurarialibrary.idm.oclc.org/linkprocessor/plink?id= 6093bf50-e842-3a3a-a38b-be8566dfa5e1. Preprint v1 — 2026/04/06 118 Preprint v1 — 2026/04/06

  14. [15]

    1997, Neural computation, 9, 1735, doi: 10.1162/neco.1997.9.8.1735

    S. Hochreiter and J. Schmidhuber. Long short-term memory.Neural computation, 9:1735–80, 12 1997. doi: 10.1162/neco.1997.9.8.1735

  15. [16]

    Ismail Fawaz, G

    H. Ismail Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P.-A. Muller. Transfer learning for time series classification. In2018 IEEE International Conference on Big Data (Big Data), pages 1367–1376, 2018. doi: 10.1109/BigData.2018.8621990

  16. [17]

    W. M. Jolly, P. H. Freeborn, L. S. Bradshaw, J. Wallace, and S. Brittain. Modernizing the us national fire danger rating system (version 4): Simplified fuel models and improved live and dead fuel moisture calculations.Environmental Modelling & Software, 181:106181, Oct. 2024. ISSN 1364-8152. doi: 10.1016/j. envsoft.2024.106181. URL http://dx.doi.org/10.10...

  17. [18]

    Keras api: Lstm layer

    Keras Team. Keras api: Lstm layer. https://keras.io/api/layers/recurrent_ layers/lstm/, 2025. URL https://keras.io/api/layers/recurrent_layers/lstm/. Model training usedreturn_sequences=True. Accessed: 2025-10-31

  18. [19]

    sum_over_batch_size

    Keras Team. Keras api: Loss functions. https://keras.io/api/losses/, 2025. URL https://keras.io/api/losses/. "sum_over_batch_size" means the loss instance will return the average of the per-sample losses in the batch. Accessed: 2025-10-31

  19. [20]

    Keras API: Layer activation functions

    Keras Team. Keras API: Layer activation functions. https://keras.io/api/layers/ activations/, 2026. Accessed: 2026-01-16

  20. [21]

    K. Ma, D. Feng, K. Lawson, W.-P. Tsai, C. Liang, X. Huang, A. Sharma, and C. Shen. Transferring hydrologic data across continents – leveraging data- rich regions to improve hydrologic prediction in data-sparse regions.Water Resources Research, 57(5):e2020WR028600, 2021. doi: https://doi.org/10.1029/ 2020WR028600. URL https://agupubs.onlinelibrary.wiley.co...

  21. [22]

    Mandel, S

    J. Mandel, S. Amram, J. D. Beezley, G. Kelman, A. K. Kochanski, V. Y. Kondratenko, B. H. Lynn, B. Regev, and M. Vejmelka. Recent advances and applications of WRF-SFIRE.Natural Hazards and Earth System Science, 14(10): 2829–2845, 2014. doi: 10.5194/nhess-14-2829-2014

  22. [23]

    Mandel, M

    J. Mandel, M. Vejmelka, A. K. Kochanski, A. Farguell, J. D. Haley, D. V. Mallia, and K. Hilburn. An interactive data-driven HPC system for forecasting weather, wildland fire, and smoke. In2019 IEEE/ACM HPC for Urgent Decision Making (UrgentHPC), Supercomputing 2019, Denver, CO, USA, pages 35–44. IEEE, 2019. doi: 10.1109/UrgentHPC49580.2019.00010. Preprint...

  23. [24]

    Mandel, J

    J. Mandel, J. Hirschi, A. K. Kochanski, A. Farguell, J. Haley, D. V. Mallia, B. Shaddy, A. A. Oberai, and K. A. Hilburn. Building a fuel moisture model for the coupled fire-atmosphere model WRF-SFIRE from data: From Kalman filters to recurrent neural networks. InSNA’23 Seminar on Numerical Analysis, pages 52–55. Institute of Geonics of the Czech Academy o...

  24. [25]

    Miller, L

    L. Miller, L. Zhu, M. Yebra, C. Rüdiger, and G. I. Webb. Projecting live fuel moisture content via deep learning.International Journal of Wildland Fire, 32 (5):709–727, 03 2023. ISSN 1049-8001. doi: 10.1071/WF22188. URL https: //doi.org/10.1071/WF22188

  25. [26]

    Dead fuel moisture

    NCEI. Dead fuel moisture. https://www.ncei.noaa.gov/access/monitoring/dyk/ deadfuelmoisture, 6 2024. Accessed: May 31, 2024

  26. [27]

    R. M. Nelson Jr. Prediction of diurnal change in 10-h fuel stick moisture content. Canadian Journal of Forest Research, 30(7):1071–1087, 2000. doi: 10.1139/x00-032

  27. [28]

    Remote automatic weather stations (RAWS)

    NIFC. Remote automatic weather stations (RAWS). https://www.nifc. gov/about-us/what-is-nifc/remote-automatic-weather-stations, 2024. Accessed: December 26, 2024

  28. [29]

    Mesonet station information: Slap

    Oklahoma Mesonet. Mesonet station information: Slap. https://www.mesonet. org/about/station-information?stid=slap, 2026. Accessed: 2026-02-19; Mesonet is a network of automated weather stations operated by the Oklahoma Climatological Survey, a project of the University of Oklahoma and Oklahoma State University

  29. [30]

    S. J. Pan and Q. Yang. A survey on transfer learning.IEEE Transactions on Knowledge and Data Engineering, 22(10):1345–1359, 2010. doi: 10.1109/TKDE. 2009.191

  30. [31]

    S. C. Quax, M. D’Asaro, and M. A. J. van Gerven. Adaptive time scales in recurrent neural networks.Scientific Reports, 10:11360, 2020. doi: 10.1038/ s41598-020-68169-x. URL https://doi.org/10.1038/s41598-020-68169-x

  31. [32]

    Restuccia

    F. Restuccia. Conduction, 2019

  32. [33]

    R. C. Rothermel. A mathematical model for predicting fire spread in wildland fires. USDA Forest Service Research Paper INT-115, 1972. https://www.fs.fed.us/rm/pubs_int/int_rp115.pdf, accessed March 2018

  33. [34]

    R. C. Rothermel, R. A. Wilson, G. A. Morris, and S. S. Sackett.Modeling moisture content of fine dead wildland fuels: Input to the BEHAVE fire prediction system

  34. [35]

    URL http://dx.doi.org/10.2737/INT-RP-359

    doi: 10.2737/int-rp-359. URL http://dx.doi.org/10.2737/INT-RP-359

  35. [36]

    Rubanova, R

    Y. Rubanova, R. T. Q. Chen, and D. K. Duvenaud. Latent ordinary differential equations for irregularly-sampled time series. InAdvances in Preprint v1 — 2026/04/06 120 Preprint v1 — 2026/04/06 Neural Information Processing Systems 32 (NeurIPS 2019). Curran As- sociates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/file/ 42a6845a557bef704ad8ac9...

  36. [37]

    F. W. Scholz and M. A. Stephens. K-sample anderson-darling tests.Journal of the American Statistical Association, 82(399):918–924, 1987

  37. [38]

    Mesonet station networks and providers

    Synoptic Data Public API. Mesonet station networks and providers. https://demos. synopticdata.com/providers/index.html, 2025. Accessed: October 15, 2025

  38. [39]

    Ollivier

    C.Tallecand Y. Ollivier. Can recurrentneural networkswarp time? InInternational Conference on Learning Representation 2018, Vancouver, France, Apr. 2018. URL https://inria.hal.science/hal-01812064

  39. [40]

    Department of Agriculture and U.S

    U.S. Department of Agriculture and U.S. Department of the Interior. Fire environment mapping system (FEMS), 2024. URL https://www.wildfire.gov/ application/fems. Accessed: 2026-03-07

  40. [41]

    D. W. Van der Kamp, R. D. Moore, and I. G. McKendry. A model for simulating the moisture content of standardized fuel sticks of various sizes.Agricultural and Forest Meteorology, 236:123–134, 2017. doi: 10.1016/j.agrformet.2017.01.013

  41. [42]

    C. E. Van Wagner. Development and structure of the Canadian forest fire weather index system. Canadian Forestry Service Forestry Technical Report 35, 1987

  42. [43]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neuri...

  43. [44]

    Vejmelka, A

    M. Vejmelka, A. K. Kochanski, and J. Mandel. Data assimilation of fuel moisture in WRF-SFIRE. In D. D. Wade and R. L. Fox, editors, Proceedings of 4th Fire Behavior and Fuels Conference, 18-22 February 2013, Raleigh, NC and 1-4 July 2013, St. Petersburg, Russia, pages 122–137. International Association of Wildland Fire, Missoula, MT, 2014. Compiled by M. ...

  44. [45]

    Vejmelka, A

    M. Vejmelka, A. Kochanski, and J. Mandel. Data assimilation of dead fuel moisture observations from remote automated weather stations.International Journal of Wildland Fire, 25:558–568, 04 2016. doi: 10.1071/WF14085

  45. [46]

    N. R. Viney. A review of fine fuel moisture modelling.International Journal of Wildland Fire, 1(4):215–234, 1991. doi: 10.1071/WF9910215. Preprint v1 — 2026/04/06 121 Preprint v1 — 2026/04/06

  46. [47]

    D. R. Weise and B. M. Corcoran. Diurnal fuel moisture content for selected hawaiian fuels. USDA Forest Service, Forest Service Research Data Archive, 2025. Submittal FY25-089_PSW-018, 8/13/2025

  47. [48]

    Yosinski, J

    J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How transferable are features in deep neural networks? In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, editors,Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014. URL https://proceedings.neurips.cc/ paper_files/paper/2014/file/532a2f85b697710...

  48. [49]

    S. M. Zahn and C. Henson. A synthesis of fuel moisture collection methods and equipment: a desk guide. Technical Report 1151 1806P, USDA Forest Service, San Dimas Technology and Development Center, San Dimas, CA, 2011

  49. [50]

    Zhang, Z

    A. Zhang, Z. C. Lipton, M. Li, and A. J. Smola.Dive into Deep Learning. Cambridge University Press, 2023. URL https://d2l.ai. Internet Textbook: https: //d2l.ai/index.html, (Accessed Oct. 31, 2025)

  50. [51]

    L. Zhu, G. Webb, M. Yebra, G. Scortechini, L. Miller, and F. Petitjean. Live fuel moisture content estimation from modis: A deep learning approach.ISPRS Journal of Photogrammetry and Remote Sensing, 179:81–91, 08 2021. doi: 10.1016/ j.isprsjprs.2021.07.010. Preprint v1 — 2026/04/06 122 Preprint v1 — 2026/04/06 I. ABBREVIATIONS AND NOTATION MLMachine Learn...