Forecasting Extreme Day and Night Heat in Paris: A Proof of Concept
Pith reviewed 2026-05-18 23:08 UTC · model grok-4.3
The pith
Quantile machine learning forecasts extreme Paris day and night temperatures two weeks ahead with valid uncertainty regions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Quantile machine learning models trained on lagged weather data can produce promising forecasts of the 90th percentile for both diurnal and nocturnal temperatures on holdout data, while adaptive conformal prediction supplies uncertainty regions that carry provably valid finite-sample coverage under exchangeability, all tied to a novel framework for using the results in decision-making.
What carries the argument
Adaptive conformal prediction regions, which use past nonconformity scores to adjust intervals and deliver valid coverage guarantees in finite samples whenever observations are exchangeable.
If this is right
- Forecasting accuracy in the holdout data is promising for both diurnal and nocturnal temperatures.
- Sound measures of uncertainty are produced alongside the point forecasts.
- A novel decision-making framework is coupled to the forecasts and uncertainty measures.
- Benefits for policy and practice follow from the two-week advance warnings of extreme conditions.
Where Pith is reading between the lines
- The same lagged-predictor and conformal approach could be tested on temperature data from other cities to check transportability.
- If exchangeability weakens under strong climate trends, the coverage properties might need recalibration on rolling windows.
- Combining these forecasts with public health alerts could allow earlier activation of heat-mitigation measures.
Load-bearing premise
The temperature observations must satisfy the exchangeability condition so that adaptive conformal prediction can deliver provably valid finite-sample coverage.
What would settle it
If the actual proportion of holdout temperatures falling inside the computed prediction regions falls substantially below the nominal level across repeated applications, the coverage guarantee would not hold in this setting.
read the original abstract
As a form of "small A", quantile machine learning is used to forecast diurnal and nocturnal $Q(.90)$ air temperatures for Paris, France from late spring through the summer months of 2021. The data are provided by the Paris-Montsouris weather station. Rather than trying to directly anticipate the onset and cessation of reported heat waves, Q(.90) values are estimated. The 90th percentile is chosen so that exceedances represent relatively rare and extreme conditions. Predictors include eight routinely available indicators of weather conditions, lagged by 14 days. Using holdout data, the temperature forecasts are produced two weeks in advance. Adaptive conformal prediction regions are computed that, under exchangeability, provide provably valid finite-sample coverage of forecasting uncertainty. For both diurnal and nocturnal temperatures, forecasting accuracy in the holdout data is promising, and sound measures of uncertainty are coupled with a novel decision-making framework. Benefits for policy and practice follow.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a quantile machine learning approach to forecast the 90th percentile of diurnal and nocturnal air temperatures in Paris during the summer of 2021, using eight weather indicators lagged by 14 days as predictors. It evaluates performance on holdout data from the Paris-Montsouris station and applies adaptive conformal prediction to generate uncertainty regions claimed to have provably valid finite-sample coverage under the exchangeability assumption. The authors report promising forecasting accuracy and introduce a novel decision-making framework for extreme heat applications.
Significance. If the results hold, the work could offer a practical, statistically grounded method for anticipating extreme heat events with quantified uncertainty, supporting public health and urban policy decisions. The coupling of quantile forecasting with conformal prediction for valid uncertainty is a constructive element for decision-making frameworks, though the single-year holdout and unverified assumptions limit generalizability.
major comments (2)
- [Abstract] Abstract: The central claim of 'sound measures of uncertainty' rests on adaptive conformal prediction delivering 'provably valid finite-sample coverage' under exchangeability. Daily temperature observations are serially dependent and subject to seasonal progression and heat-wave effects, which typically violate exchangeability. No diagnostic (e.g., permutation test on conformity scores or rolling coverage check) or explicit statement that the adaptive method relaxes the assumption is provided; if the assumption fails, the coverage guarantee does not hold and the uncertainty regions lose their advertised validity.
- [Methods] The manuscript provides no model specification, algorithm choice, hyper-parameter values, or training details for the quantile machine learning procedure. Without these, the reported holdout accuracy cannot be assessed for robustness or reproducibility, which is load-bearing for the 'promising forecasting accuracy' claim.
minor comments (2)
- [Abstract] The abstract refers to 'eight routinely available indicators' without listing them; enumerating the predictors would improve clarity and allow readers to assess relevance.
- Consider reporting the specific conformal prediction implementation (e.g., which adaptive variant) and any code or data availability statement to support reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments, which have helped us identify areas where the manuscript can be clarified and strengthened. We address each major comment point by point below, indicating the revisions we will make.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim of 'sound measures of uncertainty' rests on adaptive conformal prediction delivering 'provably valid finite-sample coverage' under exchangeability. Daily temperature observations are serially dependent and subject to seasonal progression and heat-wave effects, which typically violate exchangeability. No diagnostic (e.g., permutation test on conformity scores or rolling coverage check) or explicit statement that the adaptive method relaxes the assumption is provided; if the assumption fails, the coverage guarantee does not hold and the uncertainty regions lose their advertised validity.
Authors: We agree that serial dependence, seasonal progression, and heat-wave effects in daily temperature data can violate the exchangeability assumption underlying conformal prediction guarantees, including adaptive variants. While the adaptive conformal prediction procedure adapts conformity scores over time to improve robustness under mild non-stationarity, it does not eliminate the need for exchangeability for the finite-sample coverage proof. To address this, we will revise the manuscript to include: (i) an explicit statement in the Methods section clarifying the exchangeability assumption and its potential limitations in this time-series context; and (ii) a new diagnostic subsection reporting empirical coverage rates computed on rolling windows of the holdout data, along with a brief discussion of any observed deviations. These additions will provide a more balanced presentation of the uncertainty quantification. revision: yes
-
Referee: [Methods] The manuscript provides no model specification, algorithm choice, hyper-parameter values, or training details for the quantile machine learning procedure. Without these, the reported holdout accuracy cannot be assessed for robustness or reproducibility, which is load-bearing for the 'promising forecasting accuracy' claim.
Authors: We fully agree that the absence of these details limits reproducibility and the ability to evaluate the robustness of the reported holdout accuracy. In the revised manuscript, we will expand the Methods section to specify the quantile machine learning algorithm (quantile regression forests), all hyperparameter values and tuning procedures (including any cross-validation approach), data preprocessing steps, training protocol, and the software implementation used. This will allow readers to replicate the forecasting procedure and assess the stability of the accuracy results. revision: yes
Circularity Check
No circularity: forecasts and coverage rest on independent holdout evaluation and external guarantee
full rationale
The paper trains quantile machine learning models on lagged weather predictors to produce Q(.90) forecasts for holdout summer 2021 data, then attaches adaptive conformal prediction regions whose finite-sample coverage guarantee is invoked conditionally on exchangeability. No equation or step equates a claimed forecast or coverage probability to a quantity defined by the same fitted parameters; the conformal result is presented as an external statistical property rather than derived from the model's own structure or self-citation chain. The derivation therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- 14-day lag
- Q(0.90) level
axioms (1)
- domain assumption The sequence of weather observations satisfies exchangeability.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.