Forecasting Extreme Day and Night Heat in Paris: A Proof of Concept

Richard Berk

arxiv: 2508.12886 · v5 · submitted 2025-08-18 · 📊 stat.AP

Forecasting Extreme Day and Night Heat in Paris: A Proof of Concept

Richard Berk This is my paper

Pith reviewed 2026-05-18 23:08 UTC · model grok-4.3

classification 📊 stat.AP

keywords quantile machine learningadaptive conformal predictionextreme heattemperature forecastingParis weatherholdout evaluationdecision framework

0 comments

The pith

Quantile machine learning forecasts extreme Paris day and night temperatures two weeks ahead with valid uncertainty regions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper uses quantile machine learning to estimate the 90th percentile of daytime and nighttime air temperatures in Paris for summer 2021, drawing on eight lagged weather indicators from the local station. Forecasts are generated on holdout data two weeks in advance, then wrapped in adaptive conformal prediction regions that guarantee finite-sample coverage as long as the data satisfy exchangeability. The approach pairs these forecasts and uncertainty measures with a decision-making framework aimed at extreme heat events. A sympathetic reader would care because reliable advance notice of rare high-temperature conditions could inform public health responses and resource allocation.

Core claim

Quantile machine learning models trained on lagged weather data can produce promising forecasts of the 90th percentile for both diurnal and nocturnal temperatures on holdout data, while adaptive conformal prediction supplies uncertainty regions that carry provably valid finite-sample coverage under exchangeability, all tied to a novel framework for using the results in decision-making.

What carries the argument

Adaptive conformal prediction regions, which use past nonconformity scores to adjust intervals and deliver valid coverage guarantees in finite samples whenever observations are exchangeable.

If this is right

Forecasting accuracy in the holdout data is promising for both diurnal and nocturnal temperatures.
Sound measures of uncertainty are produced alongside the point forecasts.
A novel decision-making framework is coupled to the forecasts and uncertainty measures.
Benefits for policy and practice follow from the two-week advance warnings of extreme conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same lagged-predictor and conformal approach could be tested on temperature data from other cities to check transportability.
If exchangeability weakens under strong climate trends, the coverage properties might need recalibration on rolling windows.
Combining these forecasts with public health alerts could allow earlier activation of heat-mitigation measures.

Load-bearing premise

The temperature observations must satisfy the exchangeability condition so that adaptive conformal prediction can deliver provably valid finite-sample coverage.

What would settle it

If the actual proportion of holdout temperatures falling inside the computed prediction regions falls substantially below the nominal level across repeated applications, the coverage guarantee would not hold in this setting.

read the original abstract

As a form of "small A", quantile machine learning is used to forecast diurnal and nocturnal $Q(.90)$ air temperatures for Paris, France from late spring through the summer months of 2021. The data are provided by the Paris-Montsouris weather station. Rather than trying to directly anticipate the onset and cessation of reported heat waves, Q(.90) values are estimated. The 90th percentile is chosen so that exceedances represent relatively rare and extreme conditions. Predictors include eight routinely available indicators of weather conditions, lagged by 14 days. Using holdout data, the temperature forecasts are produced two weeks in advance. Adaptive conformal prediction regions are computed that, under exchangeability, provide provably valid finite-sample coverage of forecasting uncertainty. For both diurnal and nocturnal temperatures, forecasting accuracy in the holdout data is promising, and sound measures of uncertainty are coupled with a novel decision-making framework. Benefits for policy and practice follow.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Applies quantile ML and adaptive conformal prediction to 14-day-ahead extreme heat forecasts for one Paris station, with a policy decision layer, but the coverage guarantee depends on an untested exchangeability assumption in autocorrelated temperature data.

read the letter

This paper applies quantile machine learning to forecast the 90th percentile of day and night temperatures in Paris two weeks ahead, using eight lagged weather predictors from the Paris-Montsouris station over the 2021 summer. It adds adaptive conformal prediction intervals for uncertainty and connects the outputs to a decision framework aimed at public-health or infrastructure use. The abstract reports promising holdout accuracy on the withheld data.

Referee Report

2 major / 2 minor

Summary. The paper proposes a quantile machine learning approach to forecast the 90th percentile of diurnal and nocturnal air temperatures in Paris during the summer of 2021, using eight weather indicators lagged by 14 days as predictors. It evaluates performance on holdout data from the Paris-Montsouris station and applies adaptive conformal prediction to generate uncertainty regions claimed to have provably valid finite-sample coverage under the exchangeability assumption. The authors report promising forecasting accuracy and introduce a novel decision-making framework for extreme heat applications.

Significance. If the results hold, the work could offer a practical, statistically grounded method for anticipating extreme heat events with quantified uncertainty, supporting public health and urban policy decisions. The coupling of quantile forecasting with conformal prediction for valid uncertainty is a constructive element for decision-making frameworks, though the single-year holdout and unverified assumptions limit generalizability.

major comments (2)

[Abstract] Abstract: The central claim of 'sound measures of uncertainty' rests on adaptive conformal prediction delivering 'provably valid finite-sample coverage' under exchangeability. Daily temperature observations are serially dependent and subject to seasonal progression and heat-wave effects, which typically violate exchangeability. No diagnostic (e.g., permutation test on conformity scores or rolling coverage check) or explicit statement that the adaptive method relaxes the assumption is provided; if the assumption fails, the coverage guarantee does not hold and the uncertainty regions lose their advertised validity.
[Methods] The manuscript provides no model specification, algorithm choice, hyper-parameter values, or training details for the quantile machine learning procedure. Without these, the reported holdout accuracy cannot be assessed for robustness or reproducibility, which is load-bearing for the 'promising forecasting accuracy' claim.

minor comments (2)

[Abstract] The abstract refers to 'eight routinely available indicators' without listing them; enumerating the predictors would improve clarity and allow readers to assess relevance.
Consider reporting the specific conformal prediction implementation (e.g., which adaptive variant) and any code or data availability statement to support reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which have helped us identify areas where the manuscript can be clarified and strengthened. We address each major comment point by point below, indicating the revisions we will make.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim of 'sound measures of uncertainty' rests on adaptive conformal prediction delivering 'provably valid finite-sample coverage' under exchangeability. Daily temperature observations are serially dependent and subject to seasonal progression and heat-wave effects, which typically violate exchangeability. No diagnostic (e.g., permutation test on conformity scores or rolling coverage check) or explicit statement that the adaptive method relaxes the assumption is provided; if the assumption fails, the coverage guarantee does not hold and the uncertainty regions lose their advertised validity.

Authors: We agree that serial dependence, seasonal progression, and heat-wave effects in daily temperature data can violate the exchangeability assumption underlying conformal prediction guarantees, including adaptive variants. While the adaptive conformal prediction procedure adapts conformity scores over time to improve robustness under mild non-stationarity, it does not eliminate the need for exchangeability for the finite-sample coverage proof. To address this, we will revise the manuscript to include: (i) an explicit statement in the Methods section clarifying the exchangeability assumption and its potential limitations in this time-series context; and (ii) a new diagnostic subsection reporting empirical coverage rates computed on rolling windows of the holdout data, along with a brief discussion of any observed deviations. These additions will provide a more balanced presentation of the uncertainty quantification. revision: yes
Referee: [Methods] The manuscript provides no model specification, algorithm choice, hyper-parameter values, or training details for the quantile machine learning procedure. Without these, the reported holdout accuracy cannot be assessed for robustness or reproducibility, which is load-bearing for the 'promising forecasting accuracy' claim.

Authors: We fully agree that the absence of these details limits reproducibility and the ability to evaluate the robustness of the reported holdout accuracy. In the revised manuscript, we will expand the Methods section to specify the quantile machine learning algorithm (quantile regression forests), all hyperparameter values and tuning procedures (including any cross-validation approach), data preprocessing steps, training protocol, and the software implementation used. This will allow readers to replicate the forecasting procedure and assess the stability of the accuracy results. revision: yes

Circularity Check

0 steps flagged

No circularity: forecasts and coverage rest on independent holdout evaluation and external guarantee

full rationale

The paper trains quantile machine learning models on lagged weather predictors to produce Q(.90) forecasts for holdout summer 2021 data, then attaches adaptive conformal prediction regions whose finite-sample coverage guarantee is invoked conditionally on exchangeability. No equation or step equates a claimed forecast or coverage probability to a quantity defined by the same fitted parameters; the conformal result is presented as an external statistical property rather than derived from the model's own structure or self-citation chain. The derivation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the exchangeability assumption for conformal coverage and on the choice of 14-day lag and Q(0.90) level; no new entities are postulated and the machine-learning model parameters are fitted rather than derived.

free parameters (2)

14-day lag
Chosen by the authors to produce two-week-ahead forecasts; not derived from data or theory.
Q(0.90) level
Selected so exceedances represent rare extremes; a modeling choice rather than an estimated parameter.

axioms (1)

domain assumption The sequence of weather observations satisfies exchangeability.
Invoked to guarantee finite-sample coverage of the adaptive conformal prediction regions.

pith-pipeline@v0.9.0 · 5682 in / 1276 out tokens · 47917 ms · 2026-05-18T23:08:30.720471+00:00 · methodology

Forecasting Extreme Day and Night Heat in Paris: A Proof of Concept

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)