Modeling and forecasting subnational age distribution of death counts

Cristian F. Jim\'enez-Var\'on; Han Lin Shang

arxiv: 2503.16744 · v3 · submitted 2025-03-20 · 📊 stat.ME · stat.AP

Modeling and forecasting subnational age distribution of death counts

Han Lin Shang , Cristian F. Jim\'enez-Var\'on This is my paper

Pith reviewed 2026-05-22 22:40 UTC · model grok-4.3

classification 📊 stat.ME stat.AP

keywords mortality forecastingage distribution of deathscumulative distribution functionsubnational demographylife tablesforecast accuracyJapanese mortality data

0 comments

The pith

A cumulative distribution function transformation improves forecasts of subnational age distributions of death counts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing mortality forecasting methods focus on age-specific rates in an unconstrained space and overlook the fact that death counts form distributions with non-negative values and a fixed total. This paper applies a cumulative distribution function transformation to address the constrained nonlinear space of these distributions at the subnational level. When paired with forecasting methods, the transformation produces more accurate predictions than approaches that ignore the distributional structure, as shown on Japanese life-table data. The results matter for estimating regional survival probabilities and life expectancy, and they provide a starting point for actuaries pricing annuities by age and maturity.

Core claim

The age distribution of death counts resembles probability density functions and therefore occupies a constrained nonlinear space. Applying a cumulative distribution function transformation, which is scale-free and preserves monotonicity, allows standard forecasting methods to generate more accurate forecasts of subnational death distributions than methods that treat the data as unconstrained.

What carries the argument

The cumulative distribution function transformation applied to age distributions of death counts, which converts them into a form amenable to forecasting while remaining scale-free and monotonicity-preserving.

If this is right

More accurate forecasts of life-table death counts at subnational scales
Improved estimation of regional age-specific survival probabilities
Better subnational life expectancy calculations
A practical basis for actuaries to explore annuity pricing across ages and maturities

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The transformation approach could be tested on subnational data from countries with varying data quality to check robustness
It might be combined with joint models that forecast both population size and death distributions simultaneously
The method could help isolate the effect of data noise in lower-quality subnational regions

Load-bearing premise

The age distribution of death counts can be treated as living in a constrained nonlinear space where the CDF transformation stays scale-free and monotonicity-preserving in a way that keeps the forecasts relevant.

What would settle it

A direct accuracy comparison on the Japanese subnational life-table data in which forecasts that skip the CDF transformation match or exceed the accuracy of those that use it.

read the original abstract

Existing mortality forecasting methods focus on age-specific mortality rates, which lie in an unconstrained space and overlook the distributional nature of life-table death counts. Few studies have developed and compared forecasting methods that model the shape and dynamics of the age distribution of deaths, especially at the subnational level, where data quality varies greatly. This paper presents several forecasting methods to model and forecast the subnational age distribution of death counts. The age distribution of death counts has many similarities to probability density functions, which are non-negative and have a constrained integral, and thus live in a constrained nonlinear space. To address the nonlinear nature of objects, we implement a cumulative distribution function transformation that is scale-free and has additional monotonicity. Using subnational Japanese life-table death counts from the Japanese Mortality Database (2025), we evaluate the forecast accuracy of the transformation and forecasting methods. The improved forecast accuracy of life-table death counts implemented here will be of great interest to demographers in estimating regional age-specific survival probabilities and life expectancy, and to actuaries as a foundation for exploring potential applications in determining annuity prices for various ages and maturities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends CDF transformations to subnational death count distributions and compares forecasting methods on Japanese data, but provides no error metrics or details on handling varying totals.

read the letter

The paper takes the CDF transformation that has been used for density forecasting and applies it to the age distribution of death counts at the subnational level. It then runs several time-series methods on the transformed data and compares them on Japanese regional life tables. That is the core contribution: moving a constrained object into a space where standard tools work and then inverting back, with an explicit subnational focus where data quality is uneven. The authors correctly identify that death counts are non-negative and sum to a fixed total, so the density analogy is reasonable, and the Japanese Mortality Database is a credible source for the exercise. The practical angle for regional life expectancy and annuity calculations is also clear. The approach is a direct extension rather than a new theoretical framework, but it fills a gap that national-level work has left open. The main weakness is that the abstract contains no forecast error numbers, no comparison tables, and no discussion of how the method treats the fact that subnational death totals change from year to year. The stress-test point about whether the scale-free and monotonic properties survive the inversion when totals vary is therefore still open. If the full paper does not show that the back-transformed forecasts beat direct age-specific methods by a usable margin, or if discrete count artifacts appear, the claimed gains will be hard to judge. A demographer or actuary already working on regional mortality tables would be the natural reader and could test the code on their own data. The paper is coherent, cites the relevant forecasting literature, and does not contain internal contradictions. I would bring it to a reading group that covers applied demographic statistics. I would not cite it myself until the quantitative results are visible. It deserves peer review because the application is useful and the method is straightforward to evaluate once the numbers are on the table. Recommendation: send it to referees.

Referee Report

2 major / 2 minor

Summary. The manuscript develops and compares forecasting methods for the subnational age distribution of death counts by applying a cumulative distribution function (CDF) transformation to handle the constrained nonlinear space analogous to probability densities. Using subnational Japanese life-table death counts from the Japanese Mortality Database, it evaluates forecast accuracy and claims improvements over methods that ignore the distributional nature, with applications to regional survival probabilities, life expectancy, and actuarial pricing.

Significance. If the central claim holds after addressing the transformation details, the work would advance compositional forecasting in demography by explicitly modeling the shape of death distributions rather than unconstrained rates. The subnational focus with variable data quality and use of an external database are strengths that could support reproducible applications in regional mortality analysis.

major comments (2)

[Abstract] Abstract: the central claim that the CDF transformation is scale-free and monotonicity-preserving in a way that preserves forecast relevance upon inversion is load-bearing, yet the abstract (and by extension the methods) provides no explicit description of how totals are handled or the exact inversion procedure; this leaves open whether back-transformed forecasts remain superior when subnational totals vary and data quality is heterogeneous.
[Evaluation] Evaluation section: the reported accuracy gains must be accompanied by a full accounting of all candidate methods and pre-specified selection criteria; without this, the comparison risks post-hoc selection that could inflate apparent improvements over direct age-specific approaches.

minor comments (2)

[Abstract] Abstract: the citation 'Japanese Mortality Database (2025)' appears to reference a future or misdated source; clarify the exact data vintage and access details.
Notation: ensure consistent use of symbols for the transformed CDF and its inverse across equations and figures to avoid ambiguity in the back-transformation step.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments on our manuscript. We address each major comment below and have revised the manuscript to improve clarity and transparency where needed.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the CDF transformation is scale-free and monotonicity-preserving in a way that preserves forecast relevance upon inversion is load-bearing, yet the abstract (and by extension the methods) provides no explicit description of how totals are handled or the exact inversion procedure; this leaves open whether back-transformed forecasts remain superior when subnational totals vary and data quality is heterogeneous.

Authors: We agree that the abstract would benefit from additional detail on these points to strengthen the central claim. The CDF transformation, as implemented in Section 3, normalizes the cumulative death counts by the region-year total, rendering it scale-free by construction. Inversion proceeds by forecasting the CDF, recovering proportions via first differences, and rescaling by an independently forecasted total (obtained via univariate time-series modeling of the raw totals). We have revised the abstract to briefly describe the inversion and handling of totals, and we have added a clarifying paragraph in the methods section with a worked numerical example. This revision also notes applicability under heterogeneous data quality, as the normalization is performed separately per subnational unit. revision: yes
Referee: [Evaluation] Evaluation section: the reported accuracy gains must be accompanied by a full accounting of all candidate methods and pre-specified selection criteria; without this, the comparison risks post-hoc selection that could inflate apparent improvements over direct age-specific approaches.

Authors: We acknowledge the importance of full transparency to avoid any perception of post-hoc selection. The evaluation in Section 4 considered a pre-specified suite of methods drawn from the compositional data and mortality forecasting literature (including direct age-specific ARIMA/ETS models, log-ratio transformations, and functional approaches), with model selection and accuracy assessment based on fixed out-of-sample MAE criteria established prior to analysis. To address the referee's concern directly, we have added an explicit subsection and supplementary table in the revised manuscript that enumerates every candidate method considered, the a priori inclusion criteria, and the exact selection protocol used. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper applies a standard CDF transformation to age distributions of death counts (treated as analogous to densities) and evaluates multiple forecasting methods on external data from the Japanese Mortality Database. No load-bearing steps reduce to self-definition, fitted inputs renamed as predictions, or self-citation chains. The central claim rests on empirical forecast accuracy comparisons rather than any derivation that is equivalent to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on treating death count distributions as analogous to PDFs (standard domain assumption) and the suitability of CDF transformation for forecasting (ad_hoc_to_paper). No free parameters or invented entities are described in the abstract.

axioms (1)

domain assumption Age distribution of death counts lives in a constrained nonlinear space similar to probability density functions (non-negative, fixed integral).
Invoked in the abstract to justify the CDF transformation approach.

pith-pipeline@v0.9.0 · 5724 in / 1181 out tokens · 19266 ms · 2026-05-22T22:40:56.375340+00:00 · methodology

Modeling and forecasting subnational age distribution of death counts

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)