SAGA: A Sequence-Adaptive Generative Architecture for Multi-Horizon Probabilistic Forecasting with Adaptive Temporal Conformal Prediction

Gustav Olaf Yunus Laitinen-Fredriksson Lundstr\"om-Imanov; Hafize Gonca C\"omert

arxiv: 2605.19014 · v1 · pith:PKBG4ENCnew · submitted 2026-05-18 · 💻 cs.LG · econ.EM· stat.ML

SAGA: A Sequence-Adaptive Generative Architecture for Multi-Horizon Probabilistic Forecasting with Adaptive Temporal Conformal Prediction

Gustav Olaf Yunus Laitinen-Fredriksson Lundstr\"om-Imanov , Hafize Gonca C\"omert This is my paper

Pith reviewed 2026-05-20 12:15 UTC · model grok-4.3

classification 💻 cs.LG econ.EMstat.ML

keywords earnings forecastingtransformer architectureconformal predictionprobabilistic forecastingpanel datalifetime earningsmicrosimulationsequence modeling

0 comments

The pith

A decoder-only transformer for irregular earnings sequences reduces long-horizon forecast errors by nearly a third compared to canonical parametric processes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops SAGA as a decoder-only transformer that handles irregular sequences of individual earnings data to generate probabilistic forecasts across horizons of one to thirty years. It claims this model learns long-range nonlinear patterns that standard parametric approaches limited to first and second moments cannot represent. If the claim holds, the resulting lifetime earnings distributions would be more accurate inputs for the microsimulation models that ministries of finance and central banks use to evaluate policy. The architecture is combined with a conformal calibration step that supplies individual prediction intervals carrying finite-sample coverage guarantees.

Core claim

SAGA is a decoder-only transformer for irregular tabular panel sequences paired with a split conformal calibration wrapper that delivers individual-level prediction intervals with finite-sample marginal coverage guarantees. Trained on longitudinal earnings data, it produces annual labor earnings forecasts at one- to thirty-year horizons that reduce continuous ranked probability score by 31.9 percent at the ten-year horizon and mean absolute error by 37.7 percent at the twenty-year horizon relative to canonical parametric processes while achieving nominal coverage to within 0.4 percentage points marginally and within 2.4 percentage points on the worst-case demographic subgroup.

What carries the argument

decoder-only transformer architecture for irregular tabular panel sequences combined with a split conformal calibration wrapper that supplies finite-sample marginal coverage

Load-bearing premise

Earnings trajectories contain long-range nonlinear structure that transformers can learn from panel sequences but that first- and second-moment parametric processes cannot capture.

What would settle it

A replication on held-out years from the same register or on a comparable register from another country that shows no reduction in continuous ranked probability score or mean absolute error, or that shows conformal intervals missing nominal coverage by more than a few percentage points, would falsify the performance claims.

Figures

Figures reproduced from arXiv: 2605.19014 by Gustav Olaf Yunus Laitinen-Fredriksson Lundstr\"om-Imanov, Hafize Gonca C\"omert.

**Figure 2.** Figure 2: Distribution of present-discounted lifetime earnings (2022 SEK, [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Average attention-head pattern across the test set when forecasting year [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

read the original abstract

Microsimulation models used by ministries of finance and central banks rely on parametric processes for lifetime earnings that capture only first and second moments of the conditional distribution and miss long-range nonlinear structure. We propose SAGA, a decoder-only transformer for irregular tabular panel sequences, paired with a split conformal calibration wrapper that delivers individual-level prediction intervals with finite-sample marginal coverage guarantees. Trained on the longitudinal Swedish LISA register over 1990 to 2022, comprising 2,143,817 individuals and 61,284,903 person-years, the model forecasts annual labor earnings at horizons of one to thirty years and aggregates them by Monte Carlo into present-discounted lifetime earnings distributions. Against the canonical Guvenen, Karahan, Ozkan, and Song parametric process and tabular and recurrent baselines, SAGA reduces continuous ranked probability score by 31.9 percent at the ten-year horizon and mean absolute error by 37.7 percent at the twenty-year horizon. Conformal intervals achieve nominal coverage to within 0.4 percentage points marginally and within 2.4 percentage points on the worst-case demographic subgroup. The reconstructed lifetime earnings Gini coefficient is 0.327 against the partially observed truth of 0.341 and the GKOS estimate of 0.378. Model weights, calibration tables, and a synthetic equivalent dataset are released for replication outside the protected SCB MONA environment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SAGA gets clear empirical lifts on long-horizon earnings forecasts over parametric baselines but the conformal coverage numbers rest on an exchangeability assumption that looks shaky for serially correlated panel data.

read the letter

The main thing to know is that this paper reports solid gains in probabilistic forecasting of individual earnings at 10- and 20-year horizons on the Swedish LISA register, cutting CRPS by about 32 percent and MAE by 38 percent versus the Guvenen-Karahan-Ozkan-Song process and the tabular or recurrent baselines. It also claims conformal intervals that land close to nominal coverage both overall and on the worst demographic slice, then uses the draws to reconstruct lifetime earnings distributions and a Gini coefficient that sits nearer the observed value than the parametric alternative does. They release weights, calibration tables, and a synthetic dataset, which is useful for anyone who wants to poke at the results outside the protected environment. That combination of scale, concrete metrics, and replication materials is the practical strength here. What is actually new is the decoder-only transformer built for irregular tabular panel sequences, wrapped with an adaptive temporal conformal step. The abstract positions this as a direct response to the moment-only limitation of standard parametric earnings processes, and the application to multi-horizon microsimulation inputs is not in the cited prior work. The architecture choice and the conformal adaptation for this specific data structure are the fresh pieces. The soft spot is the coverage guarantee. Split conformal delivers finite-sample marginal coverage only under exchangeability between calibration and test points. Earnings trajectories carry serial correlation, macroeconomic shocks, and irregular observation grids even after temporal adaptation. At horizons of 10-30 years that dependence can produce systematic mis-coverage that the reported 0.4 and 2.4 percentage point figures do not automatically rule out. The paper would be stronger with explicit checks on how the adaptation preserves the guarantee or with hold-out diagnostics that separate the dependence effects. The performance numbers themselves are straightforward empirical comparisons and do not appear circular. This work is aimed at people who maintain or use microsimulation models for earnings and inequality at central banks or finance ministries, and at applied ML researchers who handle longitudinal administrative data. A reader who needs better tail behavior or longer-horizon accuracy on real register data will get concrete numbers to evaluate. It is worth sending for peer review; the empirical results are substantial enough to justify referee time even if the conformal section draws questions about dependence.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces SAGA, a decoder-only transformer architecture for irregular tabular panel sequences of earnings trajectories, paired with an adaptive temporal split conformal prediction wrapper. Trained on the Swedish LISA register (2.1M individuals, 61M person-years), it forecasts annual labor earnings at 1-30 year horizons, aggregates to lifetime distributions via Monte Carlo, and claims 31.9% CRPS reduction at the 10-year horizon and 37.7% MAE reduction at the 20-year horizon versus the Guvenen-Karahan-Ozkan-Song parametric process and tabular/recurrent baselines. Conformal intervals are reported to achieve nominal coverage within 0.4 pp marginally and 2.4 pp on the worst-case subgroup, yielding a reconstructed lifetime Gini of 0.327 versus 0.341 observed and 0.378 from the parametric baseline. Model weights, calibration tables, and a synthetic dataset are released.

Significance. If the performance gains and coverage properties hold under scrutiny, the work could meaningfully improve microsimulation models used by finance ministries and central banks by capturing long-range nonlinear structure missed by first- and second-moment parametric processes. The explicit release of model weights, calibration tables, and a synthetic equivalent dataset is a clear strength that supports external replication and verification outside protected environments.

major comments (2)

[§4] §4 (Conformal Calibration and Coverage Guarantees): The central claim of finite-sample marginal coverage guarantees (to within 0.4 pp marginally) for the adaptive temporal conformal wrapper rests on standard split conformal theory. However, earnings trajectories exhibit serial correlation, cohort and macroeconomic shocks, and irregular person-year observation grids. These features violate the exchangeability assumption between calibration and test points required for the finite-sample guarantee, particularly at long horizons (10-30 y). This directly threatens the reported coverage numbers and the downstream Gini reconstruction that relies on the calibrated intervals; a robustness check or dependence-adjusted conformal method is needed.
[Results] Results, performance comparison paragraph: The 31.9% CRPS reduction at the 10-year horizon and 37.7% MAE reduction at the 20-year horizon versus the GKOS parametric process are load-bearing for the superiority claim. Without an ablation isolating the decoder-only transformer’s contribution from the conformal wrapper, or explicit confirmation that baselines were re-estimated on the identical irregular panel structure and loss, it remains unclear whether the gains arise specifically from capturing long-range nonlinear dependencies.

minor comments (2)

[§2.1] §2.1 (Data and Sequence Representation): The description of how irregular person-year grids are tokenized and padded for the decoder-only transformer would benefit from a concrete example sequence for one individual.
[Figure 4] Figure 4 (Coverage plots): Adding the worst-case demographic subgroup curve would directly illustrate the 2.4 pp deviation cited in the abstract.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which help clarify key aspects of our work on SAGA. We address each major comment below and indicate planned revisions to the manuscript.

read point-by-point responses

Referee: [§4] §4 (Conformal Calibration and Coverage Guarantees): The central claim of finite-sample marginal coverage guarantees (to within 0.4 pp marginally) for the adaptive temporal conformal wrapper rests on standard split conformal theory. However, earnings trajectories exhibit serial correlation, cohort and macroeconomic shocks, and irregular person-year observation grids. These features violate the exchangeability assumption between calibration and test points required for the finite-sample guarantee, particularly at long horizons (10-30 y). This directly threatens the reported coverage numbers and the downstream Gini reconstruction that relies on the calibrated intervals; a robustness check or dependence-adjusted conformal method is needed.

Authors: We acknowledge that the standard split conformal prediction framework relies on exchangeability, which may be only approximately satisfied in our setting due to serial correlation in earnings trajectories, cohort effects, and macroeconomic shocks. Our adaptive temporal conformal method incorporates time-aware calibration to mitigate some temporal dependencies, and the reported coverage is also supported by empirical validation on held-out data. In the revised manuscript, we will expand the discussion in §4 to explicitly address potential violations of exchangeability, include additional robustness checks using time-blocked calibration sets, and report coverage under these conditions. We will also note this as a limitation for long-horizon applications. revision: yes
Referee: [Results] Results, performance comparison paragraph: The 31.9% CRPS reduction at the 10-year horizon and 37.7% MAE reduction at the 20-year horizon versus the GKOS parametric process are load-bearing for the superiority claim. Without an ablation isolating the decoder-only transformer’s contribution from the conformal wrapper, or explicit confirmation that baselines were re-estimated on the identical irregular panel structure and loss, it remains unclear whether the gains arise specifically from capturing long-range nonlinear dependencies.

Authors: We clarify that the CRPS and MAE metrics reflect the quality of the probabilistic forecasts generated directly by the decoder-only transformer component of SAGA; the conformal wrapper is applied only post hoc for constructing prediction intervals and does not influence these scoring rules. All baselines, including the Guvenen-Karahan-Ozkan-Song parametric process as well as tabular and recurrent models, were re-estimated on the identical irregular panel structure from the LISA register using the same data splits, preprocessing, and evaluation losses. To further isolate the contribution of the transformer architecture, we will add an ablation study in the revised results section comparing the decoder-only model against LSTM-based recurrent baselines, both with and without the conformal wrapper. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical comparisons and standard external theory

full rationale

The paper reports direct empirical reductions in CRPS and MAE versus the GKOS parametric process and recurrent baselines on the LISA register, which are independent measurements rather than quantities defined in terms of the model's own fitted parameters. The conformal coverage guarantees are invoked from split conformal theory, a pre-existing result that does not depend on the transformer architecture or the specific earnings data. No derivation step equates a prediction to its own inputs by construction, renames a fitted quantity, or relies on a load-bearing self-citation whose content is unverified. The central performance and coverage numbers remain falsifiable against external benchmarks and do not reduce to tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review based solely on abstract; no explicit free parameters, axioms, or invented entities are detailed beyond the high-level motivation and method description.

axioms (1)

domain assumption Earnings trajectories contain long-range nonlinear structure not captured by first- and second-moment parametric processes.
This is the explicit motivation given for moving beyond the Guvenen-Karahan-Ozkan-Song model.

invented entities (1)

SAGA decoder-only transformer architecture no independent evidence
purpose: To model irregular tabular panel sequences for multi-horizon probabilistic forecasting.
Core new component introduced in the paper.

pith-pipeline@v0.9.0 · 5809 in / 1448 out tokens · 55838 ms · 2026-05-20T12:15:26.280713+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SAGA is a decoder-only transformer for irregular tabular panel sequences... paired with a split conformal calibration wrapper
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Conformal intervals achieve nominal coverage to within 0.4 percentage points marginally

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 2 internal anchors

[1]

What do data on millions of US workers reveal about lifecycle earnings dynamics?

F. Guvenen, F. Karahan, S. Ozkan, and J. Song, “What do data on millions of US workers reveal about lifecycle earnings dynamics?”Econometrica, vol. 89, no. 5, pp. 2303–2339, Sept. 2021

work page 2021
[2]

Modelling income processes with lots of heterogeneity,

M. Browning, M. Ejrnaes, and J. Alvarez, “Modelling income processes with lots of heterogeneity,”Rev. Econ. Stud., vol. 77, no. 4, pp. 1353– 1381, Oct. 2010

work page 2010
[3]

On the persistence of income shocks over the life cycle,

F. Karahan and S. Ozkan, “On the persistence of income shocks over the life cycle,”Rev. Econ. Dyn., vol. 16, no. 3, pp. 452–476, July 2013

work page 2013
[4]

An empirical investigation of labor income processes,

F. Guvenen, “An empirical investigation of labor income processes,”Rev. Econ. Dyn., vol. 12, no. 1, pp. 58–79, Jan. 2009

work page 2009
[5]

Earnings dynamics and its intergenerational transmission: Evidence from Norway,

E. Halvorsen, J. Hubmer, S. Salgado, and S. Solenkova, “Earnings dynamics and its intergenerational transmission: Evidence from Norway,” Discussion Paper, Statistics Norway Research Department, 2024

work page 2024
[6]

The Panel Study of Income Dynamics: Overview, recent innovations, and potential for life course research,

K. A. McGonagle, R. F. Schoeni, N. Sastry, and V . A. Freedman, “The Panel Study of Income Dynamics: Overview, recent innovations, and potential for life course research,”Longitudinal Life Course Stud., vol. 3, no. 2, pp. 268–284, 2012

work page 2012
[7]

Using sequences of life events to predict human lives,

G. Savcisenset al., “Using sequences of life events to predict human lives,”Nature Comput. Sci., vol. 4, no. 1, pp. 43–56, Jan. 2024

work page 2024
[8]

Conformalized quantile regression,

Y . Romano, E. Patterson, and E. Candes, “Conformalized quantile regression,” inAdv. Neural Inf. Process. Syst. 32, 2019, pp. 3543–3553

work page 2019
[9]

Attention is all you need,

A. Vaswaniet al., “Attention is all you need,” inAdv. Neural Inf. Process. Syst. 30, 2017, pp. 5998–6008

work page 2017
[10]

TabTransformer: Tabular Data Modeling Using Contextual Embeddings

X. Huang, A. Khetan, M. Cvitkovic, and Z. Karnin, “TabTransformer: Tabular data modeling using contextual embeddings,”arXiv:2012.06678, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2012
[11]

On embeddings for numerical features in tabular deep learning,

Y . Gorishniy, I. Rubachev, and A. Babenko, “On embeddings for numerical features in tabular deep learning,” inAdv. Neural Inf. Process. Syst. 35, 2022, pp. 24991–25004

work page 2022
[12]

Accurate predictions on small data with a tabular foundation model,

N. Hollmann, S. Muller, K. Eggensperger, and F. Hutter, “Accurate predictions on small data with a tabular foundation model,”Nature, vol. 637, no. 8045, pp. 319–326, Jan. 2025

work page 2025
[13]

Transformers in time series: A survey,

Q. Wenet al., “Transformers in time series: A survey,” inProc. IJCAI, 2023, pp. 6778–6786. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. XX, NO. X, 2026 14

work page 2023
[14]

Informer: Beyond efficient transformer for long sequence time-series forecasting,

H. Zhouet al., “Informer: Beyond efficient transformer for long sequence time-series forecasting,” inProc. AAAI Conf. Artif. Intell. (AAAI), 2021, pp. 11106–11115

work page 2021
[15]

Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,

H. Wu, J. Xu, J. Wang, and M. Long, “Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,” in Adv. Neural Inf. Process. Syst. 34, 2021, pp. 22419–22430

work page 2021
[16]

A time series is worth 64 words: Long-term forecasting with transformers,

Y . Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, “A time series is worth 64 words: Long-term forecasting with transformers,” inProc. Int. Conf. Learn. Representations (ICLR), 2023

work page 2023
[17]

Dynamic aspects of earning mobility,

L. A. Lillard and R. J. Willis, “Dynamic aspects of earning mobility,” Econometrica, vol. 46, no. 5, pp. 985–1012, Sept. 1978

work page 1978
[18]

The use of time series processes to model the error structure of earnings in a longitudinal data analysis,

T. E. MaCurdy, “The use of time series processes to model the error structure of earnings in a longitudinal data analysis,”J. Econometrics, vol. 18, no. 1, pp. 83–114, Jan. 1982

work page 1982
[19]

Earnings, consumption and life cycle choices,

C. Meghir and L. Pistaferri, “Earnings, consumption and life cycle choices,” inHandbook of Labor Economics, vol. 4B, O. Ashenfelter and D. Card, Eds. Amsterdam: Elsevier, 2011, pp. 773–854

work page 2011
[20]

Conformal time series forecasting,

K. Stankeviciute, A. Alaa, and M. van der Schaar, “Conformal time series forecasting,” inAdv. Neural Inf. Process. Syst. 34, 2021, pp. 6216–6228

work page 2021
[21]

Conformal prediction interval for dynamic time-series,

C. Xu and Y . Xie, “Conformal prediction interval for dynamic time-series,” inProc. Int. Conf. Mach. Learn. (ICML), 2021, pp. 11559–11569

work page 2021
[22]

Adaptive conformal prediction for autoregressive forecasting,

A. Bhatnagar, J. Schwarting, and A. Brunner, “Adaptive conformal prediction for autoregressive forecasting,”J. Mach. Learn. Res., vol. 25, no. 87, pp. 1–42, 2024

work page 2024
[23]

Microsimulation as a tool for evaluating redistribution policies,

F. Bourguignon and A. Spadaro, “Microsimulation as a tool for evaluating redistribution policies,”J. Econ. Inequality, vol. 4, no. 1, pp. 77–106, Apr. 2006

work page 2006
[24]

EUROMOD: The European Union tax- benefit microsimulation model,

H. Sutherland and F. Figari, “EUROMOD: The European Union tax- benefit microsimulation model,”Int. J. Microsimul., vol. 6, no. 1, pp. 4–26, 2013

work page 2013
[25]

FASIT: The Swedish micro simulation model for the household sector,

L. Flood, “FASIT: The Swedish micro simulation model for the household sector,” Working Paper, Univ. of Gothenburg, 2024

work page 2024
[26]

TRIM3 user’s guide,

L. Wheaton, “TRIM3 user’s guide,” Working Paper, Urban Institute, Washington, DC, 2008

work page 2008
[27]

Gaussian Error Linear Units (GELUs)

D. Hendrycks and K. Gimpel, “Gaussian error linear units (GELUs),” arXiv:1606.08415, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[28]

On layer normalization in the transformer architecture,

R. Xionget al., “On layer normalization in the transformer architecture,” inProc. Int. Conf. Mach. Learn. (ICML), 2020, pp. 10524–10533

work page 2020
[29]

Decoupled weight decay regularization,

I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in Proc. Int. Conf. Learn. Representations (ICLR), 2019

work page 2019
[30]

Deep networks with stochastic depth,

G. Huang, Y . Sun, Z. Liu, D. Sedra, and K. Q. Weinberger, “Deep networks with stochastic depth,” inProc. Eur. Conf. Comput. Vis. (ECCV), 2016, pp. 646–661

work page 2016
[31]

Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations,

M. Arellano and S. Bond, “Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations,”Rev. Econ. Stud., vol. 58, no. 2, pp. 277–297, Apr. 1991

work page 1991
[32]

LightGBM: A highly efficient gradient boosting decision tree,

G. Keet al., “LightGBM: A highly efficient gradient boosting decision tree,” inAdv. Neural Inf. Process. Syst. 30, 2017, pp. 3146–3154

work page 2017
[33]

Long short-term memory,

S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997

work page 1997
[34]

Strictly proper scoring rules, prediction, and estimation,

T. Gneiting and A. E. Raftery, “Strictly proper scoring rules, prediction, and estimation,”J. Amer. Statist. Assoc., vol. 102, no. 477, pp. 359–378, Mar. 2007

work page 2007
[35]

A simple, positive semi-definite, heteroskedas- ticity and autocorrelation consistent covariance matrix,

W. Newey and K. West, “A simple, positive semi-definite, heteroskedas- ticity and autocorrelation consistent covariance matrix,”Econometrica, vol. 55, no. 3, pp. 703–708, May 1987

work page 1987
[36]

Axiomatic attribution for deep networks,

M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” inProc. Int. Conf. Mach. Learn. (ICML), 2017, pp. 3319– 3328

work page 2017
[37]

Membership inference attacks against machine learning models,

R. Shokri, M. Stronati, C. Song, and V . Shmatikov, “Membership inference attacks against machine learning models,” inProc. IEEE Symp. Secur. Privacy (SP), 2017, pp. 3–18

work page 2017
[38]

Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator,

A. Dvoretzky, J. Kiefer, and J. Wolfowitz, “Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator,”Annals of Mathematical Statistics, vol. 27, no. 3, pp. 642-669, 1956

work page 1956
[39]

The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality,

P. Massart, “The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality,”Annals of Probability, vol. 18, no. 3, pp. 1269-1283, 1990

work page 1990
[40]

Revisiting deep learning models for tabular data,

Y . Gorishniy, I. Rubachev, V . Khrulkov, and A. Babenko, “Revisiting deep learning models for tabular data,” inAdv. Neural Inf. Process. Syst. 34, 2021, pp. 18932–18943

work page 2021
[41]

SAINT: Improved neural networks for tabular data via row attention and contrastive pre-training.arXiv preprint arXiv:2106.01342,

G. Somepalli, M. Goldblum, A. Schwarzschild, C. B. Bruss, and T. Goldstein, “SAINT: Improved neural networks for tabular data via row attention and contrastive pre-training,”arXiv:2106.01342, June 2021

work page arXiv 2021
[42]

V ovk, A

V . V ovk, A. Gammerman, and G. Shafer,Algorithmic Learning in a Random World. New York, NY , USA: Springer, 2005

work page 2005
[43]

Distribution-free predictive inference for regression,

J. Lei, M. G’Sell, A. Rinaldo, R. J. Tibshirani, and L. Wasserman, “Distribution-free predictive inference for regression,”J. Amer. Statist. Assoc., vol. 113, no. 523, pp. 1094–1111, July 2018

work page 2018
[44]

Conformal prediction: A gentle introduction,

A. N. Angelopoulos and S. Bates, “Conformal prediction: A gentle introduction,”Found. Trends Mach. Learn., vol. 16, no. 4, pp. 494–591, 2023

work page 2023
[45]

Comparing predictive accuracy,

F. X. Diebold and R. S. Mariano, “Comparing predictive accuracy,”J. Bus. Econ. Statist., vol. 13, no. 3, pp. 253–263, July 1995. Gustav Olaf Yunus Laitinen-Fredriksson Lund- str¨om-Imanovreceived the M.Sc. degree in statistics and machine learning from Link ¨oping University, Link¨oping, Sweden, in 2026. He is currently pursuing the B.Sc. degree in milita...

work page 1995

[1] [1]

What do data on millions of US workers reveal about lifecycle earnings dynamics?

F. Guvenen, F. Karahan, S. Ozkan, and J. Song, “What do data on millions of US workers reveal about lifecycle earnings dynamics?”Econometrica, vol. 89, no. 5, pp. 2303–2339, Sept. 2021

work page 2021

[2] [2]

Modelling income processes with lots of heterogeneity,

M. Browning, M. Ejrnaes, and J. Alvarez, “Modelling income processes with lots of heterogeneity,”Rev. Econ. Stud., vol. 77, no. 4, pp. 1353– 1381, Oct. 2010

work page 2010

[3] [3]

On the persistence of income shocks over the life cycle,

F. Karahan and S. Ozkan, “On the persistence of income shocks over the life cycle,”Rev. Econ. Dyn., vol. 16, no. 3, pp. 452–476, July 2013

work page 2013

[4] [4]

An empirical investigation of labor income processes,

F. Guvenen, “An empirical investigation of labor income processes,”Rev. Econ. Dyn., vol. 12, no. 1, pp. 58–79, Jan. 2009

work page 2009

[5] [5]

Earnings dynamics and its intergenerational transmission: Evidence from Norway,

E. Halvorsen, J. Hubmer, S. Salgado, and S. Solenkova, “Earnings dynamics and its intergenerational transmission: Evidence from Norway,” Discussion Paper, Statistics Norway Research Department, 2024

work page 2024

[6] [6]

The Panel Study of Income Dynamics: Overview, recent innovations, and potential for life course research,

K. A. McGonagle, R. F. Schoeni, N. Sastry, and V . A. Freedman, “The Panel Study of Income Dynamics: Overview, recent innovations, and potential for life course research,”Longitudinal Life Course Stud., vol. 3, no. 2, pp. 268–284, 2012

work page 2012

[7] [7]

Using sequences of life events to predict human lives,

G. Savcisenset al., “Using sequences of life events to predict human lives,”Nature Comput. Sci., vol. 4, no. 1, pp. 43–56, Jan. 2024

work page 2024

[8] [8]

Conformalized quantile regression,

Y . Romano, E. Patterson, and E. Candes, “Conformalized quantile regression,” inAdv. Neural Inf. Process. Syst. 32, 2019, pp. 3543–3553

work page 2019

[9] [9]

Attention is all you need,

A. Vaswaniet al., “Attention is all you need,” inAdv. Neural Inf. Process. Syst. 30, 2017, pp. 5998–6008

work page 2017

[10] [10]

TabTransformer: Tabular Data Modeling Using Contextual Embeddings

X. Huang, A. Khetan, M. Cvitkovic, and Z. Karnin, “TabTransformer: Tabular data modeling using contextual embeddings,”arXiv:2012.06678, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2012

[11] [11]

On embeddings for numerical features in tabular deep learning,

Y . Gorishniy, I. Rubachev, and A. Babenko, “On embeddings for numerical features in tabular deep learning,” inAdv. Neural Inf. Process. Syst. 35, 2022, pp. 24991–25004

work page 2022

[12] [12]

Accurate predictions on small data with a tabular foundation model,

N. Hollmann, S. Muller, K. Eggensperger, and F. Hutter, “Accurate predictions on small data with a tabular foundation model,”Nature, vol. 637, no. 8045, pp. 319–326, Jan. 2025

work page 2025

[13] [13]

Transformers in time series: A survey,

Q. Wenet al., “Transformers in time series: A survey,” inProc. IJCAI, 2023, pp. 6778–6786. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. XX, NO. X, 2026 14

work page 2023

[14] [14]

Informer: Beyond efficient transformer for long sequence time-series forecasting,

H. Zhouet al., “Informer: Beyond efficient transformer for long sequence time-series forecasting,” inProc. AAAI Conf. Artif. Intell. (AAAI), 2021, pp. 11106–11115

work page 2021

[15] [15]

Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,

H. Wu, J. Xu, J. Wang, and M. Long, “Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,” in Adv. Neural Inf. Process. Syst. 34, 2021, pp. 22419–22430

work page 2021

[16] [16]

A time series is worth 64 words: Long-term forecasting with transformers,

Y . Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, “A time series is worth 64 words: Long-term forecasting with transformers,” inProc. Int. Conf. Learn. Representations (ICLR), 2023

work page 2023

[17] [17]

Dynamic aspects of earning mobility,

L. A. Lillard and R. J. Willis, “Dynamic aspects of earning mobility,” Econometrica, vol. 46, no. 5, pp. 985–1012, Sept. 1978

work page 1978

[18] [18]

The use of time series processes to model the error structure of earnings in a longitudinal data analysis,

T. E. MaCurdy, “The use of time series processes to model the error structure of earnings in a longitudinal data analysis,”J. Econometrics, vol. 18, no. 1, pp. 83–114, Jan. 1982

work page 1982

[19] [19]

Earnings, consumption and life cycle choices,

C. Meghir and L. Pistaferri, “Earnings, consumption and life cycle choices,” inHandbook of Labor Economics, vol. 4B, O. Ashenfelter and D. Card, Eds. Amsterdam: Elsevier, 2011, pp. 773–854

work page 2011

[20] [20]

Conformal time series forecasting,

K. Stankeviciute, A. Alaa, and M. van der Schaar, “Conformal time series forecasting,” inAdv. Neural Inf. Process. Syst. 34, 2021, pp. 6216–6228

work page 2021

[21] [21]

Conformal prediction interval for dynamic time-series,

C. Xu and Y . Xie, “Conformal prediction interval for dynamic time-series,” inProc. Int. Conf. Mach. Learn. (ICML), 2021, pp. 11559–11569

work page 2021

[22] [22]

Adaptive conformal prediction for autoregressive forecasting,

A. Bhatnagar, J. Schwarting, and A. Brunner, “Adaptive conformal prediction for autoregressive forecasting,”J. Mach. Learn. Res., vol. 25, no. 87, pp. 1–42, 2024

work page 2024

[23] [23]

Microsimulation as a tool for evaluating redistribution policies,

F. Bourguignon and A. Spadaro, “Microsimulation as a tool for evaluating redistribution policies,”J. Econ. Inequality, vol. 4, no. 1, pp. 77–106, Apr. 2006

work page 2006

[24] [24]

EUROMOD: The European Union tax- benefit microsimulation model,

H. Sutherland and F. Figari, “EUROMOD: The European Union tax- benefit microsimulation model,”Int. J. Microsimul., vol. 6, no. 1, pp. 4–26, 2013

work page 2013

[25] [25]

FASIT: The Swedish micro simulation model for the household sector,

L. Flood, “FASIT: The Swedish micro simulation model for the household sector,” Working Paper, Univ. of Gothenburg, 2024

work page 2024

[26] [26]

TRIM3 user’s guide,

L. Wheaton, “TRIM3 user’s guide,” Working Paper, Urban Institute, Washington, DC, 2008

work page 2008

[27] [27]

Gaussian Error Linear Units (GELUs)

D. Hendrycks and K. Gimpel, “Gaussian error linear units (GELUs),” arXiv:1606.08415, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[28] [28]

On layer normalization in the transformer architecture,

R. Xionget al., “On layer normalization in the transformer architecture,” inProc. Int. Conf. Mach. Learn. (ICML), 2020, pp. 10524–10533

work page 2020

[29] [29]

Decoupled weight decay regularization,

I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in Proc. Int. Conf. Learn. Representations (ICLR), 2019

work page 2019

[30] [30]

Deep networks with stochastic depth,

G. Huang, Y . Sun, Z. Liu, D. Sedra, and K. Q. Weinberger, “Deep networks with stochastic depth,” inProc. Eur. Conf. Comput. Vis. (ECCV), 2016, pp. 646–661

work page 2016

[31] [31]

Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations,

M. Arellano and S. Bond, “Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations,”Rev. Econ. Stud., vol. 58, no. 2, pp. 277–297, Apr. 1991

work page 1991

[32] [32]

LightGBM: A highly efficient gradient boosting decision tree,

G. Keet al., “LightGBM: A highly efficient gradient boosting decision tree,” inAdv. Neural Inf. Process. Syst. 30, 2017, pp. 3146–3154

work page 2017

[33] [33]

Long short-term memory,

S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997

work page 1997

[34] [34]

Strictly proper scoring rules, prediction, and estimation,

T. Gneiting and A. E. Raftery, “Strictly proper scoring rules, prediction, and estimation,”J. Amer. Statist. Assoc., vol. 102, no. 477, pp. 359–378, Mar. 2007

work page 2007

[35] [35]

A simple, positive semi-definite, heteroskedas- ticity and autocorrelation consistent covariance matrix,

W. Newey and K. West, “A simple, positive semi-definite, heteroskedas- ticity and autocorrelation consistent covariance matrix,”Econometrica, vol. 55, no. 3, pp. 703–708, May 1987

work page 1987

[36] [36]

Axiomatic attribution for deep networks,

M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” inProc. Int. Conf. Mach. Learn. (ICML), 2017, pp. 3319– 3328

work page 2017

[37] [37]

Membership inference attacks against machine learning models,

R. Shokri, M. Stronati, C. Song, and V . Shmatikov, “Membership inference attacks against machine learning models,” inProc. IEEE Symp. Secur. Privacy (SP), 2017, pp. 3–18

work page 2017

[38] [38]

Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator,

A. Dvoretzky, J. Kiefer, and J. Wolfowitz, “Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator,”Annals of Mathematical Statistics, vol. 27, no. 3, pp. 642-669, 1956

work page 1956

[39] [39]

The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality,

P. Massart, “The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality,”Annals of Probability, vol. 18, no. 3, pp. 1269-1283, 1990

work page 1990

[40] [40]

Revisiting deep learning models for tabular data,

Y . Gorishniy, I. Rubachev, V . Khrulkov, and A. Babenko, “Revisiting deep learning models for tabular data,” inAdv. Neural Inf. Process. Syst. 34, 2021, pp. 18932–18943

work page 2021

[41] [41]

SAINT: Improved neural networks for tabular data via row attention and contrastive pre-training.arXiv preprint arXiv:2106.01342,

G. Somepalli, M. Goldblum, A. Schwarzschild, C. B. Bruss, and T. Goldstein, “SAINT: Improved neural networks for tabular data via row attention and contrastive pre-training,”arXiv:2106.01342, June 2021

work page arXiv 2021

[42] [42]

V ovk, A

V . V ovk, A. Gammerman, and G. Shafer,Algorithmic Learning in a Random World. New York, NY , USA: Springer, 2005

work page 2005

[43] [43]

Distribution-free predictive inference for regression,

J. Lei, M. G’Sell, A. Rinaldo, R. J. Tibshirani, and L. Wasserman, “Distribution-free predictive inference for regression,”J. Amer. Statist. Assoc., vol. 113, no. 523, pp. 1094–1111, July 2018

work page 2018

[44] [44]

Conformal prediction: A gentle introduction,

A. N. Angelopoulos and S. Bates, “Conformal prediction: A gentle introduction,”Found. Trends Mach. Learn., vol. 16, no. 4, pp. 494–591, 2023

work page 2023

[45] [45]

Comparing predictive accuracy,

F. X. Diebold and R. S. Mariano, “Comparing predictive accuracy,”J. Bus. Econ. Statist., vol. 13, no. 3, pp. 253–263, July 1995. Gustav Olaf Yunus Laitinen-Fredriksson Lund- str¨om-Imanovreceived the M.Sc. degree in statistics and machine learning from Link ¨oping University, Link¨oping, Sweden, in 2026. He is currently pursuing the B.Sc. degree in milita...

work page 1995