pith. sign in

arxiv: 2311.11487 · v2 · submitted 2023-11-20 · 📊 stat.ME · stat.AP

Modeling Insurance Claims using Bayesian Nonparametric Regression

Pith reviewed 2026-05-24 06:21 UTC · model grok-4.3

classification 📊 stat.ME stat.AP
keywords Bayesian nonparametric regressioninsurance claims predictionDirichlet processPitman-Yor processmixture of regressionsPoisson regressionnormal regressionclaims frequency and severity
0
0 comments X

The pith

Bayesian nonparametric regression models predict insurance claims more accurately by allowing each data point its own parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Traditional parametric regression models assume the same functional form between covariates and claims for every observation, which often fails when claims frequency and severity are multimodal, skewed, and heavy-tailed. This paper develops Bayesian nonparametric alternatives that place Dirichlet process or Pitman-Yor process priors on the mixing distribution of regression coefficients, yielding a mixture of Poisson regressions for frequency and a mixture of normal regressions for log severity. Each observation thereby draws its own parameter values from the mixing measure, producing greater flexibility than a single shared regression function. The models are fitted with MCMC and applied to French motor insurance data to show gains in predictive accuracy. A reader would care because more accurate forecasts of individual claims support better premium setting by actuaries.

Core claim

The paper establishes that Bayesian nonparametric regression models based on Dirichlet process and Pitman-Yor process mixtures outperform traditional parametric regression for predicting insurance claims frequency and severity by accommodating individual-level variation in the regression parameters.

What carries the argument

Mixture of regressions with Dirichlet process or Pitman-Yor process priors over the distribution of regression coefficients

If this is right

  • The models capture individual-level relationships between covariates and claims more effectively than a single shared functional form.
  • They accommodate multimodality, skewness, and heavy tails in both frequency and severity distributions.
  • Actuaries obtain improved forecasts for setting premiums based on observed risk factors.
  • The same construction applies to both count-valued frequency and continuous severity responses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same mixture-regression approach with DP or PY priors could be tested on other heterogeneous prediction problems outside insurance, such as medical cost forecasting.
  • Simulation studies with known multimodal data-generating processes would help quantify how much of the reported gain comes from the nonparametric flexibility versus MCMC tuning.
  • Direct comparisons against other flexible methods such as quantile regression or tree ensembles on the same claims data would clarify the relative strengths of the BNP construction.

Load-bearing premise

The claims data exhibits the multimodality, skewness, and heavy tails that the mixture-of-regressions construction is intended to capture, and MCMC sampling from the DP/PY posterior converges reliably enough to support the accuracy claim.

What would settle it

A direct comparison on the French motor insurance data in which a standard parametric regression achieves equal or higher out-of-sample predictive accuracy than the DP or PY mixture models would falsify the improved-accuracy claim.

Figures

Figures reproduced from arXiv: 2311.11487 by Kaushik Ghosh, Mostafa Shams Esfand Abadi.

Figure 1
Figure 1. Figure 1: Trace plot of α in DPMM model for the French motor claims frequency dataset. 0 5000 10000 15000 20000 25000 −0.5 0.0 0.5 1.0 1.5 2.0 Index Precision parameter (a) Trace plot of α 0 5000 10000 15000 20000 25000 0.0 0.2 0.4 0.6 0.8 1.0 Index Discount parameter (b) Trace plot of d [PITH_FULL_IMAGE:figures/full_fig_p016_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Trace plots of α and d in PYMM model for the French motor claims frequency dataset. models, as described in Equations (8) and (9). We used a chi-square goodness-of-fit test to compare the predictive distribution of the number of claims obtained from the testing data with the observed distribution in the testing data. Since there were 3 categories of claims for the class of policyholders with car age 2 and … view at source ↗
Figure 3
Figure 3. Figure 3: Histogram of the number of distinct components for the French motor claims frequency dataset. obtained from the testing data. We also compared our BNP regression models to the classical non-Bayesian para￾metric Poisson regression. The posterior predictive distribution for a class of policy￾holders with car age 2 and driver age 62 versus the histogram of testing data for this class of policyholders for the … view at source ↗
Figure 4
Figure 4. Figure 4: The plots suggest that our DPMM and PYMM models are able to capture [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
Figure 4
Figure 4. Figure 4: Predictive distribution versus histogram of testing data for a class of policyholders with car age 2 and driver age 62 for the French motor claims frequency dataset. based on samples from the posterior distribution of β1 , . . . , βn . We computed the n × n dissimilarity matrix based on samples from posterior distribution of β1 , . . . , βn using the approach described in the Section 2.3. Heat maps of the … view at source ↗
Figure 5
Figure 5. Figure 5: Heatmap of the dissimilarity matrix showing clustering performance for the French motor claims frequency dataset. 5. Modeling Claims Severity The second step in developing any insurance pricing model is predicting the claims severity, i.e. the amount of each claim. Let us assume that there are n independent policyholders each with a set of k covariates. The ith policyholder’s claim amount is de￾noted by zi… view at source ↗
Figure 6
Figure 6. Figure 6: Trace plot of α in DPMM model for the French motor claims severity dataset. 0 5000 10000 15000 20000 25000 0.0 0.5 1.0 1.5 Index Precision parameter (a) Trace plot of α 0 5000 10000 15000 20000 25000 0.0 0.1 0.2 0.3 0.4 0.5 Index Discount parameter (b) Trace plot of d [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Trace plots of α and d in PYMM model for the French motor claims severity dataset. We calculated the predictive density of the log(claims amount) for our BNP regression models using Equations (12) and (13). Also, we compared our BNP re￾gression models to the classical non-Bayesian parametric normal regression (multiple linear regression model). The posterior predictive density estimate plot for a class of … view at source ↗
Figure 8
Figure 8. Figure 8: Histogram of the number of distinct components for the French motor claims severity dataset. The MSE of the predictions obtained using the three models are shown in [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Posterior predictive density estimate versus histogram of the testing data for a class of policyholders with the car age 9 and the driver age 38 for the French motor claims severity dataset. 7. Conclusions and future directions Based on the findings in this article, we conclude that our BNP regression models are able to capture the shape of the testing data set very well, show a very good prediction perfor… view at source ↗
Figure 10
Figure 10. Figure 10: Heatmap of the dissimilarity matrix showing clustering performance for the French motor claims severity dataset. 2.5 5.0 7.5 10.0 12.5 0 10 20 30 CarAge log(ClaimAmount) Cluster 1 2 3 (a) DPMM 2.5 5.0 7.5 10.0 12.5 0 10 20 30 CarAge log(ClaimAmount) Cluster 1 2 3 (b) PYMM [PITH_FULL_IMAGE:figures/full_fig_p028_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Scatter plot of log(claims amount) versus car age for the French motor claims severity dataset. histograms of testing data in [PITH_FULL_IMAGE:figures/full_fig_p028_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Scatter plot of log(claims amount) versus driver age for the French motor claims severity dataset. We have coded DPMM for claims frequency, DPMM for log(claims severity), PYMM for claims frequency, and PYMM for log(claims severity) in R, and the codes are available upon request. Each code consists of the model’s MCMC sampling algo￾rithm, its posterior predictive distribution computation, and evaluating it… view at source ↗
read the original abstract

The prediction of future insurance claims based on observed risk factors, or covariates, help the actuary set insurance premiums. Typically, actuaries use parametric regression models to predict claims based on the covariate information. Such models assume the same functional form tying the response to the covariates for each data point. These models are not flexible enough and can fail to accurately capture at the individual level, the relationship between the covariates and the claims frequency and severity, which are often multimodal, highly skewed, and heavy-tailed. In this article, we explore the use of Bayesian nonparametric (BNP) regression models to predict claims frequency and severity based on covariates. In particular, we model claims frequency as a mixture of Poisson regression, and the logarithm of claims severity as a mixture of normal regression. We use the Dirichlet process (DP) and Pitman-Yor process (PY) as a prior for the mixing distribution over the regression parameters. Unlike parametric regression, such models allow each data point to have its individual parameters, making them highly flexible, resulting in improved prediction accuracy. We describe model fitting using MCMC and illustrate their applicability using French motor insurance claims data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The paper proposes Bayesian nonparametric regression models for insurance claims prediction, using Dirichlet process and Pitman-Yor process priors on mixtures of Poisson regressions for claim frequency and normal regressions for log claim severity. It argues that, unlike parametric models assuming a common functional form, these allow each observation its own regression parameters, yielding greater flexibility for multimodal, skewed, and heavy-tailed data and thus improved prediction accuracy; the models are fit via MCMC and illustrated on French motor insurance claims data.

Significance. If the accuracy improvement were demonstrated with quantitative out-of-sample metrics and baselines, the work would supply a flexible BNP alternative to standard GLM-based actuarial models for complex claims distributions, with potential value for premium setting.

major comments (3)
  1. [Abstract] Abstract: The central claim that the BNP construction 'resulting in improved prediction accuracy' supplies neither quantitative metrics (e.g., RMSE, log predictive density), baseline comparisons to parametric Poisson/normal regression, nor any description of how prediction error was measured or cross-validated, so the data-to-claim link cannot be evaluated.
  2. [Application section] Application / illustration section: No evidence, diagnostics, or summary statistics are supplied showing that the French motor claims data actually exhibits the multimodality, skewness, and heavy tails that the mixture-of-regressions construction is intended to capture; without this, the claimed advantage over parametric regression does not follow.
  3. [Model fitting / MCMC section] Model fitting section: No convergence diagnostics, effective sample sizes, or mixing assessments are reported for the MCMC sampler on the DP/PY posterior, which is required to establish that the posterior predictive distributions used for the accuracy claim are reliable on the observed sample size.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment point by point below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that the BNP construction 'resulting in improved prediction accuracy' supplies neither quantitative metrics (e.g., RMSE, log predictive density), baseline comparisons to parametric Poisson/normal regression, nor any description of how prediction error was measured or cross-validated, so the data-to-claim link cannot be evaluated.

    Authors: We agree that the abstract's claim of improved prediction accuracy requires quantitative support. In the revised manuscript we will add out-of-sample metrics (RMSE, log predictive density), explicit baseline comparisons to parametric Poisson and normal regressions, and a description of the cross-validation procedure used to measure prediction error. revision: yes

  2. Referee: [Application section] Application / illustration section: No evidence, diagnostics, or summary statistics are supplied showing that the French motor claims data actually exhibits the multimodality, skewness, and heavy tails that the mixture-of-regressions construction is intended to capture; without this, the claimed advantage over parametric regression does not follow.

    Authors: We acknowledge that demonstrating the relevant data features is necessary to motivate the BNP approach. The revised version will include summary statistics, histograms, and other diagnostics that illustrate the multimodality, skewness, and heavy-tailed behavior present in the French motor insurance claims data. revision: yes

  3. Referee: [Model fitting / MCMC section] Model fitting section: No convergence diagnostics, effective sample sizes, or mixing assessments are reported for the MCMC sampler on the DP/PY posterior, which is required to establish that the posterior predictive distributions used for the accuracy claim are reliable on the observed sample size.

    Authors: We agree that MCMC diagnostics are required to substantiate the reliability of the posterior predictive results. The revised manuscript will report convergence diagnostics, effective sample sizes, and mixing assessments for the DP/PY MCMC sampler. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper's modeling approach uses standard Dirichlet process and Pitman-Yor process priors on regression parameters for mixture-of-Poisson and mixture-of-normal regressions. The claim of improved prediction accuracy follows from the nonparametric flexibility allowing per-observation parameters, but this is presented as an empirical property to be checked against data (French motor claims) rather than a quantity defined by construction from the fitted values themselves. No equations reduce a 'prediction' to a fitted input by definition, no self-citation chains justify uniqueness or ansatzes, and the central construction does not rename a known result or smuggle assumptions via prior work. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the standard modeling assumptions of Bayesian nonparametrics (exchangeability under the DP/PY prior, posterior sampling via MCMC) plus the domain assumption that claims data are well-described by per-observation Poisson and log-normal mixtures. No free parameters or invented entities are named in the abstract.

axioms (2)
  • domain assumption Claims frequency and severity follow mixtures of Poisson and normal regressions whose mixing measure has a DP or PY prior.
    Invoked when the authors state they model frequency as a mixture of Poisson regression and log-severity as a mixture of normal regression.
  • domain assumption MCMC produces reliable posterior samples for the regression parameters.
    Implicit in the statement that models are fitted using MCMC.

pith-pipeline@v0.9.0 · 5728 in / 1307 out tokens · 44127 ms · 2026-05-24T06:21:48.557053+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 1 internal anchor

  1. [1]

    Antoniak, Charles E. 1974. `` Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems .'' The Annals of Statistics 2 (6): 1152 -- 1174. ://doi.org/10.1214/aos/1176342871

  2. [2]

    Dahl, David B. 2007. `` Comment on article by Jain and Neal .'' Bayesian Analysis 2 (3): 473 -- 477. ://doi.org/10.1214/07-BA219A

  3. [3]

    Dutang, Christophe, and Arthur Charpentier. 2020. ``R Package CASdatasets : Insurance datasets.''

  4. [4]

    Escobar, Michael D., and Mike West. 1995. ``Bayesian Density Estimation and Inference Using Mixtures.'' Journal of the American Statistical Association 90 (430): 577--588. ://www.jstor.org/stable/2291069

  5. [5]

    Fall, Mame Diarra, and \'E ric Barat. 2014. `` Gibbs sampling methods for Pitman-Yor mixture models .'' Working paper or preprint, ://hal.science/hal-00740770

  6. [6]

    Fellingham, Gilbert W, Athanasios Kottas, and Brian M Hartman. 2015. ``Bayesian nonparametric predictive modeling of group health claims.'' Insurance: Mathematics and Economics 60: 1--10

  7. [8]

    Frees, Edward. 2018. ``Loss Data Analytics.'' ://arxiv.org/abs/1808.06718

  8. [9]

    Frees, Edward W. 2009. Regression Modeling with Actuarial and Financial Applications. International Series on Actuarial Science. Cambridge University Press

  9. [10]

    Blei, and Warren B

    Hannah, Lauren A., David M. Blei, and Warren B. Powell. 2011. ``Dirichlet Process Mixtures of Generalized Linear Models.'' Journal of Machine Learning Research 12 (54): 1923--1953. ://jmlr.org/papers/v12/hannah11a.html

  10. [11]

    Hartman, Brian, and David Dahl. 2010. ``Bayesian Nonparametric Regression for Diabetes Deaths.''

  11. [12]

    Hong, Liang, and Ryan Martin. 2016. ``On Prediction of Future Insurance Claims When the Model Is Uncertain.'' SSRN Electronic Journal

  12. [13]

    Hong, Liang, and Ryan Martin. 2017. ``A flexible B ayesian nonparametric model for predicting future insurance claims.'' North American Actuarial Journal 21 (2): 228--241

  13. [14]

    Hong, Liang, and Ryan Martin. 2018. ``Dirichlet process mixture models for insurance loss data.'' Scandinavian Actuarial Journal 2018 (6): 545--554

  14. [15]

    Huang, Yifan, and Shengwang Meng. 2020. ``A B ayesian nonparametric model and its application in insurance loss prediction.'' Insurance: Mathematics and Economics

  15. [16]

    Ishwaran, Hemant, and Lancelot F James. 2001. ``Gibbs Sampling Methods for Stick-Breaking Priors.'' Journal of the American Statistical Association 96 (453): 161--173. ://doi.org/10.1198/016214501750332758

  16. [17]

    Jain, Sonia, and Radford M Neal. 2004. ``A Split-Merge M arkov chain M onte C arlo Procedure for the D irichlet Process Mixture Model.'' Journal of Computational and Graphical Statistics 13 (1): 158--182. ://doi.org/10.1198/1061860043001

  17. [18]

    Jain, Sonia, and Radford M. Neal. 2007 a . `` Rejoinder .'' Bayesian Analysis 2 (3): 495 -- 500. ://doi.org/10.1214/07-BA219REJ

  18. [19]

    Jain, Sonia, and Radford M. Neal. 2007 b . `` Splitting and merging components of a nonconjugate Dirichlet process mixture model .'' Bayesian Analysis 2 (3): 445 -- 472. ://doi.org/10.1214/07-BA219

  19. [20]

    Mena, and Igor Prünster

    Lijoi, Antonio, Ramsés H. Mena, and Igor Prünster. 2007. ``Bayesian Nonparametric Estimation of the Probability of Discovering New Species.'' Biometrika 94 (4): 769--786. Accessed 2022-05-24. ://www.jstor.org/stable/20441417

  20. [21]

    MacEachern, Steven N. 2007. `` Comment on article by Jain and Neal .'' Bayesian Analysis 2 (3): 483 -- 494. ://doi.org/10.1214/07-BA219C

  21. [22]

    Neal, Radford M. 2000. ``Markov chain sampling methods for D irichlet process mixture models.'' Journal of Computational and Graphical Statistics 9 (2): 249--265

  22. [23]

    Pitman, Jim, and Marc Yor. 1997. `` The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator .'' The Annals of Probability 25 (2): 855 -- 900. ://doi.org/10.1214/aop/1024404422

  23. [24]

    Plummer, Martyn, Nicky Best, Kate Cowles, and Karen Vines. 2006. `` CODA : Convergence Diagnosis and Output Analysis for MCMC .'' R News 6 (1): 7--11. ://journal.r-project.org/archive/

  24. [25]

    Richardson, Robert, and Brian Hartman. 2018. ``Bayesian nonparametric regression models for modeling and predicting healthcare claims.'' Insurance: Mathematics and Economics 83: 1--8

  25. [26]

    Robert, C. P. 2007. `` Comment on article by Jain and Neal .'' Bayesian Analysis 2 (3): 479 -- 482. ://doi.org/10.1214/07-BA219B

  26. [27]

    Rosenthal

    Roberts, Gareth O., and Jeffrey S. Rosenthal. 2009. ``Examples of Adaptive MCMC .'' Journal of Computational and Graphical Statistics 18 (2): 349--367. ://doi.org/10.1198/jcgs.2009.06134

  27. [28]

    Sethuraman, Jayaram. 1994. ``A constructive definition of D irichlet priors.'' Statistica Sinica 639--650

  28. [29]

    Shams Esfand Abadi, Mostafa. 2022. ``Bayesian Nonparametric Regression Models for Insurance Claims Frequency and Severity.'' PhD dissertation, University of Nevada, Las Vegas. ://digitalscholarship.unlv.edu/thesesdissertations/4619

  29. [30]

    Teh, Yee Whye. 2006. ``A hierarchical B ayesian language model based on P itman– Y or processes.'' In In Coling/ACL, 2006. 9,

  30. [31]

    Tse, Yiu-Kuen. 2009. Nonlife Actuarial Models: Theory, Methods and Evaluation. International Series on Actuarial Science. Cambridge University Press

  31. [32]

    @esa (Ref

    \@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

  32. [33]

    \@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

  33. [34]

    Minimax test and neyman-pearson lemma for capacities

    @open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

  34. [35]

    , " * write output.state after.block = add.period

    ENTRY address archive author booktitle chapter collaboration edition editor eid howpublished institution journal key lastchecked month note number numpages organization pages publisher school series title type url urldate volume year label extra.label sort.label INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.c...

  35. [36]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...