Varying risk exposure in auto insurance: a weighted tweedie framework for experience rating an cancellation penalties

Jean-Philippe Boucher; Julien Trufin; Ra\"issa Coulibaly

arxiv: 2604.02400 · v1 · submitted 2026-04-02 · 📊 stat.AP

Varying risk exposure in auto insurance: a weighted tweedie framework for experience rating an cancellation penalties

Jean-Philippe Boucher , Ra\"issa Coulibaly , Julien Trufin This is my paper

Pith reviewed 2026-05-13 20:38 UTC · model grok-4.3

classification 📊 stat.AP

keywords auto insuranceTweedie modelspolicy cancellationexperience ratingratemakingweighting functionsearned premiumcancellation penalties

0 comments

The pith

Weighted Tweedie models let auto insurers apply exposure-based penalties for mid-term cancellations that offset higher losses from cancellers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a family of Tweedie ratemaking models that incorporate mid-term policy cancellations through weighting functions tied to the level of exposure. Using data from a Canadian auto insurer, it shows that policyholders who cancel early exhibit different claims experience than those who keep coverage until maturity. Flexible weighting structures adjust the earned premium and allow a penalty function that can impose a surcharge, helping recoup losses while following standard actuarial constraints like monotonicity and non-negativity. The authors compare several weighting options using deviance measures, concentration curves, and Murphy diagrams to assess performance. A sympathetic reader would care because mid-term cancellations are common in practice and directly affect both the fairness of premiums and an insurer's ability to cover claims.

Core claim

Using an automobile insurance dataset from a Canadian insurer, the authors build on the classical Tweedie framework by introducing flexible weighting functions and a premium penalty structure that depend on the level of exposure. This allows for a more realistic representation of the earned premium when coverage is interrupted before the end of the policy period. The approach provides both a strategic and competitive advantage by allowing the insurer to indirectly compensate for large losses through a cancellation surcharge, while preserving actuarial coherence and statistical consistency.

What carries the argument

Exposure-dependent weighting functions within the Tweedie compound Poisson-gamma framework, paired with monotonic non-negative penalty structures that adjust the mean response based on the fraction of the policy period actually covered.

If this is right

Insurers can apply a monotonic penalty to charge extra for early cancellations while keeping premiums non-negative and actuarially coherent.
Experience rating becomes more accurate because the model directly reflects varying risk exposure levels during the policy term.
Multiple model-selection tools, including deviance, Lorenz-based curves, and Bregman dominance diagrams, can be used to choose among weighting structures.
The framework supports indirect loss recovery through surcharges without requiring post-hoc adjustments to the base premium.
Statistical consistency is maintained so that the models remain suitable for ongoing ratemaking and reserving.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar weighting structures could be tested in other insurance lines such as homeowners or health where early termination is also common.
If policyholders learn about the surcharges, their cancellation behavior might shift, potentially changing the observed risk differences over time.
Regulators could examine whether the penalty functions produce fair outcomes across different demographic groups in the data.
Combining the approach with telematics data might allow even finer exposure weighting that further reduces residual bias.

Load-bearing premise

The observed difference in claims experience between policyholders who cancel mid-term and those who do not remains stable enough to be captured by exposure-dependent weighting functions without introducing selection bias.

What would settle it

On a new hold-out dataset from the same insurer, if the weighted Tweedie models show no improvement in deviance or area-between-curves scores compared with standard Tweedie models, or if the fitted penalty functions fail to produce a net positive offset against observed losses from cancellers.

Figures

Figures reproduced from arXiv: 2604.02400 by Jean-Philippe Boucher, Julien Trufin, Ra\"issa Coulibaly.

**Figure 1.** Figure 1: All truncation possibilities for a one-year policy period These groups can be better understood by referring to [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Distribution of the risk exposure for the [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Claim frequency by policy year and contract type [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Average cost per claim by contract type Finally, the two figures below display the average loss cost and the average annualized loss cost as functions of risk exposure (with all policyholders in the X X group grouped at an exposure level of 1). For these plots, exposure is discretized into intervals of width 0.05, and policyholders are aggregated within each interval. The size of each point reflects the nu… view at source ↗

**Figure 5.** Figure 5: Average loss cost by risk exposure 2.3 Available Covariates The dataset includes multiple attributes for each contract and vehicle. To investigate how segmentation affects pricing, we focus on five primary covariates, labeled X1 through X5 for confidentiality reasons. These variables represent common risk factors, such as policyholder characteristics, vehicle type, and usage patterns, in line with conventi… view at source ↗

**Figure 6.** Figure 6: Average annualized loss cost by BMS level and contract type [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Proportion of X O contracts by BMS level and number of contracts 3 Characterizing the Mean Response to Risk Exposure 3.1 Ratemaking Strategies When incorporating the risk exposure into premium calculations, two main approaches may be considered: 1. Traditional Approach: For contracts that terminate before the end of the policy year, the conventional actuarial assumption is that the expected claim amount is… view at source ↗

**Figure 8.** Figure 8: Weight function ω(t) (left) and spline-based estimation of γ(t) (right) for the flexible approaches 5.1.2 Mean Parameter Adequacy Test We verify our assumption regarding the mean structure in the proposed flexible approaches by comparing the average loss cost and the average estimated premiums in both the training and test datasets, as illustrated in [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: Observed average loss costs and estimated average premiums (left: training set, right: test set) [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 10.** Figure 10: is in line with the conclusions drawn from the deviance comparison. To construct this figure, we use the empirical counterparts of the concentration and Lorenz curves, computed on the training set; see Denuit et al. (2019). For 0 ≤ θ ≤ 1, these are defined by CCd(θ) = P i: µbi<Fb−1(θ) yi Pn i=1 yi , LCd(θ) = P i: µbi<Fb−1(θ) µbi Pn i=1 µbi , where yi and µbi denote the observed values of Y and of the cand… view at source ↗

**Figure 11.** Figure 11: Mean exposure and proportion of full-year contracts [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗

**Figure 12.** Figure 12: Optimal penalty function as a function of exposure [PITH_FULL_IMAGE:figures/full_fig_p021_12.png] view at source ↗

**Figure 13.** Figure 13: Murphy diagram of the optimal penalty We compare the performance of the model using γcon(t), obtained by imposing both ratemaking constraints (C1) and (C2), with the Exposure-weight approach, represented by the blue (original) line in [PITH_FULL_IMAGE:figures/full_fig_p021_13.png] view at source ↗

**Figure 14.** Figure 14: illustrates several values of a and their corresponding effects on the penalty function. As can be seen, setting a = 100% recovers the optimal penalty function previously discussed, whereas a = 0% corresponds to a null penalty. This figure also highlights that the penalty increases as a increases. Hence, insurers could use the parameter a to adopt a more or less stringent penalty for mid-term cancellation… view at source ↗

**Figure 15.** Figure 15: Evolution of the estimated coefficients as a function of the adjustment parameter [PITH_FULL_IMAGE:figures/full_fig_p023_15.png] view at source ↗

**Figure 16.** Figure 16: Cumulative proportion of premiums and costs by exposure level [PITH_FULL_IMAGE:figures/full_fig_p024_16.png] view at source ↗

**Figure 17.** Figure 17: Annual premium ratio between the flexible and the traditional models [PITH_FULL_IMAGE:figures/full_fig_p025_17.png] view at source ↗

**Figure 18.** Figure 18: Comparison between observed average costs and predicted values across BMS levels [PITH_FULL_IMAGE:figures/full_fig_p025_18.png] view at source ↗

**Figure 19.** Figure 19: Estimated smooth functions γcon(d) by BMS group To identify the most appropriate grouping, we proceeded iteratively by dividing policyholders into two subsets according to their BMS level. Given the ordinal nature of the BMS level—where lower levels (e.g., 95) correspond to safer drivers and higher levels (e.g., 104) to riskier ones—we searched for the optimal cut point that maximizes the difference betwe… view at source ↗

**Figure 20.** Figure 20: Difference between smooth functions for BMS groups [PITH_FULL_IMAGE:figures/full_fig_p027_20.png] view at source ↗

**Figure 21.** Figure 21: Descriptive statistics of all 5 covariates from the database [PITH_FULL_IMAGE:figures/full_fig_p030_21.png] view at source ↗

**Figure 22.** Figure 22: illustrates the evolution of the weight function ωi across the successive iterations of the estimation algorithm described in Section 4. The figure provides a graphical representation of how the spline-based approximation of γ(di) stabilizes as the iterative procedure updates both the mean and weight components. Each curve corresponds to a given iteration k, starting from the initial Exposure Weighted Mod… view at source ↗

read the original abstract

This paper proposes a new family of Tweedie-based ratemaking models that explicitly account for mid-term policy cancellations. Using an automobile insurance dataset from a Canadian insurer, we document a marked difference in claims experience between policyholders who maintain their coverage until maturity and those who cancel their policies mid-term. Building on the classical Tweedie framework, we introduce flexible weighting functions and a premium penalty structure that depend on the level of exposure, allowing for a more realistic representation of the earned premium when coverage is interrupted before the end of the policy period. We compare several weighting structures within the Tweedie framework and examine their theoretical properties, as well as their empirical performance using deviance-based model comparison criteria, an area-between-curves criterion derived from concentration and Lorenz curves, and Murphy diagrams grounded in Bregman dominance. To operationalize the proposed models, monotonicity and non-negativity constraints are imposed on the penalty function, ensuring consistency with actuarial principles. Finally, using real-world data, we show that this approach provides both a strategic and competitive advantage: it allows the insurer to indirectly compensate for large losses through a cancellation surcharge, while preserving actuarial coherence and statistical consistency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds exposure-weighted penalties to Tweedie ratemaking for mid-term cancellations and shows decent fit on Canadian auto data, but the claimed strategic edge hinges on an untested stability assumption.

read the letter

The main thing here is a practical tweak to Tweedie models: they add flexible weighting functions and exposure-dependent penalty structures so that earned premium adjusts when policies cancel early. Using one Canadian insurer's data, they document higher claims among cancellers and fit several versions under monotonicity and non-negativity constraints. The comparisons use deviance, Lorenz-based area-between-curves, and Murphy diagrams, which is a reasonable set of checks for this kind of work. That part is solid and directly addresses a real operational issue in ratemaking. The extension itself is new enough within the cited Tweedie literature, and the constraints keep the penalty function actuarially coherent. The empirical results look usable for pricing teams that already work with Tweedie. The soft spot is the stability assumption. The headline claim that the surcharge compensates for large losses rests on the observed claims gap between cancellers and non-cancellers being stable and fully captured by the exposure weights. If cancellation correlates with unobserved risk factors, the weights risk picking up selection rather than pure exposure effects, and the abstract plus stress-test note give no clear evidence they tested this directly. The metrics they report do not isolate that confound. This is for actuaries and pricing modelers who handle lapse behavior in auto lines. It is narrow but grounded, so it deserves a serious referee who can ask for explicit checks on selection bias and perhaps out-of-sample validation on the penalty structure. I would send it to review rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The paper proposes a new family of Tweedie-based ratemaking models for auto insurance that incorporate mid-term policy cancellations via exposure-dependent weighting functions and a premium penalty structure. Using a Canadian insurer dataset, it documents differences in claims experience between cancellers and non-cancellers, compares alternative weighting structures using deviance criteria, Lorenz-based area-between-curves, and Murphy diagrams, imposes monotonicity and non-negativity constraints on the penalty function, and claims this yields a strategic advantage by allowing indirect compensation for large losses through cancellation surcharges while preserving actuarial coherence.

Significance. If the weighting functions isolate exposure effects without confounding selection bias, the framework offers a practical extension of classical Tweedie models for handling interrupted coverage, potentially improving loss compensation and competitiveness for insurers. The use of real-world data with multiple external comparison criteria (deviance, Bregman dominance via Murphy diagrams) strengthens the empirical grounding, though the central advantage claim hinges on the stability of observed claims differences.

major comments (2)

[Modeling Framework and Empirical Performance] The central claim that the weighted Tweedie approach compensates large losses via cancellation surcharges rests on the assumption that claims differences between cancellers and non-cancellers are stable and fully captured by exposure-dependent weights without selection bias (e.g., from unobserved driver behavior). The imposed monotonicity/non-negativity constraints and fit metrics do not directly test this; if mid-term cancellation correlates with endogenous risk factors, the fitted penalties may absorb rather than remove bias, undermining the strategic advantage result.
[Empirical Results] The paper estimates parameters for the weighting and penalty functions from data rather than deriving them parameter-free; this is appropriate but requires explicit sensitivity analysis to confirm that the documented claims difference remains stable across exposure levels, as any post-hoc adjustments or data exclusions could affect the cross-group comparison.

minor comments (2)

[Data and Methods] Clarify the exact measurement of exposure (e.g., time fraction or mileage) and report sample sizes and exclusion criteria for the canceller vs. non-canceller groups to support replication.
[Notation] Ensure consistent notation for the weighting functions w(·) and penalty p(·) when moving from theoretical properties to the fitted models.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments on our manuscript. We address each major comment below, indicating where revisions will strengthen the paper while maintaining the integrity of our empirical findings and modeling approach.

read point-by-point responses

Referee: [Modeling Framework and Empirical Performance] The central claim that the weighted Tweedie approach compensates large losses via cancellation surcharges rests on the assumption that claims differences between cancellers and non-cancellers are stable and fully captured by exposure-dependent weights without selection bias (e.g., from unobserved driver behavior). The imposed monotonicity/non-negativity constraints and fit metrics do not directly test this; if mid-term cancellation correlates with endogenous risk factors, the fitted penalties may absorb rather than remove bias, undermining the strategic advantage result.

Authors: We agree that unobserved selection effects represent a substantive limitation for interpreting the strategic advantage. The manuscript documents raw differences in claims experience between cancellers and non-cancellers on the Canadian data and uses exposure-dependent weights to adjust earned premium for interrupted coverage. The monotonicity and non-negativity constraints ensure actuarial coherence of the penalty function but do not, by themselves, rule out confounding. In the revision we will add an explicit limitations subsection discussing the possibility of endogenous risk factors and will report stratified results by exposure deciles to show the stability of the observed claims differential. These additions will clarify the scope of the advantage claim without overstating identification. revision: partial
Referee: [Empirical Results] The paper estimates parameters for the weighting and penalty functions from data rather than deriving them parameter-free; this is appropriate but requires explicit sensitivity analysis to confirm that the documented claims difference remains stable across exposure levels, as any post-hoc adjustments or data exclusions could affect the cross-group comparison.

Authors: We accept the need for explicit sensitivity checks. Although the weighting and penalty parameters are estimated from the data to reflect observed patterns, we will add a dedicated sensitivity analysis subsection. This will include re-estimation after trimming extreme exposure values, varying the number of exposure bins, and re-fitting the models on subsamples defined by policy duration. The results will be summarized with the same deviance, area-between-curves, and Murphy-diagram metrics to demonstrate that the claims differential and model rankings remain qualitatively stable. revision: yes

Circularity Check

0 steps flagged

No significant circularity; modeling steps are data-driven and externally validated

full rationale

The paper documents an empirical claims difference between cancellers and non-cancellers from real Canadian auto data, then introduces exposure-dependent weighting functions and penalty structures whose parameters are estimated via standard GLM fitting. Model selection relies on deviance, Lorenz-derived area-between-curves, and Murphy diagrams under Bregman dominance—none of which reduce to the target result by construction. No self-definitional equations, fitted inputs renamed as predictions, or load-bearing self-citations appear in the derivation; the central advantage claim rests on out-of-sample empirical performance rather than tautological re-expression of inputs.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The claim rests on the classical Tweedie compound Poisson-gamma properties for claim modeling plus new ad-hoc weighting and penalty functions whose forms are chosen and fitted to data; no new physical entities are postulated.

free parameters (2)

weighting function parameters
Flexible parameters in the exposure-dependent weighting functions are introduced and estimated from data to adjust the Tweedie mean and variance.
penalty function parameters
Parameters controlling the shape and scale of the monotonic cancellation penalty are fitted while enforcing non-negativity and monotonicity.

axioms (2)

domain assumption Tweedie distribution adequately models the mixture of zero and positive insurance claims
Invoked as the base distribution for ratemaking throughout the paper.
domain assumption Penalty function must be monotonic and non-negative to preserve actuarial fairness
Explicitly imposed as a modeling constraint to ensure consistency with insurance principles.

pith-pipeline@v0.9.0 · 5512 in / 1499 out tokens · 57556 ms · 2026-05-13T20:38:01.522157+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages

[1]

and Coulibaly, R

Boucher, J.-P. and Coulibaly, R. (2024). Bonus-malus scale premiums for tweedie’s compound poisson models.Annals of Actuarial Science, pages 1–25. Boucher, J.-P. and Coulibaly, R. (2026). Comparison of offset and ratio weighted regressions in tweedie models with application to mid-term cancellations.European Actuarial Journal. Casella, G. and Berger, R. (...

work page 2024
[2]

Gneiting, T

Cambridge University Press. Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation.Journal of the American statistical Association, 102(477):359–378. Jørgensen, B. (1997).The theory of dispersion models. CRC Press. Lemaire, J. (2012).Bonus-malus systems in automobile insurance, volume

work page 2007
[3]

Nelder, J

Springer science. Nelder, J. A. and Wedderburn, R. W. (1972). Generalized linear models.Journal of the Royal Statistical Society Series A: Statistics in Society, 135(3):370–384. 28 APREPRINT- APRIL6, 2026 Pechon, F., Denuit, M., and Trufin, J. (2019). Multivariate modelling of multiple guarantees in motor insurance of a household.European Actuarial Journa...

work page 1972
[4]

The table reports the proportion of observations in each category across the training dataset

for simplicity of interpretation. The table reports the proportion of observations in each category across the training dataset. Figure 21: Descriptive statistics of all 5 covariates from the database 30 APREPRINT- APRIL6, 2026 Appendix II: Weight iterations Figure 22 illustrates the evolution of the weight function ωi across the successive iterations of ...

work page 2026

[1] [1]

and Coulibaly, R

Boucher, J.-P. and Coulibaly, R. (2024). Bonus-malus scale premiums for tweedie’s compound poisson models.Annals of Actuarial Science, pages 1–25. Boucher, J.-P. and Coulibaly, R. (2026). Comparison of offset and ratio weighted regressions in tweedie models with application to mid-term cancellations.European Actuarial Journal. Casella, G. and Berger, R. (...

work page 2024

[2] [2]

Gneiting, T

Cambridge University Press. Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation.Journal of the American statistical Association, 102(477):359–378. Jørgensen, B. (1997).The theory of dispersion models. CRC Press. Lemaire, J. (2012).Bonus-malus systems in automobile insurance, volume

work page 2007

[3] [3]

Nelder, J

Springer science. Nelder, J. A. and Wedderburn, R. W. (1972). Generalized linear models.Journal of the Royal Statistical Society Series A: Statistics in Society, 135(3):370–384. 28 APREPRINT- APRIL6, 2026 Pechon, F., Denuit, M., and Trufin, J. (2019). Multivariate modelling of multiple guarantees in motor insurance of a household.European Actuarial Journa...

work page 1972

[4] [4]

The table reports the proportion of observations in each category across the training dataset

for simplicity of interpretation. The table reports the proportion of observations in each category across the training dataset. Figure 21: Descriptive statistics of all 5 covariates from the database 30 APREPRINT- APRIL6, 2026 Appendix II: Weight iterations Figure 22 illustrates the evolution of the weight function ωi across the successive iterations of ...

work page 2026