Multi-Dimensional Composite Endpoint Analysis via the Choquet Integral: Block Recurrent Encoding and Comparative Advantage Mapping

Ibrahim Halil Tanboga

arxiv: 2604.08101 · v1 · submitted 2026-04-09 · 📊 stat.ME · stat.AP

Multi-Dimensional Composite Endpoint Analysis via the Choquet Integral: Block Recurrent Encoding and Comparative Advantage Mapping

Ibrahim Halil Tanboga This is my paper

Pith reviewed 2026-05-10 18:04 UTC · model grok-4.3

classification 📊 stat.ME stat.AP

keywords composite endpointsChoquet integralrecurrent eventspermutation testcardiovascular outcomesfuzzy measuresShapley decompositionsimulation study

0 comments

The pith

A Choquet integral method for composite endpoints outperforms Cox time-to-first-event in 15 of 17 simulation scenarios.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a new way to analyze composite endpoints in heart trials that combines information from six different outcome measures using the Choquet integral. Standard methods lose information by focusing only on the first event or ignoring ties and recurrent data. This approach encodes recurrent events using area under the cumulative count curve and last event time, then aggregates all six dimensions through a non-additive fuzzy measure that includes pairwise interactions. Simulations across 17 non-null scenarios show higher success rates than Cox TTFE in 15 cases, WLW in 14, and win ratio in 10, with exact type I error control. It also provides component attribution via Shapley decomposition.

Core claim

CWOT-CE encodes K=6 outcome dimensions—survival, event-free time, AUC recurrent burden, last event time, biomarker, and alive status—and aggregates them via the Choquet integral with a non-additive fuzzy measure including pairwise interaction terms. Inference uses permutation testing with exact finite-sample type I error control. In 5,000-replication null simulations it achieves 4.8% type I error, and across 17 alternative scenarios it outperforms Cox time-to-first-event in 15 (mean gain 28.8 percentage points), WLW in 14 (27.2 pp), and Win Ratio in 10.

What carries the argument

The Choquet integral with block recurrent encoding that summarizes recurrent event processes by AUC burden and last event time, aggregated under a fuzzy measure with pairwise terms.

If this is right

Maintains nominal type I error rate of approximately 5% under the sharp null via permutation test.
Demonstrates superior power particularly in high-correlation and mortality-driven effect settings.
Shapley value decomposition correctly identifies effect-bearing outcome components.
Provides dual confidence intervals obtained by inversion of the permutation distribution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Trials could capture richer recurrent-event information without power loss if the fuzzy measure is calibrated to clinical priorities.
The method's interpretability via component attribution may help regulators and clinicians understand what drives overall treatment effects.
Extensions to other disease areas with multi-outcome endpoints could follow similar encoding and aggregation steps.

Load-bearing premise

The non-additive fuzzy measure can be chosen to reflect genuine clinical trade-offs among the six dimensions without introducing specification bias that affects validity.

What would settle it

A new set of 5,000 simulations under the null hypothesis using the identical fuzzy measure and encoding, where the observed rejection rate at nominal 5% level deviates substantially from 5%.

Figures

Figures reproduced from arXiv: 2604.08101 by Ibrahim Halil Tanboga.

**Figure 5.** Figure 5: Method selectivity under non-sharp nulls. [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

read the original abstract

Background: Composite endpoints in cardiovascular trials combine heterogeneous outcomes-mortality, nonfatal events, hospitalizations, and biomarkers-yet conventional analytical methods sacrifice information by targeting a single dimension. Cox time-to-first-event ignores post-first-event data; Win Ratio discards tied pairs; negative binomial regression treats death as noninformative censoring. Methods: We propose CWOT-CE: a Choquet integral-based composite endpoint analysis that encodes K = 6 outcome dimensions-survival, event-free time, AUC recurrent burden, last event time, biomarker, and alive status-and aggregates them through a non-additive fuzzy measure with pairwise interaction terms. The recurrent event process is represented as two complementary scalar summaries: the area under the cumulative count curve (AUC burden) and the last event time. Inference is via permutation test with exact finite-sample Type I error control and dual confidence interval by inversion. We conducted a simulation study comparing CWOT-CE against Cox TTFE, Win Ratio (WRrec), and WLW across 20 clinically motivated scenarios (1,000-5,000 replications). Results: Under the sharp null (5,000 replications), all methods maintained nominal Type I error (CWOT-CE: 4.8%, MCSE 0.3%). Across 17 non-null scenarios, CWOT-CE outperformed Cox TTFE in 15 (mean +28.8 pp), WLW in 14 (mean +27.2 pp), and Win Ratio in 10, with 5 ties and only 2 narrow losses (mean +5.6 pp). CWOT-CE showed particular advantages in high-correlation settings (+35.4 pp vs. WR), mortality-driven effects (+10.7 pp), and balanced multi-component effects (+10.1 pp). Shapley decomposition correctly identified effect-bearing components across all calibration scenarios. Conclusions: CWOT-CE with block recurrent encoding is broadly effective across clinically relevant scenarios while offering unique interpretive advantages through component attribution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The Choquet integral approach with recurrent event encoding is a distinct technical step for composite endpoints, but its reported gains rest on whether the fuzzy measure is locked in before seeing data.

read the letter

The paper introduces CWOT-CE, which encodes six dimensions including recurrent events as AUC burden plus last event time, then aggregates them with a non-additive fuzzy measure that includes pairwise interactions before applying Shapley decomposition for attribution. Inference uses a permutation test. Simulations across 20 scenarios keep Type I error near nominal under the sharp null and show outperformance versus Cox TTFE in 15 of 17 non-null cases, WLW in 14, and Win Ratio in 10, with larger edges in high-correlation and mortality-driven settings. The encoding and attribution pieces are not standard in the cited methods, so that combination is new within the subfield. The permutation test and consistent simulation results are clear strengths; they give a non-parametric route that avoids some of the information loss in time-to-first-event or win-ratio approaches. The method also keeps the recurrent process in two scalar summaries rather than discarding post-first-event data. The main soft spot is the fuzzy measure itself. The abstract and stress-test note both leave open how the weights and interaction terms are chosen or validated. If they are set from external clinical input and fixed before any data inspection, the permutation test retains its exact finite-sample control. If any calibration or emphasis on observed correlations occurs, exchangeability under the null breaks and the power advantages become harder to interpret. That detail is load-bearing. Readers working on cardiovascular trial analysis or multi-outcome methods will find the encoding and comparison setup useful. The work is coherent on its own terms and shows honest engagement with the limitations of existing tools. It deserves peer review so the measure construction can be clarified and the simulations can be checked for robustness under different pre-specified measures.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces CWOT-CE, a Choquet-integral-based method for composite endpoint analysis that encodes six outcome dimensions (survival, event-free time, AUC recurrent burden, last event time, biomarker, alive status) via a non-additive fuzzy measure with pairwise interaction terms, represents recurrent events through block encoding, and performs inference via permutation tests with claimed exact finite-sample Type I error control. Simulations across 20 scenarios (including 17 non-null) report that CWOT-CE outperforms Cox TTFE in 15 cases (mean +28.8 pp), WLW in 14 (+27.2 pp), and Win Ratio in 10, while maintaining 4.8% Type I error under the sharp null and providing Shapley-value attribution of component effects.

Significance. If the fuzzy measure can be pre-specified on clinical grounds without data-dependent calibration, the approach would offer a flexible, non-additive aggregation framework that retains more information than time-to-first-event or win-ratio methods while preserving exact permutation inference and enabling component-wise interpretability. The reported simulation results under the sharp null and the consistent outperformance in high-correlation and mortality-driven regimes would then constitute a substantive methodological contribution for multi-dimensional cardiovascular endpoints.

major comments (2)

[Methods] Methods section (fuzzy measure definition and Choquet integral aggregation): The manuscript does not state whether the fuzzy measure weights and pairwise interaction terms for the six dimensions are fixed a priori on the basis of clinical considerations or calibrated to the observed data or simulation realizations. Because the permutation test statistic is a function of this measure, any data-dependent choice violates exchangeability under the sharp null and voids the exact finite-sample Type I error guarantee (reported as 4.8 % with MCSE 0.3 %). The headline superiority claims (+28.8 pp vs Cox TTFE, +27.2 pp vs WLW) therefore rest on an unverified assumption that the measure is pre-specified; explicit confirmation and, if applicable, a sensitivity analysis under fixed measures are required.
[Simulation study] Simulation study section (scenario calibration): The description of the 20 clinically motivated scenarios does not specify the exact fuzzy measure (including interaction coefficients) used in each replication. If the measure was tuned to emphasize high-correlation or mortality-driven regimes after inspecting the data-generating process, the reported outperformance counts (15/17 vs Cox TTFE, 14/17 vs WLW) and the Shapley decomposition results become conditional on post-hoc weighting rather than intrinsic properties of the estimator.

minor comments (2)

[Abstract] Abstract: The methods paragraph should indicate, even briefly, that the fuzzy measure is intended to be pre-specified on clinical grounds, so that readers can immediately assess the practical requirements of the procedure.
[Methods] Notation: The six scalar summaries (survival, event-free time, AUC burden, last event time, biomarker, alive status) are introduced without an explicit mapping to the arguments of the Choquet integral; a short table or equation block would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive comments, which have helped us strengthen the clarity and transparency of the manuscript. We address each major comment below and have revised the paper accordingly to confirm pre-specification of the fuzzy measure and to provide explicit details on its use in simulations.

read point-by-point responses

Referee: [Methods] Methods section (fuzzy measure definition and Choquet integral aggregation): The manuscript does not state whether the fuzzy measure weights and pairwise interaction terms for the six dimensions are fixed a priori on the basis of clinical considerations or calibrated to the observed data or simulation realizations. Because the permutation test statistic is a function of this measure, any data-dependent choice violates exchangeability under the sharp null and voids the exact finite-sample Type I error guarantee (reported as 4.8 % with MCSE 0.3 %). The headline superiority claims (+28.8 pp vs Cox TTFE, +27.2 pp vs WLW) therefore rest on an unverified assumption that the measure is pre-specified; explicit confirmation and, if applicable, a sensitivity analysis under fixed measures are required.

Authors: We confirm that the fuzzy measure (including all weights and pairwise interaction terms) is pre-specified a priori on clinical grounds, drawing on expert input from cardiologists and prior literature on composite cardiovascular endpoints, and is fixed independently of any observed data or simulation realizations. This preserves exchangeability under the sharp null and the exact finite-sample Type I error control of the permutation test. We have added a new subsection in the Methods section that explicitly states this pre-specification, provides the clinical rationale and chosen parameter values, and includes a sensitivity analysis across alternative pre-specified measures to demonstrate robustness of the reported performance advantages. revision: yes
Referee: [Simulation study] Simulation study section (scenario calibration): The description of the 20 clinically motivated scenarios does not specify the exact fuzzy measure (including interaction coefficients) used in each replication. If the measure was tuned to emphasize high-correlation or mortality-driven regimes after inspecting the data-generating process, the reported outperformance counts (15/17 vs Cox TTFE, 14/17 vs WLW) and the Shapley decomposition results become conditional on post-hoc weighting rather than intrinsic properties of the estimator.

Authors: The fuzzy measures used in each of the 20 scenarios were determined a priori based on the clinical motivations and expected correlation structures of those scenarios, without any post-hoc tuning or inspection of the data-generating processes. In the revised manuscript we now provide an explicit table (new Table 2) listing the precise weights and pairwise interaction coefficients applied in every scenario, along with the clinical justification for each choice. This ensures full transparency and confirms that the reported outperformance and Shapley-value results reflect the intrinsic properties of the pre-specified CWOT-CE estimator. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper defines CWOT-CE via a Choquet integral over a non-additive fuzzy measure with pairwise terms, encodes recurrent events via AUC burden and last-event time, and performs inference via a permutation test asserted to have exact finite-sample Type I error under a sharp null. Simulation comparisons across 20 scenarios are presented as empirical performance evaluation rather than a derivation that reduces to fitted inputs or self-citations. No load-bearing step equates a claimed prediction or uniqueness result to its own construction by the paper's equations; the fuzzy measure is treated as a fixed modeling choice whose clinical specification is external to the statistical procedure. The reported Type I error rate (4.8%) and outperformance counts are therefore not forced by construction within the given text.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim depends on the clinical relevance of the chosen six dimensions and the ability of a pairwise-interaction fuzzy measure to represent outcome trade-offs without circular dependence on the simulation data.

free parameters (1)

fuzzy measure weights and interaction terms
The non-additive measure that aggregates the six dimensions requires specification of 6 singleton weights plus 15 pairwise terms; these are not stated to be fixed a priori.

axioms (2)

domain assumption The six chosen dimensions (survival, event-free time, AUC recurrent burden, last event time, biomarker, alive status) together capture the clinically relevant information in a cardiovascular composite endpoint.
Invoked when defining the input to the Choquet integral; no external validation of completeness is provided.
standard math A permutation test on the composite score yields exact finite-sample Type I error control even when the fuzzy measure contains interaction terms.
Relies on exchangeability under the sharp null; standard for permutation tests but assumes the score is computed identically on permuted data.

pith-pipeline@v0.9.0 · 5665 in / 1702 out tokens · 54285 ms · 2026-05-10T18:04:14.959194+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Choquet integral with a validated 2-additive fuzzy measure µ ... Default K=6 weights ... Default interactions: Survival–Alive redundancy (I1,6=−0.05) ...
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery / initial Peano object unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

permutation test with exact finite-sample Type I error control

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · 1 internal anchor

[1]

and Gill, R.D

Andersen, P.K. and Gill, R.D. (1982). Cox’s regression model for counting processes.Ann. Stat., 10:1100–1120

work page 1982
[2]

Boulesteix, A.L., Lauer, S., and Eugster, M.J. (2013). A plea for neutral comparison studies. PLoS ONE, 8:e61562

work page 2013
[3]

Claggett, B.L., Pocock, S.J., Wei, L.J., et al. (2018). Comparison of time-to-first event and recurrent-event methods.Circulation, 138:570–577

work page 2018
[4]

(1997).k-order additive discrete fuzzy measures.Fuzzy Sets Syst., 92:167–189

Grabisch, M. (1997).k-order additive discrete fuzzy measures.Fuzzy Sets Syst., 92:167–189

work page 1997
[5]

and Nadeau, C

Lawless, J.F. and Nadeau, C. (1995). Some simple robust methods for recurrent events. Technometrics, 37:158–168. 16

work page 1995
[6]

Mao, L., Kim, K., and Miao, X. (2022). The Win Ratio with recurrent event and death outcomes.Stat. Med., 41:1871–1890

work page 2022
[7]

Morris, T.P., White, I.R., and Crowther, M.J. (2019). Using simulation studies to evaluate statistical methods.Stat. Med., 38:2074–2102. Orué, A., Dinart, D., Billot, L., Bellera, C., and Rondeau, V. (2025). A comparative overview of Win Ratio and Joint Frailty models.arXiv:2512.13629

work page internal anchor Pith review arXiv 2019
[8]

and Rauch, G

Ozga, A.K. and Rauch, G. (2022). Weighted composite endpoints with recurrent events. BMC Med. Res. Method., 22:57

work page 2022
[9]

Pawel, S., Kook, L., and Reeve, K. (2024). Pitfalls and potentials in simulation studies.Stat. Med., 43:2025–2042

work page 2024
[10]

Pocock, S.J., Ariti, C.A., Collier, T.J., and Wang, D. (2012). The Win Ratio: a new approach to composite endpoints.Eur. Heart J., 33:176–182

work page 2012
[11]

Rauch, G., Jahn-Eimermacher, A., Brannath, W., and Kieser, M. (2014). Opportunities and challenges of combined effect measures.Stat. Med., 33:1104–1120

work page 2014
[12]

Rondeau, V., Mathoulin-Pélissier, S., Jacqmin-Gadda, H., Brouste, V., and Soubeyran, P. (2007). Joint frailty models for recurring events and death.Biostatistics, 8:708–721

work page 2007
[13]

Wang, T. (2023). Novel statistical methods for composite endpoints. PhD dissertation, Uni- versity of Wisconsin–Madison

work page 2023
[14]

Wei, L.J., Lin, D.Y., and Weissfeld, L. (1989). Regression analysis of multivariate incomplete failure time data.JASA, 84:1065–1073. 17

work page 1989

[1] [1]

and Gill, R.D

Andersen, P.K. and Gill, R.D. (1982). Cox’s regression model for counting processes.Ann. Stat., 10:1100–1120

work page 1982

[2] [2]

Boulesteix, A.L., Lauer, S., and Eugster, M.J. (2013). A plea for neutral comparison studies. PLoS ONE, 8:e61562

work page 2013

[3] [3]

Claggett, B.L., Pocock, S.J., Wei, L.J., et al. (2018). Comparison of time-to-first event and recurrent-event methods.Circulation, 138:570–577

work page 2018

[4] [4]

(1997).k-order additive discrete fuzzy measures.Fuzzy Sets Syst., 92:167–189

Grabisch, M. (1997).k-order additive discrete fuzzy measures.Fuzzy Sets Syst., 92:167–189

work page 1997

[5] [5]

and Nadeau, C

Lawless, J.F. and Nadeau, C. (1995). Some simple robust methods for recurrent events. Technometrics, 37:158–168. 16

work page 1995

[6] [6]

Mao, L., Kim, K., and Miao, X. (2022). The Win Ratio with recurrent event and death outcomes.Stat. Med., 41:1871–1890

work page 2022

[7] [7]

Morris, T.P., White, I.R., and Crowther, M.J. (2019). Using simulation studies to evaluate statistical methods.Stat. Med., 38:2074–2102. Orué, A., Dinart, D., Billot, L., Bellera, C., and Rondeau, V. (2025). A comparative overview of Win Ratio and Joint Frailty models.arXiv:2512.13629

work page internal anchor Pith review arXiv 2019

[8] [8]

and Rauch, G

Ozga, A.K. and Rauch, G. (2022). Weighted composite endpoints with recurrent events. BMC Med. Res. Method., 22:57

work page 2022

[9] [9]

Pawel, S., Kook, L., and Reeve, K. (2024). Pitfalls and potentials in simulation studies.Stat. Med., 43:2025–2042

work page 2024

[10] [10]

Pocock, S.J., Ariti, C.A., Collier, T.J., and Wang, D. (2012). The Win Ratio: a new approach to composite endpoints.Eur. Heart J., 33:176–182

work page 2012

[11] [11]

Rauch, G., Jahn-Eimermacher, A., Brannath, W., and Kieser, M. (2014). Opportunities and challenges of combined effect measures.Stat. Med., 33:1104–1120

work page 2014

[12] [12]

Rondeau, V., Mathoulin-Pélissier, S., Jacqmin-Gadda, H., Brouste, V., and Soubeyran, P. (2007). Joint frailty models for recurring events and death.Biostatistics, 8:708–721

work page 2007

[13] [13]

Wang, T. (2023). Novel statistical methods for composite endpoints. PhD dissertation, Uni- versity of Wisconsin–Madison

work page 2023

[14] [14]

Wei, L.J., Lin, D.Y., and Weissfeld, L. (1989). Regression analysis of multivariate incomplete failure time data.JASA, 84:1065–1073. 17

work page 1989