pith. sign in

arxiv: 2510.20052 · v2 · submitted 2025-10-22 · 🧮 math.OC · cs.LG· stat.ML

Endogenous Aggregation of Multiple Data Envelopment Analysis Scores for Large Data Sets

Pith reviewed 2026-05-18 04:05 UTC · model grok-4.3

classification 🧮 math.OC cs.LGstat.ML
keywords data envelopment analysisefficiency scoresregularizationmulti-dimensional aggregationundesirable outputshospital efficiencyperformance evaluation
0
0 comments X

The pith

Regularized DEA models aggregate efficiency scores across dimensions while capturing correlations among variables.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an approach for evaluating efficiency across multiple dimensions using data envelopment analysis, suitable for large data sets and incorporating desirable and undesirable outputs. Two models are proposed: a slack-based measure (SBM) that estimates an aggregate score first and then distributes it, and a goal programming SBM (GP-SBM) that starts with dimension-level efficiencies before aggregating. A regularization parameter is used in both to improve discriminatory power. Numerical results on various datasets, including a case study of twelve Ontario hospitals over 24 months, show that these models better capture correlations among inputs and outputs than methods that evaluate dimensions separately before aggregation.

Core claim

The authors develop two regularized DEA models for endogenous aggregation of efficiency scores in multi-dimensional settings. The SBM model computes an aggregate efficiency score and allocates it across dimensions, while the GP-SBM first derives dimension-specific scores and then forms an aggregate. Both integrate desirable and undesirable outputs and use regularization to enhance discrimination, demonstrating computational efficiency for large problems and superior correlation capture in hospital performance data.

What carries the argument

Regularized slack-based measure (SBM) and linearized goal programming SBM (GP-SBM) models that perform endogenous aggregation of dimension-specific and aggregate efficiency scores.

Load-bearing premise

The reported outperformance and correlation capture depend on the chosen regularization parameter and the specific structure of the twelve-hospital Ontario dataset over 24 months.

What would settle it

Observing whether the SBM and GP-SBM models continue to outperform conventional separate-then-aggregate methods when tested on different datasets or with alternative regularization parameters.

read the original abstract

We propose an approach for dynamic efficiency evaluation across multiple organizational dimensions using data envelopment analysis (DEA). The method generates both dimension-specific and aggregate efficiency scores, incorporates desirable and undesirable outputs, and is suitable for large-scale problem settings. Two regularized DEA models are introduced: a slack-based measure (SBM) and a linearized version of a nonlinear goal programming model (GP-SBM). While SBM estimates an aggregate efficiency score and then distributes it across dimensions, GP-SBM first estimates dimension-level efficiencies and then derives an aggregate score. Both models utilize a regularization parameter to enhance discriminatory power while also directly integrating both desirable and undesirable outputs. We demonstrate the computational efficiency and validity of our approach on multiple datasets and apply it to a case study of twelve hospitals in Ontario, Canada, evaluating three theoretically grounded dimensions of organizational effectiveness over a 24-month period from January 2018 to December 2019: technical efficiency, clinical efficiency, and patient experience. Our numerical results show that SBM and GP-SBM better capture correlations among input/output variables and outperform conventional benchmarking methods that separately evaluate dimensions before aggregation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The paper proposes two regularized DEA models (SBM and GP-SBM) for endogenous aggregation of multi-dimensional efficiency scores in large-scale settings. SBM computes an aggregate score first then distributes it, while GP-SBM estimates dimension-level efficiencies before aggregation; both incorporate desirable/undesirable outputs via a regularization parameter for improved discrimination. The central claim is that these models better capture correlations among inputs/outputs and outperform conventional separate-then-aggregate benchmarking methods, supported by results on multiple datasets and a case study of twelve Ontario hospitals over 24 months across technical, clinical, and patient-experience dimensions.

Significance. If the outperformance and correlation-capture claims hold after addressing validation gaps, the work would offer a computationally efficient, integrated alternative to post-hoc aggregation in multi-dimensional DEA, with direct relevance to large-scale applications such as healthcare benchmarking.

major comments (3)
  1. Numerical results section: the reported superior correlation capture and outperformance on the Ontario hospital dataset and multiple datasets provide no error bars, statistical significance tests, or cross-validation details, leaving the central claim without quantitative support for robustness.
  2. GP-SBM model: the aggregate score is derived after dimension-level efficiencies are estimated using the regularization parameter; it is unclear whether the final aggregates remain independent of the parameter-fitting process or reduce to quantities defined by that fit, directly affecting the discriminatory-power claim.
  3. Case-study section: the twelve-hospital, 24-month Ontario dataset is presented without sensitivity analysis on the regularization parameter value or explicit exclusion rules for inputs/outputs, so it is impossible to determine whether the observed advantages are general or artifacts of this specific data structure and parameter choice.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We have carefully considered each major comment and provide point-by-point responses below, indicating where revisions will be made to strengthen the paper.

read point-by-point responses
  1. Referee: Numerical results section: the reported superior correlation capture and outperformance on the Ontario hospital dataset and multiple datasets provide no error bars, statistical significance tests, or cross-validation details, leaving the central claim without quantitative support for robustness.

    Authors: We agree that additional quantitative support for robustness would strengthen the central claims. In the revised manuscript, we will incorporate bootstrap resampling (with 1000 replications) to compute standard errors and error bars for the reported efficiency scores and correlation metrics across all datasets. We will also add Wilcoxon signed-rank tests to assess the statistical significance of outperformance relative to conventional separate-then-aggregate benchmarks. For cross-validation, we will include results from a DEA-adapted k-fold procedure (k=5) that holds out subsets of DMUs to evaluate predictive stability of the aggregate scores. These additions directly address the concern while remaining consistent with the non-parametric nature of DEA. revision: yes

  2. Referee: GP-SBM model: the aggregate score is derived after dimension-level efficiencies are estimated using the regularization parameter; it is unclear whether the final aggregates remain independent of the parameter-fitting process or reduce to quantities defined by that fit, directly affecting the discriminatory-power claim.

    Authors: We thank the referee for highlighting this potential ambiguity. In the GP-SBM formulation, the regularization parameter enters only the dimension-level efficiency estimation step to penalize excessive slacks and improve discrimination. The subsequent aggregation step computes a weighted sum of the dimension-level scores, where the weights are derived from an auxiliary correlation matrix estimated independently of the regularization. Consequently, the aggregate score is not a direct function of the regularization parameter alone. To eliminate any confusion, we will revise the model section to include an explicit algebraic decomposition demonstrating this separation and add a short numerical illustration showing how changes in the regularization affect dimension scores but leave the aggregate ranking stable when correlations are held fixed. This revision will reinforce the discriminatory-power claim. revision: yes

  3. Referee: Case-study section: the twelve-hospital, 24-month Ontario dataset is presented without sensitivity analysis on the regularization parameter value or explicit exclusion rules for inputs/outputs, so it is impossible to determine whether the observed advantages are general or artifacts of this specific data structure and parameter choice.

    Authors: We acknowledge that the absence of sensitivity checks limits the interpretability of the case-study results. In the revision, we will add a dedicated sensitivity subsection that varies the regularization parameter over the interval [0.001, 0.5] and reports the resulting changes in aggregate efficiency scores, dimension rankings, and correlation capture metrics for the twelve hospitals. We will also explicitly document the input/output selection criteria, including completeness thresholds, theoretical alignment with the technical, clinical, and patient-experience dimensions, and checks for multicollinearity. These additions will allow readers to assess whether the reported advantages are robust or data-specific. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation or empirical claims

full rationale

The paper defines two regularized DEA models (SBM and GP-SBM) that incorporate a regularization parameter as an explicit modeling choice to improve discrimination and handle desirable/undesirable outputs in large-scale settings. Dimension-specific and aggregate scores are generated endogenously within each model formulation, then applied to multiple datasets including the Ontario hospitals case to compute empirical metrics such as input/output correlations and outperformance versus separate-then-aggregate benchmarks. No quoted equations or steps reduce the reported aggregate scores, correlation capture, or superiority claims to the regularization parameter by construction; the parameter is not fitted to the target performance metrics in a manner that forces the results. The derivation chain remains self-contained with independent model definitions and external data comparisons, yielding no self-definitional, fitted-input, or self-citation reductions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on standard DEA assumptions plus one free parameter; no new entities are postulated.

free parameters (1)
  • regularization parameter
    Used in both SBM and GP-SBM to enhance discriminatory power and integrate desirable/undesirable outputs
axioms (1)
  • domain assumption Standard DEA assumptions of convexity and free disposability of inputs and outputs
    Implicit in the use of slack-based measure and goal-programming formulations

pith-pipeline@v0.9.0 · 5748 in / 1186 out tokens · 41850 ms · 2026-05-18T04:05:24.227481+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.