Endogenous Aggregation of Multiple Data Envelopment Analysis Scores for Large Data Sets
Pith reviewed 2026-05-18 04:05 UTC · model grok-4.3
The pith
Regularized DEA models aggregate efficiency scores across dimensions while capturing correlations among variables.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors develop two regularized DEA models for endogenous aggregation of efficiency scores in multi-dimensional settings. The SBM model computes an aggregate efficiency score and allocates it across dimensions, while the GP-SBM first derives dimension-specific scores and then forms an aggregate. Both integrate desirable and undesirable outputs and use regularization to enhance discrimination, demonstrating computational efficiency for large problems and superior correlation capture in hospital performance data.
What carries the argument
Regularized slack-based measure (SBM) and linearized goal programming SBM (GP-SBM) models that perform endogenous aggregation of dimension-specific and aggregate efficiency scores.
Load-bearing premise
The reported outperformance and correlation capture depend on the chosen regularization parameter and the specific structure of the twelve-hospital Ontario dataset over 24 months.
What would settle it
Observing whether the SBM and GP-SBM models continue to outperform conventional separate-then-aggregate methods when tested on different datasets or with alternative regularization parameters.
read the original abstract
We propose an approach for dynamic efficiency evaluation across multiple organizational dimensions using data envelopment analysis (DEA). The method generates both dimension-specific and aggregate efficiency scores, incorporates desirable and undesirable outputs, and is suitable for large-scale problem settings. Two regularized DEA models are introduced: a slack-based measure (SBM) and a linearized version of a nonlinear goal programming model (GP-SBM). While SBM estimates an aggregate efficiency score and then distributes it across dimensions, GP-SBM first estimates dimension-level efficiencies and then derives an aggregate score. Both models utilize a regularization parameter to enhance discriminatory power while also directly integrating both desirable and undesirable outputs. We demonstrate the computational efficiency and validity of our approach on multiple datasets and apply it to a case study of twelve hospitals in Ontario, Canada, evaluating three theoretically grounded dimensions of organizational effectiveness over a 24-month period from January 2018 to December 2019: technical efficiency, clinical efficiency, and patient experience. Our numerical results show that SBM and GP-SBM better capture correlations among input/output variables and outperform conventional benchmarking methods that separately evaluate dimensions before aggregation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes two regularized DEA models (SBM and GP-SBM) for endogenous aggregation of multi-dimensional efficiency scores in large-scale settings. SBM computes an aggregate score first then distributes it, while GP-SBM estimates dimension-level efficiencies before aggregation; both incorporate desirable/undesirable outputs via a regularization parameter for improved discrimination. The central claim is that these models better capture correlations among inputs/outputs and outperform conventional separate-then-aggregate benchmarking methods, supported by results on multiple datasets and a case study of twelve Ontario hospitals over 24 months across technical, clinical, and patient-experience dimensions.
Significance. If the outperformance and correlation-capture claims hold after addressing validation gaps, the work would offer a computationally efficient, integrated alternative to post-hoc aggregation in multi-dimensional DEA, with direct relevance to large-scale applications such as healthcare benchmarking.
major comments (3)
- Numerical results section: the reported superior correlation capture and outperformance on the Ontario hospital dataset and multiple datasets provide no error bars, statistical significance tests, or cross-validation details, leaving the central claim without quantitative support for robustness.
- GP-SBM model: the aggregate score is derived after dimension-level efficiencies are estimated using the regularization parameter; it is unclear whether the final aggregates remain independent of the parameter-fitting process or reduce to quantities defined by that fit, directly affecting the discriminatory-power claim.
- Case-study section: the twelve-hospital, 24-month Ontario dataset is presented without sensitivity analysis on the regularization parameter value or explicit exclusion rules for inputs/outputs, so it is impossible to determine whether the observed advantages are general or artifacts of this specific data structure and parameter choice.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments on our manuscript. We have carefully considered each major comment and provide point-by-point responses below, indicating where revisions will be made to strengthen the paper.
read point-by-point responses
-
Referee: Numerical results section: the reported superior correlation capture and outperformance on the Ontario hospital dataset and multiple datasets provide no error bars, statistical significance tests, or cross-validation details, leaving the central claim without quantitative support for robustness.
Authors: We agree that additional quantitative support for robustness would strengthen the central claims. In the revised manuscript, we will incorporate bootstrap resampling (with 1000 replications) to compute standard errors and error bars for the reported efficiency scores and correlation metrics across all datasets. We will also add Wilcoxon signed-rank tests to assess the statistical significance of outperformance relative to conventional separate-then-aggregate benchmarks. For cross-validation, we will include results from a DEA-adapted k-fold procedure (k=5) that holds out subsets of DMUs to evaluate predictive stability of the aggregate scores. These additions directly address the concern while remaining consistent with the non-parametric nature of DEA. revision: yes
-
Referee: GP-SBM model: the aggregate score is derived after dimension-level efficiencies are estimated using the regularization parameter; it is unclear whether the final aggregates remain independent of the parameter-fitting process or reduce to quantities defined by that fit, directly affecting the discriminatory-power claim.
Authors: We thank the referee for highlighting this potential ambiguity. In the GP-SBM formulation, the regularization parameter enters only the dimension-level efficiency estimation step to penalize excessive slacks and improve discrimination. The subsequent aggregation step computes a weighted sum of the dimension-level scores, where the weights are derived from an auxiliary correlation matrix estimated independently of the regularization. Consequently, the aggregate score is not a direct function of the regularization parameter alone. To eliminate any confusion, we will revise the model section to include an explicit algebraic decomposition demonstrating this separation and add a short numerical illustration showing how changes in the regularization affect dimension scores but leave the aggregate ranking stable when correlations are held fixed. This revision will reinforce the discriminatory-power claim. revision: yes
-
Referee: Case-study section: the twelve-hospital, 24-month Ontario dataset is presented without sensitivity analysis on the regularization parameter value or explicit exclusion rules for inputs/outputs, so it is impossible to determine whether the observed advantages are general or artifacts of this specific data structure and parameter choice.
Authors: We acknowledge that the absence of sensitivity checks limits the interpretability of the case-study results. In the revision, we will add a dedicated sensitivity subsection that varies the regularization parameter over the interval [0.001, 0.5] and reports the resulting changes in aggregate efficiency scores, dimension rankings, and correlation capture metrics for the twelve hospitals. We will also explicitly document the input/output selection criteria, including completeness thresholds, theoretical alignment with the technical, clinical, and patient-experience dimensions, and checks for multicollinearity. These additions will allow readers to assess whether the reported advantages are robust or data-specific. revision: yes
Circularity Check
No significant circularity in derivation or empirical claims
full rationale
The paper defines two regularized DEA models (SBM and GP-SBM) that incorporate a regularization parameter as an explicit modeling choice to improve discrimination and handle desirable/undesirable outputs in large-scale settings. Dimension-specific and aggregate scores are generated endogenously within each model formulation, then applied to multiple datasets including the Ontario hospitals case to compute empirical metrics such as input/output correlations and outperformance versus separate-then-aggregate benchmarks. No quoted equations or steps reduce the reported aggregate scores, correlation capture, or superiority claims to the regularization parameter by construction; the parameter is not fitted to the target performance metrics in a manner that forces the results. The derivation chain remains self-contained with independent model definitions and external data comparisons, yielding no self-definitional, fitted-input, or self-citation reductions.
Axiom & Free-Parameter Ledger
free parameters (1)
- regularization parameter
axioms (1)
- domain assumption Standard DEA assumptions of convexity and free disposability of inputs and outputs
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.