Sparse Unit-Sum Regression
Pith reviewed 2026-05-24 23:42 UTC · model grok-4.3
The pith
A mix of l0 and l1 penalties produces sparser solutions than l1 alone for linear regression with unit-sum weights.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We consider sparsity in linear regression under the restriction that the regression weights sum to one. We propose an approach that combines ℓ0- and ℓ1-regularization. We compute its solution by adapting a recent methodological innovation made by Bertsimas et al. (2016) for ℓ0-regularization in standard linear regression. In a simulation experiment we compare our approach to ℓ0-regularization and ℓ1-regularization and find that it performs favorably in terms of predictive performance and sparsity. In an application to index tracking we show that our approach can obtain substantially sparser portfolios compared to ℓ1-regularization while maintaining a similar tracking performance.
What carries the argument
Combined ℓ0-ℓ1 regularization under the added linear constraint that weights sum to one, solved by adapting the mixed-integer quadratic programming method of Bertsimas et al. (2016).
If this is right
- The method yields regression models with fewer active coefficients while retaining predictive accuracy under the unit-sum requirement.
- In portfolio construction it selects fewer assets than l1 regularization for comparable index tracking.
- It provides a practical compromise between the exact sparsity of pure l0 and the easier optimization of pure l1 when the sum constraint is present.
Where Pith is reading between the lines
- The same adaptation could be tested on other linear equality constraints that appear in compositional or probability-simplex regression.
- Advances in mixed-integer solvers would directly widen the range of dimensions where unit-sum sparse regression becomes routine.
- If the performance edge holds on new data sets, practitioners facing sum-to-one constraints may prefer the mixed penalty over l1 alone.
Load-bearing premise
The adaptation of the Bertsimas et al. mixed-integer solver remains effective and computationally tractable after the sum-to-one constraint is imposed.
What would settle it
Re-running the index-tracking application and finding that the combined method either selects as many assets as l1 regularization or produces materially worse tracking error.
read the original abstract
This paper considers sparsity in linear regression under the restriction that the regression weights sum to one. We propose an approach that combines $\ell_0$- and $\ell_1$-regularization. We compute its solution by adapting a recent methodological innovation made by Bertsimas et al. (2016) for $\ell_0$-regularization in standard linear regression. In a simulation experiment we compare our approach to $\ell_0$-regularization and $\ell_1$-regularization and find that it performs favorably in terms of predictive performance and sparsity. In an application to index tracking we show that our approach can obtain substantially sparser portfolios compared to $\ell_1$-regularization while maintaining a similar tracking performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes combining ℓ0- and ℓ1-regularization for linear regression subject to the unit-sum constraint on coefficients, solved via an adaptation of the Bertsimas et al. (2016) mixed-integer program. It reports that the method performs favorably versus pure ℓ0- and ℓ1-regularization in simulations on predictive performance and sparsity, and that in an index-tracking application it yields substantially sparser portfolios than ℓ1-regularization while preserving similar tracking error.
Significance. If the computational claims hold, the approach would supply a usable tool for sparse regression under linear equality constraints, with direct relevance to portfolio construction. The explicit adaptation of an existing MIP solver is a methodological strength when accompanied by evidence that the added equality does not materially degrade tractability.
major comments (3)
- [Abstract / Application] Abstract and application section: the central claim that the method obtains substantially sparser portfolios while maintaining similar tracking performance is presented without quantitative values for sparsity levels, tracking-error differences, number of assets, or MIP solve statistics (optimality gaps, run times, or node counts).
- [Method / Application] Method and application sections: the adaptation of Bertsimas et al. (2016) inserts the linear equality 1^T β = 1 directly into the MIP, yet no analysis or numerical evidence is supplied on how this coupling affects the tightness of the continuous relaxation or the number of branch-and-bound nodes explored at the problem sizes used in the index-tracking example.
- [Simulation experiment] Simulation experiment: the claim of favorable performance versus ℓ0- and ℓ1-regularization lacks any description of experimental design, number of replications, error bars, data-generation process, or exact quantitative comparisons, so the support for the performance claim cannot be evaluated.
minor comments (2)
- [Method] Notation for the combined penalty and the precise form of the MIP objective after adding the sum-to-one constraint should be written explicitly rather than described only in prose.
- [Abstract] The abstract would benefit from one or two concrete numerical results (e.g., average sparsity or tracking-error values) to substantiate the qualitative claims.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We agree that the manuscript requires additional quantitative details and descriptions to support its claims, and we will revise accordingly.
read point-by-point responses
-
Referee: [Abstract / Application] Abstract and application section: the central claim that the method obtains substantially sparser portfolios while maintaining similar tracking performance is presented without quantitative values for sparsity levels, tracking-error differences, number of assets, or MIP solve statistics (optimality gaps, run times, or node counts).
Authors: We agree that specific quantitative values are needed to substantiate the claims. In the revised manuscript we will report the exact sparsity levels achieved, tracking-error values and differences, number of assets, and MIP solve statistics (run times, optimality gaps, node counts) for the index-tracking example. revision: yes
-
Referee: [Method / Application] Method and application sections: the adaptation of Bertsimas et al. (2016) inserts the linear equality 1^T β = 1 directly into the MIP, yet no analysis or numerical evidence is supplied on how this coupling affects the tightness of the continuous relaxation or the number of branch-and-bound nodes explored at the problem sizes used in the index-tracking example.
Authors: We acknowledge the absence of analysis on the effect of the unit-sum constraint on the MIP relaxation and branching. In revision we will add numerical evidence from the index-tracking instances, including reported solve times, gaps, and (where feasible) comparisons of relaxation bounds or node counts with and without the equality constraint. revision: yes
-
Referee: [Simulation experiment] Simulation experiment: the claim of favorable performance versus ℓ0- and ℓ1-regularization lacks any description of experimental design, number of replications, error bars, data-generation process, or exact quantitative comparisons, so the support for the performance claim cannot be evaluated.
Authors: We agree that the simulation section is insufficiently documented. The revised manuscript will include a complete description of the data-generation process, number of replications, performance metrics with error bars or standard errors, and tables of exact quantitative comparisons against the ℓ0 and ℓ1 baselines. revision: yes
Circularity Check
No circularity: adaptation of external Bertsimas MIP with empirical validation
full rationale
The paper explicitly frames its core contribution as an adaptation of the Bertsimas et al. (2016) mixed-integer programming approach for l0-regularization, extended by the linear equality constraint 1^T beta = 1. Performance claims rest on simulation experiments comparing predictive performance and sparsity, plus an index-tracking application showing sparser portfolios with comparable tracking error. These are external empirical benchmarks, not quantities defined by the model's own fitted parameters or self-citations. No self-definitional steps, fitted-input predictions, or load-bearing self-citation chains appear. The derivation is therefore self-contained against external references and data.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.