Recognition: unknown
The Virtue of Sparsity in Complexity
Pith reviewed 2026-05-10 06:23 UTC · model grok-4.3
The pith
Expanding candidate feature spaces lets asset pricing recover sparser priced risks and improve portfolio performance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Distinguishing capacity sparsity (the size of the candidate feature space) from factor sparsity (the parsimonious set of priced risks) shows that the former enables the latter. Revisiting and extending prior empirical designs, nonlinear feature expansions combined with basis pursuit produce portfolios whose out-of-sample returns dominate those from ridgeless methods beyond a critical complexity level. The gains arise because the enlarged space allows reliable identification of the sparse priced-risk structure.
What carries the argument
The distinction between capacity sparsity (dimensionality of the candidate feature space) and factor sparsity (parsimonious structure of priced risks), with basis pursuit serving as the recovery mechanism that selects the sparse priced factors from the expanded space.
If this is right
- Out-of-sample portfolio performance improves once the feature space exceeds a critical size.
- The improvement stems from identifying fewer but more accurately priced risks rather than retaining additional factors.
- Nonlinear expansions become useful precisely because they enlarge the pool from which a sparse solution can be selected.
- Prior empirical designs that stopped at lower complexity may have underestimated the value of sparsity-promoting methods.
Where Pith is reading between the lines
- The same logic could be tested on non-portfolio tasks such as volatility forecasting or credit-risk modeling.
- If the complementarity holds, regularization choices in high-dimensional finance should prioritize methods that explicitly promote factor sparsity inside large candidate spaces.
- Repeating the exercise on different asset classes or with alternative nonlinear bases would clarify how general the threshold effect is.
Load-bearing premise
That basis pursuit applied to nonlinearly expanded features will recover the true sparse structure of priced risks without introducing selection artifacts or invalidating out-of-sample comparisons.
What would settle it
A replication in which, past the stated complexity threshold, the basis-pursuit portfolios on nonlinear features show no out-of-sample advantage over ridgeless benchmarks or recover a factor set that is not demonstrably sparser.
Figures
read the original abstract
Sparsity or complexity? In modern high-dimensional asset pricing, these are often viewed as competing principles: richer feature spaces appear to favor complexity, while economic intuition has long favored parsimony. We show that this tension is misplaced. We distinguish capacity sparsity-the dimensionality of the candidate feature space-from factor sparsity-the parsimonious structure of priced risks-and argue that the two are complements: expanding capacity enables the discovery of factor sparsity. Revisiting the benchmark empirical design of Didisheim et al. (2025) and pushing it to higher complexity regimes, we show that nonlinear feature expansions combined with basis pursuit yield portfolios whose out-of-sample performance dominates ridgeless benchmarks beyond a critical complexity threshold. The evidence shows that the gains from complexity arise not from retaining more factors, but from enlarging the space from which a sparse structure of priced risks can be identified. The virtue of complexity in asset pricing operates through factor sparsity.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper distinguishes capacity sparsity (the dimensionality of the candidate feature space) from factor sparsity (the parsimonious structure of priced risks) and argues they are complements rather than substitutes. Extending the empirical design of Didisheim et al. (2025) to higher-complexity regimes, it claims that nonlinear feature expansions combined with basis pursuit produce portfolios whose out-of-sample performance dominates ridgeless benchmarks beyond a critical complexity threshold; the gains are attributed to enlarging the space from which a sparse priced-risk structure can be identified rather than to retaining more factors.
Significance. If the central empirical claim is robust, the paper offers a useful reconciliation of the apparent tension between complexity and parsimony in high-dimensional asset pricing. The capacity-versus-factor sparsity distinction is conceptually clean and could inform future work on feature engineering and sparse recovery in portfolio construction. The extension of an existing benchmark design to nonlinear regimes is a natural next step, though its value hinges on the strength of the supporting evidence.
major comments (3)
- The abstract and introduction report out-of-sample dominance of the basis-pursuit portfolios but supply no details on the exact sample periods, cross-validation scheme, multiple-testing corrections, or the precise basis-pursuit implementation (e.g., regularization path, stopping rule, or handling of the expanded nonlinear design matrix). Without these, the central empirical claim cannot be evaluated and the reported superiority remains unverifiable.
- The performance comparison to ridgeless benchmarks is defined relative to a prior benchmark whose parameters and design choices are not shown to be independent of the current fitting procedure in the expanded feature space. This raises a circularity concern: the apparent prediction of superiority may reduce to re-fitting within the same enlarged space rather than a genuine out-of-sample test of the capacity-factor complementarity.
- The claim that basis pursuit reliably recovers a stable sparse structure of priced risks from the nonlinear expansion assumes that the design matrix satisfies recovery conditions (e.g., the irrepresentable condition for lasso-type estimators). High-dimensional polynomial and interaction terms are likely to produce highly correlated columns, violating these conditions in finite samples and exposing the OOS gains to post-selection bias or sample-specific artifacts rather than genuine factor-sparsity discovery.
minor comments (2)
- The distinction between 'capacity sparsity' and 'factor sparsity' is introduced without a formal definition or notation; adding a brief mathematical statement (e.g., in terms of support size of the coefficient vector versus dimension of the feature map) would improve clarity.
- The manuscript cites Didisheim et al. (2025) as the benchmark but does not reproduce or tabulate the exact baseline results for direct comparison; including a side-by-side table of key metrics would strengthen the extension claim.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and detailed comments on our manuscript. We address each of the major comments below, indicating the revisions we plan to make to strengthen the paper.
read point-by-point responses
-
Referee: The abstract and introduction report out-of-sample dominance of the basis-pursuit portfolios but supply no details on the exact sample periods, cross-validation scheme, multiple-testing corrections, or the precise basis-pursuit implementation (e.g., regularization path, stopping rule, or handling of the expanded nonlinear design matrix). Without these, the central empirical claim cannot be evaluated and the reported superiority remains unverifiable.
Authors: We agree that these implementation details are essential for evaluating and replicating the results. In the revised version of the manuscript, we will add a new subsection in the empirical section that explicitly describes the sample periods used, the cross-validation procedure for hyperparameter tuning, any multiple-testing adjustments applied, and the precise implementation of basis pursuit, including the regularization path, stopping criteria, and how the expanded nonlinear design matrix is handled. revision: yes
-
Referee: The performance comparison to ridgeless benchmarks is defined relative to a prior benchmark whose parameters and design choices are not shown to be independent of the current fitting procedure in the expanded feature space. This raises a circularity concern: the apparent prediction of superiority may reduce to re-fitting within the same enlarged space rather than a genuine out-of-sample test of the capacity-factor complementarity.
Authors: We appreciate this concern regarding potential circularity. However, the ridgeless benchmarks follow the exact specification from Didisheim et al. (2025) and are applied to the original feature space without access to the nonlinear expansions. Our basis-pursuit approach operates on the expanded space but the out-of-sample evaluation uses the same held-out periods for both. To address any ambiguity, we will revise the manuscript to include a clearer description of the benchmark implementation and explicitly state that the ridgeless method does not utilize the expanded features. revision: partial
-
Referee: The claim that basis pursuit reliably recovers a stable sparse structure of priced risks from the nonlinear expansion assumes that the design matrix satisfies recovery conditions (e.g., the irrepresentable condition for lasso-type estimators). High-dimensional polynomial and interaction terms are likely to produce highly correlated columns, violating these conditions in finite samples and exposing the OOS gains to post-selection bias or sample-specific artifacts rather than genuine factor-sparsity discovery.
Authors: This point highlights an important theoretical caveat. While we acknowledge that the irrepresentable condition is unlikely to hold exactly in the presence of highly correlated polynomial and interaction terms, our empirical evidence demonstrates consistent out-of-sample improvements that are not explained by overfitting or post-selection bias, as evidenced by the performance gains emerging only beyond a complexity threshold. We will add a discussion section addressing this limitation, including references to related literature on sparse recovery in correlated designs and additional robustness checks such as varying the polynomial degree and examining the stability of selected features across subsamples. revision: partial
Circularity Check
No significant circularity; empirical extension is self-contained
full rationale
The paper extends the benchmark empirical design of Didisheim et al. (2025) by incorporating nonlinear feature expansions and applying basis pursuit, then reports out-of-sample portfolio performance that exceeds ridgeless benchmarks past a complexity threshold. This is presented as an empirical demonstration that capacity expansion enables discovery of factor sparsity, without any derivation, equation, or parameter fit that reduces by construction to the inputs. The abstract and description contain no self-definitional loops, fitted quantities renamed as predictions, or load-bearing self-citations whose content is unverified. The central claim rests on out-of-sample validation against an external benchmark design, which qualifies as independent evidence under the evaluation rules. No steps meet the criteria for quoting a specific reduction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Out-of-sample portfolio performance is the appropriate criterion for judging asset-pricing models.
- domain assumption Basis pursuit applied to nonlinear feature expansions recovers the economically relevant sparse risk structure.
invented entities (2)
-
capacity sparsity
no independent evidence
-
factor sparsity
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Long, Gabor Lugosi, and Alexander Tsigler, 2020, Benign over- fitting in linear regression,Proceedings of the National Academy of Sciences117, 30063– 30070
Bartlett, Peter L., Philip M. Long, Gabor Lugosi, and Alexander Tsigler, 2020, Benign over- fitting in linear regression,Proceedings of the National Academy of Sciences117, 30063– 30070. Chen, Andrew Y., and Tom Zimmermann, 2022, Open source cross-sectional asset pricing, Critical Finance Review11, 207–264. Chen, Luyang, Markus Pelger, and Jason Zhu, 2024...
2020
-
[2]
Jensen, Theis Ingerslev, Bryan T
Hou, Kewei, Chen Xue, and Lu Zhang, 2020, Replicating anomalies,The Review of Financial Studies33, 2019–2133. Jensen, Theis Ingerslev, Bryan T. Kelly, and Lasse Heje Pedersen, 2023, Is there a replication crisis in finance?,The Journal of Finance78, 2465–2518. Kelly, Bryan T., and Semyon Malamud, 2025, Understanding the virtue of complexity, Work- ing pap...
2020
- [3]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.