arxiv: 2604.17166 · v1 · submitted 2026-04-18 · 💱 q-fin.GN · cs.LG· econ.EM· q-fin.PM· q-fin.PR

Recognition: unknown

The Virtue of Sparsity in Complexity

Nima Afsharhajari , Jonathan Yu-Meng Li

Authors on Pith no claims yet

Pith reviewed 2026-05-10 06:23 UTC · model grok-4.3

classification 💱 q-fin.GN cs.LGecon.EMq-fin.PMq-fin.PR

keywords asset pricingsparsitycomplexitybasis pursuitnonlinear featurespriced riskshigh-dimensional financeportfolio performance

0 comments

The pith

Expanding candidate feature spaces lets asset pricing recover sparser priced risks and improve portfolio performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that complexity and sparsity are not opposites in asset pricing but complements. Larger sets of possible features give models more room to identify a small number of genuine priced risks. When nonlinear expansions are paired with basis pursuit, the resulting portfolios outperform ridgeless benchmarks once a complexity threshold is crossed. The performance gain comes from better recovery of a sparse risk structure rather than from keeping more factors. This reframes the role of high-dimensional methods as tools for uncovering parsimony.

Core claim

Distinguishing capacity sparsity (the size of the candidate feature space) from factor sparsity (the parsimonious set of priced risks) shows that the former enables the latter. Revisiting and extending prior empirical designs, nonlinear feature expansions combined with basis pursuit produce portfolios whose out-of-sample returns dominate those from ridgeless methods beyond a critical complexity level. The gains arise because the enlarged space allows reliable identification of the sparse priced-risk structure.

What carries the argument

The distinction between capacity sparsity (dimensionality of the candidate feature space) and factor sparsity (parsimonious structure of priced risks), with basis pursuit serving as the recovery mechanism that selects the sparse priced factors from the expanded space.

If this is right

Out-of-sample portfolio performance improves once the feature space exceeds a critical size.
The improvement stems from identifying fewer but more accurately priced risks rather than retaining additional factors.
Nonlinear expansions become useful precisely because they enlarge the pool from which a sparse solution can be selected.
Prior empirical designs that stopped at lower complexity may have underestimated the value of sparsity-promoting methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same logic could be tested on non-portfolio tasks such as volatility forecasting or credit-risk modeling.
If the complementarity holds, regularization choices in high-dimensional finance should prioritize methods that explicitly promote factor sparsity inside large candidate spaces.
Repeating the exercise on different asset classes or with alternative nonlinear bases would clarify how general the threshold effect is.

Load-bearing premise

That basis pursuit applied to nonlinearly expanded features will recover the true sparse structure of priced risks without introducing selection artifacts or invalidating out-of-sample comparisons.

What would settle it

A replication in which, past the stated complexity threshold, the basis-pursuit portfolios on nonlinear features show no out-of-sample advantage over ridgeless benchmarks or recover a factor set that is not demonstrably sparser.

Figures

Figures reproduced from arXiv: 2604.17166 by Jonathan Yu-Meng Li, Nima Afsharhajari.

**Figure 2.** Figure 2: Virtue-of-complexity curves for the sparse (basis pursuit) and dense (ridgeless) [PITH_FULL_IMAGE:figures/full_fig_p022_2.png] view at source ↗

**Figure 3.** Figure 3: Distributional performance of the sparse (basis pursuit) and dense (ridgeless) SDFs [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗

**Figure 4.** Figure 4: Monthly realized SDF portfolio returns for the sparse SDF (green) and the dense [PITH_FULL_IMAGE:figures/full_fig_p025_4.png] view at source ↗

**Figure 5.** Figure 5: Certainty-equivalent (CE) returns for CRRA investors at three levels of risk aver [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗

**Figure 6.** Figure 6: Number of nonzero coefficients selected by the sparse SDF as a function of com [PITH_FULL_IMAGE:figures/full_fig_p027_6.png] view at source ↗

**Figure 7.** Figure 7: Low-complexity counterpart of Figure [PITH_FULL_IMAGE:figures/full_fig_p031_7.png] view at source ↗

**Figure 8.** Figure 8: Low-complexity counterpart of Figure [PITH_FULL_IMAGE:figures/full_fig_p032_8.png] view at source ↗

**Figure 9.** Figure 9: Low-complexity counterpart of Figure [PITH_FULL_IMAGE:figures/full_fig_p033_9.png] view at source ↗

read the original abstract

Sparsity or complexity? In modern high-dimensional asset pricing, these are often viewed as competing principles: richer feature spaces appear to favor complexity, while economic intuition has long favored parsimony. We show that this tension is misplaced. We distinguish capacity sparsity-the dimensionality of the candidate feature space-from factor sparsity-the parsimonious structure of priced risks-and argue that the two are complements: expanding capacity enables the discovery of factor sparsity. Revisiting the benchmark empirical design of Didisheim et al. (2025) and pushing it to higher complexity regimes, we show that nonlinear feature expansions combined with basis pursuit yield portfolios whose out-of-sample performance dominates ridgeless benchmarks beyond a critical complexity threshold. The evidence shows that the gains from complexity arise not from retaining more factors, but from enlarging the space from which a sparse structure of priced risks can be identified. The virtue of complexity in asset pricing operates through factor sparsity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper splits capacity sparsity from factor sparsity to claim that richer nonlinear features help recover sparser priced risks, but the abstract leaves too many implementation details out to judge whether the out-of-sample gains are real.

read the letter

The main takeaway is that the authors separate the size of the candidate feature space (capacity sparsity) from the number of priced factors actually selected (factor sparsity) and argue these two are complements rather than trade-offs. They extend the Didisheim et al. benchmark into nonlinear feature expansions, apply basis pursuit, and report that out-of-sample portfolio performance beats ridgeless estimators once complexity crosses a threshold. The claimed mechanism is that the larger space simply makes it easier to identify a stable sparse set of priced risks rather than retaining more factors overall. That framing is new enough on the cited benchmark to be worth noting, and the abstract presents the logic in plain terms without overclaiming.

Referee Report

3 major / 2 minor

Summary. The paper distinguishes capacity sparsity (the dimensionality of the candidate feature space) from factor sparsity (the parsimonious structure of priced risks) and argues they are complements rather than substitutes. Extending the empirical design of Didisheim et al. (2025) to higher-complexity regimes, it claims that nonlinear feature expansions combined with basis pursuit produce portfolios whose out-of-sample performance dominates ridgeless benchmarks beyond a critical complexity threshold; the gains are attributed to enlarging the space from which a sparse priced-risk structure can be identified rather than to retaining more factors.

Significance. If the central empirical claim is robust, the paper offers a useful reconciliation of the apparent tension between complexity and parsimony in high-dimensional asset pricing. The capacity-versus-factor sparsity distinction is conceptually clean and could inform future work on feature engineering and sparse recovery in portfolio construction. The extension of an existing benchmark design to nonlinear regimes is a natural next step, though its value hinges on the strength of the supporting evidence.

major comments (3)

The abstract and introduction report out-of-sample dominance of the basis-pursuit portfolios but supply no details on the exact sample periods, cross-validation scheme, multiple-testing corrections, or the precise basis-pursuit implementation (e.g., regularization path, stopping rule, or handling of the expanded nonlinear design matrix). Without these, the central empirical claim cannot be evaluated and the reported superiority remains unverifiable.
The performance comparison to ridgeless benchmarks is defined relative to a prior benchmark whose parameters and design choices are not shown to be independent of the current fitting procedure in the expanded feature space. This raises a circularity concern: the apparent prediction of superiority may reduce to re-fitting within the same enlarged space rather than a genuine out-of-sample test of the capacity-factor complementarity.
The claim that basis pursuit reliably recovers a stable sparse structure of priced risks from the nonlinear expansion assumes that the design matrix satisfies recovery conditions (e.g., the irrepresentable condition for lasso-type estimators). High-dimensional polynomial and interaction terms are likely to produce highly correlated columns, violating these conditions in finite samples and exposing the OOS gains to post-selection bias or sample-specific artifacts rather than genuine factor-sparsity discovery.

minor comments (2)

The distinction between 'capacity sparsity' and 'factor sparsity' is introduced without a formal definition or notation; adding a brief mathematical statement (e.g., in terms of support size of the coefficient vector versus dimension of the feature map) would improve clarity.
The manuscript cites Didisheim et al. (2025) as the benchmark but does not reproduce or tabulate the exact baseline results for direct comparison; including a side-by-side table of key metrics would strengthen the extension claim.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and detailed comments on our manuscript. We address each of the major comments below, indicating the revisions we plan to make to strengthen the paper.

read point-by-point responses

Referee: The abstract and introduction report out-of-sample dominance of the basis-pursuit portfolios but supply no details on the exact sample periods, cross-validation scheme, multiple-testing corrections, or the precise basis-pursuit implementation (e.g., regularization path, stopping rule, or handling of the expanded nonlinear design matrix). Without these, the central empirical claim cannot be evaluated and the reported superiority remains unverifiable.

Authors: We agree that these implementation details are essential for evaluating and replicating the results. In the revised version of the manuscript, we will add a new subsection in the empirical section that explicitly describes the sample periods used, the cross-validation procedure for hyperparameter tuning, any multiple-testing adjustments applied, and the precise implementation of basis pursuit, including the regularization path, stopping criteria, and how the expanded nonlinear design matrix is handled. revision: yes
Referee: The performance comparison to ridgeless benchmarks is defined relative to a prior benchmark whose parameters and design choices are not shown to be independent of the current fitting procedure in the expanded feature space. This raises a circularity concern: the apparent prediction of superiority may reduce to re-fitting within the same enlarged space rather than a genuine out-of-sample test of the capacity-factor complementarity.

Authors: We appreciate this concern regarding potential circularity. However, the ridgeless benchmarks follow the exact specification from Didisheim et al. (2025) and are applied to the original feature space without access to the nonlinear expansions. Our basis-pursuit approach operates on the expanded space but the out-of-sample evaluation uses the same held-out periods for both. To address any ambiguity, we will revise the manuscript to include a clearer description of the benchmark implementation and explicitly state that the ridgeless method does not utilize the expanded features. revision: partial
Referee: The claim that basis pursuit reliably recovers a stable sparse structure of priced risks from the nonlinear expansion assumes that the design matrix satisfies recovery conditions (e.g., the irrepresentable condition for lasso-type estimators). High-dimensional polynomial and interaction terms are likely to produce highly correlated columns, violating these conditions in finite samples and exposing the OOS gains to post-selection bias or sample-specific artifacts rather than genuine factor-sparsity discovery.

Authors: This point highlights an important theoretical caveat. While we acknowledge that the irrepresentable condition is unlikely to hold exactly in the presence of highly correlated polynomial and interaction terms, our empirical evidence demonstrates consistent out-of-sample improvements that are not explained by overfitting or post-selection bias, as evidenced by the performance gains emerging only beyond a complexity threshold. We will add a discussion section addressing this limitation, including references to related literature on sparse recovery in correlated designs and additional robustness checks such as varying the polynomial degree and examining the stability of selected features across subsamples. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical extension is self-contained

full rationale

The paper extends the benchmark empirical design of Didisheim et al. (2025) by incorporating nonlinear feature expansions and applying basis pursuit, then reports out-of-sample portfolio performance that exceeds ridgeless benchmarks past a complexity threshold. This is presented as an empirical demonstration that capacity expansion enables discovery of factor sparsity, without any derivation, equation, or parameter fit that reduces by construction to the inputs. The abstract and description contain no self-definitional loops, fitted quantities renamed as predictions, or load-bearing self-citations whose content is unverified. The central claim rests on out-of-sample validation against an external benchmark design, which qualifies as independent evidence under the evaluation rules. No steps meet the criteria for quoting a specific reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The argument rests on the untested extension of a prior empirical design and on the new distinction between capacity and factor sparsity; no explicit free parameters are listed in the abstract.

axioms (2)

domain assumption Out-of-sample portfolio performance is the appropriate criterion for judging asset-pricing models.
Implicit in the comparison to ridgeless benchmarks.
domain assumption Basis pursuit applied to nonlinear feature expansions recovers the economically relevant sparse risk structure.
Central to the claim that gains arise from factor sparsity.

invented entities (2)

capacity sparsity no independent evidence
purpose: Dimensionality of the candidate feature space
New term introduced to frame the complementarity argument.
factor sparsity no independent evidence
purpose: Parsimonious structure of priced risks
Contrasted with capacity sparsity to explain where complexity adds value.

pith-pipeline@v0.9.0 · 5465 in / 1398 out tokens · 46241 ms · 2026-05-10T06:23:05.621776+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 1 canonical work pages

[1]

Long, Gabor Lugosi, and Alexander Tsigler, 2020, Benign over- fitting in linear regression,Proceedings of the National Academy of Sciences117, 30063– 30070

Bartlett, Peter L., Philip M. Long, Gabor Lugosi, and Alexander Tsigler, 2020, Benign over- fitting in linear regression,Proceedings of the National Academy of Sciences117, 30063– 30070. Chen, Andrew Y., and Tom Zimmermann, 2022, Open source cross-sectional asset pricing, Critical Finance Review11, 207–264. Chen, Luyang, Markus Pelger, and Jason Zhu, 2024...

2020
[2]

Jensen, Theis Ingerslev, Bryan T

Hou, Kewei, Chen Xue, and Lu Zhang, 2020, Replicating anomalies,The Review of Financial Studies33, 2019–2133. Jensen, Theis Ingerslev, Bryan T. Kelly, and Lasse Heje Pedersen, 2023, Is there a replication crisis in finance?,The Journal of Finance78, 2465–2518. Kelly, Bryan T., and Semyon Malamud, 2025, Understanding the virtue of complexity, Work- ing pap...

2020
[3]

Wilson, Andrew Gordon, 2025, Deep learning is not so mysterious or different, arXiv preprint arXiv:2503.02113 [cs.LG]. 34

work page arXiv 2025