One Size Fits None: Heuristic Collapse in LLM Investment Advice
Pith reviewed 2026-05-08 06:20 UTC · model grok-4.3
The pith
LLMs reduce investment advice to a client's self-reported risk tolerance while largely ignoring other personal details.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Frontier LLMs exhibit heuristic collapse in investment advice: allocation decisions are largely determined by self-reported risk tolerance, while other relevant client factors contribute minimally. Web search partially attenuates heuristic collapse but does not resolve it.
What carries the argument
Heuristic collapse, the systematic reduction of complex multi-factor decisions to a small number of dominant inputs such as risk tolerance.
Load-bearing premise
The surrogate models accurately recover the true decision process of the black-box LLMs and the tested client factors represent the full set of legally relevant circumstances.
What would settle it
Observing that changes to client factors other than risk tolerance produce large shifts in the LLM's recommended allocations would falsify the claim of dominant reliance on risk tolerance.
Figures
read the original abstract
Large language models are increasingly deployed as advisors in high-stakes domains -- answering medical questions, interpreting legal documents, recommending financial products -- where good advice requires integrating a user's full context rather than responding to salient surface features. We investigate whether frontier LLMs actually do this, or whether they instead exhibit heuristic collapse: a systematic reduction of complex, multi-factor decisions to a small number of dominant inputs. We study the phenomenon in investment advice, where legal standards explicitly require individualized reasoning over a client's full circumstances. Applying interpretable surrogate models to LLM outputs, we find systematic heuristic collapse: investment allocation decisions are largely determined by self-reported risk tolerance, while other relevant factors contribute minimally. We further find that web search partially attenuates heuristic collapse but does not resolve it. These findings suggest that heuristic collapse is not resolved by web search augmentation or model scale alone, and that deploying LLMs as advisors requires auditing input sensitivity, not just output quality.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that frontier LLMs exhibit 'heuristic collapse' when giving investment advice: despite legal requirements for individualized reasoning over a client's full circumstances, allocation decisions are largely determined by self-reported risk tolerance while other factors contribute minimally. Interpretable surrogate models fitted to LLM outputs are used to demonstrate this dominance; web search is shown to partially attenuate but not eliminate the collapse. The work concludes that auditing input sensitivity (rather than output quality alone) is required for safe deployment of LLMs as advisors.
Significance. If the surrogate-based attribution is robust, the result would be significant for AI deployment in regulated domains. It supplies concrete evidence that neither scale nor retrieval augmentation suffices to enforce multi-factor integration, directly relevant to fiduciary standards in finance and analogous requirements in medicine or law. The emphasis on input-sensitivity auditing offers a practical, testable criterion beyond standard benchmark accuracy.
major comments (3)
- [Methods] Methods section: the description of the surrogate models provides no details on model family, training procedure, prompt templates used to elicit LLM outputs, the exact set of client factors tested, statistical controls for multicollinearity, or any fidelity metric (e.g., surrogate R² or agreement with held-out LLM decisions). Without these, the central attribution of dominance to risk tolerance cannot be evaluated and may be an artifact of surrogate misspecification.
- [Results] Results section: the claim that 'other relevant factors contribute minimally' is presented without quantitative support such as feature-importance rankings, partial-dependence plots, or ablation experiments that isolate the marginal effect of removing risk tolerance versus other variables. The reported dominance therefore lacks a clear effect-size anchor.
- [§4] §4 (web-search experiments): the statement that web search 'partially attenuates' collapse is not accompanied by a direct comparison of surrogate coefficients or decision boundaries with and without retrieval, making it impossible to quantify the mitigation or to rule out that the remaining collapse is still driven by the same surrogate artifacts.
minor comments (2)
- [Abstract] The term 'heuristic collapse' is introduced in the abstract and title without a concise formal definition or pointer to the precise operationalization used in the surrogate analysis.
- [Figures/Tables] Table or figure captions that display surrogate coefficients should explicitly state the units and the baseline against which 'minimal contribution' is judged.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to improve methodological transparency and the quantitative presentation of results.
read point-by-point responses
-
Referee: [Methods] Methods section: the description of the surrogate models provides no details on model family, training procedure, prompt templates used to elicit LLM outputs, the exact set of client factors tested, statistical controls for multicollinearity, or any fidelity metric (e.g., surrogate R² or agreement with held-out LLM decisions). Without these, the central attribution of dominance to risk tolerance cannot be evaluated and may be an artifact of surrogate misspecification.
Authors: We agree that the current Methods description is insufficiently detailed for independent evaluation. In the revised manuscript we will add a dedicated subsection specifying the surrogate model families (logistic regression and gradient-boosted trees), the exact training procedure and hyper-parameters, the full prompt templates used to elicit LLM allocations, the complete list of client factors, multicollinearity diagnostics (VIF thresholds), and fidelity metrics (R² on held-out LLM decisions together with decision-agreement rates). These additions will allow readers to assess whether the observed dominance of risk tolerance is robust to surrogate specification. revision: yes
-
Referee: [Results] Results section: the claim that 'other relevant factors contribute minimally' is presented without quantitative support such as feature-importance rankings, partial-dependence plots, or ablation experiments that isolate the marginal effect of removing risk tolerance versus other variables. The reported dominance therefore lacks a clear effect-size anchor.
Authors: We accept that the Results section would be strengthened by explicit quantitative anchors. The revision will include (i) normalized feature-importance rankings across all surrogate models, (ii) partial-dependence plots for each client factor, and (iii) ablation results that report the change in surrogate fidelity and predicted allocations when risk tolerance is removed versus when other factors are removed. These additions will supply concrete effect-size evidence for the minimal contribution claim. revision: yes
-
Referee: [§4] §4 (web-search experiments): the statement that web search 'partially attenuates' collapse is not accompanied by a direct comparison of surrogate coefficients or decision boundaries with and without retrieval, making it impossible to quantify the mitigation or to rule out that the remaining collapse is still driven by the same surrogate artifacts.
Authors: We will expand §4 with explicit side-by-side tables and figures comparing surrogate coefficients, feature importances, and decision-boundary visualizations between the retrieval-augmented and baseline conditions. This will quantify the degree of attenuation and allow direct inspection of whether the residual dominance of risk tolerance persists after retrieval. revision: yes
Circularity Check
No circularity: empirical measurement of surrogate attributions on LLM outputs
full rationale
The paper conducts an empirical analysis by feeding client profiles to LLMs, collecting allocation outputs, and fitting interpretable surrogate models to attribute importance to input factors such as risk tolerance. The central claim follows directly from the observed surrogate coefficients or feature importances on the generated data; it is not obtained by redefining any quantity in terms of itself, renaming a fitted parameter as a prediction, or invoking a self-citation chain whose validity depends on the present result. No equations or derivations appear that reduce the reported dominance of risk tolerance to an input assumption. The study is therefore a measurement whose validity rests on surrogate fidelity and experimental design rather than on any self-referential construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Interpretable surrogate models accurately recover the decision logic of frontier LLMs
invented entities (1)
-
heuristic collapse
no independent evidence
Reference graph
Works this paper leans on
-
[1]
High-Yield Savings Account
-
[2]
Short-Term Treasury Bonds
-
[3]
Long-Term Treasury Bonds
-
[4]
Investment-Grade Corporate Bonds
-
[5]
High-Yield Corporate Bonds
-
[6]
Index Funds - S&P 500
-
[7]
Index Funds - Total Stock Market
-
[8]
Index Funds - International Stocks
-
[9]
Index Funds - Small Cap Stocks
-
[10]
Target-Date Retirement Funds
-
[11]
Real Estate Investment Trusts (REITs)
-
[12]
Dividend Growth Stocks
-
[13]
529 College Savings Plan
-
[14]
Health Savings Account (HSA)
-
[15]
Use it to inform your recommendations with up-to-date information
Certificates of Deposit (CDs) Prompt: Addendum for Tool Required Condition You have access to a web_search tool to look up current financial information, market data, interest rates, and investment research. Use it to inform your recommendations with up-to-date information. You MUST use web_search to look up: (1) current interest rates, (2) recent market ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.