pith. sign in

arxiv: 2604.23837 · v1 · submitted 2026-04-26 · 💻 cs.CL · cs.LG

One Size Fits None: Heuristic Collapse in LLM Investment Advice

Pith reviewed 2026-05-08 06:20 UTC · model grok-4.3

classification 💻 cs.CL cs.LG
keywords heuristic collapseLLM investment advicerisk tolerancesurrogate modelsfinancial adviceAI advisorsclient contextdecision simplification
0
0 comments X

The pith

LLMs reduce investment advice to a client's self-reported risk tolerance while largely ignoring other personal details.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether large language models integrate a full range of client circumstances when providing investment recommendations or instead simplify the task by latching onto a few dominant cues. It applies surrogate models to decode LLM outputs and documents a consistent pattern in which allocation decisions depend primarily on risk tolerance, with minimal influence from factors such as age, income, time horizon, or specific goals. This matters because financial regulations require advisors to consider an individual's complete situation rather than applying one-size-fits-all rules. The authors also test whether web search augmentation reduces the problem and find only partial improvement. The results point to a need for direct checks on how models weigh different inputs before using them in advisory roles.

Core claim

Frontier LLMs exhibit heuristic collapse in investment advice: allocation decisions are largely determined by self-reported risk tolerance, while other relevant client factors contribute minimally. Web search partially attenuates heuristic collapse but does not resolve it.

What carries the argument

Heuristic collapse, the systematic reduction of complex multi-factor decisions to a small number of dominant inputs such as risk tolerance.

Load-bearing premise

The surrogate models accurately recover the true decision process of the black-box LLMs and the tested client factors represent the full set of legally relevant circumstances.

What would settle it

Observing that changes to client factors other than risk tolerance produce large shifts in the LLM's recommended allocations would falsify the claim of dominant reliance on risk tolerance.

Figures

Figures reproduced from arXiv: 2604.23837 by Andrew W. Lo, Jillian Ross.

Figure 1
Figure 1. Figure 1: Comparison of reverse-engineered surrogate regressors for LLM allocation decisions with view at source ↗
Figure 2
Figure 2. Figure 2: Citations generated by the LLM with required web search. view at source ↗
read the original abstract

Large language models are increasingly deployed as advisors in high-stakes domains -- answering medical questions, interpreting legal documents, recommending financial products -- where good advice requires integrating a user's full context rather than responding to salient surface features. We investigate whether frontier LLMs actually do this, or whether they instead exhibit heuristic collapse: a systematic reduction of complex, multi-factor decisions to a small number of dominant inputs. We study the phenomenon in investment advice, where legal standards explicitly require individualized reasoning over a client's full circumstances. Applying interpretable surrogate models to LLM outputs, we find systematic heuristic collapse: investment allocation decisions are largely determined by self-reported risk tolerance, while other relevant factors contribute minimally. We further find that web search partially attenuates heuristic collapse but does not resolve it. These findings suggest that heuristic collapse is not resolved by web search augmentation or model scale alone, and that deploying LLMs as advisors requires auditing input sensitivity, not just output quality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that frontier LLMs exhibit 'heuristic collapse' when giving investment advice: despite legal requirements for individualized reasoning over a client's full circumstances, allocation decisions are largely determined by self-reported risk tolerance while other factors contribute minimally. Interpretable surrogate models fitted to LLM outputs are used to demonstrate this dominance; web search is shown to partially attenuate but not eliminate the collapse. The work concludes that auditing input sensitivity (rather than output quality alone) is required for safe deployment of LLMs as advisors.

Significance. If the surrogate-based attribution is robust, the result would be significant for AI deployment in regulated domains. It supplies concrete evidence that neither scale nor retrieval augmentation suffices to enforce multi-factor integration, directly relevant to fiduciary standards in finance and analogous requirements in medicine or law. The emphasis on input-sensitivity auditing offers a practical, testable criterion beyond standard benchmark accuracy.

major comments (3)
  1. [Methods] Methods section: the description of the surrogate models provides no details on model family, training procedure, prompt templates used to elicit LLM outputs, the exact set of client factors tested, statistical controls for multicollinearity, or any fidelity metric (e.g., surrogate R² or agreement with held-out LLM decisions). Without these, the central attribution of dominance to risk tolerance cannot be evaluated and may be an artifact of surrogate misspecification.
  2. [Results] Results section: the claim that 'other relevant factors contribute minimally' is presented without quantitative support such as feature-importance rankings, partial-dependence plots, or ablation experiments that isolate the marginal effect of removing risk tolerance versus other variables. The reported dominance therefore lacks a clear effect-size anchor.
  3. [§4] §4 (web-search experiments): the statement that web search 'partially attenuates' collapse is not accompanied by a direct comparison of surrogate coefficients or decision boundaries with and without retrieval, making it impossible to quantify the mitigation or to rule out that the remaining collapse is still driven by the same surrogate artifacts.
minor comments (2)
  1. [Abstract] The term 'heuristic collapse' is introduced in the abstract and title without a concise formal definition or pointer to the precise operationalization used in the surrogate analysis.
  2. [Figures/Tables] Table or figure captions that display surrogate coefficients should explicitly state the units and the baseline against which 'minimal contribution' is judged.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to improve methodological transparency and the quantitative presentation of results.

read point-by-point responses
  1. Referee: [Methods] Methods section: the description of the surrogate models provides no details on model family, training procedure, prompt templates used to elicit LLM outputs, the exact set of client factors tested, statistical controls for multicollinearity, or any fidelity metric (e.g., surrogate R² or agreement with held-out LLM decisions). Without these, the central attribution of dominance to risk tolerance cannot be evaluated and may be an artifact of surrogate misspecification.

    Authors: We agree that the current Methods description is insufficiently detailed for independent evaluation. In the revised manuscript we will add a dedicated subsection specifying the surrogate model families (logistic regression and gradient-boosted trees), the exact training procedure and hyper-parameters, the full prompt templates used to elicit LLM allocations, the complete list of client factors, multicollinearity diagnostics (VIF thresholds), and fidelity metrics (R² on held-out LLM decisions together with decision-agreement rates). These additions will allow readers to assess whether the observed dominance of risk tolerance is robust to surrogate specification. revision: yes

  2. Referee: [Results] Results section: the claim that 'other relevant factors contribute minimally' is presented without quantitative support such as feature-importance rankings, partial-dependence plots, or ablation experiments that isolate the marginal effect of removing risk tolerance versus other variables. The reported dominance therefore lacks a clear effect-size anchor.

    Authors: We accept that the Results section would be strengthened by explicit quantitative anchors. The revision will include (i) normalized feature-importance rankings across all surrogate models, (ii) partial-dependence plots for each client factor, and (iii) ablation results that report the change in surrogate fidelity and predicted allocations when risk tolerance is removed versus when other factors are removed. These additions will supply concrete effect-size evidence for the minimal contribution claim. revision: yes

  3. Referee: [§4] §4 (web-search experiments): the statement that web search 'partially attenuates' collapse is not accompanied by a direct comparison of surrogate coefficients or decision boundaries with and without retrieval, making it impossible to quantify the mitigation or to rule out that the remaining collapse is still driven by the same surrogate artifacts.

    Authors: We will expand §4 with explicit side-by-side tables and figures comparing surrogate coefficients, feature importances, and decision-boundary visualizations between the retrieval-augmented and baseline conditions. This will quantify the degree of attenuation and allow direct inspection of whether the residual dominance of risk tolerance persists after retrieval. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical measurement of surrogate attributions on LLM outputs

full rationale

The paper conducts an empirical analysis by feeding client profiles to LLMs, collecting allocation outputs, and fitting interpretable surrogate models to attribute importance to input factors such as risk tolerance. The central claim follows directly from the observed surrogate coefficients or feature importances on the generated data; it is not obtained by redefining any quantity in terms of itself, renaming a fitted parameter as a prediction, or invoking a self-citation chain whose validity depends on the present result. No equations or derivations appear that reduce the reported dominance of risk tolerance to an input assumption. The study is therefore a measurement whose validity rests on surrogate fidelity and experimental design rather than on any self-referential construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the untested assumption that surrogate models faithfully extract the LLM's internal decision rules and that the selected client factors exhaust the relevant inputs for investment advice.

axioms (1)
  • domain assumption Interpretable surrogate models accurately recover the decision logic of frontier LLMs
    Invoked when applying surrogates to LLM outputs to measure factor importance
invented entities (1)
  • heuristic collapse no independent evidence
    purpose: Label for the systematic reduction of multi-factor decisions to a small number of dominant inputs
    New descriptive term introduced to characterize the observed behavior

pith-pipeline@v0.9.0 · 5453 in / 1113 out tokens · 29544 ms · 2026-05-08T06:20:43.959700+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

  1. [1]

    High-Yield Savings Account

  2. [2]

    Short-Term Treasury Bonds

  3. [3]

    Long-Term Treasury Bonds

  4. [4]

    Investment-Grade Corporate Bonds

  5. [5]

    High-Yield Corporate Bonds

  6. [6]

    Index Funds - S&P 500

  7. [7]

    Index Funds - Total Stock Market

  8. [8]

    Index Funds - International Stocks

  9. [9]

    Index Funds - Small Cap Stocks

  10. [10]

    Target-Date Retirement Funds

  11. [11]

    Real Estate Investment Trusts (REITs)

  12. [12]

    Dividend Growth Stocks

  13. [13]

    529 College Savings Plan

  14. [14]

    Health Savings Account (HSA)

  15. [15]

    Use it to inform your recommendations with up-to-date information

    Certificates of Deposit (CDs) Prompt: Addendum for Tool Required Condition You have access to a web_search tool to look up current financial information, market data, interest rates, and investment research. Use it to inform your recommendations with up-to-date information. You MUST use web_search to look up: (1) current interest rates, (2) recent market ...