pith. sign in

arxiv: 2606.10092 · v1 · pith:HRO5X3J7new · submitted 2026-06-08 · 💻 cs.LG · econ.GN· q-fin.EC

Decision-Making under Combinatorial Risk

Pith reviewed 2026-06-27 16:55 UTC · model grok-4.3

classification 💻 cs.LG econ.GNq-fin.EC
keywords combinatorial riskdecision under risksymbolic regressionprospect theoryinvestment allocationprobability incrementinduced distributionprobability mass function
0
0 comments X

The pith

People facing combinatorial risk choose based on key probability features like success increments rather than computing the full induced outcome distribution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an investment task in which allocating resources to multiple risky components induces a new outcome distribution whose exact shape is costly to compute. Participants consistently favor larger probability increments from investment and, when increments match, the component with higher starting success probability. Symbolic regression on the choice data recovers compact models driven by these combinatorial features instead of the complete probability mass function. Displaying the induced PMF reduces reliance on those features and lowers choice variability, with the remaining pattern captured by adding a prospect-theoretic term. The work shows that decision makers navigate multi-component risk through its core parameters when full evaluation is effortful.

Core claim

In the investment-allocation task participants prefer the option delivering the larger probability increment and, when increments are equal, the option with the higher initial success probability. Symbolic regression recovers descriptive models whose dominant terms are combinatorial-risk features such as after-investment success probability; these models account for observed choices without requiring exact evaluation of the full induced distribution. When the probability mass function is revealed, behavior shifts and is well fit by augmenting the feature model with a prospect-theoretic residual.

What carries the argument

Symbolic regression applied to choice data to discover compact models whose main inputs are combinatorial-risk features such as after-investment success probability.

If this is right

  • Choices are driven primarily by the size of the probability increment produced by investment.
  • When probability increments are identical, the higher initial success probability is preferred.
  • Revealing the full induced PMF reduces responsiveness to combinatorial features and decreases choice variance.
  • Augmenting the feature model with a prospect-theoretic residual accounts for behavior once the PMF is displayed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Decision-support tools could emphasize probability increments and baseline rates rather than full outcome tables when users face multiple risky components.
  • The same feature-based strategy may appear in other settings such as portfolio allocation or project selection where exact convolution of risks is impractical.
  • A direct test would apply the recovered models to new combinatorial structures with different numbers of components or payoff correlations.

Load-bearing premise

The symbolic regression procedure yields models that reflect participants' actual cognitive process rather than merely fitting noise or task-specific artifacts in the collected data.

What would settle it

A new experiment in which the same participants face variants requiring explicit calculation of the full induced distribution and produce choice patterns that deviate systematically from the feature-based models recovered here.

Figures

Figures reproduced from arXiv: 2606.10092 by Chen Wang, Hongmiao Fan, Yifan Hong.

Figure 1
Figure 1. Figure 1: Influence diagram of the two-component investment-allocation problem. The decision [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example of problem from the treatment condition. The same problem from the control [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Boxplot of proportion of choosing the dominant option, with mean proportion marked [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Absolute difference in choice probability between high- and low- magnitude problems. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Overview of the proposed method. Each dataset is treated as an island. In every epoch, [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Pareto frontier of models on the Combinatorial Risk testset data. Symbolic regression successfully identified a rich set of descriptive models (see [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Expected payoff difference transformations discovered from the treatment group data. [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of control model, residual model augmentation, PT-only benchmark, and [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: System prompt for EXPLOIT with choices13k schema. Current Best Expressions {expression list} Hardest Prediction Cases These are the cases where the current best expressions cannot predict well. {error examples} Your Task For EACH of the {n} expressions above, output exactly ONE improved variant in the SAME ORDER. Output exactly {n} numbered lines: {output template} Important: - Preserve the numbering (1., … view at source ↗
Figure 10
Figure 10. Figure 10: User prompt template for EXPLOIT. Organize: An Ontology of Decision Models We maintain a graph-structured ontology that organizes discovered symbolic models and exposes reusable functional forms and semantic insights. As illustrated in Figure 11a, the ontology contains six types of nodes: Category, Concept, Functional Form, Composed Feature, Model, and Raw Feature (atomic inputs). Intu￾itively, categories… view at source ↗
Figure 11
Figure 11. Figure 11: Ontology graph schema and an illustrative example. [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Pareto frontier of models on the choices13k data. Ablation We compare our framework (Full) with versions without Explore (No Explore) and with Search only (Search Only). We find Exploit effectively boosts model performances on Search, while the full framework with Explore achieves dominant performance (See [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗
read the original abstract

Decision-making under risk is typically studied through single-shot lottery choices. Yet many real decisions involve combinatorial risk, where risk arises from multiple risky components, so the lottery over outcomes is induced rather than given outright and can be costly to evaluate exactly. We introduce an investment-allocation task to study decision under combinatorial risk, where investing in a component raises its success probability and thereby reshapes the outcome distribution. Participants favor the option with the larger probability increment, and, when increments are equal, the option with the higher initial success probability. Revealing the induced probability mass function (PMF) substantially changes behavior, making participants less responsive to combinatorial-risk features and reducing choice variance. To explain these patterns, we move beyond standard benchmarks and hand-crafted hypotheses with symbolic regression to discover compact descriptive models. The discovered models rely mainly on combinatorial-risk features, such as the after-investment success probability, rather than exact evaluation of the full induced distribution. Behavior under the displayed PMF is then well explained by augmenting this model with a prospect-theoretic residual model. The results show that people navigate combinatorial risk primarily through its core features, shifting toward lottery valuation only when the induced PMF is displayed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces an investment-allocation task to examine decision-making under combinatorial risk, where risk is induced by multiple components rather than given as a direct lottery. Participants prefer options with larger probability increments after investment and, when increments are equal, those with higher initial success probabilities. Revealing the induced PMF reduces responsiveness to these combinatorial features and lowers choice variance. Symbolic regression is used to discover compact models that rely primarily on combinatorial-risk features (e.g., after-investment success probability) instead of exact evaluation of the full induced distribution; behavior in the PMF-display condition is then explained by augmenting the model with a prospect-theoretic residual.

Significance. If the symbolic regression yields models that are robustly validated as descriptive of cognitive processes (rather than experimental artifacts), the work would advance understanding of how people approximate complex induced risks via core features, with implications for behavioral decision theory, bounded rationality, and applications in finance or engineering where combinatorial risks are common. The contrast between feature-based and distribution-based valuation when PMF is displayed could inform hybrid models of risk processing.

major comments (2)
  1. [Symbolic regression and model discovery] The central claim that participants rely on combinatorial-risk features (rather than full PMF evaluation) rests on the symbolic regression procedure. The abstract and reported method provide no details on out-of-sample validation, held-out testing, regularization, or explicit comparison against null models that ignore combinatorial structure; without these, the discovered expressions may simply recover the lowest-complexity functions correlated with choices due to the task's own feature definitions rather than reflecting cognitive mechanisms.
  2. [Behavioral results] No information is given on sample size, statistical tests establishing the reported preferences (larger increment, higher initial probability), effect sizes for the PMF-display manipulation, or controls for multiple comparisons. These omissions make it impossible to assess the reliability of the behavioral patterns that the symbolic models are intended to explain.
minor comments (1)
  1. [Model augmentation] The description of how the prospect-theoretic residual is combined with the combinatorial model (additive? multiplicative?) is not specified in sufficient detail to allow replication or assessment of identifiability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below, indicating revisions that will strengthen the manuscript's transparency on methods and results.

read point-by-point responses
  1. Referee: [Symbolic regression and model discovery] The central claim that participants rely on combinatorial-risk features (rather than full PMF evaluation) rests on the symbolic regression procedure. The abstract and reported method provide no details on out-of-sample validation, held-out testing, regularization, or explicit comparison against null models that ignore combinatorial structure; without these, the discovered expressions may simply recover the lowest-complexity functions correlated with choices due to the task's own feature definitions rather than reflecting cognitive mechanisms.

    Authors: We agree that additional methodological details are needed to substantiate the symbolic regression results. The current manuscript describes the use of symbolic regression for discovering compact models but does not report the validation steps in sufficient depth. We will revise the methods section to include: (i) 5-fold cross-validation with held-out testing on 20% of trials, (ii) explicit regularization via the algorithm's complexity penalty, and (iii) direct comparisons against null models (e.g., logistic regression using only non-combinatorial features and a uniform random baseline). These additions will demonstrate that the discovered expressions outperform the nulls on out-of-sample log-likelihood and AIC, supporting that they capture task-relevant cognitive features rather than artifacts of feature definitions. revision: yes

  2. Referee: [Behavioral results] No information is given on sample size, statistical tests establishing the reported preferences (larger increment, higher initial probability), effect sizes for the PMF-display manipulation, or controls for multiple comparisons. These omissions make it impossible to assess the reliability of the behavioral patterns that the symbolic models are intended to explain.

    Authors: We acknowledge that the main text does not explicitly report these statistical details, which are required for evaluating the behavioral findings. We will revise the methods and results sections to state the sample size, the specific tests (paired t-tests for the increment and initial-probability preferences), effect sizes (Cohen's d), and the multiple-comparison procedure (FDR correction). These elements were part of the data-analysis pipeline but will now be presented clearly in the main manuscript to allow readers to assess reliability. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper applies symbolic regression to choice data collected from a custom investment task in order to discover compact descriptive models. This is a standard data-driven fitting procedure whose output is not equivalent to its inputs by construction, nor does the provided text contain self-definitional equations, load-bearing self-citations, uniqueness theorems imported from prior author work, or any other enumerated circular pattern. The central claim that participants rely on combinatorial features is presented as an empirical finding from the regression rather than a deductive result forced by the task definition itself. The subsequent augmentation with a prospect-theoretic residual is likewise described as an explanatory addition, not a tautological renaming. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The central claim rests on the unstated assumption that symbolic regression recovers psychologically meaningful rules rather than statistical artifacts.

pith-pipeline@v0.9.1-grok · 5737 in / 986 out tokens · 20690 ms · 2026-06-27T16:55:18.929500+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 2 canonical work pages · 1 internal anchor

  1. [1]

    Econometrica , volume=

    Prospect theory: An analysis of decision under risk , author=. Econometrica , volume=

  2. [2]

    Cognitive psychology , volume=

    On the shape of the probability weighting function , author=. Cognitive psychology , volume=. 1999 , publisher=

  3. [3]

    Journal of Risk and uncertainty , volume=

    Advances in prospect theory: Cumulative representation of uncertainty , author=. Journal of Risk and uncertainty , volume=. 1992 , publisher=

  4. [4]

    Science , volume=

    Using large-scale experiments and machine learning to discover theories of human decision-making , author=. Science , volume=. 2021 , publisher=

  5. [5]

    International conference on machine learning , pages=

    Cognitive model priors for predicting human decisions , author=. International conference on machine learning , pages=. 2019 , organization=

  6. [6]

    Topics in Cognitive Science , year=

    Local search and the evolution of world models , author=. Topics in Cognitive Science , year=

  7. [7]

    2009 , publisher=

    Decision theory: Principles and approaches , author=. 2009 , publisher=

  8. [8]

    Econometrica: Journal of the Econometric Society , pages=

    Risk Aversion in the Small and in the Large , author=. Econometrica: Journal of the Econometric Society , pages=. 1964 , publisher=

  9. [9]

    Wiley encyclopedia of operations research and management science , year=

    Overweighting of small probabilities , author=. Wiley encyclopedia of operations research and management science , year=

  10. [10]

    Petersburg Paradox , author=

    A Resource-Rational, Process-Level Account of the St. Petersburg Paradox , author=. Topics in Cognitive Science , volume=. 2020 , publisher=

  11. [11]

    Nature communications , volume=

    Rationally inattentive intertemporal choice , author=. Nature communications , volume=. 2020 , publisher=

  12. [12]

    Current Opinion in Behavioral Sciences , volume=

    Resource-rational decision making , author=. Current Opinion in Behavioral Sciences , volume=. 2021 , publisher=

  13. [13]

    , author=

    Computation-limited Bayesian updating: A resource-rational analysis of approximate Bayesian inference. , author=. Psychological Review , year=

  14. [14]

    Proceedings of the National Academy of Sciences , volume=

    Optimal utility and probability functions for agents with finite computational precision , author=. Proceedings of the National Academy of Sciences , volume=. 2021 , publisher=

  15. [15]

    Management Science , volume=

    Decisions under uncertainty as bayesian inference on choice options , author=. Management Science , volume=. 2024 , publisher=

  16. [16]

    American Economic Review , volume=

    Perceiving prospects properly , author=. American Economic Review , volume=. 2016 , publisher=

  17. [17]

    The Quarterly Journal of Economics , volume=

    Efficient coding and risky choice , author=. The Quarterly Journal of Economics , volume=. 2022 , publisher=

  18. [18]

    , author=

    From anomalies to forecasts: Toward a descriptive model of decisions under risk, under ambiguity, and from experience. , author=. Psychological review , volume=. 2017 , publisher=

  19. [19]

    CS294A Lecture notes , volume=

    Sparse autoencoder , author=. CS294A Lecture notes , volume=

  20. [20]

    The Thirteenth International Conference on Learning Representations , year=

    Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice , author=. The Thirteenth International Conference on Learning Representations , year=

  21. [21]

    Transactions on Machine Learning Research , year=

    Symbolic Regression is NP-hard , author=. Transactions on Machine Learning Research , year=

  22. [22]

    International Conference on Learning Representations , year=

    Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients , author=. International Conference on Learning Representations , year=

  23. [23]

    Advances in neural information processing systems , volume=

    Discovering symbolic models from deep learning with inductive biases , author=. Advances in neural information processing systems , volume=

  24. [24]

    Nature Computational Science , pages=

    Discovering physical laws with parallel symbolic enumeration , author=. Nature Computational Science , pages=. 2025 , publisher=

  25. [25]

    Science advances , volume=

    AI Feynman: A physics-inspired method for symbolic regression , author=. Science advances , volume=. 2020 , publisher=

  26. [26]

    Advances in Neural Information Processing Systems , volume=

    AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity , author=. Advances in Neural Information Processing Systems , volume=

  27. [27]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Dimension Reduction for Symbolic Regression , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  28. [28]

    Nature Machine Intelligence , pages=

    A neural symbolic model for space physics , author=. Nature Machine Intelligence , pages=. 2025 , publisher=

  29. [29]

    Hou and Max Tegmark , booktitle=

    Ziming Liu and Yixuan Wang and Sachin Vaidya and Fabian Ruehle and James Halverson and Marin Soljacic and Thomas Y. Hou and Max Tegmark , booktitle=. 2025 , url=

  30. [30]

    arXiv , langid =:2408.10205 , primaryclass =

    Kan 2.0: Kolmogorov-arnold networks meet science , author=. arXiv preprint arXiv:2408.10205 , year=

  31. [31]

    Advances in Neural Information Processing Systems , volume=

    Symbolic regression with a learned concept library , author=. Advances in Neural Information Processing Systems , volume=

  32. [32]

    Proceedings of the National Academy of Sciences , volume=

    SR-LLM: An incremental symbolic regression framework driven by LLM-based retrieval-augmented generation , author=. Proceedings of the National Academy of Sciences , volume=. 2025 , publisher=

  33. [33]

    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop) , pages=

    In-Context Symbolic Regression: Leveraging Large Language Models for Function Discovery , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop) , pages=

  34. [34]

    International Conference on Artificial Intelligence and Statistics , pages=

    Shape Arithmetic Expressions: Advancing Scientific Discovery Beyond Closed-Form Equations , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2024 , organization=

  35. [35]

    Advances in Neural Information Processing Systems , volume=

    End-to-end symbolic regression with transformers , author=. Advances in Neural Information Processing Systems , volume=

  36. [36]

    The Eleventh International Conference on Learning Representations , year=

    Deep Generative Symbolic Regression , author=. The Eleventh International Conference on Learning Representations , year=

  37. [37]

    INFORMS Journal on Computing , volume=

    Learning symbolic expressions: Mixed-integer formulations, cuts, and heuristics , author=. INFORMS Journal on Computing , volume=. 2023 , publisher=

  38. [38]

    IEEE transactions on evolutionary computation , volume=

    A fast and elitist multiobjective genetic algorithm: NSGA-II , author=. IEEE transactions on evolutionary computation , volume=. 2002 , publisher=

  39. [39]

    Proceedings of the 29th International Coference on International Conference on Machine Learning , pages=

    Revisiting k-means: new algorithms via Bayesian nonparametrics , author=. Proceedings of the 29th International Coference on International Conference on Machine Learning , pages=

  40. [40]

    International Conference on Computer Aided Systems Theory , pages=

    Complexity measures for multi-objective symbolic regression , author=. International Conference on Computer Aided Systems Theory , pages=. 2015 , organization=

  41. [41]

    Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl

    Interpretable machine learning for science with PySR and SymbolicRegression. jl , author=. arXiv preprint arXiv:2305.01582 , year=

  42. [42]

    Proceedings of the 2020 genetic and evolutionary computation conference companion , pages=

    Operon C++ an efficient genetic programming framework for symbolic regression , author=. Proceedings of the 2020 genetic and evolutionary computation conference companion , pages=

  43. [43]

    1944 , publisher=

    Theory of games and economic behavior , author=. 1944 , publisher=

  44. [44]

    Econometrica , volume=

    Risk aversion in the small and in the large , author=. Econometrica , volume=. 1976 , publisher=