Calibrating Behavioral Parameters with Large Language Models

Brandon Yee; Krishna Sharma

arxiv: 2602.01022 · v3 · submitted 2026-02-01 · 💰 econ.GN · cs.AI· q-fin.EC

Calibrating Behavioral Parameters with Large Language Models

Brandon Yee , Krishna Sharma This is my paper

Pith reviewed 2026-05-16 08:54 UTC · model grok-4.3

classification 💰 econ.GN cs.AIq-fin.EC

keywords behavioral financelarge language modelsloss aversionherdingextrapolationagent-based modelscalibrationasset pricing

0 comments

The pith

Large language models can be calibrated with behavioral profiles to measure loss aversion, herding, and extrapolation at or above human benchmark levels for asset pricing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework that uses large language models as measurement instruments for behavioral parameters that are hard to observe directly in asset markets. It first documents that baseline LLM responses display too much rationality, with weaker loss aversion, herding, and disposition effects than human data. Profile-based prompting then produces large, stable shifts that bring the parameters for loss aversion, herding, extrapolation, and anchoring into or beyond observed human ranges. When the calibrated extrapolation parameter is inserted into an agent-based asset pricing model, the resulting price paths display short-horizon momentum and long-horizon reversal that match empirical patterns.

Core claim

Profile-based calibration of LLMs induces large, stable, and theoretically coherent shifts in behavioral parameters, with calibrated loss aversion, herding, extrapolation, and anchoring reaching or exceeding benchmark magnitudes, and calibrated extrapolation in an agent-based asset pricing model generates short-horizon momentum and long-horizon reversal patterns consistent with empirical evidence.

What carries the argument

Profile-based prompting that treats LLMs as calibrated measurement instruments for eight canonical behavioral biases.

If this is right

Calibrated parameters reach or exceed human benchmark magnitudes for loss aversion, herding, extrapolation, and anchoring.
Calibrated extrapolation in an agent-based asset pricing model produces short-horizon momentum and long-horizon reversal consistent with empirical evidence.
The framework supplies explicit measurement ranges and boundaries for eight canonical behavioral biases.
Baseline LLM behavior exhibits systematic rationality bias including attenuated loss aversion and near-zero disposition effects.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could allow researchers to generate large populations of heterogeneous agents with controlled bias profiles without new surveys or experiments.
If the calibration functions prove stable, they could be reused across different market models to study interactions among multiple biases simultaneously.
The method might extend to calibrating behavioral parameters in macroeconomic or policy simulation models where direct measurement is equally difficult.

Load-bearing premise

That prompting LLMs with behavioral profiles produces parameters that remain stable across models, scenarios, and time and that inserting those parameters into agent-based models yields dynamics that reflect human behavior rather than artifacts of the prompting process.

What would settle it

Running the same profile prompts on multiple LLMs at different times and finding that the extracted parameters for loss aversion or extrapolation vary by more than the reported stability margin, or finding that the agent-based model with calibrated extrapolation fails to produce momentum and reversal patterns when tested against new market data.

read the original abstract

Behavioral parameters such as loss aversion, herding, and extrapolation are central to asset pricing models but remain difficult to measure reliably. We develop a framework that treats large language models (LLMs) as calibrated measurement instruments for behavioral parameters. Using four models and 24{,}000 agent--scenario pairs, we document systematic rationality bias in baseline LLM behavior, including attenuated loss aversion, weak herding, and near-zero disposition effects relative to human benchmarks. Profile-based calibration induces large, stable, and theoretically coherent shifts in several parameters, with calibrated loss aversion, herding, extrapolation, and anchoring reaching or exceeding benchmark magnitudes. To assess external validity, we embed calibrated parameters in an agent-based asset pricing model, where calibrated extrapolation generates short-horizon momentum and long-horizon reversal patterns consistent with empirical evidence. Our results establish measurement ranges, calibration functions, and explicit boundaries for eight canonical behavioral biases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows LLMs can be profile-prompted to shift behavioral parameters like loss aversion and extrapolation toward human levels, then validates by recovering momentum and reversal in an ABM, but lacks reported checks on cross-model stability and prompting details.

read the letter

The main takeaway is that profile-based prompting on LLMs produces large, coherent shifts in parameters such as loss aversion, herding, extrapolation, and anchoring, and that feeding the calibrated values into an agent-based asset pricing model generates short-horizon momentum and long-horizon reversal patterns that line up with observed market data. This combination of calibration plus ABM validation is not something I have seen in the prior literature on either side. They run the exercise across four models and 24,000 pairs, which gives the results some scale, and they start by documenting the baseline rationality bias in raw LLM outputs. That part is useful and straightforward. The external-validity step in the ABM is also a concrete move beyond pure measurement. The soft spots are exactly where the stress-test note points: the abstract gives no prompting protocols, no variance statistics across models, and no formal tests of whether the same profiles produce stable shifts when the underlying LLM changes. Without those, the claim that the parameters are reliable instruments rests on summarized outcomes rather than shown invariance. The moderate circularity risk is also real because both the calibration targets and the market patterns come from human data. This is aimed at behavioral asset pricing researchers who need parameter values they cannot easily elicit from people, and at anyone testing LLM use in economic simulations. A reader in either group would get a workable framework and some evidence it can reproduce anomalies, even if they would want the full methods before relying on the numbers. It deserves a serious referee because the core idea is new and the validation attempt is direct, though it will need added robustness sections to hold up.

Referee Report

3 major / 2 minor

Summary. The paper develops a framework treating LLMs as calibrated instruments for measuring behavioral parameters (loss aversion, herding, extrapolation, anchoring) in asset pricing. Using four models and 24,000 agent-scenario pairs, it reports baseline LLM rationality biases relative to human benchmarks, shows profile-based calibration produces large, stable, theoretically coherent parameter shifts reaching or exceeding benchmarks, and validates by embedding calibrated parameters in an agent-based asset pricing model where extrapolation generates short-horizon momentum and long-horizon reversal consistent with empirical evidence. The work establishes measurement ranges and explicit boundaries for eight biases.

Significance. If the calibration functions prove robust, the approach could supply a scalable method for quantifying parameters that are otherwise difficult to measure directly, improving the micro-foundations of agent-based models in finance. The multi-model design and ABM embedding step are constructive elements that ground the claims in both measurement and dynamic implications.

major comments (3)

[Abstract] Abstract: the central claim that profile-based calibration induces 'large, stable, and theoretically coherent shifts' rests on summarized outcomes; the abstract supplies no prompting protocols, statistical significance tests, robustness checks to model choice, or exclusion criteria, leaving the reliability of the calibration functions unevaluated.
[Results] Results section: results are reported from four LLMs and 24,000 pairs but no cross-model variance statistics or consistency metrics for the calibration mappings are provided, so the invariance assumption required for treating LLMs as stable measurement instruments remains untested.
[ABM validation] ABM validation section: calibrated extrapolation is shown to generate momentum and reversal patterns 'consistent with empirical evidence,' yet both the calibration targets (human benchmarks) and the validation targets (market patterns) derive from observed human behavior, creating a moderate circularity risk that weakens the external-validity interpretation.

minor comments (2)

[Abstract] The abstract states '24,000' without a comma; adopt consistent numeric formatting throughout.
[Introduction] Define the eight canonical behavioral biases with explicit functional forms or references in the main text before presenting calibration results.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below. Revisions have been made to strengthen the presentation of methods, add cross-model statistics, and clarify the validation logic. We believe these changes improve the manuscript without altering its core claims.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that profile-based calibration induces 'large, stable, and theoretically coherent shifts' rests on summarized outcomes; the abstract supplies no prompting protocols, statistical significance tests, robustness checks to model choice, or exclusion criteria, leaving the reliability of the calibration functions unevaluated.

Authors: We agree the abstract is highly condensed. Due to length limits, it summarizes rather than details protocols. Prompting templates, exact statistical tests (t-tests and Wilcoxon on parameter shifts), model-robustness tables, and exclusion rules (e.g., responses with <80% coherence) are fully reported in Sections 2.2, 3.1, and 4.1. We have revised the abstract to add one sentence noting the four-model design, 24,000-pair sample, and robustness across LLMs. revision: yes
Referee: [Results] Results section: results are reported from four LLMs and 24,000 pairs but no cross-model variance statistics or consistency metrics for the calibration mappings are provided, so the invariance assumption required for treating LLMs as stable measurement instruments remains untested.

Authors: We accept this point. The original draft reported only pooled results. We have added a new subsection (3.3) that computes (i) standard deviation of each calibrated parameter across the four models, (ii) pairwise correlations of the calibration functions, and (iii) a consistency index (fraction of parameters whose sign and significance agree across models). These metrics are low-variance for loss aversion, herding, and extrapolation, supporting the invariance assumption. The revised tables are now included. revision: yes
Referee: [ABM validation] ABM validation section: calibrated extrapolation is shown to generate momentum and reversal patterns 'consistent with empirical evidence,' yet both the calibration targets (human benchmarks) and the validation targets (market patterns) derive from observed human behavior, creating a moderate circularity risk that weakens the external-validity interpretation.

Authors: We disagree that this constitutes circularity. Calibration targets are micro-level parameters recovered from controlled laboratory experiments (Kahneman & Tversky 1979; Barberis et al. 2016). Validation targets are macro-level return patterns documented in asset-pricing studies (Jegadeesh & Titman 1993; De Bondt & Thaler 1985). The ABM tests whether parameters fitted to individual experimental data can reproduce aggregate market regularities—an explicit micro-to-macro mapping that is not tautological. We have added a clarifying paragraph in Section 5.2 distinguishing the two data sources and noting that market patterns were never used in calibration. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper calibrates LLM outputs to match external human behavioral benchmarks for parameters such as loss aversion and extrapolation, then embeds the resulting values in a standard agent-based asset pricing model to check whether they reproduce known aggregate market patterns (short-horizon momentum, long-horizon reversal). These steps are independent: the calibration targets are micro-level individual biases drawn from separate human-subject studies, while the validation targets are macro-level price dynamics from market data. No equations, definitions, or self-citations reduce any claimed result to its own inputs by construction. The framework therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that LLMs can be made to proxy human behavioral parameters through prompting and profile calibration, with no new physical entities introduced and only one class of free parameters (the profile descriptors).

free parameters (1)

profile descriptors for calibration
Short textual profiles added to prompts to shift LLM responses toward human benchmark magnitudes; their exact content and selection criteria are not specified in the abstract.

axioms (1)

domain assumption LLMs can simulate human-like decision biases when appropriately prompted and calibrated
Invoked when the authors treat baseline and calibrated LLM outputs as direct measurements of behavioral parameters.

pith-pipeline@v0.9.0 · 5445 in / 1456 out tokens · 34145 ms · 2026-05-16T08:54:10.811978+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Profile-based calibration induces large, stable, and theoretically coherent shifts in several parameters... embed calibrated parameters in an agent-based asset pricing model
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We develop a framework that treats large language models (LLMs) as calibrated measurement instruments for behavioral parameters

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

[1]

Using large language models to simulate multiple humans and replicate human subject studies.International Conference on Machine Learning, pages 337–371, 2023

Gati Aher, Rosa I Arriaga, and Adam Tauman Kalai. Using large language models to simulate multiple humans and replicate human subject studies.International Conference on Machine Learning, pages 337–371, 2023

work page 2023
[2]

Information cascades in the laboratory.American Economic Review, 87(5): 847–862, 1997

Lisa R Anderson and Charles A Holt. Information cascades in the laboratory.American Economic Review, 87(5): 847–862, 1997

work page 1997
[3]

Out of one, many: Using language models to simulate human samples.Political Analysis, 31(3):337–351, 2023

Lisa P Argyle, Ethan C Busby, Nancy Fulda, Joshua R Gubler, Christopher Rytting, and David Wingate. Out of one, many: Using language models to simulate human samples.Political Analysis, 31(3):337–351, 2023

work page 2023
[4]

Stocks as lotteries: The implications of probability weighting for security prices.American Economic Review, 98(5):2066–2100, 2008

Nicholas Barberis and Ming Huang. Stocks as lotteries: The implications of probability weighting for security prices.American Economic Review, 98(5):2066–2100, 2008

work page 2066
[5]

A model of investor sentiment.Journal of Financial Economics, 49(3):307–343, 1998

Nicholas Barberis, Andrei Shleifer, and Robert Vishny. A model of investor sentiment.Journal of Financial Economics, 49(3):307–343, 1998

work page 1998
[6]

Extrapolation and bubbles.Journal of Financial Economics, 129(2):203–227, 2018

Nicholas Barberis, Robin Greenwood, Lawrence Jin, and Andrei Shleifer. Extrapolation and bubbles.Journal of Financial Economics, 129(2):203–227, 2018

work page 2018
[7]

Thirty years of prospect theory in economics: A review and assessment.Journal of Economic Perspectives, 27(1):173–196, 2013

Nicholas C Barberis. Thirty years of prospect theory in economics: A review and assessment.Journal of Economic Perspectives, 27(1):173–196, 2013

work page 2013
[8]

Post-earnings-announcement drift: Delayed price response or risk premium?Journal of Accounting Research, 27:1–36, 1989

Victor L Bernard and Jacob K Thomas. Post-earnings-announcement drift: Delayed price response or risk premium?Journal of Accounting Research, 27:1–36, 1989

work page 1989
[9]

A theory of fads, fashion, custom, and cultural change as informational cascades.Journal of Political Economy, 100(5):992–1026, 1992

Sushil Bikhchandani, David Hirshleifer, and Ivo Welch. A theory of fads, fashion, custom, and cultural change as informational cascades.Journal of Political Economy, 100(5):992–1026, 1992

work page 1992
[10]

Predicting the next step of a random walk: Experimental evidence of regime-shifting beliefs.Journal of Financial Economics, 65(3):397–414, 2002

Robert Bloomfield and Jeffrey Hales. Predicting the next step of a random walk: Experimental evidence of regime-shifting beliefs.Journal of Financial Economics, 65(3):397–414, 2002

work page 2002
[11]

Using gpt for market research

James Brand, Ayelet Israeli, and Donald Ngwe. Using gpt for market research. Marketing Unit Working Paper 23-062, Harvard Business School, 2023

work page 2023
[12]

Heterogeneous beliefs and routes to chaos in a simple asset pricing model

William A Brock and Cars H Hommes. Heterogeneous beliefs and routes to chaos in a simple asset pricing model. Journal of Economic Dynamics and Control, 22(8-9):1235–1274, 1998

work page 1998
[13]

The promise and success of lab-field generalizability in experimental economics: A critical reply to levitt and list.Available at SSRN 1977749, 2011

Colin F Camerer. The promise and success of lab-field generalizability in experimental economics: A critical reply to levitt and list.Available at SSRN 1977749, 2011

work page 2011
[14]

Distinguishing informational cascades from herd behavior in the laboratory

Bogachan Celen and Shachar Kariv. Distinguishing informational cascades from herd behavior in the laboratory. American Economic Review, 94(3):484–498, 2004

work page 2004
[15]

Investor psychology and security market under-and overreactions.Journal of Finance, 53(6):1839–1885, 1998

Kent Daniel, David Hirshleifer, and Avanidhar Subrahmanyam. Investor psychology and security market under-and overreactions.Journal of Finance, 53(6):1839–1885, 1998

work page 1998
[16]

Individual risk attitudes: Measurement, determinants, and behavioral consequences.Journal of the European Economic Association, 9(3):522–550, 2011

Thomas Dohmen, Armin Falk, David Huffman, Uwe Sunde, Jürgen Schupp, and Gert G Wagner. Individual risk attitudes: Measurement, determinants, and behavioral consequences.Journal of the European Economic Association, 9(3):522–550, 2011

work page 2011
[17]

Expectations of returns and expected returns.Review of Financial Studies, 27(3):714–746, 2014

Robin Greenwood and Andrei Shleifer. Expectations of returns and expected returns.Review of Financial Studies, 27(3):714–746, 2014

work page 2014
[18]

Risk aversion and incentive effects.American Economic Review, 92(5): 1644–1655, 2002

Charles A Holt and Susan K Laury. Risk aversion and incentive effects.American Economic Review, 92(5): 1644–1655, 2002

work page 2002
[19]

Large language models as simulated economic agents: What can we learn from homo silicus? National Bureau of Economic Research Working Paper, (31122), 2023

John J Horton. Large language models as simulated economic agents: What can we learn from homo silicus? National Bureau of Economic Research Working Paper, (31122), 2023

work page 2023
[20]

Can large language models simulate human behavior in economic experiments?Working Paper, 2024

John J Horton. Can large language models simulate human behavior in economic experiments?Working Paper, 2024

work page 2024
[21]

Returns to buying winners and selling losers: Implications for stock market efficiency.Journal of Finance, 48(1):65–91, 1993

Narasimhan Jegadeesh and Sheridan Titman. Returns to buying winners and selling losers: Implications for stock market efficiency.Journal of Finance, 48(1):65–91, 1993

work page 1993
[22]

Prospect theory: An analysis of decision under risk.Econometrica, 47(2): 263–291, 1979

Daniel Kahneman and Amos Tversky. Prospect theory: An analysis of decision under risk.Econometrica, 47(2): 263–291, 1979

work page 1979
[23]

Experimental tests of the endowment effect and the coase theorem.Journal of Political Economy, 98(6):1325–1348, 1990

Daniel Kahneman, Jack L Knetsch, and Richard H Thaler. Experimental tests of the endowment effect and the coase theorem.Journal of Political Economy, 98(6):1325–1348, 1990

work page 1990
[24]

Structural vs

Michael P Keane. Structural vs. atheoretic approaches to econometrics.Journal of Econometrics, 156(1):3–20, 2011. 11 Large Language Models as Calibrated Measurement Instruments for Behavioral ParametersA PREPRINT

work page 2011
[25]

Contrarian investment, extrapolation, and risk.Journal of Finance, 49(5):1541–1578, 1994

Josef Lakonishok, Andrei Shleifer, and Robert W Vishny. Contrarian investment, extrapolation, and risk.Journal of Finance, 49(5):1541–1578, 1994

work page 1994
[26]

Empirical regularities from interacting long-and short-memory investors in an agent-based stock market.IEEE Transactions on Evolutionary Computation, 5(5):442–455, 2001

Blake LeBaron. Empirical regularities from interacting long-and short-memory investors in an agent-based stock market.IEEE Transactions on Evolutionary Computation, 5(5):442–455, 2001

work page 2001
[27]

Scaling and criticality in a stochastic multi-agent model of a financial market

Thomas Lux and Michele Marchesi. Scaling and criticality in a stochastic multi-agent model of a financial market. Nature, 397(6719):498–500, 1999

work page 1999
[28]

Quantifying and mitigating memorization in large language models.arXiv preprint, 2024

Qing Mei et al. Quantifying and mitigating memorization in large language models.arXiv preprint, 2024

work page 2024
[29]

The trouble with overconfidence.Psychological Review, 115(2):502–517, 2008

Don A Moore and Paul J Healy. The trouble with overconfidence.Psychological Review, 115(2):502–517, 2008

work page 2008
[30]

Overcoming the inevitable anchoring effect: Considering the opposite compensates for selective accessibility.Personality and Social Psychology Bulletin, 26(9):1142–1150, 2000

Thomas Mussweiler, Fritz Strack, and Tim Pfeiffer. Overcoming the inevitable anchoring effect: Considering the opposite compensates for selective accessibility.Personality and Social Psychology Bulletin, 26(9):1142–1150, 2000

work page 2000
[31]

Experts, amateurs, and real estate: An anchoring-and-adjustment perspective on property pricing decisions.Organizational Behavior and Human Decision Processes, 39(1):84–97, 1987

Gregory B Northcraft and Margaret A Neale. Experts, amateurs, and real estate: An anchoring-and-adjustment perspective on property pricing decisions.Organizational Behavior and Human Decision Processes, 39(1):84–97, 1987

work page 1987
[32]

The boundaries of loss aversion.Journal of Marketing Research, 42(2): 119–128, 2005

Nathan Novemsky and Daniel Kahneman. The boundaries of loss aversion.Journal of Marketing Research, 42(2): 119–128, 2005

work page 2005
[33]

Are investors reluctant to realize their losses?Journal of Finance, 53(5):1775–1798, 1998

Terrance Odean. Are investors reluctant to realize their losses?Journal of Finance, 53(5):1775–1798, 1998

work page 1998
[34]

Generative agents: Interactive simulacra of human behavior

Joon Sung Park, Joseph C O’Brien, Carrie J Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. InUIST, 2023

work page 2023
[35]

The disposition to sell winners too early and ride losers too long: Theory and evidence.Journal of Finance, 40(3):777–790, 1985

Hersh Shefrin and Meir Statman. The disposition to sell winners too early and ride losers too long: Theory and evidence.Journal of Finance, 40(3):777–790, 1985

work page 1985
[36]

Judgment under uncertainty: Heuristics and biases.Science, 185(4157): 1124–1131, 1974

Amos Tversky and Daniel Kahneman. Judgment under uncertainty: Heuristics and biases.Science, 185(4157): 1124–1131, 1974

work page 1974
[37]

Advances in prospect theory: Cumulative representation of uncertainty

Amos Tversky and Daniel Kahneman. Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5(4):297–323, 1992

work page 1992
[38]

Asset M891 (+20%) just missed earnings 40%, lost major contract, deteriorating margins. Asset P234 (-10%) performing as expected. Which sell?

Martin Weber and Colin F Camerer. The disposition effect in securities trading: An experimental analysis.Journal of Economic Behavior & Organization, 33(2):167–184, 1998. 12 Large Language Models as Calibrated Measurement Instruments for Behavioral ParametersA PREPRINT Appendix A Human Benchmark Justification This appendix provides detailed justification ...

work page 1998
[39]

Search each asset identifier on Google (exact phrase match)

work page
[40]

Search on Bing, Yahoo Finance, Bloomberg Terminal

work page
[41]

Search SEC EDGAR filings

work page
[42]

Procedure documented and replicable

Search financial news archives (WSJ, FT, Bloomberg News) Zero exact matches confirm non-existence in accessible training data. Procedure documented and replicable. E.2 Power Analysis Details For each experiment, we compute power using simulation-based approach: Disposition Effect: • Null: DR = 1.0 (no bias) • Alternative: DR = 1.6 (human benchmark) • Samp...

work page 2000

[1] [1]

Using large language models to simulate multiple humans and replicate human subject studies.International Conference on Machine Learning, pages 337–371, 2023

Gati Aher, Rosa I Arriaga, and Adam Tauman Kalai. Using large language models to simulate multiple humans and replicate human subject studies.International Conference on Machine Learning, pages 337–371, 2023

work page 2023

[2] [2]

Information cascades in the laboratory.American Economic Review, 87(5): 847–862, 1997

Lisa R Anderson and Charles A Holt. Information cascades in the laboratory.American Economic Review, 87(5): 847–862, 1997

work page 1997

[3] [3]

Out of one, many: Using language models to simulate human samples.Political Analysis, 31(3):337–351, 2023

Lisa P Argyle, Ethan C Busby, Nancy Fulda, Joshua R Gubler, Christopher Rytting, and David Wingate. Out of one, many: Using language models to simulate human samples.Political Analysis, 31(3):337–351, 2023

work page 2023

[4] [4]

Stocks as lotteries: The implications of probability weighting for security prices.American Economic Review, 98(5):2066–2100, 2008

Nicholas Barberis and Ming Huang. Stocks as lotteries: The implications of probability weighting for security prices.American Economic Review, 98(5):2066–2100, 2008

work page 2066

[5] [5]

A model of investor sentiment.Journal of Financial Economics, 49(3):307–343, 1998

Nicholas Barberis, Andrei Shleifer, and Robert Vishny. A model of investor sentiment.Journal of Financial Economics, 49(3):307–343, 1998

work page 1998

[6] [6]

Extrapolation and bubbles.Journal of Financial Economics, 129(2):203–227, 2018

Nicholas Barberis, Robin Greenwood, Lawrence Jin, and Andrei Shleifer. Extrapolation and bubbles.Journal of Financial Economics, 129(2):203–227, 2018

work page 2018

[7] [7]

Thirty years of prospect theory in economics: A review and assessment.Journal of Economic Perspectives, 27(1):173–196, 2013

Nicholas C Barberis. Thirty years of prospect theory in economics: A review and assessment.Journal of Economic Perspectives, 27(1):173–196, 2013

work page 2013

[8] [8]

Post-earnings-announcement drift: Delayed price response or risk premium?Journal of Accounting Research, 27:1–36, 1989

Victor L Bernard and Jacob K Thomas. Post-earnings-announcement drift: Delayed price response or risk premium?Journal of Accounting Research, 27:1–36, 1989

work page 1989

[9] [9]

A theory of fads, fashion, custom, and cultural change as informational cascades.Journal of Political Economy, 100(5):992–1026, 1992

Sushil Bikhchandani, David Hirshleifer, and Ivo Welch. A theory of fads, fashion, custom, and cultural change as informational cascades.Journal of Political Economy, 100(5):992–1026, 1992

work page 1992

[10] [10]

Predicting the next step of a random walk: Experimental evidence of regime-shifting beliefs.Journal of Financial Economics, 65(3):397–414, 2002

Robert Bloomfield and Jeffrey Hales. Predicting the next step of a random walk: Experimental evidence of regime-shifting beliefs.Journal of Financial Economics, 65(3):397–414, 2002

work page 2002

[11] [11]

Using gpt for market research

James Brand, Ayelet Israeli, and Donald Ngwe. Using gpt for market research. Marketing Unit Working Paper 23-062, Harvard Business School, 2023

work page 2023

[12] [12]

Heterogeneous beliefs and routes to chaos in a simple asset pricing model

William A Brock and Cars H Hommes. Heterogeneous beliefs and routes to chaos in a simple asset pricing model. Journal of Economic Dynamics and Control, 22(8-9):1235–1274, 1998

work page 1998

[13] [13]

The promise and success of lab-field generalizability in experimental economics: A critical reply to levitt and list.Available at SSRN 1977749, 2011

Colin F Camerer. The promise and success of lab-field generalizability in experimental economics: A critical reply to levitt and list.Available at SSRN 1977749, 2011

work page 2011

[14] [14]

Distinguishing informational cascades from herd behavior in the laboratory

Bogachan Celen and Shachar Kariv. Distinguishing informational cascades from herd behavior in the laboratory. American Economic Review, 94(3):484–498, 2004

work page 2004

[15] [15]

Investor psychology and security market under-and overreactions.Journal of Finance, 53(6):1839–1885, 1998

Kent Daniel, David Hirshleifer, and Avanidhar Subrahmanyam. Investor psychology and security market under-and overreactions.Journal of Finance, 53(6):1839–1885, 1998

work page 1998

[16] [16]

Individual risk attitudes: Measurement, determinants, and behavioral consequences.Journal of the European Economic Association, 9(3):522–550, 2011

Thomas Dohmen, Armin Falk, David Huffman, Uwe Sunde, Jürgen Schupp, and Gert G Wagner. Individual risk attitudes: Measurement, determinants, and behavioral consequences.Journal of the European Economic Association, 9(3):522–550, 2011

work page 2011

[17] [17]

Expectations of returns and expected returns.Review of Financial Studies, 27(3):714–746, 2014

Robin Greenwood and Andrei Shleifer. Expectations of returns and expected returns.Review of Financial Studies, 27(3):714–746, 2014

work page 2014

[18] [18]

Risk aversion and incentive effects.American Economic Review, 92(5): 1644–1655, 2002

Charles A Holt and Susan K Laury. Risk aversion and incentive effects.American Economic Review, 92(5): 1644–1655, 2002

work page 2002

[19] [19]

Large language models as simulated economic agents: What can we learn from homo silicus? National Bureau of Economic Research Working Paper, (31122), 2023

John J Horton. Large language models as simulated economic agents: What can we learn from homo silicus? National Bureau of Economic Research Working Paper, (31122), 2023

work page 2023

[20] [20]

Can large language models simulate human behavior in economic experiments?Working Paper, 2024

John J Horton. Can large language models simulate human behavior in economic experiments?Working Paper, 2024

work page 2024

[21] [21]

Returns to buying winners and selling losers: Implications for stock market efficiency.Journal of Finance, 48(1):65–91, 1993

Narasimhan Jegadeesh and Sheridan Titman. Returns to buying winners and selling losers: Implications for stock market efficiency.Journal of Finance, 48(1):65–91, 1993

work page 1993

[22] [22]

Prospect theory: An analysis of decision under risk.Econometrica, 47(2): 263–291, 1979

Daniel Kahneman and Amos Tversky. Prospect theory: An analysis of decision under risk.Econometrica, 47(2): 263–291, 1979

work page 1979

[23] [23]

Experimental tests of the endowment effect and the coase theorem.Journal of Political Economy, 98(6):1325–1348, 1990

Daniel Kahneman, Jack L Knetsch, and Richard H Thaler. Experimental tests of the endowment effect and the coase theorem.Journal of Political Economy, 98(6):1325–1348, 1990

work page 1990

[24] [24]

Structural vs

Michael P Keane. Structural vs. atheoretic approaches to econometrics.Journal of Econometrics, 156(1):3–20, 2011. 11 Large Language Models as Calibrated Measurement Instruments for Behavioral ParametersA PREPRINT

work page 2011

[25] [25]

Contrarian investment, extrapolation, and risk.Journal of Finance, 49(5):1541–1578, 1994

Josef Lakonishok, Andrei Shleifer, and Robert W Vishny. Contrarian investment, extrapolation, and risk.Journal of Finance, 49(5):1541–1578, 1994

work page 1994

[26] [26]

Empirical regularities from interacting long-and short-memory investors in an agent-based stock market.IEEE Transactions on Evolutionary Computation, 5(5):442–455, 2001

Blake LeBaron. Empirical regularities from interacting long-and short-memory investors in an agent-based stock market.IEEE Transactions on Evolutionary Computation, 5(5):442–455, 2001

work page 2001

[27] [27]

Scaling and criticality in a stochastic multi-agent model of a financial market

Thomas Lux and Michele Marchesi. Scaling and criticality in a stochastic multi-agent model of a financial market. Nature, 397(6719):498–500, 1999

work page 1999

[28] [28]

Quantifying and mitigating memorization in large language models.arXiv preprint, 2024

Qing Mei et al. Quantifying and mitigating memorization in large language models.arXiv preprint, 2024

work page 2024

[29] [29]

The trouble with overconfidence.Psychological Review, 115(2):502–517, 2008

Don A Moore and Paul J Healy. The trouble with overconfidence.Psychological Review, 115(2):502–517, 2008

work page 2008

[30] [30]

Overcoming the inevitable anchoring effect: Considering the opposite compensates for selective accessibility.Personality and Social Psychology Bulletin, 26(9):1142–1150, 2000

Thomas Mussweiler, Fritz Strack, and Tim Pfeiffer. Overcoming the inevitable anchoring effect: Considering the opposite compensates for selective accessibility.Personality and Social Psychology Bulletin, 26(9):1142–1150, 2000

work page 2000

[31] [31]

Experts, amateurs, and real estate: An anchoring-and-adjustment perspective on property pricing decisions.Organizational Behavior and Human Decision Processes, 39(1):84–97, 1987

Gregory B Northcraft and Margaret A Neale. Experts, amateurs, and real estate: An anchoring-and-adjustment perspective on property pricing decisions.Organizational Behavior and Human Decision Processes, 39(1):84–97, 1987

work page 1987

[32] [32]

The boundaries of loss aversion.Journal of Marketing Research, 42(2): 119–128, 2005

Nathan Novemsky and Daniel Kahneman. The boundaries of loss aversion.Journal of Marketing Research, 42(2): 119–128, 2005

work page 2005

[33] [33]

Are investors reluctant to realize their losses?Journal of Finance, 53(5):1775–1798, 1998

Terrance Odean. Are investors reluctant to realize their losses?Journal of Finance, 53(5):1775–1798, 1998

work page 1998

[34] [34]

Generative agents: Interactive simulacra of human behavior

Joon Sung Park, Joseph C O’Brien, Carrie J Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. InUIST, 2023

work page 2023

[35] [35]

The disposition to sell winners too early and ride losers too long: Theory and evidence.Journal of Finance, 40(3):777–790, 1985

Hersh Shefrin and Meir Statman. The disposition to sell winners too early and ride losers too long: Theory and evidence.Journal of Finance, 40(3):777–790, 1985

work page 1985

[36] [36]

Judgment under uncertainty: Heuristics and biases.Science, 185(4157): 1124–1131, 1974

Amos Tversky and Daniel Kahneman. Judgment under uncertainty: Heuristics and biases.Science, 185(4157): 1124–1131, 1974

work page 1974

[37] [37]

Advances in prospect theory: Cumulative representation of uncertainty

Amos Tversky and Daniel Kahneman. Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5(4):297–323, 1992

work page 1992

[38] [38]

Asset M891 (+20%) just missed earnings 40%, lost major contract, deteriorating margins. Asset P234 (-10%) performing as expected. Which sell?

Martin Weber and Colin F Camerer. The disposition effect in securities trading: An experimental analysis.Journal of Economic Behavior & Organization, 33(2):167–184, 1998. 12 Large Language Models as Calibrated Measurement Instruments for Behavioral ParametersA PREPRINT Appendix A Human Benchmark Justification This appendix provides detailed justification ...

work page 1998

[39] [39]

Search each asset identifier on Google (exact phrase match)

work page

[40] [40]

Search on Bing, Yahoo Finance, Bloomberg Terminal

work page

[41] [41]

Search SEC EDGAR filings

work page

[42] [42]

Procedure documented and replicable

Search financial news archives (WSJ, FT, Bloomberg News) Zero exact matches confirm non-existence in accessible training data. Procedure documented and replicable. E.2 Power Analysis Details For each experiment, we compute power using simulation-based approach: Disposition Effect: • Null: DR = 1.0 (no bias) • Alternative: DR = 1.6 (human benchmark) • Samp...

work page 2000