From Demographics to Survey Anchors: Evaluating LLM Agents for Modeling Retirement Attitudes

Jonne Kamphorst; Marcin Detyniecki; Michael Bernstein; Pauline Baron; Rub\'en Garz\'on; Vincent Grari

arxiv: 2605.16303 · v1 · pith:EVA5C43Onew · submitted 2026-04-24 · 💻 cs.CY · cs.AI· cs.CL

From Demographics to Survey Anchors: Evaluating LLM Agents for Modeling Retirement Attitudes

Rub\'en Garz\'on , Pauline Baron , Vincent Grari , Jonne Kamphorst , Michael Bernstein , Marcin Detyniecki This is my paper

Pith reviewed 2026-05-21 01:05 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.CL

keywords LLM agentssurvey predictionretirement planningdemographicsSHARE surveyfinancial risk toleranceinteraction effectsattitude modeling

0 comments

The pith

Demographic-only LLM agents reproduce main effects on retirement savings but miss the interactions among risk tolerance, time perspective, and planning knowledge that survey-anchored agents capture.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares two ways of building LLM agents to predict answers on a large retirement survey. One type uses only basic demographics such as age, income, and education. The other adds responses to related survey questions as anchors. Demographic agents tend to give middle-of-the-road answers and rarely admit uncertainty, while anchored agents produce more human-like variation including don't-know replies. When both types are tested on a known regression model of what drives retirement savings, demographic agents recover the separate roles of risk tolerance, future outlook, and planning knowledge, yet only the anchored agents recover how those three factors modify one another. The result cautions against using thin demographic profiles alone when the goal is to simulate survey responses for policy or research.

Core claim

Agents defined solely by demographics reproduce the finding that financial risk tolerance, future time perspective, and knowledge of retirement planning each predict retirement savings, yet only agents supplied with additional in-domain survey responses succeed in reproducing the statistical interaction among these three constructs; demographic agents also display central-tendency bias and unrealistically high accuracy that omits typical human error and don't-know responses.

What carries the argument

The direct comparison of demographic-only versus survey-anchored LLM agents when both are asked to replicate a hierarchical regression analysis on five variables from three retirement-planning constructs in the SHARE survey.

If this is right

Demographic agents skew answers toward population averages instead of matching the full distribution of human replies.
Demographic agents produce fewer incorrect or don't-know responses than actual survey participants.
Only survey-anchored agents recover the interaction term among risk tolerance, future time perspective, and planning knowledge.
Relying solely on demographics for LLM survey simulation risks missing how predictors combine in retirement attitudes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Minimal additional anchor questions might be enough to restore interaction effects without requiring entire survey modules.
The same demographic-versus-anchored test could be applied to other domains where attitude interactions drive behavior, such as health or consumer finance.
If prompt engineering alone can close the gap, then survey anchoring may be less necessary than the current results suggest.

Load-bearing premise

Performance gaps arise specifically from the presence or absence of in-domain survey anchors rather than from differences in prompt wording, model choice, or how the five variables were selected.

What would settle it

Re-running the hierarchical regression on responses from new demographic-only agents that use identical prompt structure, model, and post-processing as the survey-anchored agents; if those new demographic agents now reproduce the three-factor interaction, the central claim is falsified.

read the original abstract

Large language models (LLM) agents may offer tools to predict human responses to surveys. A common technique for defining these agents uses only demographics, for example country, age, gender, employment status, income, education and marital status. We compare the predictive accuracy of demographic agents to that of survey agents defined with a larger set of in-domain survey responses. We test both approaches in predicting responses to the multidisciplinary, cross-national Survey of Health, Ageing and Retirement in Europe (SHARE), focusing on five variables from three policy-relevant constructs around personal finance. In these three constructs, we observe that, compared to survey agents trained on broader data, demographics-only agents (1) exhibited a central tendency bias, skewing answers toward population means, and (2) were unrealistically accurate, failing to reproduce the incorrect answers and "don't know" responses typical of human respondents. These performance differences are further substantiated through the replication of a hierarchical regression analysis from prior retirement planning research. Agents based solely on demographic information reproduce the outcome that financial risk tolerance, future time perspective, and knowledge of retirement planning each are predictive of retirement savings. However, only the survey-anchored agents succeed in reproducing the interaction among these three factors. These findings suggest caution in using only demographics to define LLM agents for predicting survey responses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Demographic-only LLM agents for retirement surveys show central tendency bias and miss key interactions that survey-anchored agents reproduce, but the comparison may have unstated differences in setup.

read the letter

The core finding is that agents built from demographics alone tend to cluster answers around population averages and fail to match the spread of real responses, including incorrect or don't-know answers. When replicating a prior hierarchical regression on retirement savings, they recover the three main effects from risk tolerance, time perspective, and planning knowledge but miss the interaction term. Survey-anchored agents recover both the main effects and the interaction. That specific replication result on SHARE data is the clearest new piece here.

Referee Report

2 major / 2 minor

Summary. The manuscript compares LLM agents for predicting responses in the SHARE survey on retirement attitudes. Demographic-only agents (using country, age, gender, employment, income, education, marital status) are contrasted with survey-anchored agents that incorporate additional in-domain responses. The authors report that demographic agents exhibit central tendency bias and unrealistically high accuracy (avoiding 'don't know' or incorrect answers), while survey-anchored agents better match human patterns. In replicating a published hierarchical regression, demographic agents recover main effects of financial risk tolerance, future time perspective, and retirement planning knowledge on savings but miss their interaction; only survey-anchored agents recover both main effects and the interaction.

Significance. If the results hold after addressing controls for agent construction, the work would usefully caution against demographic-only LLM agents for survey simulation in policy domains and highlight the value of in-domain anchors for capturing interactions. The replication of an established hierarchical regression from prior retirement research is a clear strength, grounding the LLM evaluation in substantive findings rather than isolated accuracy metrics.

major comments (2)

[§3 and §4.2] §3 (Agent Construction) and §4.2 (Regression Replication): The headline claim that 'only the survey-anchored agents succeed in reproducing the interaction' requires that the two agent classes differ solely in the added survey responses. The manuscript provides no explicit statement that prompt wording, few-shot examples, temperature, model version, and the exact five-variable set were pre-fixed and applied uniformly. Without this control, differences in prompt engineering or variable selection could produce the observed pattern independently of the anchors. This is load-bearing for the central conclusion.
[§4.2] §4.2 (Hierarchical Regression): The paper reports that demographic agents recover the three main effects but not the interaction. To interpret this as evidence for the anchors' causal role, the selection of the three constructs (risk tolerance, time perspective, planning knowledge) and their interaction must be shown to have been pre-specified rather than identified post-hoc. Post-hoc focus risks inflating the apparent advantage of survey-anchored agents on the interaction term.

minor comments (2)

[Abstract] Abstract: The phrasing 'survey agents trained on broader data' is ambiguous and should be replaced with a precise description of the survey-anchored condition.
[Methods] Methods: Report exact numbers of agents, queries per agent, response exclusion criteria, and any statistical tests or error bars on accuracy and regression coefficients. These details are needed to assess the reliability of the directional findings.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the robustness of our findings. We address each major comment in turn below.

read point-by-point responses

Referee: [§3 and §4.2] §3 (Agent Construction) and §4.2 (Regression Replication): The headline claim that 'only the survey-anchored agents succeed in reproducing the interaction' requires that the two agent classes differ solely in the added survey responses. The manuscript provides no explicit statement that prompt wording, few-shot examples, temperature, model version, and the exact five-variable set were pre-fixed and applied uniformly. Without this control, differences in prompt engineering or variable selection could produce the observed pattern independently of the anchors. This is load-bearing for the central conclusion.

Authors: We acknowledge the importance of this control for the validity of our comparison. Upon review, the manuscript describes the agent construction in §3 but does not include an explicit statement confirming uniformity of all other parameters. In the revised version, we will add a clear statement in §3 that prompt wording, few-shot examples, temperature, model version, and the variable set were pre-fixed and identical for both agent classes, with the sole difference being the inclusion of additional in-domain survey responses for the survey-anchored agents. This revision will be made to ensure the differences can be attributed to the anchors. revision: yes
Referee: [§4.2] §4.2 (Hierarchical Regression): The paper reports that demographic agents recover the three main effects but not the interaction. To interpret this as evidence for the anchors' causal role, the selection of the three constructs (risk tolerance, time perspective, planning knowledge) and their interaction must be shown to have been pre-specified rather than identified post-hoc. Post-hoc focus risks inflating the apparent advantage of survey-anchored agents on the interaction term.

Authors: The selection of the three constructs—financial risk tolerance, future time perspective, and retirement planning knowledge—and their interaction was not post-hoc but followed directly from replicating a specific hierarchical regression model established in prior retirement research, as cited in the manuscript. The analysis was designed to test whether LLM agents could reproduce both main effects and the interaction from this established finding. To make this explicit, we will revise §4.2 to state that the regression specification, including the constructs and interaction term, was pre-specified based on the referenced prior work. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on external SHARE data and prior published regressions

full rationale

The paper performs an empirical comparison of LLM agent outputs against actual responses in the external SHARE survey dataset. It replicates a hierarchical regression from prior published retirement planning research to evaluate whether demographic-only agents recover main effects but miss interactions that survey-anchored agents recover. No mathematical derivations, fitted parameters renamed as predictions, or self-citation chains are present that reduce the central claims to the paper's own inputs by construction. The evaluation is benchmarked against held-out human data and independent prior results, rendering the findings self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the domain assumption that LLMs can be prompted to simulate human survey responses and that the SHARE dataset provides representative in-domain anchors; no free parameters or invented entities are introduced.

axioms (1)

domain assumption LLM agents can be defined via prompting to simulate human survey responses when given demographic or survey context.
This underpins both agent types and is invoked throughout the comparison.

pith-pipeline@v0.9.0 · 5783 in / 1232 out tokens · 40954 ms · 2026-05-21T01:05:10.350756+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Agents based solely on demographic information reproduce the outcome that financial risk tolerance, future time perspective, and knowledge of retirement planning each are predictive of retirement savings. However, only the survey-anchored agents succeed in reproducing the interaction among these three factors.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We compare the predictive accuracy of demographic agents to that of survey agents defined with a larger set of in-domain survey responses.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

87 extracted references · 87 canonical work pages · 9 internal anchors

[1]

In: Proceedings of the 36th Annual Acm Symposium on User Interface Software and Technology, pp

Park, J.S., O’Brien, J., Cai, C.J., Morris, M.R., Liang, P., Bernstein, M.S.: Generative agents: Inter- active simulacra of human behavior. In: Proceedings of the 36th Annual Acm Symposium on User Interface Software and Technology, pp. 1–22 (2023)

work page 2023
[2]

LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals

Park, J.S., Zou, C.Q., Shaw, A., Hill, B.M., Cai, C., Morris, M.R., Willer, R., Liang, P., Bernstein, M.S.: Generative agent simulations of 1,000 people. arXiv preprint arXiv:2411.10109 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[3]

In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pp

Xu, S., Wen, H.-N., Pan, H., Dominguez, D., Hu, D., Zhang, X.: Classroom simulacra: Building contextual student generative agents in online education for learning behavioral simulation. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pp. 1–26 (2025)

work page 2025
[4]

Xie, C., Chen, C., Jia, F., Ye, Z., Lai, S., Shu, K., Gu, J., Bibi, A., Hu, Z., Jurgens, D.,et al.: Can large language model agents simulate human trust behavior? Advances in neural information processing systems37, 15674–15729 (2024)

work page 2024
[5]

In: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pp

Agnew, W., Bergman, A.S., Chien, J., D´ ıaz, M., El-Sayed, S., Pittman, J., Mohamed, S., McKee, K.R.: The illusion of artificial inclusion. In: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pp. 1–12 (2024)

work page 2024
[6]

In: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

H¨ am¨ al¨ ainen, P., Tavast, M., Kunnari, A.: Evaluating large language models in generating synthetic hci research data: a case study. In: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. CHI ’23. Association for Computing Machinery, New York, NY, USA (2023). https://doi.org/10.1145/3544548.3580688 . https://doi.org/10.1145/...

work page doi:10.1145/3544548.3580688 2023
[7]

In: Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems

Hwang, A.H.-C., Bernstein, M.S., Sundar, S.S., Zhang, R., Horta Ribeiro, M., Lu, Y., Chang, S., Wu, T., Yang, A., Williams, D., Park, J.S., Ognyanova, K., Xiao, Z., Shaw, A., Shamma, D.A.: Human subjects research in the age of generative ai: Opportunities and challenges of applying llm-simulated data to hci studies. In: Proceedings of the Extended Abstrac...

work page doi:10.1145/3706599.3716299 2025
[8]

Kapania, S.: Simulacrum of Stories: Examining Large Language Models as Qualitative Research Participants (2025)

work page 2025
[9]

Political Analysis31(3), 337–351 (2023)

Argyle, L.P., Busby, E.C., Fulda, N., Gubler, J.R., Rytting, C., Wingate, D.: Out of one, many: Using language models to simulate human samples. Political Analysis31(3), 337–351 (2023)

work page 2023
[10]

https://arxiv.org/abs/2301.07543

Horton, J.J.: Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? (2023). https://arxiv.org/abs/2301.07543

work page arXiv 2023
[11]

URL: https://docsend

Ashokkumar, A., Hewitt, L., Ghezae, I., Willer, R.: Predicting results of social science experiments using large language models. URL: https://docsend. com/view/ity6yf2dansesucf [accessed 2025-02- 12] (2024)

work page 2025
[12]

Nature Machine Intelligence, 1–12 (2025)

Wang, A., Morgenstern, J., Dickerson, J.P.: Large language models that replace human participants can harmfully misportray and flatten identity groups. Nature Machine Intelligence, 1–12 (2025)

work page 2025
[13]

Release version: 9.0.0

SHARE-ERIC: Survey of Health, Ageing and Retirement in Europe (SHARE) Wave 9. Release version: 9.0.0. SHARE-ERIC. Data set (2024). https://doi.org/10.6103/SHARE.w9.900 . https:// doi.org/10.6103/SHARE.w9.900 45

work page doi:10.6103/share.w9.900 2024
[14]

Munich: SHARE-ERIC (2024)

Bergmann, M., Wagner, M., B¨ orsch-Supan, A.: Share wave 9 methodology: From the share corona survey 2 to the share main wave 9 interview. Munich: SHARE-ERIC (2024)

work page 2024
[15]

International journal of epidemiology42(4), 992–1001 (2013)

B¨ orsch-Supan, A., Brandt, M., Hunkler, C., Kneip, T., Korbmacher, J., Malter, F., Schaan, B., Stuck, S., Zuber, S.: Data resource profile: the survey of health, ageing and retirement in europe (share). International journal of epidemiology42(4), 992–1001 (2013)

work page 2013
[16]

Financial services review14(4), 331 (2005)

Jacobs-Lawson, J.M., Hershey, D.A.: Influence of future time perspective, financial knowledge, and financial risk tolerance on retirement saving behaviors. Financial services review14(4), 331 (2005)

work page 2005
[17]

Publications Office of the European Union, ??? (2023)

Commission, E., Communication, D.-G., Affairs, I.E.P.: Monitoring the Level of Financial Literacy in the EU – Report. Publications Office of the European Union, ??? (2023). https://doi.org/10.2874/ 956514

work page 2023
[18]

Chicago, IL: GSS Project Report (31) (2016)

Marsden, P.V., Smith, T.W.: Overview: The general social survey project. Chicago, IL: GSS Project Report (31) (2016)

work page 2016
[19]

Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions

Suh, J., Jahanparast, E., Moon, S., Kang, M., Chang, S.: Language model fine-tuning on scaled survey data for predicting distributions of public opinions. arXiv preprint arXiv:2502.16761 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[20]

Technical report, National Bureau of Economic Research (2024)

Manning, B.S., Zhu, K., Horton, J.J.: Automated social science: Language models as scientist and subjects. Technical report, National Bureau of Economic Research (2024)

work page 2024
[21]

Nature, 1–8 (2025)

Binz, M., Akata, E., Bethge, M., Br¨ andle, F., Callaway, F., Coda-Forno, J., Dayan, P., Demircan, C., Eckstein, M.K., ´Eltet˝ o, N., et al.: A foundation model to predict and capture human cognition. Nature, 1–8 (2025)

work page 2025
[22]

Dillion, D., Tandon, N., Gu, Y., Gray, K.: Can ai language models replace human participants? Trends in Cognitive Sciences27(7), 597–600 (2023)

work page 2023
[23]

Ai & Society39(5), 2603–2605 (2024)

Harding, J., D’Alessandro, W., Laskowski, N., Long, R.: Ai language models cannot replace human research participants. Ai & Society39(5), 2603–2605 (2024)

work page 2024
[24]

Science380(6650), 1108–1109 (2023)

Grossmann, I., Feinberg, M., Parker, D.C., Christakis, N.A., Tetlock, P.E., Cunningham, W.A.: Ai and the transformation of social science research. Science380(6650), 1108–1109 (2023)

work page 2023
[25]

arXiv preprint arXiv:2406.01171 (2024)

Tseng, Y.-M., Huang, Y.-C., Hsiao, T.-Y., Chen, W.-L., Huang, C.-W., Meng, Y., Chen, Y.-N.: Two tales of persona in llms: A survey of role-playing and personalization. arXiv preprint arXiv:2406.01171 (2024)

work page arXiv 2024
[26]

AI-Augmented Surveys: Leveraging Large Language Models and Surveys for Opinion Prediction

Kim, J., Lee, B.: Ai-augmented surveys: Leveraging large language models and surveys for opinion prediction. arXiv preprint arXiv:2305.09620 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[27]

In: International Conference on Machine Learning, pp

Aher, G.V., Arriaga, R.I., Kalai, A.T.: Using large language models to simulate multiple humans and replicate human subject studies. In: International Conference on Machine Learning, pp. 337–371 (2023). PMLR

work page 2023
[28]

SSRN Electronic Journal (2023) https://doi.org/10.2139/ssrn.4650172

Gui, G., Toubia, O.: The Challenge of Using LLMs to Simulate Human Behavior: A Causal Inference Perspective. SSRN Electronic Journal (2023) https://doi.org/10.2139/ssrn.4650172 . Accessed 2026- 04-21

work page doi:10.2139/ssrn.4650172 2023
[29]

Proceedings of the National Academy of Sciences120(51), 2316205120 (2023) 46

Chen, Y., Liu, T.X., Shan, Y., Zhong, S.: The emergence of economic rationality of gpt. Proceedings of the National Academy of Sciences120(51), 2316205120 (2023) 46

work page 2023
[30]

Proceedings of the National Academy of Sciences120(6), 2218523120 (2023)

Binz, M., Schulz, E.: Using cognitive psychology to understand gpt-3. Proceedings of the National Academy of Sciences120(6), 2218523120 (2023)

work page 2023
[31]

Wang, J., Zhao, Z., Ni, T., Wei, Z.: SocioBench: Modeling Human Behavior in Sociological Surveys with Large Language Models. arXiv. arXiv:2510.11131 [cs] (2025). https://doi.org/10.48550/arXiv. 2510.11131 . http://arxiv.org/abs/2510.11131 Accessed 2025-12-29

work page internal anchor Pith review doi:10.48550/arxiv 2025
[32]

Zhao, J., Yuan, C., Luo, W., Xie, H., Zhang, G., Quan, S.J., Yuan, Z., Wang, P., Zhang, D.: Large Language Models as Virtual Survey Respondents: Evaluating Sociodemographic Response Generation. arXiv. arXiv:2509.06337 [cs] (2025). https://doi.org/10.48550/arXiv.2509.06337 . http: //arxiv.org/abs/2509.06337 Accessed 2026-01-27

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2509.06337 2025
[33]

Political Analysis32(4), 401–416 (2024)

Bisbee, J., Clinton, J.D., Dorff, C., Kenkel, B., Larson, J.M.: Synthetic replacements for human survey data? the perils of large language models. Political Analysis32(4), 401–416 (2024)

work page 2024
[34]

Behavior Research Methods56(6), 5754–5770 (2024)

Park, P.S., Schoenegger, P., Zhu, C.: Diminished diversity-of-thought in a standard large language model. Behavior Research Methods56(6), 5754–5770 (2024)

work page 2024
[35]

arXiv preprint arXiv:2303.16779 (2023)

Chu, E., Andreas, J., Ansolabehere, S., Roy, D.: Language models trained on media diets can predict public opinion. arXiv preprint arXiv:2303.16779 (2023)

work page arXiv 2023
[36]

29971–30004 (2023)

Santurkar, S., Durmus, E., Ladhak, F., Lee, C., Liang, P., Hashimoto, T.: Whose opinions do language models reflect? In: International Conference on Machine Learning, pp. 29971–30004 (2023). PMLR

work page 2023
[37]

Anthis, J.R., Liu, R., Richardson, S.M., Kozlowski, A.C., Koch, B., Evans, J., Brynjolfsson, E., Bernstein, M.: LLM Social Simulations Are a Promising Research Method. arXiv. arXiv:2504.02234 [cs] (2025). https://doi.org/10.48550/arXiv.2504.02234 . http://arxiv.org/abs/2504.02234 Accessed 2026-01-27

work page doi:10.48550/arxiv.2504.02234 2025
[38]

Marketing Science (2025)

Toubia, O., Gui, G.Z., Peng, T., Merlau, D.J., Li, A., Chen, H.: Twin-2k-500: A data set for building digital twins of over 2,000 people based on their answers to over 500 questions. Marketing Science (2025)

work page 2025
[39]

In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems

Gunaratne, J., Nov, O.: Informing and improving retirement saving performance using behavioral economics theory-driven user interfaces. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. CHI ’15, pp. 917–920. Association for Computing Machinery, New York, NY, USA (2015). https://doi.org/10.1145/2702123.2702408 . https...

work page doi:10.1145/2702123.2702408 2015
[40]

In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems

Lewis, M., Perry, M.: Follow the money: Managing personal finance digitally. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. CHI ’19, pp. 1–14. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3290605.3300620 . https://doi.org/10.1145/3290605.3300620

work page doi:10.1145/3290605.3300620 2019
[41]

In: Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems

Freitas, C., Santos, A., Campos, P.F., Bala, P., Dionisio, M.: Exploring the impact of transmedia sto- rytelling on financial literacy: A pilot evaluation with young adults. In: Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems. CHI EA ’25. Association for Computing Machinery, New York, NY, USA (2025). https...

work page doi:10.1145/3706599.3719934 2025
[42]

In: Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems

Dove, G., Seals, A., Nov, O.: Socially-informed sorting for guiding personal finance choices. In: Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems. CHI EA ’20, pp. 1–9. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/ 47 10.1145/3334480.3382898 . https://doi.org/10.1145/3334480.3382898

work page doi:10.1145/3334480.3382898 2020
[43]

In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems

Dai, J., McGrenere, J.: Envisioning financial technology support for older adults through cognitive and life transitions. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. CHI ’25. Association for Computing Machinery, New York, NY, USA (2025). https://doi. org/10.1145/3706598.3713427 . https://doi.org/10.1145/3706598.3713427

work page doi:10.1145/3706598.3713427 2025
[44]

In: Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems, pp

Maqbool, S., Munteanu, C.: Understanding older adults’ long-term financial practices: Challenges and opportunities for design. In: Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems, pp. 1–6 (2018)

work page 2018
[45]

In: Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems

Park, M., Lim, Y.-k.: Exploring the potential of generative ai for supporting middle-aged individuals in retirement transitions. In: Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems. CHI EA ’25. Association for Computing Machinery, New York, NY, USA (2025). https://doi.org/10.1145/3706599.3720145 . https://...

work page doi:10.1145/3706599.3720145 2025
[46]

Monthly Labor Review116(3), 25–35 (1993)

Wiatrowski, W.J.: Factors affecting retirement income. Monthly Labor Review116(3), 25–35 (1993)

work page 1993
[47]

National Tax Journal51(2), 263–289 (1998)

Bassett, W.F., Fleming, M.J., Rodrigues, A.P.: How workers use 401 (k) plans: The participation, contribution, and withdrawal decisions. National Tax Journal51(2), 263–289 (1998)

work page 1998
[48]

Retirement: Reasons, processes, and results, 188 (2003)

Sterns, H.L., Kaplan, J.: Self-management of career and. Retirement: Reasons, processes, and results, 188 (2003)

work page 2003
[49]

Journal of gerontological social work5(3), 49–60 (1983)

Behling, J.H., Kilty, K.M., Foster, S.A.: Scarce resources for retirement planning: A dilemma for professional women. Journal of gerontological social work5(3), 49–60 (1983)

work page 1983
[50]

Social Indicators Research136(1), 247–268 (2018)

Rey-Ares, L., Fern´ andez-L´ opez, S., Vivel-B´ ua, M.: The influence of social models on retirement savings: Evidence for european countries. Social Indicators Research136(1), 247–268 (2018)

work page 2018
[51]

Journal of Population Ageing4(1), 97–117 (2011)

Gough, O., Niza, C.: Retirement saving choices: review of the literature and policy implications. Journal of Population Ageing4(1), 97–117 (2011)

work page 2011
[52]

CSA: Certified Senior Advisor22, 31–39 (2004)

Hershey, D.: Psychological influences on the retirement investor. CSA: Certified Senior Advisor22, 31–39 (2004)

work page 2004
[53]

Journal of political Economy112(S1), 164–187 (2004)

Thaler, R.H., Benartzi, S.: Save more tomorrow™: Using behavioral economics to increase employee saving. Journal of political Economy112(S1), 164–187 (2004)

work page 2004
[54]

Journal of Economic perspectives21(3), 81–104 (2007)

Benartzi, S., Thaler, R.H.: Heuristics and biases in retirement savings behavior. Journal of Economic perspectives21(3), 81–104 (2007)

work page 2007
[55]

Management science45(3), 364–381 (1999)

Benartzi, S., Thaler, R.H.: Risk aversion or myopia? choices in repeated gambles and retirement investments. Management science45(3), 364–381 (1999)

work page 1999
[56]

Journal of occupational health psychology1(2), 131 (1996)

Moen, P.: A life course perspective on retirement, gender, and well-being. Journal of occupational health psychology1(2), 131 (1996)

work page 1996
[57]

The International Journal of Aging and Human Development64(1), 13–32 (2007)

Stawski, R.S., Hershey, D.A., Jacobs-Lawson, J.M.: Goal clarity and financial planning activities as determinants of retirement savings contributions. The International Journal of Aging and Human Development64(1), 13–32 (2007)

work page 2007
[58]

Psychology and aging24(1), 245 (2009)

Petkoska, J., Earl, J.K.: Understanding the influence of demographic and psychological variables on 48 retirement planning. Psychology and aging24(1), 245 (2009)

work page 2009
[59]

The Gerontologist40(6), 687–697 (2000)

Hershey, D.A., Mowen, J.C.: Psychological determinants of financial preparedness for retirement. The Gerontologist40(6), 687–697 (2000)

work page 2000
[60]

Family and Consumer Sciences Research Journal41(1), 36–55 (2012)

Yang, T.-Y., DeVaney, S.A.: Determinants of retirement assets and the amount in stock in retirement assets. Family and Consumer Sciences Research Journal41(1), 36–55 (2012)

work page 2012
[61]

European Journal of Applied Business and Management1(1) (2015)

Rey-Ares, L., Fern´ andez-L´ opez, S., Vivel-B´ ua, M.: The determinants of privately saving for retire- ment: the cases of portugal and spain. European Journal of Applied Business and Management1(1) (2015)

work page 2015
[62]

Fiscal studies28(2), 143–170 (2007)

Banks, J., Oldfield, Z.: Understanding pensions: Cognitive function, numerical ability and retirement saving. Fiscal studies28(2), 143–170 (2007)

work page 2007
[63]

Technical report, National Bureau of Economic Research (2011)

Alessie, R.J., Van Rooij, M., Lusardi, A.: Financial literacy, retirement preparation and pension expectations in the netherlands. Technical report, National Bureau of Economic Research (2011)

work page 2011
[64]

Journal of aging & social policy26(4), 308–323 (2014)

Chou, K.-L., Yu, K.-M., Chan, W.-S., Chan, A.C., Lum, T.Y., Zhu, A.Y.: Social and psychological barriers to private retirement savings in hong kong. Journal of aging & social policy26(4), 308–323 (2014)

work page 2014
[65]

Qwen3 Technical Report

Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Gao, C., Huang, C., Lv, C., et al.: Qwen3 technical report. arXiv preprint arXiv:2505.09388 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[66]

Software

Ollama Team: Ollama: Get up and Running with Llama 2, Code Llama, and Other Large Language Models Locally. Software. https://github.com/ollama/ollama

work page
[67]

In: Proceedings of the 29th Symposium on Operating Systems Principles, pp

Kwon, W., Li, Z., Zhuang, S., Sheng, Y., Zheng, L., Yu, C.H., Gonzalez, J., Zhang, H., Stoica, I.: Efficient memory management for large language model serving with pagedattention. In: Proceedings of the 29th Symposium on Operating Systems Principles, pp. 611–626 (2023)

work page 2023
[68]

Chapman and Hall/CRC, ??? (1994)

Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman and Hall/CRC, ??? (1994)

work page 1994
[69]

In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp

Koehn, P.: Statistical significance tests for machine translation evaluation. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 388–395 (2004)

work page 2004
[70]

In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp

Berg-Kirkpatrick, T., Burkett, D., Klein, D.: An empirical investigation of statistical significance in nlp. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 995–1005 (2012)

work page 2012
[71]

In: Adjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization, pp

Kaiser, C., Kaiser, J., Manewitsch, V., Rau, L., Schallner, R.: Simulating human opinions with large language models: Opportunities and challenges for personalized survey data modeling. In: Adjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization, pp. 82–86 (2025)

work page 2025
[72]

Murthy, S.K., Ullman, T., Hu, J.: One fish, two fish, but not the whole sea: Alignment reduces language models’ conceptual diversity. In: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pp. 11241–11258 (2025)

work page 2025
[73]

arXiv preprint arXiv:2212.1052910(2022) 49

Li, X., Li, Y., Liu, L., Bing, L., Joty, S.: Is gpt-3 a psychopath? evaluating large language models from a psychological perspective. arXiv preprint arXiv:2212.1052910(2022) 49

work page arXiv 2022
[74]

Sociological Methods & Research54(3), 1110–1155 (2025)

Lyman, A., Hepner, B., Argyle, L.P., Busby, E.C., Gubler, J.R., Wingate, D.: Balancing large lan- guage model alignment and algorithmic fidelity in social science research. Sociological Methods & Research54(3), 1110–1155 (2025)

work page 2025
[75]

Gemma 3 Technical Report

Team, G., Kamath, A., Ferret, J., Pathak, S., Vieillard, N., Merhej, R., Perrin, S., Matejovicova, T., Ram´ e, A., Rivi` ere, M., et al.: Gemma 3 technical report. arXiv preprint arXiv:2503.19786 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[76]

The Llama 3 Herd of Models

Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Vaughan, A., et al.: The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[77]

arXiv preprint arXiv:2409.13338 (2024)

Herel, D., Bartek, V., Jirak, J., Mikolov, T.: Time awareness in large language models: Benchmarking fact recall across time. arXiv preprint arXiv:2409.13338 (2024)

work page arXiv 2024
[78]

In: Proceedings of the 17th ACM International Conference on Web Search and Data Mining, pp

Wallat, J., Jatowt, A., Anand, A.: Temporal blind spots in large language models. In: Proceedings of the 17th ACM International Conference on Web Search and Data Mining, pp. 683–692 (2024)

work page 2024
[79]

arXiv preprint arXiv:2310.00835 (2023)

Wang, Y., Zhao, Y.: Tram: Benchmarking temporal reasoning for large language models. arXiv preprint arXiv:2310.00835 (2023)

work page arXiv 2023
[80]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., et al.: Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

Showing first 80 references.

[1] [1]

In: Proceedings of the 36th Annual Acm Symposium on User Interface Software and Technology, pp

Park, J.S., O’Brien, J., Cai, C.J., Morris, M.R., Liang, P., Bernstein, M.S.: Generative agents: Inter- active simulacra of human behavior. In: Proceedings of the 36th Annual Acm Symposium on User Interface Software and Technology, pp. 1–22 (2023)

work page 2023

[2] [2]

LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals

Park, J.S., Zou, C.Q., Shaw, A., Hill, B.M., Cai, C., Morris, M.R., Willer, R., Liang, P., Bernstein, M.S.: Generative agent simulations of 1,000 people. arXiv preprint arXiv:2411.10109 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[3] [3]

In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pp

Xu, S., Wen, H.-N., Pan, H., Dominguez, D., Hu, D., Zhang, X.: Classroom simulacra: Building contextual student generative agents in online education for learning behavioral simulation. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pp. 1–26 (2025)

work page 2025

[4] [4]

Xie, C., Chen, C., Jia, F., Ye, Z., Lai, S., Shu, K., Gu, J., Bibi, A., Hu, Z., Jurgens, D.,et al.: Can large language model agents simulate human trust behavior? Advances in neural information processing systems37, 15674–15729 (2024)

work page 2024

[5] [5]

In: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pp

Agnew, W., Bergman, A.S., Chien, J., D´ ıaz, M., El-Sayed, S., Pittman, J., Mohamed, S., McKee, K.R.: The illusion of artificial inclusion. In: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pp. 1–12 (2024)

work page 2024

[6] [6]

In: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

H¨ am¨ al¨ ainen, P., Tavast, M., Kunnari, A.: Evaluating large language models in generating synthetic hci research data: a case study. In: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. CHI ’23. Association for Computing Machinery, New York, NY, USA (2023). https://doi.org/10.1145/3544548.3580688 . https://doi.org/10.1145/...

work page doi:10.1145/3544548.3580688 2023

[7] [7]

In: Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems

Hwang, A.H.-C., Bernstein, M.S., Sundar, S.S., Zhang, R., Horta Ribeiro, M., Lu, Y., Chang, S., Wu, T., Yang, A., Williams, D., Park, J.S., Ognyanova, K., Xiao, Z., Shaw, A., Shamma, D.A.: Human subjects research in the age of generative ai: Opportunities and challenges of applying llm-simulated data to hci studies. In: Proceedings of the Extended Abstrac...

work page doi:10.1145/3706599.3716299 2025

[8] [8]

Kapania, S.: Simulacrum of Stories: Examining Large Language Models as Qualitative Research Participants (2025)

work page 2025

[9] [9]

Political Analysis31(3), 337–351 (2023)

Argyle, L.P., Busby, E.C., Fulda, N., Gubler, J.R., Rytting, C., Wingate, D.: Out of one, many: Using language models to simulate human samples. Political Analysis31(3), 337–351 (2023)

work page 2023

[10] [10]

https://arxiv.org/abs/2301.07543

Horton, J.J.: Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? (2023). https://arxiv.org/abs/2301.07543

work page arXiv 2023

[11] [11]

URL: https://docsend

Ashokkumar, A., Hewitt, L., Ghezae, I., Willer, R.: Predicting results of social science experiments using large language models. URL: https://docsend. com/view/ity6yf2dansesucf [accessed 2025-02- 12] (2024)

work page 2025

[12] [12]

Nature Machine Intelligence, 1–12 (2025)

Wang, A., Morgenstern, J., Dickerson, J.P.: Large language models that replace human participants can harmfully misportray and flatten identity groups. Nature Machine Intelligence, 1–12 (2025)

work page 2025

[13] [13]

Release version: 9.0.0

SHARE-ERIC: Survey of Health, Ageing and Retirement in Europe (SHARE) Wave 9. Release version: 9.0.0. SHARE-ERIC. Data set (2024). https://doi.org/10.6103/SHARE.w9.900 . https:// doi.org/10.6103/SHARE.w9.900 45

work page doi:10.6103/share.w9.900 2024

[14] [14]

Munich: SHARE-ERIC (2024)

Bergmann, M., Wagner, M., B¨ orsch-Supan, A.: Share wave 9 methodology: From the share corona survey 2 to the share main wave 9 interview. Munich: SHARE-ERIC (2024)

work page 2024

[15] [15]

International journal of epidemiology42(4), 992–1001 (2013)

B¨ orsch-Supan, A., Brandt, M., Hunkler, C., Kneip, T., Korbmacher, J., Malter, F., Schaan, B., Stuck, S., Zuber, S.: Data resource profile: the survey of health, ageing and retirement in europe (share). International journal of epidemiology42(4), 992–1001 (2013)

work page 2013

[16] [16]

Financial services review14(4), 331 (2005)

Jacobs-Lawson, J.M., Hershey, D.A.: Influence of future time perspective, financial knowledge, and financial risk tolerance on retirement saving behaviors. Financial services review14(4), 331 (2005)

work page 2005

[17] [17]

Publications Office of the European Union, ??? (2023)

Commission, E., Communication, D.-G., Affairs, I.E.P.: Monitoring the Level of Financial Literacy in the EU – Report. Publications Office of the European Union, ??? (2023). https://doi.org/10.2874/ 956514

work page 2023

[18] [18]

Chicago, IL: GSS Project Report (31) (2016)

Marsden, P.V., Smith, T.W.: Overview: The general social survey project. Chicago, IL: GSS Project Report (31) (2016)

work page 2016

[19] [19]

Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions

Suh, J., Jahanparast, E., Moon, S., Kang, M., Chang, S.: Language model fine-tuning on scaled survey data for predicting distributions of public opinions. arXiv preprint arXiv:2502.16761 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[20] [20]

Technical report, National Bureau of Economic Research (2024)

Manning, B.S., Zhu, K., Horton, J.J.: Automated social science: Language models as scientist and subjects. Technical report, National Bureau of Economic Research (2024)

work page 2024

[21] [21]

Nature, 1–8 (2025)

Binz, M., Akata, E., Bethge, M., Br¨ andle, F., Callaway, F., Coda-Forno, J., Dayan, P., Demircan, C., Eckstein, M.K., ´Eltet˝ o, N., et al.: A foundation model to predict and capture human cognition. Nature, 1–8 (2025)

work page 2025

[22] [22]

Dillion, D., Tandon, N., Gu, Y., Gray, K.: Can ai language models replace human participants? Trends in Cognitive Sciences27(7), 597–600 (2023)

work page 2023

[23] [23]

Ai & Society39(5), 2603–2605 (2024)

Harding, J., D’Alessandro, W., Laskowski, N., Long, R.: Ai language models cannot replace human research participants. Ai & Society39(5), 2603–2605 (2024)

work page 2024

[24] [24]

Science380(6650), 1108–1109 (2023)

Grossmann, I., Feinberg, M., Parker, D.C., Christakis, N.A., Tetlock, P.E., Cunningham, W.A.: Ai and the transformation of social science research. Science380(6650), 1108–1109 (2023)

work page 2023

[25] [25]

arXiv preprint arXiv:2406.01171 (2024)

Tseng, Y.-M., Huang, Y.-C., Hsiao, T.-Y., Chen, W.-L., Huang, C.-W., Meng, Y., Chen, Y.-N.: Two tales of persona in llms: A survey of role-playing and personalization. arXiv preprint arXiv:2406.01171 (2024)

work page arXiv 2024

[26] [26]

AI-Augmented Surveys: Leveraging Large Language Models and Surveys for Opinion Prediction

Kim, J., Lee, B.: Ai-augmented surveys: Leveraging large language models and surveys for opinion prediction. arXiv preprint arXiv:2305.09620 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[27] [27]

In: International Conference on Machine Learning, pp

Aher, G.V., Arriaga, R.I., Kalai, A.T.: Using large language models to simulate multiple humans and replicate human subject studies. In: International Conference on Machine Learning, pp. 337–371 (2023). PMLR

work page 2023

[28] [28]

SSRN Electronic Journal (2023) https://doi.org/10.2139/ssrn.4650172

Gui, G., Toubia, O.: The Challenge of Using LLMs to Simulate Human Behavior: A Causal Inference Perspective. SSRN Electronic Journal (2023) https://doi.org/10.2139/ssrn.4650172 . Accessed 2026- 04-21

work page doi:10.2139/ssrn.4650172 2023

[29] [29]

Proceedings of the National Academy of Sciences120(51), 2316205120 (2023) 46

Chen, Y., Liu, T.X., Shan, Y., Zhong, S.: The emergence of economic rationality of gpt. Proceedings of the National Academy of Sciences120(51), 2316205120 (2023) 46

work page 2023

[30] [30]

Proceedings of the National Academy of Sciences120(6), 2218523120 (2023)

Binz, M., Schulz, E.: Using cognitive psychology to understand gpt-3. Proceedings of the National Academy of Sciences120(6), 2218523120 (2023)

work page 2023

[31] [31]

Wang, J., Zhao, Z., Ni, T., Wei, Z.: SocioBench: Modeling Human Behavior in Sociological Surveys with Large Language Models. arXiv. arXiv:2510.11131 [cs] (2025). https://doi.org/10.48550/arXiv. 2510.11131 . http://arxiv.org/abs/2510.11131 Accessed 2025-12-29

work page internal anchor Pith review doi:10.48550/arxiv 2025

[32] [32]

Zhao, J., Yuan, C., Luo, W., Xie, H., Zhang, G., Quan, S.J., Yuan, Z., Wang, P., Zhang, D.: Large Language Models as Virtual Survey Respondents: Evaluating Sociodemographic Response Generation. arXiv. arXiv:2509.06337 [cs] (2025). https://doi.org/10.48550/arXiv.2509.06337 . http: //arxiv.org/abs/2509.06337 Accessed 2026-01-27

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2509.06337 2025

[33] [33]

Political Analysis32(4), 401–416 (2024)

Bisbee, J., Clinton, J.D., Dorff, C., Kenkel, B., Larson, J.M.: Synthetic replacements for human survey data? the perils of large language models. Political Analysis32(4), 401–416 (2024)

work page 2024

[34] [34]

Behavior Research Methods56(6), 5754–5770 (2024)

Park, P.S., Schoenegger, P., Zhu, C.: Diminished diversity-of-thought in a standard large language model. Behavior Research Methods56(6), 5754–5770 (2024)

work page 2024

[35] [35]

arXiv preprint arXiv:2303.16779 (2023)

Chu, E., Andreas, J., Ansolabehere, S., Roy, D.: Language models trained on media diets can predict public opinion. arXiv preprint arXiv:2303.16779 (2023)

work page arXiv 2023

[36] [36]

29971–30004 (2023)

Santurkar, S., Durmus, E., Ladhak, F., Lee, C., Liang, P., Hashimoto, T.: Whose opinions do language models reflect? In: International Conference on Machine Learning, pp. 29971–30004 (2023). PMLR

work page 2023

[37] [37]

Anthis, J.R., Liu, R., Richardson, S.M., Kozlowski, A.C., Koch, B., Evans, J., Brynjolfsson, E., Bernstein, M.: LLM Social Simulations Are a Promising Research Method. arXiv. arXiv:2504.02234 [cs] (2025). https://doi.org/10.48550/arXiv.2504.02234 . http://arxiv.org/abs/2504.02234 Accessed 2026-01-27

work page doi:10.48550/arxiv.2504.02234 2025

[38] [38]

Marketing Science (2025)

Toubia, O., Gui, G.Z., Peng, T., Merlau, D.J., Li, A., Chen, H.: Twin-2k-500: A data set for building digital twins of over 2,000 people based on their answers to over 500 questions. Marketing Science (2025)

work page 2025

[39] [39]

In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems

Gunaratne, J., Nov, O.: Informing and improving retirement saving performance using behavioral economics theory-driven user interfaces. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. CHI ’15, pp. 917–920. Association for Computing Machinery, New York, NY, USA (2015). https://doi.org/10.1145/2702123.2702408 . https...

work page doi:10.1145/2702123.2702408 2015

[40] [40]

In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems

Lewis, M., Perry, M.: Follow the money: Managing personal finance digitally. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. CHI ’19, pp. 1–14. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3290605.3300620 . https://doi.org/10.1145/3290605.3300620

work page doi:10.1145/3290605.3300620 2019

[41] [41]

In: Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems

Freitas, C., Santos, A., Campos, P.F., Bala, P., Dionisio, M.: Exploring the impact of transmedia sto- rytelling on financial literacy: A pilot evaluation with young adults. In: Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems. CHI EA ’25. Association for Computing Machinery, New York, NY, USA (2025). https...

work page doi:10.1145/3706599.3719934 2025

[42] [42]

In: Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems

Dove, G., Seals, A., Nov, O.: Socially-informed sorting for guiding personal finance choices. In: Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems. CHI EA ’20, pp. 1–9. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/ 47 10.1145/3334480.3382898 . https://doi.org/10.1145/3334480.3382898

work page doi:10.1145/3334480.3382898 2020

[43] [43]

In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems

Dai, J., McGrenere, J.: Envisioning financial technology support for older adults through cognitive and life transitions. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. CHI ’25. Association for Computing Machinery, New York, NY, USA (2025). https://doi. org/10.1145/3706598.3713427 . https://doi.org/10.1145/3706598.3713427

work page doi:10.1145/3706598.3713427 2025

[44] [44]

In: Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems, pp

Maqbool, S., Munteanu, C.: Understanding older adults’ long-term financial practices: Challenges and opportunities for design. In: Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems, pp. 1–6 (2018)

work page 2018

[45] [45]

In: Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems

Park, M., Lim, Y.-k.: Exploring the potential of generative ai for supporting middle-aged individuals in retirement transitions. In: Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems. CHI EA ’25. Association for Computing Machinery, New York, NY, USA (2025). https://doi.org/10.1145/3706599.3720145 . https://...

work page doi:10.1145/3706599.3720145 2025

[46] [46]

Monthly Labor Review116(3), 25–35 (1993)

Wiatrowski, W.J.: Factors affecting retirement income. Monthly Labor Review116(3), 25–35 (1993)

work page 1993

[47] [47]

National Tax Journal51(2), 263–289 (1998)

Bassett, W.F., Fleming, M.J., Rodrigues, A.P.: How workers use 401 (k) plans: The participation, contribution, and withdrawal decisions. National Tax Journal51(2), 263–289 (1998)

work page 1998

[48] [48]

Retirement: Reasons, processes, and results, 188 (2003)

Sterns, H.L., Kaplan, J.: Self-management of career and. Retirement: Reasons, processes, and results, 188 (2003)

work page 2003

[49] [49]

Journal of gerontological social work5(3), 49–60 (1983)

Behling, J.H., Kilty, K.M., Foster, S.A.: Scarce resources for retirement planning: A dilemma for professional women. Journal of gerontological social work5(3), 49–60 (1983)

work page 1983

[50] [50]

Social Indicators Research136(1), 247–268 (2018)

Rey-Ares, L., Fern´ andez-L´ opez, S., Vivel-B´ ua, M.: The influence of social models on retirement savings: Evidence for european countries. Social Indicators Research136(1), 247–268 (2018)

work page 2018

[51] [51]

Journal of Population Ageing4(1), 97–117 (2011)

Gough, O., Niza, C.: Retirement saving choices: review of the literature and policy implications. Journal of Population Ageing4(1), 97–117 (2011)

work page 2011

[52] [52]

CSA: Certified Senior Advisor22, 31–39 (2004)

Hershey, D.: Psychological influences on the retirement investor. CSA: Certified Senior Advisor22, 31–39 (2004)

work page 2004

[53] [53]

Journal of political Economy112(S1), 164–187 (2004)

Thaler, R.H., Benartzi, S.: Save more tomorrow™: Using behavioral economics to increase employee saving. Journal of political Economy112(S1), 164–187 (2004)

work page 2004

[54] [54]

Journal of Economic perspectives21(3), 81–104 (2007)

Benartzi, S., Thaler, R.H.: Heuristics and biases in retirement savings behavior. Journal of Economic perspectives21(3), 81–104 (2007)

work page 2007

[55] [55]

Management science45(3), 364–381 (1999)

Benartzi, S., Thaler, R.H.: Risk aversion or myopia? choices in repeated gambles and retirement investments. Management science45(3), 364–381 (1999)

work page 1999

[56] [56]

Journal of occupational health psychology1(2), 131 (1996)

Moen, P.: A life course perspective on retirement, gender, and well-being. Journal of occupational health psychology1(2), 131 (1996)

work page 1996

[57] [57]

The International Journal of Aging and Human Development64(1), 13–32 (2007)

Stawski, R.S., Hershey, D.A., Jacobs-Lawson, J.M.: Goal clarity and financial planning activities as determinants of retirement savings contributions. The International Journal of Aging and Human Development64(1), 13–32 (2007)

work page 2007

[58] [58]

Psychology and aging24(1), 245 (2009)

Petkoska, J., Earl, J.K.: Understanding the influence of demographic and psychological variables on 48 retirement planning. Psychology and aging24(1), 245 (2009)

work page 2009

[59] [59]

The Gerontologist40(6), 687–697 (2000)

Hershey, D.A., Mowen, J.C.: Psychological determinants of financial preparedness for retirement. The Gerontologist40(6), 687–697 (2000)

work page 2000

[60] [60]

Family and Consumer Sciences Research Journal41(1), 36–55 (2012)

Yang, T.-Y., DeVaney, S.A.: Determinants of retirement assets and the amount in stock in retirement assets. Family and Consumer Sciences Research Journal41(1), 36–55 (2012)

work page 2012

[61] [61]

European Journal of Applied Business and Management1(1) (2015)

Rey-Ares, L., Fern´ andez-L´ opez, S., Vivel-B´ ua, M.: The determinants of privately saving for retire- ment: the cases of portugal and spain. European Journal of Applied Business and Management1(1) (2015)

work page 2015

[62] [62]

Fiscal studies28(2), 143–170 (2007)

Banks, J., Oldfield, Z.: Understanding pensions: Cognitive function, numerical ability and retirement saving. Fiscal studies28(2), 143–170 (2007)

work page 2007

[63] [63]

Technical report, National Bureau of Economic Research (2011)

Alessie, R.J., Van Rooij, M., Lusardi, A.: Financial literacy, retirement preparation and pension expectations in the netherlands. Technical report, National Bureau of Economic Research (2011)

work page 2011

[64] [64]

Journal of aging & social policy26(4), 308–323 (2014)

Chou, K.-L., Yu, K.-M., Chan, W.-S., Chan, A.C., Lum, T.Y., Zhu, A.Y.: Social and psychological barriers to private retirement savings in hong kong. Journal of aging & social policy26(4), 308–323 (2014)

work page 2014

[65] [65]

Qwen3 Technical Report

Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Gao, C., Huang, C., Lv, C., et al.: Qwen3 technical report. arXiv preprint arXiv:2505.09388 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[66] [66]

Software

Ollama Team: Ollama: Get up and Running with Llama 2, Code Llama, and Other Large Language Models Locally. Software. https://github.com/ollama/ollama

work page

[67] [67]

In: Proceedings of the 29th Symposium on Operating Systems Principles, pp

Kwon, W., Li, Z., Zhuang, S., Sheng, Y., Zheng, L., Yu, C.H., Gonzalez, J., Zhang, H., Stoica, I.: Efficient memory management for large language model serving with pagedattention. In: Proceedings of the 29th Symposium on Operating Systems Principles, pp. 611–626 (2023)

work page 2023

[68] [68]

Chapman and Hall/CRC, ??? (1994)

Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman and Hall/CRC, ??? (1994)

work page 1994

[69] [69]

In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp

Koehn, P.: Statistical significance tests for machine translation evaluation. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 388–395 (2004)

work page 2004

[70] [70]

In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp

Berg-Kirkpatrick, T., Burkett, D., Klein, D.: An empirical investigation of statistical significance in nlp. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 995–1005 (2012)

work page 2012

[71] [71]

In: Adjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization, pp

Kaiser, C., Kaiser, J., Manewitsch, V., Rau, L., Schallner, R.: Simulating human opinions with large language models: Opportunities and challenges for personalized survey data modeling. In: Adjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization, pp. 82–86 (2025)

work page 2025

[72] [72]

Murthy, S.K., Ullman, T., Hu, J.: One fish, two fish, but not the whole sea: Alignment reduces language models’ conceptual diversity. In: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pp. 11241–11258 (2025)

work page 2025

[73] [73]

arXiv preprint arXiv:2212.1052910(2022) 49

Li, X., Li, Y., Liu, L., Bing, L., Joty, S.: Is gpt-3 a psychopath? evaluating large language models from a psychological perspective. arXiv preprint arXiv:2212.1052910(2022) 49

work page arXiv 2022

[74] [74]

Sociological Methods & Research54(3), 1110–1155 (2025)

Lyman, A., Hepner, B., Argyle, L.P., Busby, E.C., Gubler, J.R., Wingate, D.: Balancing large lan- guage model alignment and algorithmic fidelity in social science research. Sociological Methods & Research54(3), 1110–1155 (2025)

work page 2025

[75] [75]

Gemma 3 Technical Report

Team, G., Kamath, A., Ferret, J., Pathak, S., Vieillard, N., Merhej, R., Perrin, S., Matejovicova, T., Ram´ e, A., Rivi` ere, M., et al.: Gemma 3 technical report. arXiv preprint arXiv:2503.19786 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[76] [76]

The Llama 3 Herd of Models

Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Vaughan, A., et al.: The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[77] [77]

arXiv preprint arXiv:2409.13338 (2024)

Herel, D., Bartek, V., Jirak, J., Mikolov, T.: Time awareness in large language models: Benchmarking fact recall across time. arXiv preprint arXiv:2409.13338 (2024)

work page arXiv 2024

[78] [78]

In: Proceedings of the 17th ACM International Conference on Web Search and Data Mining, pp

Wallat, J., Jatowt, A., Anand, A.: Temporal blind spots in large language models. In: Proceedings of the 17th ACM International Conference on Web Search and Data Mining, pp. 683–692 (2024)

work page 2024

[79] [79]

arXiv preprint arXiv:2310.00835 (2023)

Wang, Y., Zhao, Y.: Tram: Benchmarking temporal reasoning for large language models. arXiv preprint arXiv:2310.00835 (2023)

work page arXiv 2023

[80] [80]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., et al.: Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023