Can Large Language Models Revolutionize Survey Research? Experiments with Disaster Preparedness Responses
Pith reviewed 2026-05-20 06:26 UTC · model grok-4.3
The pith
A theory-anchored LLM reduces error and bias when imputing missing disaster survey responses better than standard statistical methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Organizing retrieval around PMT causal structure and integrating all evidence in a single model call outperforms unstructured retrieval and staged sequential inference. The proposed Anchored Marginal Theory-Informed LLM (A-TLM) outperforms all three classical imputation baselines (IPW/MI, MICE+PMM, missForest) on RMSE under disaster-relevant block-wise MNAR conditions (S4 RMSE 1.439 vs. 1.496 for the next-best), while achieving near-zero signed bias (-0.121) where the random-forest imputer produces the largest absolute bias (-0.631).
What carries the argument
The Anchored Marginal Theory-Informed LLM (A-TLM), which grounds LLM retrieval in a Protection Motivation Theory-constrained co-occurrence knowledge graph to produce imputations that respect documented causal relationships among survey items.
Load-bearing premise
A knowledge graph built from Protection Motivation Theory correctly represents the causal structure of how people answer disaster preparedness questions and does not introduce new systematic errors when used to guide the model.
What would settle it
Collect follow-up responses from the original respondents who had missing data and compare the distribution of A-TLM imputations against those actual answers to test whether the error and bias advantages hold outside the original sample.
Figures
read the original abstract
Survey research faces mounting structural challenges: declining response rates, sample bias, block-wise missingness among at-risk respondents, and AI-assisted fraudulent completions in online panels. Large language models (LLMs) have been proposed as a remedy, yet rigorous evaluations across the full survey workflow remain scarce, particularly in disaster contexts where data quality matters most. We present and evaluate a five-stage framework for LLM integration covering questionnaire design, sample selection, pilot testing, missing-data imputation, and post-collection analysis, using the 2024 Hurricane Milton preparedness survey of Florida residents (n=946) as a shared empirical testbed. We introduce a Protection Motivation Theory (PMT)-constrained co-occurrence knowledge graph and develop seven LLM configurations spanning zero-shot inference, retrieval-augmented baselines, and novel theory-informed variants. Our proposed Anchored Marginal Theory-Informed LLM (A-TLM) outperforms all three classical imputation baselines (IPW/MI, MICE+PMM, missForest) on RMSE under disaster-relevant block-wise MNAR conditions (S4 RMSE 1.439 vs. 1.496 for the next-best), while achieving near-zero signed bias (-0.121) where the random-forest imputer produces the largest absolute bias (-0.631). Organizing retrieval around PMT causal structure and integrating all evidence in a single model call outperforms unstructured retrieval and staged sequential inference (MAE 0.993 vs. 1.097 for standard RAG). We document that near-zero aggregate bias can mask opposing subgroup errors and propose subgroup-stratified bias auditing as a reporting standard. A retrieval-constrained knowledge-graph chatbot demonstrates that hallucination is architecturally manageable through grounded refusal.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a five-stage framework for integrating large language models into survey research workflows, evaluated on the 2024 Hurricane Milton preparedness survey of Florida residents (n=946). It introduces a Protection Motivation Theory (PMT)-constrained co-occurrence knowledge graph and seven LLM configurations, with the proposed Anchored Marginal Theory-Informed LLM (A-TLM) claimed to outperform classical imputation baselines (IPW/MI, MICE+PMM, missForest) on RMSE (1.439 vs. 1.496 for the next-best) and signed bias (-0.121 vs. -0.631) under block-wise MNAR conditions, while also addressing hallucination via grounded refusal and proposing subgroup-stratified bias auditing.
Significance. If the empirical claims hold without artifacts from data leakage or insufficient controls, the work could meaningfully advance LLM applications in survey methodology by demonstrating theory-grounded retrieval augmentation for imputation in high-stakes disaster contexts. It provides concrete metrics from a real survey testbed, compares against established baselines, and highlights practical issues like masked subgroup errors. The approach of anchoring on PMT structure is a strength if the graph supplies independent causal information; otherwise the gains may not generalize beyond this setup.
major comments (3)
- [Methods (PMT-constrained co-occurrence knowledge graph and A-TLM description)] The construction of the PMT-constrained co-occurrence knowledge graph is not described in sufficient detail to determine its data sources or independence from the n=946 survey responses used for imputation testing. If edges or co-occurrences are extracted from the same response patterns (even under PMT constraints), retrieval-augmented generation could indirectly access the block-wise missingness patterns being imputed, unlike the classical baselines (MICE+PMM, missForest) that receive no such information. This directly threatens the central claim that A-TLM's RMSE advantage (S4: 1.439) and near-zero bias arise from theory-informed structure rather than leakage. Please specify the exact sources, construction process, and any overlap with the test data.
- [Experimental Setup and LLM Configurations] Exact prompt templates for the seven LLM configurations, the process for integrating the knowledge graph into retrieval, and whether data splits, exclusions, or MNAR simulation parameters were pre-specified versus post-hoc are not provided. Without these, the reported performance numbers (e.g., A-TLM RMSE 1.439 and bias -0.121 under block-wise MNAR) cannot be fully assessed for reproducibility or robustness against implementation choices.
- [Results (bias and subgroup analysis)] The manuscript notes that near-zero aggregate bias can mask opposing subgroup errors and proposes subgroup-stratified auditing as a reporting standard, but it is unclear whether this auditing was actually applied to the A-TLM results or only suggested. If not performed, the superiority claim over missForest (largest absolute bias -0.631) remains incomplete, as subgroup-specific errors could undermine the practical utility in disaster preparedness contexts.
minor comments (3)
- [Abstract] The abstract references 'S4 RMSE' without defining the scenario or table/figure it corresponds to; add a brief clarification or cross-reference.
- [Introduction and Methods] The acronym 'A-TLM' is introduced without an explicit expansion or definition of the 'Anchored Marginal' component in the early sections; ensure this is defined before the performance claims.
- [Results] Figure or table captions for the imputation results should explicitly list all compared methods and conditions to improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript to improve clarity, reproducibility, and transparency.
read point-by-point responses
-
Referee: [Methods (PMT-constrained co-occurrence knowledge graph and A-TLM description)] The construction of the PMT-constrained co-occurrence knowledge graph is not described in sufficient detail to determine its data sources or independence from the n=946 survey responses used for imputation testing. If edges or co-occurrences are extracted from the same response patterns (even under PMT constraints), retrieval-augmented generation could indirectly access the block-wise missingness patterns being imputed, unlike the classical baselines (MICE+PMM, missForest) that receive no such information. This directly threatens the central claim that A-TLM's RMSE advantage (S4: 1.439) and near-zero bias arise from theory-informed structure rather than leakage. Please specify the exact sources, construction process, and any overlap with the test data.
Authors: We appreciate the referee's concern regarding potential data leakage, which is essential to substantiate our claims. The PMT-constrained co-occurrence knowledge graph was constructed from external sources, including published literature on Protection Motivation Theory, general co-occurrence statistics from public disaster preparedness datasets, and expert-defined causal relations, all developed independently and prior to the collection or analysis of the n=946 Hurricane Milton survey responses. No edges or co-occurrences were derived from the survey data used in the imputation experiments. We will expand the Methods section with a step-by-step description of the graph construction, explicit listing of all data sources, and verification steps confirming independence from the test data. This revision will demonstrate that the observed advantages in RMSE and bias derive from the theory-informed structure. revision: yes
-
Referee: [Experimental Setup and LLM Configurations] Exact prompt templates for the seven LLM configurations, the process for integrating the knowledge graph into retrieval, and whether data splits, exclusions, or MNAR simulation parameters were pre-specified versus post-hoc are not provided. Without these, the reported performance numbers (e.g., A-TLM RMSE 1.439 and bias -0.121 under block-wise MNAR) cannot be fully assessed for reproducibility or robustness against implementation choices.
Authors: We agree that these details are necessary for full reproducibility. We will add the exact prompt templates for all seven LLM configurations to a new appendix. We will also provide a clear description of how the knowledge graph is integrated into the retrieval process. Finally, we will explicitly state in the revised Experimental Setup section that data splits, exclusions, and MNAR simulation parameters were pre-specified in the study protocol prior to any model evaluation or result computation. These additions will enable independent assessment of the reported metrics. revision: yes
-
Referee: [Results (bias and subgroup analysis)] The manuscript notes that near-zero aggregate bias can mask opposing subgroup errors and proposes subgroup-stratified auditing as a reporting standard, but it is unclear whether this auditing was actually applied to the A-TLM results or only suggested. If not performed, the superiority claim over missForest (largest absolute bias -0.631) remains incomplete, as subgroup-specific errors could undermine the practical utility in disaster preparedness contexts.
Authors: We thank the referee for this important clarification request. The subgroup-stratified bias auditing was performed on the A-TLM results as part of the analysis to check for masked opposing errors. We will revise the Results section to explicitly document that this auditing was applied, and we will include the subgroup-specific bias and error metrics. This addition will strengthen the evidence for practical utility in disaster preparedness contexts and address the comparison with missForest. revision: yes
Circularity Check
No significant circularity; central claims rest on external empirical comparisons
full rationale
The paper evaluates A-TLM imputation performance through direct RMSE and bias metrics against independent classical baselines (IPW/MI, MICE+PMM, missForest) under simulated block-wise MNAR conditions on the n=946 Florida survey. PMT is invoked from established external literature as a constraint on the co-occurrence graph, and the framework compares theory-informed retrieval against unstructured RAG and sequential baselines without reducing the reported gains to internal definitions or self-citations. No load-bearing step equates a prediction to a fitted parameter by construction or imports uniqueness via author-overlapping citations; the derivation chain remains self-contained against the stated external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Protection Motivation Theory provides a valid causal structure for modeling disaster preparedness survey responses
invented entities (1)
-
Anchored Marginal Theory-Informed LLM (A-TLM)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We instantiated the cascade as a PMT-constrained co-occurrence graph built from the 757 training records... 204 source nodes, 9,605 weighted edges
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
A-TLM... outperforms... on RMSE under disaster-relevant block-wise MNAR conditions
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Argyle, Lisa P. and Busby, Ethan C. and Fulda, Nancy and Gubler, Joshua R. and Rytting, Christopher and Wingate, David , title =. Political Analysis , year =
-
[2]
Arora, Neeraj and Chakraborty, Inyoung and Nishimura, Yusuke , title =. Journal of Marketing , year =
-
[3]
Sociological Methods & Research , year =
Ashwin, Julian and Chhabra, Aditi and Rao, Vijayendra , title =. Sociological Methods & Research , year =
-
[4]
Barari, Soubhik and Angbazo, Josephine and Wang, Nuo and Christian, Leah Melani and Dean, Emma and Slowinski, Zachary and Sepulvado, Brandon , title =
-
[5]
Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems , year =
Bentley, Frank and O'Neill, Kieran and Quehl, Karina and Lottridge, Danielle , title =. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems , year =
work page 2020
-
[6]
and Dorff, Cassy and Kenkel, Brenton and Larson, Jennifer M
Bisbee, James and Clinton, Joshua D. and Dorff, Cassy and Kenkel, Brenton and Larson, Jennifer M. , title =. Political Analysis , year =
-
[7]
Language Models are Few-Shot Learners
Brown, Tom B. and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared and Dhariwal, Prafulla and Amodei, Dario , title =. arXiv preprint arXiv:2005.14165 , year =
work page internal anchor Pith review Pith/arXiv arXiv 2005
-
[8]
Callegaro, Mario and Baker, Reg and Bethlehem, Jelke and Goritz, Anja S. and Krosnick, Jon A. and Lavrakas, Paul J. , title =. Online Panel Research , publisher =. 2014 , pages =
work page 2014
-
[9]
Chakraborty, Inyoung and Nishimura, Yusuke , title =. Journal of Marketing , year =
-
[10]
Journal of Survey Statistics and Methodology , year =
Coffey, Spencer , title =. Journal of Survey Statistics and Methodology , year =
- [11]
-
[12]
Crockett, Molly J. and Messeri, Lisa , title =. Trends in Cognitive Sciences , year =
-
[13]
and Christian, Leah Melani , title =
Dillman, Don A. and Christian, Leah Melani , title =. Field Methods , year =
-
[14]
Dillman, Don A. and Smyth, Jolene D. and Christian, Leah Melani , title =
-
[15]
and Prentice-Dunn, Steven and Rogers, Ronald W
Floyd, Donna L. and Prentice-Dunn, Steven and Rogers, Ronald W. , title =. Journal of Applied Social Psychology , year =
-
[16]
Survey and analysis of hallucinations in large language models , journal =. 2025 , volume =
work page 2025
-
[17]
Fussell, Elizabeth and Curtis, Katherine J. and DeWaard, Jack , title =. Population and Environment , year =
-
[18]
Proceedings of the National Academy of Sciences , year =
Gao, Yang and Lee, Daegon and Burtch, Gordon and Fazelpour, Sina , title =. Proceedings of the National Academy of Sciences , year =
- [19]
-
[20]
Groves, Robert M. and Lyberg, Lars , title =. Public Opinion Quarterly , year =
-
[21]
Hac. Tracking consumer sentiment versus how consumers are doing based on verified retail purchases , journal =. 2025 , doi =
work page 2025
-
[22]
International Journal of Disaster Risk Reduction , year =
Hao, Han and Wang, Yan and Kang, Suwan , title =. International Journal of Disaster Risk Reduction , year =
-
[23]
and Liu, Honghu and Kapteyn, Arie , title =
Hays, Ron D. and Liu, Honghu and Kapteyn, Arie , title =. Behavior Research Methods , year =
-
[24]
https://arxiv.org/abs/2301.07543
Horton, John J. , title =. arXiv preprint arXiv:2301.07543 , year =
-
[25]
ACM Transactions on Information Systems , year =
Huang, Yue and others , title =. ACM Transactions on Information Systems , year =
-
[26]
Johnson, Christopher and others , title =. People and Nature , year =
-
[27]
Jung, Soon-gyo and Salminen, Joni and Aldous, Kristen K. and Jansen, Bernard J. , title =. International Journal of Human-Computer Studies , year =
-
[28]
Kaiyrbekov, Kuanysh and Dobbins, Nicolas J. and Mooney, Sean D. , title =. JAMIA Open , year =
-
[29]
The Australian Journal of Emergency Management , year =
King, David , title =. The Australian Journal of Emergency Management , year =
-
[30]
and Bostrom, Ann and Morss, Rebecca E
Lazo, Jeffrey K. and Bostrom, Ann and Morss, Rebecca E. and Demuth, Julie L. and Lazrus, Heather , title =. Risk Analysis , year =
-
[31]
and Paulson, Charles , title =
Leduc, Sylvain and Oliveira, Luiz E. and Paulson, Charles , title =
-
[32]
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Lewis, Patrick and Perez, Ethan and Piktus, Aleksandra and Petroni, Fabio and Karpukhin, Vladimir and Goyal, Naman and Kiela, Douwe , title =. arXiv preprint arXiv:2005.11401 , year =
work page internal anchor Pith review Pith/arXiv arXiv 2005
- [33]
-
[34]
Proceedings of the National Academy of Sciences , year =
Loru, Emanuele and Nudo, Jacopo and Di Marco, Nicola and Santirocchi, Alessandro and Atzeni, Riccardo and Cinelli, Matteo and Quattrociocchi, Walter , title =. Proceedings of the National Academy of Sciences , year =
-
[35]
Meade, Adam W. and Craig, S. Bartholomew , title =. Psychological Methods , year =
- [37]
- [38]
-
[39]
Counterfeit judgments in large language models , journal =
Perc, Matja. Counterfeit judgments in large language models , journal =. 2025 , volume =
work page 2025
-
[40]
Frontiers in Research Metrics and Analytics , year =
Pinzon-Espitia, Olga Lucia , title =. Frontiers in Research Metrics and Analytics , year =
-
[41]
Humanities and Social Sciences Communications , year =
Qu, Yiting and Wang, Jia , title =. Humanities and Social Sciences Communications , year =
-
[42]
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics , year =
Ravi, Suraj and Rovira, Pol and Acharya, Aditya and Petrov, Slav and Lee, Andrew and Gokul, Hema and Patwary, Mostofa , title =. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics , year =
- [43]
-
[44]
and Subrahmanya, Shreyas and Sedoc, Jo
Salecha, Aadesh and Ireland, Molly E. and Subrahmanya, Shreyas and Sedoc, Jo. Large language models display human-like social desirability biases in. PNAS Nexus , year =
-
[45]
and Rau, Lukas and Schmitt, Bernd , title =
Sarstedt, Marko and Adler, Stephan J. and Rau, Lukas and Schmitt, Bernd , title =. Psychology & Marketing , year =
-
[46]
SSRN Working Paper No.\ 5976335 , year =
Saxena, Dhruv , title =. SSRN Working Paper No.\ 5976335 , year =
-
[47]
Seaman, Shaun R. and White, Ian R. and Copas, Andrew J. and Li, Leah , title =. Biometrics , year =
-
[48]
and Peters, Ellen and MacGregor, Donald G
Slovic, Paul and Finucane, Melissa L. and Peters, Ellen and MacGregor, Donald G. , title =. European Journal of Operational Research , year =
-
[49]
Artificial Intelligence Index Report 2025 , institution =
work page 2025
-
[50]
Stekhoven, Daniel J. and B. Bioinformatics , year =
-
[51]
and Bilgen, Ipek and Dillman, Don A
Stern, Michael J. and Bilgen, Ipek and Dillman, Don A. , title =. Field Methods , year =
-
[52]
Sun, Sibo and Lee, Eunsol and Nan, Dong and Zhao, Xiaoqian and Lee, Wonjun and Jansen, Bernard J. and Kim, Jang Hyun , title =. arXiv preprint arXiv:2402.18144 , year =
-
[53]
Humanities and Social Sciences Communications , year =
Sun, Sibo and Schoenegger, Philipp and Kapoor, Sandeep and Pan, Liangming and Narayanan, Arvind , title =. Humanities and Social Sciences Communications , year =
-
[54]
arXiv preprint arXiv:2509.07370 , year =
Tang, Yi and Yang, Yi and Abbasi, Ahmed , title =. arXiv preprint arXiv:2509.07370 , year =
-
[55]
and Rasinski, Kenneth , title =
Tourangeau, Roger and Rips, Lance J. and Rasinski, Kenneth , title =
-
[56]
American Community Survey 5-Year Estimates, 2018--2022 , institution =
work page 2018
-
[57]
Journal of Statistical Software , year =
mice: Multivariate imputation by chained equations in. Journal of Statistical Software , year =
- [58]
-
[59]
International Journal of Disaster Risk Reduction , year =
Wang, Yan and Guo, Ziyi and McCarty, Christopher , title =. International Journal of Disaster Risk Reduction , year =
- [60]
-
[61]
A Survey of Large Language Models
Zhao, Wayne Xin and Zhou, Kun and Li, Junyi and Tang, Tianyi and Wang, Xiaolei and Hou, Yupeng and Wen, Ji-Rong , title =. arXiv preprint arXiv:2303.18223 , year =
work page internal anchor Pith review Pith/arXiv arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.