Your Reviews Replicate You: LLM-Based Agents as Customer Digital Twins for Conjoint Analysis
Pith reviewed 2026-05-15 15:45 UTC · model grok-4.3
The pith
LLM-based customer digital twins built from Reddit review histories replicate individual user preferences in conjoint analysis tasks with 87.73 percent accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Customer digital twins are LLM agents constructed by aggregating an individual's Reddit reviews into a vector database and applying retrieval-augmented generation with prompt engineering to answer conjoint questions. The agents complete pairwise comparisons on product profiles from a fractional factorial design; the resulting choice data is then analyzed via logistic regression to recover part-worth utilities. Empirical validation shows these agents match the original users' preferences at 87.73 percent accuracy, and a monitor category case study yields trade-off structures consistent with real market conditions.
What carries the argument
Retrieval-augmented generation over individualized vector databases of Reddit reviews, which lets each LLM agent retrieve and reason from a specific user's past preference statements while performing conjoint pairwise tasks.
If this is right
- Conjoint studies can be run entirely with digital twins without recruiting or fatiguing real respondents.
- Choice data from the twins supports standard logistic regression to estimate part-worth utilities for product attributes.
- A monitor case study recovers realistic trade-offs such as panel type versus resolution that match external market observations.
- The method scales to large numbers of virtual respondents at low marginal cost once review histories are processed.
Where Pith is reading between the lines
- If review data proves rich enough, similar digital twins could be constructed for other preference domains such as subscription services or durable goods without new data collection.
- Firms could prototype product concepts against large simulated populations before fielding any human surveys.
- Using public reviews to create persistent individual simulations raises questions about consent and data ownership that the method leaves unaddressed.
Load-bearing premise
That a user's collected Reddit review history contains sufficient stable information about their preferences and constraints to accurately simulate responses to new product attribute trade-offs in the conjoint experiment.
What would settle it
Administering the identical pairwise conjoint questions directly to the original Reddit users and finding that their choices diverge from the CDT predictions in substantially more than 12 percent of cases on average.
Figures
read the original abstract
Conjoint analysis is a cornerstone of market research for estimating consumer preferences; however, traditional methods face persistent challenges regarding time, cost, and respondent fatigue. To address these limitations, this study proposes a framework that utilizes large language model (LLM)-based "customer digital twins (CDT)" as virtual respondents. We identified active users within the Reddit community and aggregated their comprehensive review histories to construct individualized vector databases. By integrating retrieval-augmented generation (RAG) with prompt engineering, this study developed customer agents capable of dynamically retrieving and reasoning upon their specific past preferences and constraints. These customer agents, called CDTs, performed pairwise comparison tasks on product profiles generated via fractional factorial design, and the resulting choice data was analyzed to estimate part-worth utilities by logistic regression. Empirical validation demonstrates that these CDTs predict the preferences of actual users with 87.73% accuracy. Furthermore, a case study on the computer monitor category successfully quantified trade-offs between attributes such as panel type and resolution, deriving preference structures consistent with market realities. Ultimately, this study contributes to marketing research by presenting a scalable alternative that significantly improves both agility and cost-efficiency to traditional methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes LLM-based Customer Digital Twins (CDTs) built from aggregated Reddit review histories using RAG and prompt engineering. These agents simulate pairwise choices on fractional-factorial product profiles, with part-worth utilities estimated via logistic regression. The central claim is that the CDTs replicate real-user preferences at 87.73% accuracy, demonstrated in a computer-monitor case study, as a scalable, low-cost alternative to traditional conjoint analysis.
Significance. If the accuracy claim is substantiated with adequate validation details, the work could offer a practical advance in market research by replacing human respondents with individualized LLM agents, substantially lowering costs and respondent burden while preserving predictive fidelity. The external benchmark against held-out real-user choices is a methodological strength.
major comments (3)
- [Abstract] Abstract: The 87.73% accuracy figure is reported without any information on the number of users, number of conjoint tasks per user, held-out sample size, cross-validation procedure, or statistical tests, leaving the central empirical result impossible to evaluate for robustness or bias.
- [Methodology] Methodology section: The RAG pipeline retrieves Reddit reviews but provides no metric quantifying coverage of the specific conjoint attributes (e.g., panel type, resolution, refresh rate) in the retrieved context; without this, it is unclear whether the reported accuracy reflects user-specific data or LLM priors.
- [Empirical validation] Empirical validation: No controls or ablation are described for prompt sensitivity, Reddit data selection biases, or attribute sparsity, all of which directly affect whether the CDT predictions are driven by the constructed vector databases rather than general model knowledge.
minor comments (2)
- [Case study] The fractional-factorial design details and the exact logistic regression specification for part-worth estimation should be expanded with an equation or table for reproducibility.
- [Figures] Figure captions and axis labels in the preference-structure plots could be clarified to distinguish CDT-derived utilities from any real-user baselines.
Circularity Check
No significant circularity; validation uses held-out real-user choices as external benchmark
full rationale
The paper builds CDTs from aggregated Reddit review histories via RAG and prompt engineering, then runs them on fractional-factorial conjoint profiles to generate choice data for logistic regression utility estimation. The central claim of 87.73% accuracy is obtained by comparing CDT outputs directly to held-out choice responses from the same real users on identical profiles. This comparison is an independent external benchmark and does not reduce to any fitted parameter or self-referential definition within the CDT construction pipeline. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling appear in the provided derivation; the accuracy metric is statistically independent of the review-to-vector-database step.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM agents augmented with RAG on personal review histories can accurately simulate individual consumer preferences for unseen product profiles
invented entities (1)
-
Customer Digital Twin (CDT)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Aher, G., Arriaga, R.I. and Kalai, A.T. 2023. Using large language models to simulate multiple humans and replicate human sub ject studies. Proceedings of the 40th International Conference on Machine Learning (Honolulu, Hawaii, USA, 2023), 337–371
work page 2023
-
[2]
Andreas, J. 2022. Language Models as Agent Models. arXiv
work page 2022
-
[3]
Argyle, L.P., Busby, E.C., Fulda, N., Gubler, J., Rytting, C. and Wingate, D. 2023. Out of One, Many: Using Language Models to Simulate Human Samples. Political Analysis. 31, 3 (July 2023), 337–351. https://doi.org/10.1017/pan.2023.2
-
[4]
Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori , H., Palangi, H., Ribeiro, M.T. and Zhang, Y
-
[5]
Sparks of Artificial General Intelligence: Early experiments with GPT-4. arXiv
-
[6]
Cooper, R.G. 2019. The drivers of success in new -product development. Industrial Marketing Management. 76, (Jan. 2019), 36 –47. https://doi.org/10.1016/j.indmarman.2018.07.005
-
[7]
Cunningham, C.E., Deal, K. and Chen, Y. 2010. Adaptive Choice-Based Conjoint Analysis: A New Patient-Centered Approach to the Assessment of Health Service Preferences. The Patient: Patient-Centered Outcomes Research. 3, 4 (Dec. 2010), 257–273. https://doi.org/10.2165/11537870-000000000-00000
-
[8]
Dahan, E. and Hauser, J.R. 2002. The virtual customer. Journal of Product Innovation Management. 19, 5 (Sept. 2002), 332–353. https://doi.org/10.1111/1540-5885.1950332
-
[9]
DeShazo, J.R. and Fermo, G. 2002. Designing Choice Sets for Stated Preference Methods: The Effects of Complexity on Choice Co nsistency. Journal of Environmental Economics and Management. 44, 1 (July 2002), 123–143. https://doi.org/10.1006/jeem.2001.1199
-
[10]
Goodman, J.K. and Paolacci, G. 2017. Crowdsourcing Consumer Research. Journal of Consumer Research. 44, 1 (June 2017), 196–210. https://doi.org/10.1093/jcr/ucx047
-
[11]
Green, P.E., Krieger, A.M. and Agarwal, M.K. 1991. Adaptive Conjoint Analysis: Some Caveats and Suggestions. Journal of Marke ting Research. 28, 2 (May 1991), 215. https://doi.org/10.2307/3172809
-
[12]
Green, P.E. and Rao, V.R. 1971. Conjoint Measurement for Quantifying Judgmental Data. Journal of Marketing Research. 8, 3 (Au g. 1971), 355. https://doi.org/10.2307/3149575
-
[13]
Green, P.E. and Srinivasan, V. 1978. Conjoint Analysis in Consumer Research: Issues and Outlook. Journal of Consumer Research. 5, 2 (1978), 103–123
work page 1978
-
[14]
Green, P.E. and Srinivasan, V. 1990. Conjoint Analysis in Marketing: New Developments with Implications for Research and Practice. Journal of Marketing. 54, 4 (Oct. 1990), 3–19. https://doi.org/10.1177/002224299005400402
-
[15]
Horton, J.J. 2023. Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? arXiv
work page 2023
-
[16]
Johnson, R.M. 1987. Adaptive conjoint analysis. Proceedings of the Sawtooth Software Conference on Perceptual Mapping, Conjoi nt Analysis, and Computer Interviewing (Ketchum, ID, 1987), 253–265
work page 1987
-
[17]
Adaptive Conjoint Analysis: Some Caveats and Suggestions
Johnson, R.M. 1991. Comment on “Adaptive Conjoint Analysis: Some Caveats and Suggestions.” Journal of Marketing Research. 28, 2 (May 1991), 223. https://doi.org/10.2307/3172810
-
[18]
Johnson, R.M. and Orme, B.K. 2007. A New Approach to Adaptive CBC. (2007)
work page 2007
-
[19]
Lancaster, K.J. 1966. A New Approach to Consumer Theory. Journal of Political Economy. 74, 2 (Apr. 1966), 132–157. https://doi.org/10.1086/259131
-
[20]
Lenk, P.J., DeSarbo, W.S., Green, P.E. and Young, M.R. 1996. Hierarchical Bayes Conjoint Analysis: Recovery of Partworth Hete rogeneity from Reduced Experimental Designs. Marketing Science. 15, 2 (1996), 173–191
work page 1996
-
[21]
Liu, Y., Zhou, H., Guo, Z., Shareghi, E., Vulić, I., Korhonen, A. and Collier, N. 2024. Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators. arXiv
work page 2024
-
[22]
Miller, K.M., Hofstetter, R., Krohmer, H. and Zhang, Z.J. 2011. How Should Consumers’ Willingness to Pay be Measured? An Empi rical Comparison of State -of-the-Art Approaches. Journal of Marketing Research. 48, 1 (Feb. 2011), 172–184. https://doi.org/10.1509/jmkr.48.1.172
-
[23]
Mukaida, K., Ogata, S. and Okano, K. 2025. Persona-driven automated extraction of non-functional requirements using LLM agents. Procedia Computer Science. 270, (2025), 485–494. https://doi.org/10.1016/j.procs.2025.09.167
-
[24]
Park, J.S., O’Brien, J.C., Cai, C.J., Morris, M.R., Liang, P. and Bernstein, M.S. 2023. Generative Agents: Interactive Simulacra of Human Behavior. arXiv
work page 2023
-
[25]
Pullman, M.E., Moore, W.L. and Wardell, D.G. 2002. A comparison of quality function deployment and conjoint analysis in new product design. Journal of Product Innovation Management. 19, 5 (2002), 354–364. https://doi.org/10.1111/1540-5885.1950354
-
[26]
Qin, Z., Jagerman, R., Hui, K., Zhuang, H., Wu, J., Yan, L., Shen, J., Liu, T., Liu, J., Metzler, D., Wang, X. and Bendersky, M. 2024. Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting. Findings of the Association for Computational Linguistics: NAACL 2024 (Mexico City, Mexico, 2024), 1504–1518
work page 2024
-
[27]
Rizwan, M., Carlsson, L. and Loni, M. 2025. PersonaBOT: Bringing Customer Personas to Life with LLMs and RAG. arXiv
work page 2025
-
[28]
Rosen, S. 1974. Hedonic Prices and Implicit Markets: Product Differentiation in Pure Competition. Journal of Political Economy. 82, 1 (1974), 34–55
work page 1974
-
[29]
Santurkar, S., Durmus, E., Ladhak, F., Lee, C., Liang, P. and Hashimoto, T. 2023. Whose Opinions Do Language Models Reflect? arXiv
work page 2023
-
[30]
Swait, J. and Adamowicz, W. 2001. The Influence of Task Complexity on Consumer Choice: A Latent Class Model of Decision Strat egy Switching. Journal of Consumer Research. 28, 1 (June 2001), 135–148. https://doi.org/10.1086/321952
-
[31]
Walter, S.L., Seibert, S.E., Goering, D. and O’Boyle, E.H. 2019. A Tale of Two Sample Sources: Do Results from Online Panel D ata and Conventional Data Converge? Journal of Business and Psychology. 34, 4 (Aug. 2019), 425–452. https://doi.org/10.1007/s10869-018-9552-y
-
[32]
Wang, L. et al. 2025. User Behavior Simulation with Large Language Model -based Agents. ACM Transactions on Information Systems. 43, 2 (Mar. 2025), 1 –37. https://doi.org/10.1145/3708985
-
[33]
Zhang, B., Li, J., Hortaçsu, A., Ye, X., Chernozhukov, V., Ni, A. and Huang, E. 2025. Agentic Economic Modeling. arXiv. A Prompts ### ROLE & PERSONA You are the Reddit user '{user_id}'. The content provided below represents your OWN past memories, reviews, and opinions. You must simulate this specific user's preference logic, writing style, and decision-m...
work page 2025
-
[34]
Analyze the memories to determine which option aligns better with your past self
-
[35]
You MUST return the result in a valid JSON format
- [36]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.