pith. sign in

arxiv: 2604.22756 · v1 · submitted 2026-03-06 · 💻 cs.IR · cs.AI

Your Reviews Replicate You: LLM-Based Agents as Customer Digital Twins for Conjoint Analysis

Pith reviewed 2026-05-15 15:45 UTC · model grok-4.3

classification 💻 cs.IR cs.AI
keywords customer digital twinsconjoint analysisLLM agentsretrieval-augmented generationpreference predictionmarket researchreview datavirtual respondents
0
0 comments X

The pith

LLM-based customer digital twins built from Reddit review histories replicate individual user preferences in conjoint analysis tasks with 87.73 percent accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes building LLM agents as customer digital twins that use a person's full Reddit review history to simulate responses in traditional conjoint studies. These twins turn reviews into a searchable vector database, then apply retrieval-augmented generation so the agent can retrieve and reason from the specific user's past statements when faced with new product profiles. When the twins complete the same pairwise comparison tasks given to real respondents, their choices match the actual users 87.73 percent of the time. The same process also produces part-worth utility estimates from logistic regression that align with observed market preferences in a computer-monitor case study. If the replication holds, the approach replaces slow and costly recruitment of human participants with scalable virtual ones derived from existing public data.

Core claim

Customer digital twins are LLM agents constructed by aggregating an individual's Reddit reviews into a vector database and applying retrieval-augmented generation with prompt engineering to answer conjoint questions. The agents complete pairwise comparisons on product profiles from a fractional factorial design; the resulting choice data is then analyzed via logistic regression to recover part-worth utilities. Empirical validation shows these agents match the original users' preferences at 87.73 percent accuracy, and a monitor category case study yields trade-off structures consistent with real market conditions.

What carries the argument

Retrieval-augmented generation over individualized vector databases of Reddit reviews, which lets each LLM agent retrieve and reason from a specific user's past preference statements while performing conjoint pairwise tasks.

If this is right

  • Conjoint studies can be run entirely with digital twins without recruiting or fatiguing real respondents.
  • Choice data from the twins supports standard logistic regression to estimate part-worth utilities for product attributes.
  • A monitor case study recovers realistic trade-offs such as panel type versus resolution that match external market observations.
  • The method scales to large numbers of virtual respondents at low marginal cost once review histories are processed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If review data proves rich enough, similar digital twins could be constructed for other preference domains such as subscription services or durable goods without new data collection.
  • Firms could prototype product concepts against large simulated populations before fielding any human surveys.
  • Using public reviews to create persistent individual simulations raises questions about consent and data ownership that the method leaves unaddressed.

Load-bearing premise

That a user's collected Reddit review history contains sufficient stable information about their preferences and constraints to accurately simulate responses to new product attribute trade-offs in the conjoint experiment.

What would settle it

Administering the identical pairwise conjoint questions directly to the original Reddit users and finding that their choices diverge from the CDT predictions in substantially more than 12 percent of cases on average.

Figures

Figures reproduced from arXiv: 2604.22756 by Bin Xuan, Hakyeon Lee, Jungmin Hwang.

Figure 1
Figure 1. Figure 1: Framework Overview [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example of ground truth construction from user comparison statement [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Example of CDT preference prediction based on user review history [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

Conjoint analysis is a cornerstone of market research for estimating consumer preferences; however, traditional methods face persistent challenges regarding time, cost, and respondent fatigue. To address these limitations, this study proposes a framework that utilizes large language model (LLM)-based "customer digital twins (CDT)" as virtual respondents. We identified active users within the Reddit community and aggregated their comprehensive review histories to construct individualized vector databases. By integrating retrieval-augmented generation (RAG) with prompt engineering, this study developed customer agents capable of dynamically retrieving and reasoning upon their specific past preferences and constraints. These customer agents, called CDTs, performed pairwise comparison tasks on product profiles generated via fractional factorial design, and the resulting choice data was analyzed to estimate part-worth utilities by logistic regression. Empirical validation demonstrates that these CDTs predict the preferences of actual users with 87.73% accuracy. Furthermore, a case study on the computer monitor category successfully quantified trade-offs between attributes such as panel type and resolution, deriving preference structures consistent with market realities. Ultimately, this study contributes to marketing research by presenting a scalable alternative that significantly improves both agility and cost-efficiency to traditional methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes LLM-based Customer Digital Twins (CDTs) built from aggregated Reddit review histories using RAG and prompt engineering. These agents simulate pairwise choices on fractional-factorial product profiles, with part-worth utilities estimated via logistic regression. The central claim is that the CDTs replicate real-user preferences at 87.73% accuracy, demonstrated in a computer-monitor case study, as a scalable, low-cost alternative to traditional conjoint analysis.

Significance. If the accuracy claim is substantiated with adequate validation details, the work could offer a practical advance in market research by replacing human respondents with individualized LLM agents, substantially lowering costs and respondent burden while preserving predictive fidelity. The external benchmark against held-out real-user choices is a methodological strength.

major comments (3)
  1. [Abstract] Abstract: The 87.73% accuracy figure is reported without any information on the number of users, number of conjoint tasks per user, held-out sample size, cross-validation procedure, or statistical tests, leaving the central empirical result impossible to evaluate for robustness or bias.
  2. [Methodology] Methodology section: The RAG pipeline retrieves Reddit reviews but provides no metric quantifying coverage of the specific conjoint attributes (e.g., panel type, resolution, refresh rate) in the retrieved context; without this, it is unclear whether the reported accuracy reflects user-specific data or LLM priors.
  3. [Empirical validation] Empirical validation: No controls or ablation are described for prompt sensitivity, Reddit data selection biases, or attribute sparsity, all of which directly affect whether the CDT predictions are driven by the constructed vector databases rather than general model knowledge.
minor comments (2)
  1. [Case study] The fractional-factorial design details and the exact logistic regression specification for part-worth estimation should be expanded with an equation or table for reproducibility.
  2. [Figures] Figure captions and axis labels in the preference-structure plots could be clarified to distinguish CDT-derived utilities from any real-user baselines.

Circularity Check

0 steps flagged

No significant circularity; validation uses held-out real-user choices as external benchmark

full rationale

The paper builds CDTs from aggregated Reddit review histories via RAG and prompt engineering, then runs them on fractional-factorial conjoint profiles to generate choice data for logistic regression utility estimation. The central claim of 87.73% accuracy is obtained by comparing CDT outputs directly to held-out choice responses from the same real users on identical profiles. This comparison is an independent external benchmark and does not reduce to any fitted parameter or self-referential definition within the CDT construction pipeline. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling appear in the provided derivation; the accuracy metric is statistically independent of the review-to-vector-database step.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that review text can serve as a sufficient proxy for future preference judgments in new product contexts; no explicit free parameters or invented physical entities are introduced.

axioms (1)
  • domain assumption LLM agents augmented with RAG on personal review histories can accurately simulate individual consumer preferences for unseen product profiles
    This assumption underpins the entire CDT construction and accuracy claim.
invented entities (1)
  • Customer Digital Twin (CDT) no independent evidence
    purpose: Virtual respondent that replicates a specific user's preferences in conjoint tasks
    New conceptual entity introduced to frame the LLM agent as a digital proxy; no independent falsifiable evidence beyond the reported accuracy is provided.

pith-pipeline@v0.9.0 · 5506 in / 1269 out tokens · 45739 ms · 2026-05-15T15:45:25.578192+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

  1. [1]

    and Kalai, A.T

    Aher, G., Arriaga, R.I. and Kalai, A.T. 2023. Using large language models to simulate multiple humans and replicate human sub ject studies. Proceedings of the 40th International Conference on Machine Learning (Honolulu, Hawaii, USA, 2023), 337–371

  2. [2]

    Andreas, J. 2022. Language Models as Agent Models. arXiv

  3. [3]

    Busby, Nancy Fulda, Joshua R

    Argyle, L.P., Busby, E.C., Fulda, N., Gubler, J., Rytting, C. and Wingate, D. 2023. Out of One, Many: Using Language Models to Simulate Human Samples. Political Analysis. 31, 3 (July 2023), 337–351. https://doi.org/10.1017/pan.2023.2

  4. [4]

    and Zhang, Y

    Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori , H., Palangi, H., Ribeiro, M.T. and Zhang, Y

  5. [5]

    Sparks of Artificial General Intelligence: Early experiments with GPT-4. arXiv

  6. [6]

    Cooper, R.G. 2019. The drivers of success in new -product development. Industrial Marketing Management. 76, (Jan. 2019), 36 –47. https://doi.org/10.1016/j.indmarman.2018.07.005

  7. [7]

    and Chen, Y

    Cunningham, C.E., Deal, K. and Chen, Y. 2010. Adaptive Choice-Based Conjoint Analysis: A New Patient-Centered Approach to the Assessment of Health Service Preferences. The Patient: Patient-Centered Outcomes Research. 3, 4 (Dec. 2010), 257–273. https://doi.org/10.2165/11537870-000000000-00000

  8. [8]

    and Hauser, J.R

    Dahan, E. and Hauser, J.R. 2002. The virtual customer. Journal of Product Innovation Management. 19, 5 (Sept. 2002), 332–353. https://doi.org/10.1111/1540-5885.1950332

  9. [9]

    and Fermo, G

    DeShazo, J.R. and Fermo, G. 2002. Designing Choice Sets for Stated Preference Methods: The Effects of Complexity on Choice Co nsistency. Journal of Environmental Economics and Management. 44, 1 (July 2002), 123–143. https://doi.org/10.1006/jeem.2001.1199

  10. [10]

    and Paolacci, G

    Goodman, J.K. and Paolacci, G. 2017. Crowdsourcing Consumer Research. Journal of Consumer Research. 44, 1 (June 2017), 196–210. https://doi.org/10.1093/jcr/ucx047

  11. [11]

    and Agarwal, M.K

    Green, P.E., Krieger, A.M. and Agarwal, M.K. 1991. Adaptive Conjoint Analysis: Some Caveats and Suggestions. Journal of Marke ting Research. 28, 2 (May 1991), 215. https://doi.org/10.2307/3172809

  12. [12]

    and Rao, V.R

    Green, P.E. and Rao, V.R. 1971. Conjoint Measurement for Quantifying Judgmental Data. Journal of Marketing Research. 8, 3 (Au g. 1971), 355. https://doi.org/10.2307/3149575

  13. [13]

    and Srinivasan, V

    Green, P.E. and Srinivasan, V. 1978. Conjoint Analysis in Consumer Research: Issues and Outlook. Journal of Consumer Research. 5, 2 (1978), 103–123

  14. [14]

    and Srinivasan, V

    Green, P.E. and Srinivasan, V. 1990. Conjoint Analysis in Marketing: New Developments with Implications for Research and Practice. Journal of Marketing. 54, 4 (Oct. 1990), 3–19. https://doi.org/10.1177/002224299005400402

  15. [15]

    Horton, J.J. 2023. Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? arXiv

  16. [16]

    Johnson, R.M. 1987. Adaptive conjoint analysis. Proceedings of the Sawtooth Software Conference on Perceptual Mapping, Conjoi nt Analysis, and Computer Interviewing (Ketchum, ID, 1987), 253–265

  17. [17]

    Adaptive Conjoint Analysis: Some Caveats and Suggestions

    Johnson, R.M. 1991. Comment on “Adaptive Conjoint Analysis: Some Caveats and Suggestions.” Journal of Marketing Research. 28, 2 (May 1991), 223. https://doi.org/10.2307/3172810

  18. [18]

    and Orme, B.K

    Johnson, R.M. and Orme, B.K. 2007. A New Approach to Adaptive CBC. (2007)

  19. [19]

    Lancaster, K.J. 1966. A New Approach to Consumer Theory. Journal of Political Economy. 74, 2 (Apr. 1966), 132–157. https://doi.org/10.1086/259131

  20. [20]

    and Young, M.R

    Lenk, P.J., DeSarbo, W.S., Green, P.E. and Young, M.R. 1996. Hierarchical Bayes Conjoint Analysis: Recovery of Partworth Hete rogeneity from Reduced Experimental Designs. Marketing Science. 15, 2 (1996), 173–191

  21. [21]

    and Collier, N

    Liu, Y., Zhou, H., Guo, Z., Shareghi, E., Vulić, I., Korhonen, A. and Collier, N. 2024. Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators. arXiv

  22. [22]

    and Zhang, Z.J

    Miller, K.M., Hofstetter, R., Krohmer, H. and Zhang, Z.J. 2011. How Should Consumers’ Willingness to Pay be Measured? An Empi rical Comparison of State -of-the-Art Approaches. Journal of Marketing Research. 48, 1 (Feb. 2011), 172–184. https://doi.org/10.1509/jmkr.48.1.172

  23. [23]

    and Okano, K

    Mukaida, K., Ogata, S. and Okano, K. 2025. Persona-driven automated extraction of non-functional requirements using LLM agents. Procedia Computer Science. 270, (2025), 485–494. https://doi.org/10.1016/j.procs.2025.09.167

  24. [24]

    and Bernstein, M.S

    Park, J.S., O’Brien, J.C., Cai, C.J., Morris, M.R., Liang, P. and Bernstein, M.S. 2023. Generative Agents: Interactive Simulacra of Human Behavior. arXiv

  25. [25]

    and Wardell, D.G

    Pullman, M.E., Moore, W.L. and Wardell, D.G. 2002. A comparison of quality function deployment and conjoint analysis in new product design. Journal of Product Innovation Management. 19, 5 (2002), 354–364. https://doi.org/10.1111/1540-5885.1950354

  26. [26]

    and Bendersky, M

    Qin, Z., Jagerman, R., Hui, K., Zhuang, H., Wu, J., Yan, L., Shen, J., Liu, T., Liu, J., Metzler, D., Wang, X. and Bendersky, M. 2024. Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting. Findings of the Association for Computational Linguistics: NAACL 2024 (Mexico City, Mexico, 2024), 1504–1518

  27. [27]

    and Loni, M

    Rizwan, M., Carlsson, L. and Loni, M. 2025. PersonaBOT: Bringing Customer Personas to Life with LLMs and RAG. arXiv

  28. [28]

    Rosen, S. 1974. Hedonic Prices and Implicit Markets: Product Differentiation in Pure Competition. Journal of Political Economy. 82, 1 (1974), 34–55

  29. [29]

    and Hashimoto, T

    Santurkar, S., Durmus, E., Ladhak, F., Lee, C., Liang, P. and Hashimoto, T. 2023. Whose Opinions Do Language Models Reflect? arXiv

  30. [30]

    and Adamowicz, W

    Swait, J. and Adamowicz, W. 2001. The Influence of Task Complexity on Consumer Choice: A Latent Class Model of Decision Strat egy Switching. Journal of Consumer Research. 28, 1 (June 2001), 135–148. https://doi.org/10.1086/321952

  31. [31]

    and O’Boyle, E.H

    Walter, S.L., Seibert, S.E., Goering, D. and O’Boyle, E.H. 2019. A Tale of Two Sample Sources: Do Results from Online Panel D ata and Conventional Data Converge? Journal of Business and Psychology. 34, 4 (Aug. 2019), 425–452. https://doi.org/10.1007/s10869-018-9552-y

  32. [32]

    Wang, L. et al. 2025. User Behavior Simulation with Large Language Model -based Agents. ACM Transactions on Information Systems. 43, 2 (Mar. 2025), 1 –37. https://doi.org/10.1145/3708985

  33. [33]

    and Huang, E

    Zhang, B., Li, J., Hortaçsu, A., Ye, X., Chernozhukov, V., Ni, A. and Huang, E. 2025. Agentic Economic Modeling. arXiv. A Prompts ### ROLE & PERSONA You are the Reddit user '{user_id}'. The content provided below represents your OWN past memories, reviews, and opinions. You must simulate this specific user's preference logic, writing style, and decision-m...

  34. [34]

    Analyze the memories to determine which option aligns better with your past self

  35. [35]

    You MUST return the result in a valid JSON format

  36. [36]

    choice":

    Do not include any markdown formatting (like ```json) or additional text. Just the raw JSON string. ### REQUIRED JSON OUPUT {{ "choice": "A" // or "B" }}