Learning Preferences from Conjoint Data: A Structural Deep Learning Approach

Avidit Acharya; Jens Hainmueller; Yiqing Xu

arxiv: 2604.10845 · v2 · pith:6UJ4OZBZnew · submitted 2026-04-12 · 📊 stat.ME · econ.EM

Learning Preferences from Conjoint Data: A Structural Deep Learning Approach

Avidit Acharya , Jens Hainmueller , Yiqing Xu This is my paper

Pith reviewed 2026-05-10 15:05 UTC · model grok-4.3

classification 📊 stat.ME econ.EM

keywords conjoint experimentsdeep neural networksrandom utility modelspreference heterogeneitydouble/debiased machine learningpolitical science methods

0 comments

The pith

Embedding a deep neural network inside a random utility logit model recovers flexible preference heterogeneity from conjoint data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a structural method that places a deep neural network inside the standard random utility logit model so that preference parameters can depend on respondent characteristics in fully flexible ways. This addresses the risk that rigid parametric forms miss the true data-generating process while double/debiased machine learning keeps inference valid for average preference parameters. When applied to three well-known conjoint studies, the approach shows that conventional reduced-form averages conceal substantial individual-level variation in political and policy preferences.

Core claim

By embedding a deep neural network within the random utility logit model, preference parameters become arbitrary functions of respondent characteristics, and double/debiased machine learning delivers valid inference on average effects; applications to existing conjoint data then uncover rich heterogeneity that reduced-form averages hide.

What carries the argument

A deep neural network embedded inside the random utility logit model that maps respondent characteristics to the vector of preference parameters.

If this is right

A near-zero average gender effect can coexist with 83 percent of respondents preferring female candidates.
Opposition to undemocratic behavior is nearly universal yet varies sharply in intensity across individuals.
Support for progressive taxation cuts across every partisan subgroup rather than aligning neatly with party lines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same neural-network embedding could be applied to other discrete-choice surveys that collect respondent covariates.
Simpler flexible approximators such as random forests might be tested as lower-complexity alternatives for similar heterogeneity recovery.
Reduced-form conjoint analyses common in political science may systematically understate the nuance present in public preferences.

Load-bearing premise

The random utility logit model with neural-network parameters still correctly represents how respondents actually make choices.

What would settle it

A simulation in which choice data are generated from known preference functions of respondent characteristics and the method is checked for accurate recovery of both the full heterogeneity map and the average parameters.

Figures

Figures reproduced from arXiv: 2604.10845 by Avidit Acharya, Jens Hainmueller, Yiqing Xu.

**Figure 1.** Figure 1: Saha & Weeks (2022) candidate choice conjoint. A: Average preference parameters ˆ θk on the logit scale with 95% DML confidence intervals. B: Average marginal effects on the probability scale; the structural estimate (green circles) and the linear probability model AMCE (orange triangles) nearly perfectly coincide, confirming that the structural model’s average nests the standard reduced-form estimand. C: … view at source ↗

**Figure 2.** Figure 2: shows the full density of βˆ k(Zi) across respondents for each attribute level, ordered by variance. Complete Overhaul is the most heterogeneous: while nearly all respondents favor it, the intensity ranges from near-zero to over 1.5 in logit units. The Empathetic distribution straddles zero, while Hard-Working is tightly concentrated in positive territory. These densities illustrate the direction-versus-in… view at source ↗

**Figure 3.** Figure 3: Individual-level preference parameters βˆ k(Zi) for preferring a female candidate (top) and an empathetic candidate (bottom), by respondent party (columns) and respondent gender (fill). Solid vertical lines mark subgroup means; dashed vertical line at zero. Gender preference is positive nearly everywhere with a pronounced respondent-gender gap; empathy preference is structured primarily by party, with Repu… view at source ↗

**Figure 4.** Figure 4: Distribution of individual-level attribute importance shares in the Saha & Weeks (2022) conjoint. Dashed lines and labels mark the mean. Policy agenda dominates on average (66%) but with considerable individual heterogeneity. 4.2 The Democracy Tradeoff Graham and Svolik (2020) study how American voters trade off democratic principles against policy and partisan considerations. Their conjoint experiment pre… view at source ↗

**Figure 5.** Figure 5: Graham & Svolik (2020) candidate choice conjoint (profession dummies omitted). A: Average preference parameters ˆθk with 95% DML CIs. B: Fraction favoring vs. opposing each attribute level. All undemocratic actions are opposed by >95% of respondents. 20 [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗

**Figure 6.** Figure 6: Ridgeline densities of βˆ k(Zi) in Graham & Svolik (2020), ordered by variance (profession dummies omitted). Undemocratic actions cluster entirely below zero; the spread captures heterogeneity in intensity. 21 [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗

**Figure 7.** Figure 7: Democracy sensitivity by 7-point ideology. The U-shape shows that both ideological poles penalize undemocratic behavior more heavily than the moderate middle. The marginal rate of substitution between undemocratic actions and co-partisanship quantifies how much partisan benefit a voter must sacrifice to avoid a democratic violation [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗

**Figure 8.** Figure 8 [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗

**Figure 9.** Figure 9: Ballard-Rosa et al. (2017) tax-plan conjoint. A: Average preference parameters ˆθk on the logit scale (per 1 percentage point of rate, or per unit of revenue). B: Fraction of respondents with βˆ i,k > 0 (favor raising that rate) vs. βˆ i,k < 0 (oppose). The $85–175k bracket is the most polarized dimension—55% favor, 45% oppose—despite its near-zero average effect. on the log of the bracket midpoint: si = P… view at source ↗

**Figure 10.** Figure 10: Distribution of individual-level βˆ i,k schedules, by party. A: Party medians overlaid with interquartile (25–75) and 10–90 percentile bands. B: A random sample of 200 individual respondent schedules per party (semi-transparent), with the party mean in bold. Almost every individual line slopes upward, but the level varies enormously: within-party heterogeneity dwarfs the between-party gap. 27 [PITH_FULL… view at source ↗

**Figure 11.** Figure 11: Attribute importance shares (variance decomposition) by party. Democrats allocate nearly three times as much variance to the top bracket as Republicans; Republicans allocate roughly twice as much to the low and middle brackets as Democrats. 28 [PITH_FULL_IMAGE:figures/full_fig_p029_11.png] view at source ↗

read the original abstract

Conjoint experiments randomize multidimensional profiles, offering a powerful design for recovering structural preference parameters -- including marginal rates of substitution, willingness to pay, and the distribution of preferences across a population. Yet the dominant approach in political science has focused on nonparametric causal estimands that do not leverage this potential. We propose a structural approach that embeds a deep neural network within a random utility logit model, allowing preference parameters to vary as a fully flexible function of respondent characteristics. The neural network addresses the concern that a parametric specification may not capture the true data generating process, while double/debiased machine learning provides valid inference on average preference parameters. We apply our method to three prominent conjoint studies and find rich preference heterogeneity masked by reduced-form averages: a near-zero gender effect coexists with 83% preferring female candidates, opposition to undemocratic behavior is near-universal but varies sharply in intensity, and progressive tax preferences cut across every partisan subgroup.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They embed a deep NN inside a structural logit for conjoint data and use double ML for average preference inference, but whether that inference is valid is the open question.

read the letter

The paper's main contribution is putting a neural network inside the random utility logit so preference parameters become flexible functions of respondent traits, then applying double/debiased ML to recover average effects and heterogeneity. They run it on three existing conjoint studies and show cases where averages look small but most respondents actually lean one way, such as the gender preference example or the tax attitudes that cross party lines.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes embedding a deep neural network inside a random utility logit model for conjoint experiments, allowing preference parameters to vary flexibly as functions of respondent characteristics. Double/debiased machine learning is used to recover valid inference on population-average preference parameters, and the method is applied to three prominent conjoint studies to document rich heterogeneity masked by reduced-form averages (e.g., near-zero average gender effects coexisting with 83% of respondents preferring female candidates).

Significance. If the inference claims hold, the approach would usefully bridge reduced-form and structural traditions in conjoint analysis by delivering both flexible heterogeneity modeling and interpretable average structural quantities such as marginal rates of substitution. The three empirical applications illustrate how the method can revise substantive conclusions about voter preferences.

major comments (2)

[§3] §3 (Estimation): the claim that double/debiased ML delivers valid confidence intervals for the averaged structural parameters rests on Neyman orthogonality and rate conditions for the nuisance estimator. Because the neural network is embedded directly inside the choice probabilities rather than as a separate first-stage predictor, standard DML theory for partially linear models does not apply verbatim; the manuscript should supply either a tailored orthogonality argument or Monte Carlo evidence that first-order bias is controlled under the chosen regularization and cross-fitting scheme.
[§4] §4 (Applications): the headline heterogeneity statistics (83% preferring female candidates, near-universal but intensity-varying opposition to undemocratic behavior) are post-estimation functionals of the fitted neural network. The paper should report how these quantities change under alternative architectures, regularization strengths, or cross-fitting folds; without such checks the reported percentages risk being artifacts of the particular NN specification.

minor comments (2)

[§2] Notation for the neural-network-embedded utility function is introduced without an explicit equation number; adding a displayed equation would improve readability.
[Abstract] The abstract and introduction use the phrase 'parameter-free' for the average quantities recovered by DML; this is imprecise because the NN still contains many free parameters whose influence is only averaged out.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and describe the revisions we will undertake to strengthen the manuscript.

read point-by-point responses

Referee: [§3] §3 (Estimation): the claim that double/debiased ML delivers valid confidence intervals for the averaged structural parameters rests on Neyman orthogonality and rate conditions for the nuisance estimator. Because the neural network is embedded directly inside the choice probabilities rather than as a separate first-stage predictor, standard DML theory for partially linear models does not apply verbatim; the manuscript should supply either a tailored orthogonality argument or Monte Carlo evidence that first-order bias is controlled under the chosen regularization and cross-fitting scheme.

Authors: We agree that the standard partially linear DML framework does not apply verbatim given the structural embedding of the neural network inside the choice probabilities. In the revision we will add a dedicated subsection deriving the Neyman orthogonality condition for the average structural parameters under our random-utility model with embedded NN nuisance function. We will also include Monte Carlo simulations calibrated to the sample sizes and regularization levels used in the applications, confirming that first-order bias remains negligible under the cross-fitting scheme. revision: yes
Referee: [§4] §4 (Applications): the headline heterogeneity statistics (83% preferring female candidates, near-universal but intensity-varying opposition to undemocratic behavior) are post-estimation functionals of the fitted neural network. The paper should report how these quantities change under alternative architectures, regularization strengths, or cross-fitting folds; without such checks the reported percentages risk being artifacts of the particular NN specification.

Authors: We acknowledge that the reported heterogeneity functionals could be sensitive to modeling choices. The revised manuscript will include a new robustness subsection that recomputes the key post-estimation quantities (including the 83% figure and intensity distributions) under (i) alternative network depths and widths, (ii) a grid of regularization strengths, and (iii) different numbers of cross-fitting folds. Results will be summarized in a table showing that the substantive conclusions remain stable. revision: yes

Circularity Check

0 steps flagged

No circularity: structural DNN-logit with DML produces data-driven estimates

full rationale

The paper embeds a neural network inside a random-utility logit to allow flexible preference parameters, then applies double/debiased ML to recover average structural quantities from conjoint choice data. No equation or step equates a claimed result to its own fitted inputs by construction; the reported heterogeneity (e.g., 83 % preferring female candidates) is an empirical output obtained by maximizing the model likelihood on observed profiles. The method relies on external DML theory and standard logit assumptions rather than self-referential definitions or self-citation chains that would force the conclusions. The derivation therefore remains non-circular.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach relies on the domain assumption of the random utility model and treats the neural network as a flexible approximator without additional invented entities.

free parameters (1)

Neural network parameters
The weights and biases of the deep neural network are fitted to the data to allow flexible variation of preference parameters with respondent characteristics.

axioms (1)

domain assumption Choice follows a random utility maximization model with logistic errors
This is the foundational assumption for the logit model structure.

pith-pipeline@v0.9.0 · 5460 in / 1262 out tokens · 49457 ms · 2026-05-10T15:05:00.691263+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Ranked-choice conjoint experiments
stat.ME 2026-04 unverdicted novelty 7.0

Ranked-choice conjoint experiments produce AMCE estimates equivalent to forced-choice designs but with 12-55% smaller standard errors depending on the number of ranked profiles, recommending four profiles per vignette.