Ranked-choice conjoint experiments

Mats Ahrenshop; Spyros Kosmidis; Thomas S. Robinson

arxiv: 2604.15064 · v1 · submitted 2026-04-16 · 📊 stat.ME

Ranked-choice conjoint experiments

Thomas S. Robinson , Mats Ahrenshop , Spyros Kosmidis This is my paper

Pith reviewed 2026-05-10 10:48 UTC · model grok-4.3

classification 📊 stat.ME

keywords conjoint experimentsranked choiceAMCE estimationexperimental efficiencysurvey methodspolitical experimentsassumption testing

0 comments

The pith

Ranked-choice conjoint experiments produce the same average marginal component effects as forced-choice designs but with substantially higher statistical efficiency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper formalizes a method to expand ranked-choice responses into multiple observations for conjoint analysis. It proves that these rank-expanded estimators are mathematically equivalent to the standard average marginal component effect (AMCE) estimator from binary choices. A theoretical account shows that including more ranked profiles per vignette increases the number of effective observations, thereby improving precision. Pre-registered experiments across candidate and policy domains confirm that estimates remain substantively similar while standard errors shrink by 12-13 percent with one extra profile and up to 55 percent with six profiles. The approach includes design-based tests for the key assumptions of transitivity and independence of irrelevant alternatives, and recommends using four profiles for most applications based on efficiency-validity trade-offs.

Core claim

Rank expansion treats each position in a ranking as a separate forced-choice comparison, which is proven to yield identical AMCE estimates to conventional designs. Additional profiles per vignette multiply the data points without altering the causal estimand, leading to efficiency gains that scale with the number of profiles. Design-based tests allow researchers to check whether transitivity and independence of irrelevant alternatives hold in their data.

What carries the argument

Rank expansion, which converts a single ranked response into multiple binary choice observations that are equivalent to standard conjoint data.

If this is right

AMCE point estimates remain unchanged while precision improves with more profiles.
Standard errors can be reduced by over 50 percent with six ranked profiles per vignette.
Tests for transitivity and IIA provide a way to validate the expansion in new contexts.
Four profiles per vignette offer a practical balance for most survey experiments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Survey designers could reduce sample sizes while maintaining statistical power by adopting ranked choices.
Similar expansions might apply to other experimental methods involving rankings or preferences.
Domains where transitivity fails could still use partial rankings or hybrid designs.

Load-bearing premise

The assumptions of transitivity and independence of irrelevant alternatives hold in the choice contexts where the ranked designs are applied.

What would settle it

A large-scale experiment where the rank-expanded AMCE estimates diverge from those obtained via forced-choice designs, or where the provided tests reject transitivity or IIA.

Figures

Figures reproduced from arXiv: 2604.15064 by Mats Ahrenshop, Spyros Kosmidis, Thomas S. Robinson.

**Figure 2.** Figure 2: Estimated difference in power between RCC and FCC designs, holding constant [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Example of the ranked-choice budget vignette. [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Estimated AMCEs from the candidate experiment. Left panel: Study 1 (FCC vs [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

**Figure 5.** Figure 5: Study 2: Proportion of transitivity (left) and IIA (right) violations by number of [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

**Figure 6.** Figure 6: Out-of-sample classification performance (ROC-AUC) across both studies. The [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗

**Figure 7.** Figure 7: AUC gain from ranking (K = 4 versus K = 2) as a function of baseline prediction difficulty, by attribute-level. Each point represents a subset of the Study 2 randomized test profiles sharing a given attribute level (e.g. Party = Independent, Race = Hispanic); point size is proportional to subset size. The x-axis shows the AUC achieved by the forced-choice (K = 2) model on that subset, and the y-axis shows … view at source ↗

**Figure 8.** Figure 8: Summary of trade-offs across values of K. Panel (A) shows theoretical (Proposition 2) and empirical SE reductions relative to K = 2; bold labels indicate the FCC samplesize multiplier (number of forced-choice respondents needed per ranked-choice respondent to achieve equivalent precision). Panel (B) shows the mean absolute deviation in AMCE estimates when subjects who violated transitivity or IIA are exc… view at source ↗

read the original abstract

Forced-choice conjoint designs have become a staple method in the experimentalist's toolkit. However, the forced-choice outcome is neither always consistent with the types of choices individuals make in real political contexts, nor is it statistically efficient. In this paper, we formalize how ranked outcomes can be integrated into the conjoint framework. We provide a proof that rank-expanded estimators are equivalent to conventional AMCE, a theoretical account of how additional profiles increase the efficiency of conjoint designs, and design-based tests for the transitivity and independence of irrelevant alternatives assumptions that underpin the expansion. Across two pre-registered survey experiments--the first comparing forced-choice and ranked-choice designs across candidate and policy domains, and the second varying the number of ranked profiles--we find that ranked-choice conjoints yield substantively similar but more precise AMCE estimates, shrinking standard errors by 12-13% with one additional profile and up to 55% with six profiles per vignette. Based on efficiency--validity trade-offs, we recommend K = 4 profiles for most applications. We provide an accompanying open-source R package, cjrank, that implements rank expansion, AMCE estimation, efficiency diagnostics, and the assumption tests described in this paper.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper shows ranked-choice conjoints match standard AMCE estimates while cutting standard errors by 12-55% depending on profiles, with a direct equivalence proof, efficiency derivation, assumption tests, and usable code.

read the letter

The main thing to know is that this work formalizes ranked outcomes inside conjoint experiments, proves the expanded estimator equals the usual AMCE, derives the efficiency lift from extra profiles, and supplies design-based tests for transitivity and IIA. They back it with two pre-registered experiments and ship an R package that does the expansion, estimation, diagnostics, and tests in one place. That combination is the concrete advance over existing conjoint tools. The experiments deliver the numbers they promise: 12-13% SE reduction with one added profile and up to 55% with six, leading to their K=4 recommendation as a practical balance. The package lowers the barrier for others to try it. The equivalence argument is direct rather than fitted, and the efficiency account comes from information per vignette rather than post-hoc fitting. Those pieces hold up on the terms the paper sets. The soft spots sit with the assumptions. Transitivity and IIA have to hold for the rank expansion to stay unbiased, and while the paper gives tests, those tests are design-based and may miss violations that appear in messier political choice settings or with respondent fatigue. The efficiency gains are theoretically clean but the exact percentages will vary by domain and vignette length, so the K=4 suggestion is a reasonable starting point rather than a universal rule. Generalizability beyond the two survey experiments is left for users to check. This is aimed at political scientists and applied researchers who already run conjoints and want tighter estimates without bigger samples. Methodologists who care about experimental design will get the most from the formal steps and the code. It is solid enough on its own terms to deserve a serious referee, even if some readers will want to stress-test the assumption checks on their own data. I would send it out for peer review.

Referee Report

0 major / 3 minor

Summary. The paper formalizes the integration of ranked outcomes into forced-choice conjoint experiments. It provides a proof that rank-expanded estimators are equivalent to the conventional AMCE estimator, a theoretical derivation of efficiency gains from including additional ranked profiles, and design-based tests for the transitivity and IIA assumptions. Two pre-registered experiments (one comparing forced-choice vs. ranked-choice across domains, one varying the number of profiles) show substantively similar AMCEs with reduced standard errors (12-13% for one extra profile, up to 55% for six profiles), leading to a recommendation of K=4 profiles per vignette. An open-source R package cjrank implements the estimator, diagnostics, and tests.

Significance. If the equivalence proof and efficiency derivation hold under the stated conditions, this contribution could meaningfully improve the precision of conjoint experiments in political science and related fields while preserving the interpretability of AMCEs. The design-based assumption tests, pre-registered experiments with concrete efficiency numbers, information-theoretic account of gains from additional profiles, and the reproducible cjrank package are explicit strengths that facilitate verification and adoption.

minor comments (3)

The abstract states efficiency reductions of 12-13% with one additional profile and up to 55% with six, but the main text should include the exact variance formula or information-theoretic derivation (likely in the theoretical section) to allow readers to reproduce the claimed gains without re-deriving from scratch.
The recommendation of K=4 is based on efficiency-validity trade-offs; a brief sensitivity table or figure showing SE reduction and assumption-test p-values across K=2 to K=6 would strengthen the practical guidance.
Notation for the rank-expanded estimator and the conventional AMCE should be aligned more explicitly (e.g., via a side-by-side equation display) to highlight the equivalence result.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive assessment of our manuscript, accurate summary of the contributions, and recommendation for minor revision. We are pleased that the equivalence proof, efficiency derivation, design-based tests, pre-registered experiments, and cjrank package were recognized as strengths.

read point-by-point responses

Referee: The paper formalizes the integration of ranked outcomes into forced-choice conjoint experiments. It provides a proof that rank-expanded estimators are equivalent to the conventional AMCE estimator, a theoretical derivation of efficiency gains from including additional ranked profiles, and design-based tests for the transitivity and IIA assumptions. Two pre-registered experiments (one comparing forced-choice vs. ranked-choice across domains, one varying the number of profiles) show substantively similar AMCEs with reduced standard errors (12-13% for one extra profile, up to 55% for six profiles), leading to a recommendation of K=4 profiles per vignette. An open-source R package cjrank implements the estimator, diagnostics, and tests.

Authors: We appreciate the referee's concise and accurate summary of the paper's main elements. No specific concerns or requests for clarification were raised in the major comments, and we agree with this characterization of the work. We will incorporate any minor editorial suggestions during the revision process. revision: no

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's core contributions—a direct algebraic proof that rank-expanded estimators equal conventional AMCE, an information-theoretic account of efficiency gains from additional ranked profiles, and design-based tests for transitivity and IIA—are established independently of fitted parameters, self-referential predictions, or load-bearing self-citations. The equivalence follows from explicit mathematical identity to the standard estimator rather than by construction from the target result itself, while efficiency derivations rely on standard variance-reduction principles without reference to the paper's own empirical estimates or prior author work. The recommendation of K=4 profiles is a post-hoc trade-off summary, not a derived prediction that collapses into its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The method rests on standard choice-modeling assumptions that the paper subjects to design-based tests rather than treating them as unexamined background.

axioms (2)

domain assumption Preferences satisfy transitivity when respondents produce rankings
Required for the rank-expansion estimator to be well-defined and equivalent to AMCE.
domain assumption Independence of irrelevant alternatives holds for the ranked profiles
Underpins validity of the expansion; the paper supplies design-based tests for it.

pith-pipeline@v0.9.0 · 5507 in / 1206 out tokens · 45460 ms · 2026-05-10T10:48:33.610303+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

7 extracted references · 7 canonical work pages · 1 internal anchor

[1]

Learning Preferences from Conjoint Data: A Structural Deep Learning Approach

Abramson, S. F., Ko¸ cak, K. & Magazinnik, A. (2022), ‘What do we learn about voter prefer- ences from conjoint experiments?’,American Journal of Political Science66(4), 1008–1020. Acharya, A., Hainmueller, J. & Xu, Y. (2026), ‘Learning preferences from conjoint data: A structural deep learning approach’ . URL:https://arxiv.org/abs/2604.10845 Atsusaka, Y....

work page internal anchor Pith review Pith/arXiv arXiv 2022
[2]

LetP i ={p1,...,p K}denote the set of profiles shown in a given round

We assume throughout that all attribute levels are independently and uniformly randomized across profiles, subjects, and rounds. LetP i ={p1,...,p K}denote the set of profiles shown in a given round. In aforced-choice design (K= 2), the observed outcome is: Y FC ipp′=1(U ip >U ip′). i Hainmueller et al. (2014) show that, under independent randomization of...

work page 2014
[3]

Proposition 1(Unbiasedness of AMCE under rank expansion).Under Assumptions 1 and 2, the OLS estimator applied to the rank-expanded dataset{(˜Yipp′,X p,X p′)}recovers the AMCE

The third is the definition ofY FC. Proposition 1(Unbiasedness of AMCE under rank expansion).Under Assumptions 1 and 2, the OLS estimator applied to the rank-expanded dataset{(˜Yipp′,X p,X p′)}recovers the AMCE. Proof.We show that the rank-expanded observations satisfy the same conditional moment restrictions as forced-choice data, so the identification r...

work page 2014
[4]

accurate

3K(K−1). This follows fromVar( ˜R) = (K+1)/[12(K−1)](the variance of a rescaled discrete uniform), compared withVar(Y FC) = 1/4, and theK/2ratio in effective observations per round. For K= 2, the normalized rank reduces to the binary choice indicator, nesting the standard forced-choice estimator as a special case. For the values ofKused in this paper, the...

work page 2014
[5]

K= 4, random 1-per-task 3,687 924 0.694 [0.660, 0.729] K= 4, top-vs-2nd ranked only 3,687 924 0.677 [0.642, 0.713] K= 6, random 1-per-task 3,846 962 0.706 [0.673, 0.739] K= 6, top-vs-2nd ranked only 3,846 962 0.617 [0.581, 0.652] Table A5: Held-out randomized vignette classification from Study 2, using the structural deep-learning random-utility model of ...

work page 2026
[6]

When placed on a comparable scale, the standard errors are of similar magnitude, reflecting the additional information carried by the continuous outcome

The point estimates are similar across approaches (consistent with the high correlations reported in Section 5.4). When placed on a comparable scale, the standard errors are of similar magnitude, reflecting the additional information carried by the continuous outcome. This validates the ranking approach while confirming that the choice-based framework of ...

work page 2026
[7]

The gain disappears: subsampled K= 4 andK= 6 AUCs return to within confidence intervals of theK= 2 baseline (0.69 and 0.71, respectively; lower panel). The estimator’s improvement under ranking therefore reflectspair quantity, with rank-expanded pairs behaving as equivalent in kind to forced- choice pairs under the random-utility model–consistent with our...

work page 2026

[1] [1]

Learning Preferences from Conjoint Data: A Structural Deep Learning Approach

Abramson, S. F., Ko¸ cak, K. & Magazinnik, A. (2022), ‘What do we learn about voter prefer- ences from conjoint experiments?’,American Journal of Political Science66(4), 1008–1020. Acharya, A., Hainmueller, J. & Xu, Y. (2026), ‘Learning preferences from conjoint data: A structural deep learning approach’ . URL:https://arxiv.org/abs/2604.10845 Atsusaka, Y....

work page internal anchor Pith review Pith/arXiv arXiv 2022

[2] [2]

LetP i ={p1,...,p K}denote the set of profiles shown in a given round

We assume throughout that all attribute levels are independently and uniformly randomized across profiles, subjects, and rounds. LetP i ={p1,...,p K}denote the set of profiles shown in a given round. In aforced-choice design (K= 2), the observed outcome is: Y FC ipp′=1(U ip >U ip′). i Hainmueller et al. (2014) show that, under independent randomization of...

work page 2014

[3] [3]

Proposition 1(Unbiasedness of AMCE under rank expansion).Under Assumptions 1 and 2, the OLS estimator applied to the rank-expanded dataset{(˜Yipp′,X p,X p′)}recovers the AMCE

The third is the definition ofY FC. Proposition 1(Unbiasedness of AMCE under rank expansion).Under Assumptions 1 and 2, the OLS estimator applied to the rank-expanded dataset{(˜Yipp′,X p,X p′)}recovers the AMCE. Proof.We show that the rank-expanded observations satisfy the same conditional moment restrictions as forced-choice data, so the identification r...

work page 2014

[4] [4]

accurate

3K(K−1). This follows fromVar( ˜R) = (K+1)/[12(K−1)](the variance of a rescaled discrete uniform), compared withVar(Y FC) = 1/4, and theK/2ratio in effective observations per round. For K= 2, the normalized rank reduces to the binary choice indicator, nesting the standard forced-choice estimator as a special case. For the values ofKused in this paper, the...

work page 2014

[5] [5]

K= 4, random 1-per-task 3,687 924 0.694 [0.660, 0.729] K= 4, top-vs-2nd ranked only 3,687 924 0.677 [0.642, 0.713] K= 6, random 1-per-task 3,846 962 0.706 [0.673, 0.739] K= 6, top-vs-2nd ranked only 3,846 962 0.617 [0.581, 0.652] Table A5: Held-out randomized vignette classification from Study 2, using the structural deep-learning random-utility model of ...

work page 2026

[6] [6]

When placed on a comparable scale, the standard errors are of similar magnitude, reflecting the additional information carried by the continuous outcome

The point estimates are similar across approaches (consistent with the high correlations reported in Section 5.4). When placed on a comparable scale, the standard errors are of similar magnitude, reflecting the additional information carried by the continuous outcome. This validates the ranking approach while confirming that the choice-based framework of ...

work page 2026

[7] [7]

The gain disappears: subsampled K= 4 andK= 6 AUCs return to within confidence intervals of theK= 2 baseline (0.69 and 0.71, respectively; lower panel). The estimator’s improvement under ranking therefore reflectspair quantity, with rank-expanded pairs behaving as equivalent in kind to forced- choice pairs under the random-utility model–consistent with our...

work page 2026