pith. machine review for the scientific record. sign in

arxiv: 2605.11240 · v1 · submitted 2026-05-11 · 💻 cs.GT · cs.CY· cs.LG

Recognition: no theorem link

When to Ask a Question: Understanding Communication Strategies in Generative AI Tools

Authors on Pith no claims yet

Pith reviewed 2026-05-13 01:08 UTC · model grok-4.3

classification 💻 cs.GT cs.CYcs.LG
keywords generative AIpreference elicitationbias mitigationuser interactioninformation inferencefairness in AIstylized model
0
0 comments X

The pith

Generative AI can mitigate bias from preference inference by strategically asking users a few targeted questions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a stylized model of how users interact with large language models, where users give short prompts and the model infers the rest from typical preference patterns. This inference often favors common viewpoints and leaves out people whose tastes differ from the majority. Because preferences across users tend to be correlated, the model shows it is possible to decide exactly how much extra information to request before generating an answer. The goal is an objective that weighs the cost of extra questions against the gain in accurate representation of the user's actual preferences. When this balance is struck correctly, the system produces outputs that reflect a wider range of perspectives without forcing every user to supply every detail.

Core claim

In the stylized user-LLM interaction model, an optimal policy for information elicitation exists that reduces the systematic bias introduced by pure preference inference; the policy uses observed correlations among user preferences to determine how many and which questions to ask before generating content, thereby improving representation of atypical preferences while limiting the burden placed on users.

What carries the argument

A stylized model of user-LLM interaction whose objective trades off user burden against the accuracy of preference representation, with the decision to elicit further information driven by the degree of correlation across users' preferences.

If this is right

  • Generative tools can be tuned to ask only the questions that most reduce representation error for atypical users.
  • The optimal number of questions rises when preference correlations weaken or when inference error is high.
  • Systems that follow the derived policy produce outputs closer to each user's true preferences than systems that always infer everything.
  • Efficiency is preserved because elicitation stops once additional questions yield diminishing returns on the representation objective.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same tradeoff logic could guide question-asking in non-generative AI assistants such as recommendation engines or decision-support tools.
  • Real-world deployment would require learning the correlation structure from past interactions rather than assuming it is known in advance.
  • If the correlation structure changes over time, the optimal elicitation policy would need periodic re-estimation from fresh user data.

Load-bearing premise

Individual user preferences are sufficiently correlated that inference from limited input remains reliable and selective elicitation adds value.

What would settle it

An empirical test showing that when user preferences are drawn from an uncorrelated distribution, the amount of elicitation recommended by the model fails to improve representation accuracy or increases user burden without reducing bias.

Figures

Figures reproduced from arXiv: 2605.11240 by Charlotte Park, Kate Donahue, Manish Raghavan.

Figure 1
Figure 1. Figure 1: A screenshot of ChatGPT prompting the user for additional information when an under [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Expected utility E[U(k)] vs p in the case of α = 1/2 for n = 3, 6, 10, and 30. For each n, as the noise rate p increases, there comes a point where it optimizes expected utility to not ask any questions. In other words, as there is more uncertainty to the user’s cluster of origin based on the information, there is less value to asking questions. Additional figures are provided in Appendix C. When the noise… view at source ↗
Figure 3
Figure 3. Figure 3: Expected utility E[U(k)] vs p in the case of α = 0.9 for n = 3, 6, 10, and 30. When n is smaller, k = 0 is the optimal querying policy regardless of p. When n is larger however, it can sometimes be better to ask a single question. When α is either close to 0 or 1, the niche set of users is significantly smaller than those with majority preferences, leading the model to have a much stronger prior that any u… view at source ↗
Figure 4
Figure 4. Figure 4: provides an overview of the landscape of optimal k values for a range of choices for α ∈ [0.5, 1] and p ∈ [0, 0.5] for n = 30. We see here that for increased values of n, it can be useful to ask more than a single question, particularly for intermediate values of p and α. In these cases, the prior belief about cluster membership is strong but not decisive, so the additional clarification provided by a seco… view at source ↗
Figure 5
Figure 5. Figure 5: The number of 0s to output based on the optimal response policy [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Optimal choice of k vs γ for various values of p and α = 1/2. As γ initially increases, the optimal querying policy grows for small noise rates p. As γ continues to grow however, the optimal querying policy begins to decrease again. This observation is not immediately clear. As mentioned above, one might expect that asking more questions would always be beneficial, as it provides the LLM with additional in… view at source ↗
Figure 7
Figure 7. Figure 7: Mean welfare vs k for increasing values of γ for per-query cost c = 0.3. As γ increases, the optimal number of queries k changes, increasing from 4 to 10 as γ increases from 0 to 10. The y-axis scale is omitted, as welfare is not comparable across γ; we consider only within-γ comparisons over k. encode the preference for any sushi ranked in the top 2 as a 1, and any sushi ranked in the bottom 8 as a 0. In … view at source ↗
Figure 8
Figure 8. Figure 8: Mean utility vs Gini coefficient for different values of [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Welfare W(k, γ) vs flipping probability p in the case of α = 1/2 for various values of γ. The choice of k leading to the optimal query length is included in each plot. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Welfare W(k, γ) vs p in the case of α = 0.9 for various values of γ. The choice of k leading to the optimal query length is included in each plot [PITH_FULL_IMAGE:figures/full_fig_p027_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: The number of 0s to output based on the optimal response policy [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Optimal choice of k vs γ for various values of p and α = 1/2. (a) n = 10 (b) n = 20 (c) n = 40 [PITH_FULL_IMAGE:figures/full_fig_p028_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Optimal number of queries k for various values of p and α across different values of n. 28 [PITH_FULL_IMAGE:figures/full_fig_p028_13.png] view at source ↗
read the original abstract

Generative AI models differ from traditional machine learning tools in that they allow users to provide as much or as little information as they choose in their inputs. This flexibility often leads users to omit certain details, relying on the models to infer and fill in under-specified information based on distributional knowledge of user preferences. Such inferences may privilege majority viewpoints and disadvantage users with atypical preferences, raising concerns about fairness. Unlike more traditional recommender systems, LLMs can explicitly solicit more information from users through natural language. However, while directly eliciting user preferences could increase personalization and mitigate inequality, excessive querying places a burden on users who value efficiency. We develop a stylized model of user-LLM interaction and develop an objective that captures tradeoff between user burden and preference representation. Building on the observation that individual preferences are often correlated, we analyze how AI systems should balance inference and elicitation, characterizing the optimal amount of information to solicit before content generation. Ultimately, we show that information elicitation can mitigate the systematic biases of preference inference, enabling the design of generative tools that better incorporate diverse user perspectives while maintaining efficiency. We complement this theoretical analysis with an empirical evaluation illustrating the model's predictions and exploring their practical implications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops a stylized model of user-LLM interaction that formalizes the tradeoff between the user burden of explicit preference elicitation and the gains from reduced bias in content generation. Building on the observation that preferences are often correlated, it characterizes the optimal amount of information to solicit before generation and shows that selective elicitation can mitigate systematic inference biases while preserving efficiency; the theoretical analysis is complemented by an empirical evaluation that illustrates the model's predictions.

Significance. If the central tradeoff result holds under realistic correlation structures, the work supplies a principled, efficiency-aware framework for designing generative tools that better serve users with atypical preferences. The explicit modeling of the burden-representation objective and the inclusion of an empirical component are strengths that could inform both mechanism design and practical interface choices in AI systems.

major comments (2)
  1. [Stylized Model and Theoretical Analysis] The bias-mitigation and efficiency-preserving claims rest on the assumption that individual preferences are sufficiently correlated for the LLM's prior to be informative for most users (abstract and stylized-model section). No sensitivity analysis across correlation strengths, no bounds on the correlation parameter, and no empirical measurement of correlation in user-LLM interactions are provided. When correlation is weak or context-dependent, the inference step adds little value, the derived solicitation threshold collapses to a corner solution, and the headline result no longer holds.
  2. [Theoretical Analysis] The objective that balances user burden against preference representation is introduced without an explicit functional form or derivation of the optimal solicitation quantity (theoretical analysis). It is therefore impossible to verify whether the claimed interior optimum is parameter-free or whether it reduces to a fitted correlation weight, as the stress-test concern anticipates.
minor comments (2)
  1. [Empirical Evaluation] The empirical evaluation is described as 'illustrating the model's predictions,' but the manuscript does not state which specific predictions are tested or report quantitative metrics linking theory to data.
  2. Notation for the key quantities (e.g., burden cost, representation error, correlation parameter) is introduced informally; a compact table or definitions section would improve readability.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments that highlight the importance of clarifying assumptions and derivations in our stylized model. We address each major comment below and outline planned revisions to improve robustness and transparency.

read point-by-point responses
  1. Referee: The bias-mitigation and efficiency-preserving claims rest on the assumption that individual preferences are sufficiently correlated for the LLM's prior to be informative for most users (abstract and stylized-model section). No sensitivity analysis across correlation strengths, no bounds on the correlation parameter, and no empirical measurement of correlation in user-LLM interactions are provided. When correlation is weak or context-dependent, the inference step adds little value, the derived solicitation threshold collapses to a corner solution, and the headline result no longer holds.

    Authors: We agree that the correlation assumption is central, as the model is stylized to demonstrate the tradeoff when preferences exhibit positive correlation (a standard modeling choice in preference aggregation). The headline result is conditional on this regime. In revision, we will add an explicit sensitivity analysis varying the correlation parameter across [0,1], derive the analytical bound on correlation strength required for an interior optimum (specifically, when marginal bias reduction exceeds elicitation cost), and show that the threshold collapses to full elicitation for weak correlation, consistent with the referee's observation. For empirical measurement of correlations, we cannot conduct a new large-scale study in this revision but will expand the discussion to propose estimation approaches using logged interactions. revision: yes

  2. Referee: The objective that balances user burden against preference representation is introduced without an explicit functional form or derivation of the optimal solicitation quantity (theoretical analysis). It is therefore impossible to verify whether the claimed interior optimum is parameter-free or whether it reduces to a fitted correlation weight, as the stress-test concern anticipates.

    Authors: We acknowledge that the presentation of the objective could be more explicit. The objective is defined as the sum of a linear elicitation cost term (c times number of questions solicited) and the expected representation error (posterior-weighted deviation from the user's true preference vector). The optimal quantity is obtained by comparing the marginal reduction in expected error from one additional question against the marginal cost c, yielding a closed-form threshold condition. In the revision we will include the full functional form, the complete first-order condition derivation, and a proof that the interior solution depends explicitly on the correlation and bias parameters rather than being a fitted weight. This will make verification straightforward. revision: yes

standing simulated objections not resolved
  • Empirical measurement of correlation strengths in real user-LLM interactions, which would require a separate data-collection study beyond the scope of the current theoretical and illustrative-empirical manuscript.

Circularity Check

0 steps flagged

No circularity: stylized model derives tradeoff result from stated assumptions

full rationale

The paper constructs a stylized model with an explicit objective trading off user burden against preference representation, then derives the optimal solicitation quantity and the bias-mitigation property from that model under the maintained assumption of correlated preferences. This is a standard forward derivation from assumptions to conclusions rather than any reduction of the claimed result to its inputs by construction. No equations, fitted parameters, or self-citations are shown to collapse the central claim; the correlation observation is treated as an external modeling premise, not a fitted or self-referential quantity. The empirical complement is presented separately and does not alter the theoretical chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on abstract only. The model rests on the stated observation that preferences are correlated and on an invented objective function that trades off burden against representation accuracy. No explicit free parameters or invented entities are named.

axioms (1)
  • domain assumption Individual preferences are often correlated
    Invoked to justify that inference from distributional knowledge is feasible and that selective elicitation can improve outcomes.

pith-pipeline@v0.9.0 · 5514 in / 1199 out tokens · 40568 ms · 2026-05-13T01:08:31.486389+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 2 internal anchors

  1. [1]

    Policy aggregation

    Parand A Alamdari, Soroush Ebadian, and Ariel D Procaccia. “Policy aggregation”. In: Advances in Neural Information Processing Systems37 (2024), pp. 68308–68329

  2. [2]

    Star-gate: Teaching language models to ask clarifying questions

    Chinmaya Andukuri et al. “Star-gate: Teaching language models to ask clarifying questions”. In:arXiv preprint arXiv:2403.19154(2024)

  3. [3]

    On the measurement of inequality

    Anthony B Atkinson et al. “On the measurement of inequality”. In:Journal of economic theory2.3 (1970), pp. 244–263

  4. [4]

    Christopher M Bishop.Pattern recognition and machine learning. Vol. 4. 4. Springer, 2006

  5. [5]

    Measures of relative equality and their meaning in terms of social welfare

    Charles Blackorby and David Donaldson. “Measures of relative equality and their meaning in terms of social welfare”. In:Journal of economic theory18.1 (1978), pp. 59–80

  6. [6]

    Picking on the same person: Does algorithmic monoculture lead to outcome homogenization?

    Rishi Bommasani et al. “Picking on the same person: Does algorithmic monoculture lead to outcome homogenization?” In:Advances in Neural Information Processing Systems35 (2022), pp. 3663–3678

  7. [7]

    Human-AI interactions and societal pit- falls

    Francisco Castro, Jian Gao, and S´ ebastien Martin. “Human-AI interactions and societal pit- falls”. In:arXiv preprint arXiv:2309.10448(2023)

  8. [8]

    arXiv preprint arXiv:2406.08469 , year=

    Daiwei Chen et al. “Pal: Pluralistic alignment framework for learning from heterogeneous preferences”. In:arXiv preprint arXiv:2406.08469(2024)

  9. [9]

    Why is my classifier discriminatory?

    Irene Chen, Fredrik D Johansson, and David Sontag. “Why is my classifier discriminatory?” In:Advances in neural information processing systems31 (2018)

  10. [10]

    and Jacobs, Bob M

    Vincent Conitzer et al. “Social choice should guide ai alignment in dealing with diverse human feedback”. In:arXiv preprint arXiv:2404.10271(2024)

  11. [11]

    Revisiting fair-PAC learning and the axioms of cardinal welfare

    Cyrus Cousins. “Revisiting fair-PAC learning and the axioms of cardinal welfare”. In:Inter- national Conference on Artificial Intelligence and Statistics. PMLR. 2023, pp. 6422–6442

  12. [12]

    Mapping social choice theory to RLHF

    Jessica Dai and Eve Fleisig. “Mapping social choice theory to RLHF”. In:arXiv preprint arXiv:2404.13038(2024)

  13. [13]

    Proactive conversational ai: A comprehensive survey of advancements and opportunities

    Yang Deng et al. “Proactive conversational ai: A comprehensive survey of advancements and opportunities”. In:ACM Transactions on Information Systems43.3 (2025), pp. 1–45

  14. [14]

    Generative AI enhances individual creativity but reduces the collective diversity of novel content

    Anil R Doshi and Oliver P Hauser. “Generative AI enhances individual creativity but reduces the collective diversity of novel content”. In:Science Advances10.28 (2024), eadn5290

  15. [15]

    TO-GATE: Clarifying Questions and Summarizing Responses with Trajectory Optimization for Eliciting Human Preference

    Yulin Dou and Jiangming Liu. “TO-GATE: Clarifying Questions and Summarizing Responses with Trajectory Optimization for Eliciting Human Preference”. In:Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 40. 36. 2026, pp. 30548–30556

  16. [16]

    Problem Solving Through Human-AI Preference-Based Coopera- tion

    Subhabrata Dutta et al. “Problem Solving Through Human-AI Preference-Based Coopera- tion”. In:arXiv preprint arXiv:2408.07461(2024)

  17. [17]

    Adaptive Pluralistic Alignment: A pipeline for dynamic artificial democracy

    Rachel Freedman. “Adaptive Pluralistic Alignment: A pipeline for dynamic artificial democ- racy”. In:arXiv preprint arXiv:2605.01642(2026)

  18. [18]

    Axioms for ai alignment from human feedback

    Luise Ge et al. “Axioms for ai alignment from human feedback”. In:arXiv preprint arXiv:2405.14758 (2024)

  19. [19]

    A Survey on Personalized Alignment–The Missing Piece for Large Language Models in Real-World Applications

    Jian Guan et al. “A Survey on Personalized Alignment–The Missing Piece for Large Language Models in Real-World Applications”. In:arXiv preprint arXiv:2503.17003(2025). 17

  20. [20]

    Ontologies in Design: How Imagining a Tree Reveals Possibilites and Assumptions in Large Language Models

    Nava Haghighi et al. “Ontologies in Design: How Imagining a Tree Reveals Possibilites and Assumptions in Large Language Models”. In:arXiv preprint arXiv:2504.03029(2025)

  21. [21]

    Taylor & Francis, 2025

    Dan Hendrycks.Introduction to AI safety, ethics, and society. Taylor & Francis, 2025

  22. [22]

    A probabilistic model to resolve diversity–accuracy challenge of recommendation systems

    Amin Javari and Mahdi Jalili. “A probabilistic model to resolve diversity–accuracy challenge of recommendation systems”. In:Knowledge and Information Systems44 (2015), pp. 609– 627

  23. [23]

    Nantonac collaborative filtering: recommendation based on order re- sponses

    Toshihiro Kamishima. “Nantonac collaborative filtering: recommendation based on order re- sponses”. In:Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. 2003, pp. 583–588

  24. [24]

    Efficient clustering for orders

    Toshihiro Kamishima and Shotaro Akaho. “Efficient clustering for orders”. In:Sixth IEEE International Conference on Data Mining-Workshops (ICDMW’06). IEEE. 2006, pp. 274– 278

  25. [25]

    INTERACT: Enabling Interactive, Question-Driven Learning in Large Language Models

    Aum Kendapadi et al. “INTERACT: Enabling Interactive, Question-Driven Learning in Large Language Models”. In:arXiv preprint arXiv:2412.11388(2024)

  26. [26]

    How to approach ambiguous queries in conversa- tional search: A survey of techniques, approaches, tools, and challenges

    Kimiya Keyvan and Jimmy Xiangji Huang. “How to approach ambiguous queries in conversa- tional search: A survey of techniques, approaches, tools, and challenges”. In:ACM Computing Surveys55.6 (2022), pp. 1–40

  27. [27]

    Personalisation within bounds: A risk taxonomy and policy frame- work for the alignment of large language models with personalised feedback

    Hannah Rose Kirk et al. “Personalisation within bounds: A risk taxonomy and policy frame- work for the alignment of large language models with personalised feedback”. In:arXiv preprint arXiv:2303.05453(2023)

  28. [28]

    Simplicity creates inequity: implications for fair- ness, stereotypes, and interpretability

    Jon Kleinberg and Sendhil Mullainathan. “Simplicity creates inequity: implications for fair- ness, stereotypes, and interpretability”. In:Proceedings of the 2019 ACM Conference on Eco- nomics and Computation. 2019, pp. 807–808

  29. [29]

    Algorithmic monoculture and social welfare

    Jon Kleinberg and Manish Raghavan. “Algorithmic monoculture and social welfare”. In: Proceedings of the National Academy of Sciences118.22 (2021), e2018340118

  30. [30]

    Patrick Tser Jern Kon, Jiachen Liu, Yiming Qiu, Weijun Fan, Ting He Lei Lin, Haoran Zhang, Owen M

    Katarzyna Kobalczyk et al. “Active Task Disambiguation with LLMs”. In:arXiv preprint arXiv:2502.04485(2025)

  31. [31]

    Matched pair calibration for ranking fairness

    Hannah Korevaar et al. “Matched pair calibration for ranking fairness”. In:arXiv preprint arXiv:2306.03775(2023)

  32. [32]

    Endless forms most similar: the dearth of the author in AI-supported art

    Max Kreminski. “Endless forms most similar: the dearth of the author in AI-supported art”. In:AI & SOCIETY(2025), pp. 1–15

  33. [33]

    Clam: Selective clarification for ambiguous questions with generative language models

    Lorenz Kuhn, Yarin Gal, and Sebastian Farquhar. “Clam: Selective clarification for ambiguous questions with generative language models”. In:arXiv preprint arXiv:2212.07769(2022)

  34. [34]

    Diversity in recommender systems–A survey

    Matevˇ z Kunaver and Tomaˇ z Poˇ zrl. “Diversity in recommender systems–A survey”. In:Knowledge- based systems123 (2017), pp. 154–162

  35. [35]

    QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?

    Belinda Z Li, Been Kim, and Zi Wang. “QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?” In:arXiv preprint arXiv:2503.22674(2025)

  36. [36]

    Eliciting human preferences with language models

    Belinda Z Li et al. “Eliciting human preferences with language models”. In:arXiv preprint arXiv:2310.11589(2023)

  37. [37]

    Active Feature Acquisition with Generative Surrogate Models

    Yang Li and Junier Oliva. “Active Feature Acquisition with Generative Surrogate Models”. In:Proceedings of the 38th International Conference on Machine Learning. Ed. by Marina Meila and Tong Zhang. Vol. 139. Proceedings of Machine Learning Research. PMLR, July 2021, pp. 6450–6459.url:https://proceedings.mlr.press/v139/li21p.html. 18

  38. [38]

    Fairness in recommendation: Foundations, methods, and applications

    Yunqi Li et al. “Fairness in recommendation: Foundations, methods, and applications”. In: ACM Transactions on Intelligent Systems and Technology14.5 (2023), pp. 1–48

  39. [39]

    Transferable fairness for cold-start recommendation

    Yunqi Li et al. “Transferable fairness for cold-start recommendation”. In:arXiv preprint arXiv:2301.10665(2023)

  40. [40]

    What you say= what you want? Teaching humans to articulate require- ments for LLMs

    Qianou Ma et al. “What you say= what you want? Teaching humans to articulate require- ments for LLMs”. In:arXiv preprint arXiv:2409.08775(2024)

  41. [41]

    Asking Clarifying Questions for Preference Elicitation With Large Language Models

    Ali Montazeralghaem et al. “Asking Clarifying Questions for Preference Elicitation With Large Language Models”. In:arXiv preprint arXiv:2510.12015(2025)

  42. [42]

    Kibum Moon, Adam Green, and Kostadin Kushlev.Homogenizing Effect of Large Language Model (LLM) on Creative Diversity: An Empirical Comparison. 2024

  43. [43]

    Learning social welfare functions

    Kanad Pardeshi et al. “Learning social welfare functions”. In:Advances in Neural Information Processing Systems37 (2024), pp. 41733–41766

  44. [44]

    Reconciling the accuracy-diversity trade-off in recommendations

    Kenny Peng et al. “Reconciling the accuracy-diversity trade-off in recommendations”. In: Proceedings of the ACM Web Conference 2024. 2024, pp. 1318–1329

  45. [45]

    New proofs of weighted power mean inequalities and monotonicity for gener- alized weighted mean values

    Feng Qi et al. “New proofs of weighted power mean inequalities and monotonicity for gener- alized weighted mean values”. In:Math. Inequal. Appl3.3 (2000), pp. 377–383

  46. [46]

    Can Language Models Perform Implicit Bayesian Inference Over User Pref- erence States?

    Linlu Qiu et al. “Can Language Models Perform Implicit Bayesian Inference Over User Pref- erence States?” In:The First Workshop on System-2 Reasoning at Scale, NeurIPS’24. 2024

  47. [47]

    A survey on active feature acquisition strategies

    Arman Rahbar, Linus Aronsson, and Morteza Haghir Chehreghani. “A survey on active feature acquisition strategies”. In:arXiv preprint arXiv:2502.11067(2025)

  48. [48]

    Societal Impacts Research Requires Benchmarks for Creative Composition Tasks

    Judy Hanwen Shen and Carlos Guestrin. “Societal Impacts Research Requires Benchmarks for Creative Composition Tasks”. In:arXiv preprint arXiv:2504.06549(2025)

  49. [49]

    Direct alignment with heterogeneous preferences

    Ali Shirali et al. “Direct Alignment with Heterogeneous Preferences”. In:arXiv preprint arXiv:2502.16320(2025)

  50. [50]

    Fairness of exposure in rankings

    Ashudeep Singh and Thorsten Joachims. “Fairness of exposure in rankings”. In:Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2018, pp. 2219–2228

  51. [51]

    arXiv preprint arXiv:2402.05070 , year =

    Taylor Sorensen et al. “A roadmap to pluralistic alignment”. In:arXiv preprint arXiv:2402.05070 (2024)

  52. [52]

    APPA: Adaptive Preference Pluralistic Alignment for Fair Federated RLHF of LLMs

    Mahmoud Srewa, Tianyu Zhao, and Salma Elmalaki. “APPA: Adaptive Preference Pluralistic Alignment for Fair Federated RLHF of LLMs”. In:arXiv preprint arXiv:2604.04261(2026)

  53. [53]

    Artwhisperer: A dataset for characterizing human-ai in- teractions in artistic creations

    Kailas Vodrahalli and James Zou. “Artwhisperer: A dataset for characterizing human-ai in- teractions in artistic creations”. In:arXiv preprint arXiv:2306.08141(2023)

  54. [54]

    Are big recommendation models fair to cold users?

    Chuhan Wu et al. “Are big recommendation models fair to cold users?” In:arXiv preprint arXiv:2202.13607(2022)

  55. [55]

    Aligning LLMs with Individual Preferences via Interaction

    Shujin Wu et al. “Aligning LLMs with Individual Preferences via Interaction”. In:arXiv preprint arXiv:2410.03642(2024)

  56. [56]

    A New Dialogue Response Generation Agent for Large Language Models by Asking Questions to Detect User’s Intentions

    Siwei Wu, Xiangqing Shen, and Rui Xia. “A New Dialogue Response Generation Agent for Large Language Models by Asking Questions to Detect User’s Intentions”. In:arXiv preprint arXiv:2310.03293(2023)

  57. [57]

    Measuring human contribution in ai-assisted content generation

    Yueqi Xie et al. “Measuring human contribution in ai-assisted content generation”. In:arXiv preprint arXiv:2408.14792(2024). 19

  58. [58]

    A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models

    Zhouhang Xie et al. “A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models”. In:arXiv preprint arXiv:2504.07070(2025)

  59. [59]

    Query Understanding in LLM-based Conversational Information Seeking

    Yifei Yuan et al. “Query Understanding in LLM-based Conversational Information Seeking”. In:arXiv preprint arXiv:2504.06356(2025)

  60. [60]

    Analyzing and learning from user interactions for search clarification

    Hamed Zamani et al. “Analyzing and learning from user interactions for search clarification”. In:Proceedings of the 43rd international acm sigir conference on research and development in information retrieval. 2020, pp. 1181–1190

  61. [61]

    Modeling future conversation turns to teach llms to ask clarifying questions

    Michael JQ Zhang, W Bradley Knox, and Eunsol Choi. “Modeling future conversation turns to teach llms to ask clarifying questions”. In:arXiv preprint arXiv:2410.13788(2024)

  62. [62]

    Noveltybench: Evaluating language models for humanlike diversity.arXiv preprint arXiv:2504.05228,

    Yiming Zhang et al. “NoveltyBench: Evaluating Creativity and Diversity in Language Mod- els”. In:arXiv preprint arXiv:2504.05228(2025)

  63. [63]

    Solving the apparent diversity-accuracy dilemma of recommender systems

    Tao Zhou et al. “Solving the apparent diversity-accuracy dilemma of recommender systems”. In:Proceedings of the National Academy of Sciences107.10 (2010), pp. 4511–4515

  64. [64]

    Fairness among new items in cold start recommender systems

    Ziwei Zhu et al. “Fairness among new items in cold start recommender systems”. In:Pro- ceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 2021, pp. 767–776. A Derivations A.1 Utility To obtain a formalization of expected utility from the high-level form given in Equation 2, we start by breaking do...