arxiv: 2605.11240 · v1 · submitted 2026-05-11 · 💻 cs.GT · cs.CY· cs.LG

Recognition: no theorem link

When to Ask a Question: Understanding Communication Strategies in Generative AI Tools

Charlotte Park , Kate Donahue , Manish Raghavan

Authors on Pith no claims yet

Pith reviewed 2026-05-13 01:08 UTC · model grok-4.3

classification 💻 cs.GT cs.CYcs.LG

keywords generative AIpreference elicitationbias mitigationuser interactioninformation inferencefairness in AIstylized model

0 comments

The pith

Generative AI can mitigate bias from preference inference by strategically asking users a few targeted questions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a stylized model of how users interact with large language models, where users give short prompts and the model infers the rest from typical preference patterns. This inference often favors common viewpoints and leaves out people whose tastes differ from the majority. Because preferences across users tend to be correlated, the model shows it is possible to decide exactly how much extra information to request before generating an answer. The goal is an objective that weighs the cost of extra questions against the gain in accurate representation of the user's actual preferences. When this balance is struck correctly, the system produces outputs that reflect a wider range of perspectives without forcing every user to supply every detail.

Core claim

In the stylized user-LLM interaction model, an optimal policy for information elicitation exists that reduces the systematic bias introduced by pure preference inference; the policy uses observed correlations among user preferences to determine how many and which questions to ask before generating content, thereby improving representation of atypical preferences while limiting the burden placed on users.

What carries the argument

A stylized model of user-LLM interaction whose objective trades off user burden against the accuracy of preference representation, with the decision to elicit further information driven by the degree of correlation across users' preferences.

If this is right

Generative tools can be tuned to ask only the questions that most reduce representation error for atypical users.
The optimal number of questions rises when preference correlations weaken or when inference error is high.
Systems that follow the derived policy produce outputs closer to each user's true preferences than systems that always infer everything.
Efficiency is preserved because elicitation stops once additional questions yield diminishing returns on the representation objective.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same tradeoff logic could guide question-asking in non-generative AI assistants such as recommendation engines or decision-support tools.
Real-world deployment would require learning the correlation structure from past interactions rather than assuming it is known in advance.
If the correlation structure changes over time, the optimal elicitation policy would need periodic re-estimation from fresh user data.

Load-bearing premise

Individual user preferences are sufficiently correlated that inference from limited input remains reliable and selective elicitation adds value.

What would settle it

An empirical test showing that when user preferences are drawn from an uncorrelated distribution, the amount of elicitation recommended by the model fails to improve representation accuracy or increases user burden without reducing bias.

Figures

Figures reproduced from arXiv: 2605.11240 by Charlotte Park, Kate Donahue, Manish Raghavan.

**Figure 2.** Figure 2: Expected utility E[U(k)] vs p in the case of α = 1/2 for n = 3, 6, 10, and 30. For each n, as the noise rate p increases, there comes a point where it optimizes expected utility to not ask any questions. In other words, as there is more uncertainty to the user’s cluster of origin based on the information, there is less value to asking questions. Additional figures are provided in Appendix C. When the noise… view at source ↗

**Figure 3.** Figure 3: Expected utility E[U(k)] vs p in the case of α = 0.9 for n = 3, 6, 10, and 30. When n is smaller, k = 0 is the optimal querying policy regardless of p. When n is larger however, it can sometimes be better to ask a single question. When α is either close to 0 or 1, the niche set of users is significantly smaller than those with majority preferences, leading the model to have a much stronger prior that any u… view at source ↗

**Figure 4.** Figure 4: provides an overview of the landscape of optimal k values for a range of choices for α ∈ [0.5, 1] and p ∈ [0, 0.5] for n = 30. We see here that for increased values of n, it can be useful to ask more than a single question, particularly for intermediate values of p and α. In these cases, the prior belief about cluster membership is strong but not decisive, so the additional clarification provided by a seco… view at source ↗

**Figure 5.** Figure 5: The number of 0s to output based on the optimal response policy [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Optimal choice of k vs γ for various values of p and α = 1/2. As γ initially increases, the optimal querying policy grows for small noise rates p. As γ continues to grow however, the optimal querying policy begins to decrease again. This observation is not immediately clear. As mentioned above, one might expect that asking more questions would always be beneficial, as it provides the LLM with additional in… view at source ↗

**Figure 7.** Figure 7: Mean welfare vs k for increasing values of γ for per-query cost c = 0.3. As γ increases, the optimal number of queries k changes, increasing from 4 to 10 as γ increases from 0 to 10. The y-axis scale is omitted, as welfare is not comparable across γ; we consider only within-γ comparisons over k. encode the preference for any sushi ranked in the top 2 as a 1, and any sushi ranked in the bottom 8 as a 0. In … view at source ↗

**Figure 8.** Figure 8: Mean utility vs Gini coefficient for different values of [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Welfare W(k, γ) vs flipping probability p in the case of α = 1/2 for various values of γ. The choice of k leading to the optimal query length is included in each plot. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗

**Figure 10.** Figure 10: Welfare W(k, γ) vs p in the case of α = 0.9 for various values of γ. The choice of k leading to the optimal query length is included in each plot [PITH_FULL_IMAGE:figures/full_fig_p027_10.png] view at source ↗

**Figure 11.** Figure 11: The number of 0s to output based on the optimal response policy [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗

**Figure 12.** Figure 12: Optimal choice of k vs γ for various values of p and α = 1/2. (a) n = 10 (b) n = 20 (c) n = 40 [PITH_FULL_IMAGE:figures/full_fig_p028_12.png] view at source ↗

**Figure 13.** Figure 13: Optimal number of queries k for various values of p and α across different values of n. 28 [PITH_FULL_IMAGE:figures/full_fig_p028_13.png] view at source ↗

read the original abstract

Generative AI models differ from traditional machine learning tools in that they allow users to provide as much or as little information as they choose in their inputs. This flexibility often leads users to omit certain details, relying on the models to infer and fill in under-specified information based on distributional knowledge of user preferences. Such inferences may privilege majority viewpoints and disadvantage users with atypical preferences, raising concerns about fairness. Unlike more traditional recommender systems, LLMs can explicitly solicit more information from users through natural language. However, while directly eliciting user preferences could increase personalization and mitigate inequality, excessive querying places a burden on users who value efficiency. We develop a stylized model of user-LLM interaction and develop an objective that captures tradeoff between user burden and preference representation. Building on the observation that individual preferences are often correlated, we analyze how AI systems should balance inference and elicitation, characterizing the optimal amount of information to solicit before content generation. Ultimately, we show that information elicitation can mitigate the systematic biases of preference inference, enabling the design of generative tools that better incorporate diverse user perspectives while maintaining efficiency. We complement this theoretical analysis with an empirical evaluation illustrating the model's predictions and exploring their practical implications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a clean stylized model for when LLMs should solicit extra user input to cut inference bias, but the result rests on untested preference correlations with little sensitivity work shown.

read the letter

The main point is a stylized model that derives an optimal elicitation threshold for generative AI by trading off user burden against better representation of atypical preferences. It shows that some targeted questioning can reduce the majority bias that comes from pure inference, provided preferences are correlated enough for the model to have a useful prior. This is new in the LLM setting, where the system can ask natural-language follow-ups rather than just infer, and it extends the kind of tradeoff thinking that has been used in recommender systems. The theoretical part does a solid job of characterizing the balance and producing clear predictions, and the empirical section at least illustrates how those predictions might look in practice. That combination is useful for anyone thinking about fairness in interactive AI tools. The soft spot is the central reliance on correlated preferences. The abstract flags this observation, but if the full paper does not include parameter sweeps, alternative correlation regimes, or any measurement of how strong those correlations actually are in real user-LLM data, then the claimed efficiency-preserving mitigation can disappear. Weak or context-specific correlations would push the optimum toward either always eliciting or never eliciting, flattening the interesting middle ground. The empirical evaluation appears illustrative rather than a direct test of the correlation assumption. This is the kind of paper that belongs in a reading group for people working on human-AI interaction or AI fairness. It deserves a serious referee because it formalizes a practical design question with a transparent model, even though the assumptions will need scrutiny in review.

Referee Report

2 major / 2 minor

Summary. The paper develops a stylized model of user-LLM interaction that formalizes the tradeoff between the user burden of explicit preference elicitation and the gains from reduced bias in content generation. Building on the observation that preferences are often correlated, it characterizes the optimal amount of information to solicit before generation and shows that selective elicitation can mitigate systematic inference biases while preserving efficiency; the theoretical analysis is complemented by an empirical evaluation that illustrates the model's predictions.

Significance. If the central tradeoff result holds under realistic correlation structures, the work supplies a principled, efficiency-aware framework for designing generative tools that better serve users with atypical preferences. The explicit modeling of the burden-representation objective and the inclusion of an empirical component are strengths that could inform both mechanism design and practical interface choices in AI systems.

major comments (2)

[Stylized Model and Theoretical Analysis] The bias-mitigation and efficiency-preserving claims rest on the assumption that individual preferences are sufficiently correlated for the LLM's prior to be informative for most users (abstract and stylized-model section). No sensitivity analysis across correlation strengths, no bounds on the correlation parameter, and no empirical measurement of correlation in user-LLM interactions are provided. When correlation is weak or context-dependent, the inference step adds little value, the derived solicitation threshold collapses to a corner solution, and the headline result no longer holds.
[Theoretical Analysis] The objective that balances user burden against preference representation is introduced without an explicit functional form or derivation of the optimal solicitation quantity (theoretical analysis). It is therefore impossible to verify whether the claimed interior optimum is parameter-free or whether it reduces to a fitted correlation weight, as the stress-test concern anticipates.

minor comments (2)

[Empirical Evaluation] The empirical evaluation is described as 'illustrating the model's predictions,' but the manuscript does not state which specific predictions are tested or report quantitative metrics linking theory to data.
Notation for the key quantities (e.g., burden cost, representation error, correlation parameter) is introduced informally; a compact table or definitions section would improve readability.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments that highlight the importance of clarifying assumptions and derivations in our stylized model. We address each major comment below and outline planned revisions to improve robustness and transparency.

read point-by-point responses

Referee: The bias-mitigation and efficiency-preserving claims rest on the assumption that individual preferences are sufficiently correlated for the LLM's prior to be informative for most users (abstract and stylized-model section). No sensitivity analysis across correlation strengths, no bounds on the correlation parameter, and no empirical measurement of correlation in user-LLM interactions are provided. When correlation is weak or context-dependent, the inference step adds little value, the derived solicitation threshold collapses to a corner solution, and the headline result no longer holds.

Authors: We agree that the correlation assumption is central, as the model is stylized to demonstrate the tradeoff when preferences exhibit positive correlation (a standard modeling choice in preference aggregation). The headline result is conditional on this regime. In revision, we will add an explicit sensitivity analysis varying the correlation parameter across [0,1], derive the analytical bound on correlation strength required for an interior optimum (specifically, when marginal bias reduction exceeds elicitation cost), and show that the threshold collapses to full elicitation for weak correlation, consistent with the referee's observation. For empirical measurement of correlations, we cannot conduct a new large-scale study in this revision but will expand the discussion to propose estimation approaches using logged interactions. revision: yes
Referee: The objective that balances user burden against preference representation is introduced without an explicit functional form or derivation of the optimal solicitation quantity (theoretical analysis). It is therefore impossible to verify whether the claimed interior optimum is parameter-free or whether it reduces to a fitted correlation weight, as the stress-test concern anticipates.

Authors: We acknowledge that the presentation of the objective could be more explicit. The objective is defined as the sum of a linear elicitation cost term (c times number of questions solicited) and the expected representation error (posterior-weighted deviation from the user's true preference vector). The optimal quantity is obtained by comparing the marginal reduction in expected error from one additional question against the marginal cost c, yielding a closed-form threshold condition. In the revision we will include the full functional form, the complete first-order condition derivation, and a proof that the interior solution depends explicitly on the correlation and bias parameters rather than being a fitted weight. This will make verification straightforward. revision: yes

standing simulated objections not resolved

Empirical measurement of correlation strengths in real user-LLM interactions, which would require a separate data-collection study beyond the scope of the current theoretical and illustrative-empirical manuscript.

Circularity Check

0 steps flagged

No circularity: stylized model derives tradeoff result from stated assumptions

full rationale

The paper constructs a stylized model with an explicit objective trading off user burden against preference representation, then derives the optimal solicitation quantity and the bias-mitigation property from that model under the maintained assumption of correlated preferences. This is a standard forward derivation from assumptions to conclusions rather than any reduction of the claimed result to its inputs by construction. No equations, fitted parameters, or self-citations are shown to collapse the central claim; the correlation observation is treated as an external modeling premise, not a fitted or self-referential quantity. The empirical complement is presented separately and does not alter the theoretical chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on abstract only. The model rests on the stated observation that preferences are correlated and on an invented objective function that trades off burden against representation accuracy. No explicit free parameters or invented entities are named.

axioms (1)

domain assumption Individual preferences are often correlated
Invoked to justify that inference from distributional knowledge is feasible and that selective elicitation can improve outcomes.

pith-pipeline@v0.9.0 · 5514 in / 1199 out tokens · 40568 ms · 2026-05-13T01:08:31.486389+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 2 internal anchors

[1]

Policy aggregation

Parand A Alamdari, Soroush Ebadian, and Ariel D Procaccia. “Policy aggregation”. In: Advances in Neural Information Processing Systems37 (2024), pp. 68308–68329

work page 2024
[2]

Star-gate: Teaching language models to ask clarifying questions

Chinmaya Andukuri et al. “Star-gate: Teaching language models to ask clarifying questions”. In:arXiv preprint arXiv:2403.19154(2024)

work page arXiv 2024
[3]

On the measurement of inequality

Anthony B Atkinson et al. “On the measurement of inequality”. In:Journal of economic theory2.3 (1970), pp. 244–263

work page 1970
[4]

Christopher M Bishop.Pattern recognition and machine learning. Vol. 4. 4. Springer, 2006

work page 2006
[5]

Measures of relative equality and their meaning in terms of social welfare

Charles Blackorby and David Donaldson. “Measures of relative equality and their meaning in terms of social welfare”. In:Journal of economic theory18.1 (1978), pp. 59–80

work page 1978
[6]

Picking on the same person: Does algorithmic monoculture lead to outcome homogenization?

Rishi Bommasani et al. “Picking on the same person: Does algorithmic monoculture lead to outcome homogenization?” In:Advances in Neural Information Processing Systems35 (2022), pp. 3663–3678

work page 2022
[7]

Human-AI interactions and societal pit- falls

Francisco Castro, Jian Gao, and S´ ebastien Martin. “Human-AI interactions and societal pit- falls”. In:arXiv preprint arXiv:2309.10448(2023)

work page arXiv 2023
[8]

arXiv preprint arXiv:2406.08469 , year=

Daiwei Chen et al. “Pal: Pluralistic alignment framework for learning from heterogeneous preferences”. In:arXiv preprint arXiv:2406.08469(2024)

work page arXiv 2024
[9]

Why is my classifier discriminatory?

Irene Chen, Fredrik D Johansson, and David Sontag. “Why is my classifier discriminatory?” In:Advances in neural information processing systems31 (2018)

work page 2018
[10]

and Jacobs, Bob M

Vincent Conitzer et al. “Social choice should guide ai alignment in dealing with diverse human feedback”. In:arXiv preprint arXiv:2404.10271(2024)

work page arXiv 2024
[11]

Revisiting fair-PAC learning and the axioms of cardinal welfare

Cyrus Cousins. “Revisiting fair-PAC learning and the axioms of cardinal welfare”. In:Inter- national Conference on Artificial Intelligence and Statistics. PMLR. 2023, pp. 6422–6442

work page 2023
[12]

Mapping social choice theory to RLHF

Jessica Dai and Eve Fleisig. “Mapping social choice theory to RLHF”. In:arXiv preprint arXiv:2404.13038(2024)

work page arXiv 2024
[13]

Proactive conversational ai: A comprehensive survey of advancements and opportunities

Yang Deng et al. “Proactive conversational ai: A comprehensive survey of advancements and opportunities”. In:ACM Transactions on Information Systems43.3 (2025), pp. 1–45

work page 2025
[14]

Generative AI enhances individual creativity but reduces the collective diversity of novel content

Anil R Doshi and Oliver P Hauser. “Generative AI enhances individual creativity but reduces the collective diversity of novel content”. In:Science Advances10.28 (2024), eadn5290

work page 2024
[15]

TO-GATE: Clarifying Questions and Summarizing Responses with Trajectory Optimization for Eliciting Human Preference

Yulin Dou and Jiangming Liu. “TO-GATE: Clarifying Questions and Summarizing Responses with Trajectory Optimization for Eliciting Human Preference”. In:Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 40. 36. 2026, pp. 30548–30556

work page 2026
[16]

Problem Solving Through Human-AI Preference-Based Coopera- tion

Subhabrata Dutta et al. “Problem Solving Through Human-AI Preference-Based Coopera- tion”. In:arXiv preprint arXiv:2408.07461(2024)

work page arXiv 2024
[17]

Adaptive Pluralistic Alignment: A pipeline for dynamic artificial democracy

Rachel Freedman. “Adaptive Pluralistic Alignment: A pipeline for dynamic artificial democ- racy”. In:arXiv preprint arXiv:2605.01642(2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[18]

Axioms for ai alignment from human feedback

Luise Ge et al. “Axioms for ai alignment from human feedback”. In:arXiv preprint arXiv:2405.14758 (2024)

work page arXiv 2024
[19]

A Survey on Personalized Alignment–The Missing Piece for Large Language Models in Real-World Applications

Jian Guan et al. “A Survey on Personalized Alignment–The Missing Piece for Large Language Models in Real-World Applications”. In:arXiv preprint arXiv:2503.17003(2025). 17

work page arXiv 2025
[20]

Ontologies in Design: How Imagining a Tree Reveals Possibilites and Assumptions in Large Language Models

Nava Haghighi et al. “Ontologies in Design: How Imagining a Tree Reveals Possibilites and Assumptions in Large Language Models”. In:arXiv preprint arXiv:2504.03029(2025)

work page arXiv 2025
[21]

Taylor & Francis, 2025

Dan Hendrycks.Introduction to AI safety, ethics, and society. Taylor & Francis, 2025

work page 2025
[22]

A probabilistic model to resolve diversity–accuracy challenge of recommendation systems

Amin Javari and Mahdi Jalili. “A probabilistic model to resolve diversity–accuracy challenge of recommendation systems”. In:Knowledge and Information Systems44 (2015), pp. 609– 627

work page 2015
[23]

Nantonac collaborative filtering: recommendation based on order re- sponses

Toshihiro Kamishima. “Nantonac collaborative filtering: recommendation based on order re- sponses”. In:Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. 2003, pp. 583–588

work page 2003
[24]

Efficient clustering for orders

Toshihiro Kamishima and Shotaro Akaho. “Efficient clustering for orders”. In:Sixth IEEE International Conference on Data Mining-Workshops (ICDMW’06). IEEE. 2006, pp. 274– 278

work page 2006
[25]

INTERACT: Enabling Interactive, Question-Driven Learning in Large Language Models

Aum Kendapadi et al. “INTERACT: Enabling Interactive, Question-Driven Learning in Large Language Models”. In:arXiv preprint arXiv:2412.11388(2024)

work page arXiv 2024
[26]

How to approach ambiguous queries in conversa- tional search: A survey of techniques, approaches, tools, and challenges

Kimiya Keyvan and Jimmy Xiangji Huang. “How to approach ambiguous queries in conversa- tional search: A survey of techniques, approaches, tools, and challenges”. In:ACM Computing Surveys55.6 (2022), pp. 1–40

work page 2022
[27]

Personalisation within bounds: A risk taxonomy and policy frame- work for the alignment of large language models with personalised feedback

Hannah Rose Kirk et al. “Personalisation within bounds: A risk taxonomy and policy frame- work for the alignment of large language models with personalised feedback”. In:arXiv preprint arXiv:2303.05453(2023)

work page arXiv 2023
[28]

Simplicity creates inequity: implications for fair- ness, stereotypes, and interpretability

Jon Kleinberg and Sendhil Mullainathan. “Simplicity creates inequity: implications for fair- ness, stereotypes, and interpretability”. In:Proceedings of the 2019 ACM Conference on Eco- nomics and Computation. 2019, pp. 807–808

work page 2019
[29]

Algorithmic monoculture and social welfare

Jon Kleinberg and Manish Raghavan. “Algorithmic monoculture and social welfare”. In: Proceedings of the National Academy of Sciences118.22 (2021), e2018340118

work page 2021
[30]

Patrick Tser Jern Kon, Jiachen Liu, Yiming Qiu, Weijun Fan, Ting He Lei Lin, Haoran Zhang, Owen M

Katarzyna Kobalczyk et al. “Active Task Disambiguation with LLMs”. In:arXiv preprint arXiv:2502.04485(2025)

work page arXiv 2025
[31]

Matched pair calibration for ranking fairness

Hannah Korevaar et al. “Matched pair calibration for ranking fairness”. In:arXiv preprint arXiv:2306.03775(2023)

work page arXiv 2023
[32]

Endless forms most similar: the dearth of the author in AI-supported art

Max Kreminski. “Endless forms most similar: the dearth of the author in AI-supported art”. In:AI & SOCIETY(2025), pp. 1–15

work page 2025
[33]

Clam: Selective clarification for ambiguous questions with generative language models

Lorenz Kuhn, Yarin Gal, and Sebastian Farquhar. “Clam: Selective clarification for ambiguous questions with generative language models”. In:arXiv preprint arXiv:2212.07769(2022)

work page arXiv 2022
[34]

Diversity in recommender systems–A survey

Matevˇ z Kunaver and Tomaˇ z Poˇ zrl. “Diversity in recommender systems–A survey”. In:Knowledge- based systems123 (2017), pp. 154–162

work page 2017
[35]

QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?

Belinda Z Li, Been Kim, and Zi Wang. “QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?” In:arXiv preprint arXiv:2503.22674(2025)

work page arXiv 2025
[36]

Eliciting human preferences with language models

Belinda Z Li et al. “Eliciting human preferences with language models”. In:arXiv preprint arXiv:2310.11589(2023)

work page arXiv 2023
[37]

Active Feature Acquisition with Generative Surrogate Models

Yang Li and Junier Oliva. “Active Feature Acquisition with Generative Surrogate Models”. In:Proceedings of the 38th International Conference on Machine Learning. Ed. by Marina Meila and Tong Zhang. Vol. 139. Proceedings of Machine Learning Research. PMLR, July 2021, pp. 6450–6459.url:https://proceedings.mlr.press/v139/li21p.html. 18

work page 2021
[38]

Fairness in recommendation: Foundations, methods, and applications

Yunqi Li et al. “Fairness in recommendation: Foundations, methods, and applications”. In: ACM Transactions on Intelligent Systems and Technology14.5 (2023), pp. 1–48

work page 2023
[39]

Transferable fairness for cold-start recommendation

Yunqi Li et al. “Transferable fairness for cold-start recommendation”. In:arXiv preprint arXiv:2301.10665(2023)

work page arXiv 2023
[40]

What you say= what you want? Teaching humans to articulate require- ments for LLMs

Qianou Ma et al. “What you say= what you want? Teaching humans to articulate require- ments for LLMs”. In:arXiv preprint arXiv:2409.08775(2024)

work page arXiv 2024
[41]

Asking Clarifying Questions for Preference Elicitation With Large Language Models

Ali Montazeralghaem et al. “Asking Clarifying Questions for Preference Elicitation With Large Language Models”. In:arXiv preprint arXiv:2510.12015(2025)

work page arXiv 2025
[42]

Kibum Moon, Adam Green, and Kostadin Kushlev.Homogenizing Effect of Large Language Model (LLM) on Creative Diversity: An Empirical Comparison. 2024

work page 2024
[43]

Learning social welfare functions

Kanad Pardeshi et al. “Learning social welfare functions”. In:Advances in Neural Information Processing Systems37 (2024), pp. 41733–41766

work page 2024
[44]

Reconciling the accuracy-diversity trade-off in recommendations

Kenny Peng et al. “Reconciling the accuracy-diversity trade-off in recommendations”. In: Proceedings of the ACM Web Conference 2024. 2024, pp. 1318–1329

work page 2024
[45]

New proofs of weighted power mean inequalities and monotonicity for gener- alized weighted mean values

Feng Qi et al. “New proofs of weighted power mean inequalities and monotonicity for gener- alized weighted mean values”. In:Math. Inequal. Appl3.3 (2000), pp. 377–383

work page 2000
[46]

Can Language Models Perform Implicit Bayesian Inference Over User Pref- erence States?

Linlu Qiu et al. “Can Language Models Perform Implicit Bayesian Inference Over User Pref- erence States?” In:The First Workshop on System-2 Reasoning at Scale, NeurIPS’24. 2024

work page 2024
[47]

A survey on active feature acquisition strategies

Arman Rahbar, Linus Aronsson, and Morteza Haghir Chehreghani. “A survey on active feature acquisition strategies”. In:arXiv preprint arXiv:2502.11067(2025)

work page arXiv 2025
[48]

Societal Impacts Research Requires Benchmarks for Creative Composition Tasks

Judy Hanwen Shen and Carlos Guestrin. “Societal Impacts Research Requires Benchmarks for Creative Composition Tasks”. In:arXiv preprint arXiv:2504.06549(2025)

work page arXiv 2025
[49]

Direct alignment with heterogeneous preferences

Ali Shirali et al. “Direct Alignment with Heterogeneous Preferences”. In:arXiv preprint arXiv:2502.16320(2025)

work page arXiv 2025
[50]

Fairness of exposure in rankings

Ashudeep Singh and Thorsten Joachims. “Fairness of exposure in rankings”. In:Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2018, pp. 2219–2228

work page 2018
[51]

arXiv preprint arXiv:2402.05070 , year =

Taylor Sorensen et al. “A roadmap to pluralistic alignment”. In:arXiv preprint arXiv:2402.05070 (2024)

work page arXiv 2024
[52]

APPA: Adaptive Preference Pluralistic Alignment for Fair Federated RLHF of LLMs

Mahmoud Srewa, Tianyu Zhao, and Salma Elmalaki. “APPA: Adaptive Preference Pluralistic Alignment for Fair Federated RLHF of LLMs”. In:arXiv preprint arXiv:2604.04261(2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[53]

Artwhisperer: A dataset for characterizing human-ai in- teractions in artistic creations

Kailas Vodrahalli and James Zou. “Artwhisperer: A dataset for characterizing human-ai in- teractions in artistic creations”. In:arXiv preprint arXiv:2306.08141(2023)

work page arXiv 2023
[54]

Are big recommendation models fair to cold users?

Chuhan Wu et al. “Are big recommendation models fair to cold users?” In:arXiv preprint arXiv:2202.13607(2022)

work page arXiv 2022
[55]

Aligning LLMs with Individual Preferences via Interaction

Shujin Wu et al. “Aligning LLMs with Individual Preferences via Interaction”. In:arXiv preprint arXiv:2410.03642(2024)

work page arXiv 2024
[56]

A New Dialogue Response Generation Agent for Large Language Models by Asking Questions to Detect User’s Intentions

Siwei Wu, Xiangqing Shen, and Rui Xia. “A New Dialogue Response Generation Agent for Large Language Models by Asking Questions to Detect User’s Intentions”. In:arXiv preprint arXiv:2310.03293(2023)

work page arXiv 2023
[57]

Measuring human contribution in ai-assisted content generation

Yueqi Xie et al. “Measuring human contribution in ai-assisted content generation”. In:arXiv preprint arXiv:2408.14792(2024). 19

work page arXiv 2024
[58]

A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models

Zhouhang Xie et al. “A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models”. In:arXiv preprint arXiv:2504.07070(2025)

work page arXiv 2025
[59]

Query Understanding in LLM-based Conversational Information Seeking

Yifei Yuan et al. “Query Understanding in LLM-based Conversational Information Seeking”. In:arXiv preprint arXiv:2504.06356(2025)

work page arXiv 2025
[60]

Analyzing and learning from user interactions for search clarification

Hamed Zamani et al. “Analyzing and learning from user interactions for search clarification”. In:Proceedings of the 43rd international acm sigir conference on research and development in information retrieval. 2020, pp. 1181–1190

work page 2020
[61]

Modeling future conversation turns to teach llms to ask clarifying questions

Michael JQ Zhang, W Bradley Knox, and Eunsol Choi. “Modeling future conversation turns to teach llms to ask clarifying questions”. In:arXiv preprint arXiv:2410.13788(2024)

work page arXiv 2024
[62]

Noveltybench: Evaluating language models for humanlike diversity.arXiv preprint arXiv:2504.05228,

Yiming Zhang et al. “NoveltyBench: Evaluating Creativity and Diversity in Language Mod- els”. In:arXiv preprint arXiv:2504.05228(2025)

work page arXiv 2025
[63]

Solving the apparent diversity-accuracy dilemma of recommender systems

Tao Zhou et al. “Solving the apparent diversity-accuracy dilemma of recommender systems”. In:Proceedings of the National Academy of Sciences107.10 (2010), pp. 4511–4515

work page 2010
[64]

Fairness among new items in cold start recommender systems

Ziwei Zhu et al. “Fairness among new items in cold start recommender systems”. In:Pro- ceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 2021, pp. 767–776. A Derivations A.1 Utility To obtain a formalization of expected utility from the high-level form given in Equation 2, we start by breaking do...

work page 2021