Recognition: no theorem link
When to Ask a Question: Understanding Communication Strategies in Generative AI Tools
Pith reviewed 2026-05-13 01:08 UTC · model grok-4.3
The pith
Generative AI can mitigate bias from preference inference by strategically asking users a few targeted questions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In the stylized user-LLM interaction model, an optimal policy for information elicitation exists that reduces the systematic bias introduced by pure preference inference; the policy uses observed correlations among user preferences to determine how many and which questions to ask before generating content, thereby improving representation of atypical preferences while limiting the burden placed on users.
What carries the argument
A stylized model of user-LLM interaction whose objective trades off user burden against the accuracy of preference representation, with the decision to elicit further information driven by the degree of correlation across users' preferences.
If this is right
- Generative tools can be tuned to ask only the questions that most reduce representation error for atypical users.
- The optimal number of questions rises when preference correlations weaken or when inference error is high.
- Systems that follow the derived policy produce outputs closer to each user's true preferences than systems that always infer everything.
- Efficiency is preserved because elicitation stops once additional questions yield diminishing returns on the representation objective.
Where Pith is reading between the lines
- The same tradeoff logic could guide question-asking in non-generative AI assistants such as recommendation engines or decision-support tools.
- Real-world deployment would require learning the correlation structure from past interactions rather than assuming it is known in advance.
- If the correlation structure changes over time, the optimal elicitation policy would need periodic re-estimation from fresh user data.
Load-bearing premise
Individual user preferences are sufficiently correlated that inference from limited input remains reliable and selective elicitation adds value.
What would settle it
An empirical test showing that when user preferences are drawn from an uncorrelated distribution, the amount of elicitation recommended by the model fails to improve representation accuracy or increases user burden without reducing bias.
Figures
read the original abstract
Generative AI models differ from traditional machine learning tools in that they allow users to provide as much or as little information as they choose in their inputs. This flexibility often leads users to omit certain details, relying on the models to infer and fill in under-specified information based on distributional knowledge of user preferences. Such inferences may privilege majority viewpoints and disadvantage users with atypical preferences, raising concerns about fairness. Unlike more traditional recommender systems, LLMs can explicitly solicit more information from users through natural language. However, while directly eliciting user preferences could increase personalization and mitigate inequality, excessive querying places a burden on users who value efficiency. We develop a stylized model of user-LLM interaction and develop an objective that captures tradeoff between user burden and preference representation. Building on the observation that individual preferences are often correlated, we analyze how AI systems should balance inference and elicitation, characterizing the optimal amount of information to solicit before content generation. Ultimately, we show that information elicitation can mitigate the systematic biases of preference inference, enabling the design of generative tools that better incorporate diverse user perspectives while maintaining efficiency. We complement this theoretical analysis with an empirical evaluation illustrating the model's predictions and exploring their practical implications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a stylized model of user-LLM interaction that formalizes the tradeoff between the user burden of explicit preference elicitation and the gains from reduced bias in content generation. Building on the observation that preferences are often correlated, it characterizes the optimal amount of information to solicit before generation and shows that selective elicitation can mitigate systematic inference biases while preserving efficiency; the theoretical analysis is complemented by an empirical evaluation that illustrates the model's predictions.
Significance. If the central tradeoff result holds under realistic correlation structures, the work supplies a principled, efficiency-aware framework for designing generative tools that better serve users with atypical preferences. The explicit modeling of the burden-representation objective and the inclusion of an empirical component are strengths that could inform both mechanism design and practical interface choices in AI systems.
major comments (2)
- [Stylized Model and Theoretical Analysis] The bias-mitigation and efficiency-preserving claims rest on the assumption that individual preferences are sufficiently correlated for the LLM's prior to be informative for most users (abstract and stylized-model section). No sensitivity analysis across correlation strengths, no bounds on the correlation parameter, and no empirical measurement of correlation in user-LLM interactions are provided. When correlation is weak or context-dependent, the inference step adds little value, the derived solicitation threshold collapses to a corner solution, and the headline result no longer holds.
- [Theoretical Analysis] The objective that balances user burden against preference representation is introduced without an explicit functional form or derivation of the optimal solicitation quantity (theoretical analysis). It is therefore impossible to verify whether the claimed interior optimum is parameter-free or whether it reduces to a fitted correlation weight, as the stress-test concern anticipates.
minor comments (2)
- [Empirical Evaluation] The empirical evaluation is described as 'illustrating the model's predictions,' but the manuscript does not state which specific predictions are tested or report quantitative metrics linking theory to data.
- Notation for the key quantities (e.g., burden cost, representation error, correlation parameter) is introduced informally; a compact table or definitions section would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive comments that highlight the importance of clarifying assumptions and derivations in our stylized model. We address each major comment below and outline planned revisions to improve robustness and transparency.
read point-by-point responses
-
Referee: The bias-mitigation and efficiency-preserving claims rest on the assumption that individual preferences are sufficiently correlated for the LLM's prior to be informative for most users (abstract and stylized-model section). No sensitivity analysis across correlation strengths, no bounds on the correlation parameter, and no empirical measurement of correlation in user-LLM interactions are provided. When correlation is weak or context-dependent, the inference step adds little value, the derived solicitation threshold collapses to a corner solution, and the headline result no longer holds.
Authors: We agree that the correlation assumption is central, as the model is stylized to demonstrate the tradeoff when preferences exhibit positive correlation (a standard modeling choice in preference aggregation). The headline result is conditional on this regime. In revision, we will add an explicit sensitivity analysis varying the correlation parameter across [0,1], derive the analytical bound on correlation strength required for an interior optimum (specifically, when marginal bias reduction exceeds elicitation cost), and show that the threshold collapses to full elicitation for weak correlation, consistent with the referee's observation. For empirical measurement of correlations, we cannot conduct a new large-scale study in this revision but will expand the discussion to propose estimation approaches using logged interactions. revision: yes
-
Referee: The objective that balances user burden against preference representation is introduced without an explicit functional form or derivation of the optimal solicitation quantity (theoretical analysis). It is therefore impossible to verify whether the claimed interior optimum is parameter-free or whether it reduces to a fitted correlation weight, as the stress-test concern anticipates.
Authors: We acknowledge that the presentation of the objective could be more explicit. The objective is defined as the sum of a linear elicitation cost term (c times number of questions solicited) and the expected representation error (posterior-weighted deviation from the user's true preference vector). The optimal quantity is obtained by comparing the marginal reduction in expected error from one additional question against the marginal cost c, yielding a closed-form threshold condition. In the revision we will include the full functional form, the complete first-order condition derivation, and a proof that the interior solution depends explicitly on the correlation and bias parameters rather than being a fitted weight. This will make verification straightforward. revision: yes
- Empirical measurement of correlation strengths in real user-LLM interactions, which would require a separate data-collection study beyond the scope of the current theoretical and illustrative-empirical manuscript.
Circularity Check
No circularity: stylized model derives tradeoff result from stated assumptions
full rationale
The paper constructs a stylized model with an explicit objective trading off user burden against preference representation, then derives the optimal solicitation quantity and the bias-mitigation property from that model under the maintained assumption of correlated preferences. This is a standard forward derivation from assumptions to conclusions rather than any reduction of the claimed result to its inputs by construction. No equations, fitted parameters, or self-citations are shown to collapse the central claim; the correlation observation is treated as an external modeling premise, not a fitted or self-referential quantity. The empirical complement is presented separately and does not alter the theoretical chain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Individual preferences are often correlated
Reference graph
Works this paper leans on
-
[1]
Parand A Alamdari, Soroush Ebadian, and Ariel D Procaccia. “Policy aggregation”. In: Advances in Neural Information Processing Systems37 (2024), pp. 68308–68329
work page 2024
-
[2]
Star-gate: Teaching language models to ask clarifying questions
Chinmaya Andukuri et al. “Star-gate: Teaching language models to ask clarifying questions”. In:arXiv preprint arXiv:2403.19154(2024)
-
[3]
On the measurement of inequality
Anthony B Atkinson et al. “On the measurement of inequality”. In:Journal of economic theory2.3 (1970), pp. 244–263
work page 1970
-
[4]
Christopher M Bishop.Pattern recognition and machine learning. Vol. 4. 4. Springer, 2006
work page 2006
-
[5]
Measures of relative equality and their meaning in terms of social welfare
Charles Blackorby and David Donaldson. “Measures of relative equality and their meaning in terms of social welfare”. In:Journal of economic theory18.1 (1978), pp. 59–80
work page 1978
-
[6]
Picking on the same person: Does algorithmic monoculture lead to outcome homogenization?
Rishi Bommasani et al. “Picking on the same person: Does algorithmic monoculture lead to outcome homogenization?” In:Advances in Neural Information Processing Systems35 (2022), pp. 3663–3678
work page 2022
-
[7]
Human-AI interactions and societal pit- falls
Francisco Castro, Jian Gao, and S´ ebastien Martin. “Human-AI interactions and societal pit- falls”. In:arXiv preprint arXiv:2309.10448(2023)
-
[8]
arXiv preprint arXiv:2406.08469 , year=
Daiwei Chen et al. “Pal: Pluralistic alignment framework for learning from heterogeneous preferences”. In:arXiv preprint arXiv:2406.08469(2024)
-
[9]
Why is my classifier discriminatory?
Irene Chen, Fredrik D Johansson, and David Sontag. “Why is my classifier discriminatory?” In:Advances in neural information processing systems31 (2018)
work page 2018
-
[10]
Vincent Conitzer et al. “Social choice should guide ai alignment in dealing with diverse human feedback”. In:arXiv preprint arXiv:2404.10271(2024)
-
[11]
Revisiting fair-PAC learning and the axioms of cardinal welfare
Cyrus Cousins. “Revisiting fair-PAC learning and the axioms of cardinal welfare”. In:Inter- national Conference on Artificial Intelligence and Statistics. PMLR. 2023, pp. 6422–6442
work page 2023
-
[12]
Mapping social choice theory to RLHF
Jessica Dai and Eve Fleisig. “Mapping social choice theory to RLHF”. In:arXiv preprint arXiv:2404.13038(2024)
-
[13]
Proactive conversational ai: A comprehensive survey of advancements and opportunities
Yang Deng et al. “Proactive conversational ai: A comprehensive survey of advancements and opportunities”. In:ACM Transactions on Information Systems43.3 (2025), pp. 1–45
work page 2025
-
[14]
Generative AI enhances individual creativity but reduces the collective diversity of novel content
Anil R Doshi and Oliver P Hauser. “Generative AI enhances individual creativity but reduces the collective diversity of novel content”. In:Science Advances10.28 (2024), eadn5290
work page 2024
-
[15]
Yulin Dou and Jiangming Liu. “TO-GATE: Clarifying Questions and Summarizing Responses with Trajectory Optimization for Eliciting Human Preference”. In:Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 40. 36. 2026, pp. 30548–30556
work page 2026
-
[16]
Problem Solving Through Human-AI Preference-Based Coopera- tion
Subhabrata Dutta et al. “Problem Solving Through Human-AI Preference-Based Coopera- tion”. In:arXiv preprint arXiv:2408.07461(2024)
-
[17]
Adaptive Pluralistic Alignment: A pipeline for dynamic artificial democracy
Rachel Freedman. “Adaptive Pluralistic Alignment: A pipeline for dynamic artificial democ- racy”. In:arXiv preprint arXiv:2605.01642(2026)
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[18]
Axioms for ai alignment from human feedback
Luise Ge et al. “Axioms for ai alignment from human feedback”. In:arXiv preprint arXiv:2405.14758 (2024)
-
[19]
Jian Guan et al. “A Survey on Personalized Alignment–The Missing Piece for Large Language Models in Real-World Applications”. In:arXiv preprint arXiv:2503.17003(2025). 17
-
[20]
Nava Haghighi et al. “Ontologies in Design: How Imagining a Tree Reveals Possibilites and Assumptions in Large Language Models”. In:arXiv preprint arXiv:2504.03029(2025)
-
[21]
Dan Hendrycks.Introduction to AI safety, ethics, and society. Taylor & Francis, 2025
work page 2025
-
[22]
A probabilistic model to resolve diversity–accuracy challenge of recommendation systems
Amin Javari and Mahdi Jalili. “A probabilistic model to resolve diversity–accuracy challenge of recommendation systems”. In:Knowledge and Information Systems44 (2015), pp. 609– 627
work page 2015
-
[23]
Nantonac collaborative filtering: recommendation based on order re- sponses
Toshihiro Kamishima. “Nantonac collaborative filtering: recommendation based on order re- sponses”. In:Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. 2003, pp. 583–588
work page 2003
-
[24]
Efficient clustering for orders
Toshihiro Kamishima and Shotaro Akaho. “Efficient clustering for orders”. In:Sixth IEEE International Conference on Data Mining-Workshops (ICDMW’06). IEEE. 2006, pp. 274– 278
work page 2006
-
[25]
INTERACT: Enabling Interactive, Question-Driven Learning in Large Language Models
Aum Kendapadi et al. “INTERACT: Enabling Interactive, Question-Driven Learning in Large Language Models”. In:arXiv preprint arXiv:2412.11388(2024)
-
[26]
Kimiya Keyvan and Jimmy Xiangji Huang. “How to approach ambiguous queries in conversa- tional search: A survey of techniques, approaches, tools, and challenges”. In:ACM Computing Surveys55.6 (2022), pp. 1–40
work page 2022
-
[27]
Hannah Rose Kirk et al. “Personalisation within bounds: A risk taxonomy and policy frame- work for the alignment of large language models with personalised feedback”. In:arXiv preprint arXiv:2303.05453(2023)
-
[28]
Simplicity creates inequity: implications for fair- ness, stereotypes, and interpretability
Jon Kleinberg and Sendhil Mullainathan. “Simplicity creates inequity: implications for fair- ness, stereotypes, and interpretability”. In:Proceedings of the 2019 ACM Conference on Eco- nomics and Computation. 2019, pp. 807–808
work page 2019
-
[29]
Algorithmic monoculture and social welfare
Jon Kleinberg and Manish Raghavan. “Algorithmic monoculture and social welfare”. In: Proceedings of the National Academy of Sciences118.22 (2021), e2018340118
work page 2021
-
[30]
Patrick Tser Jern Kon, Jiachen Liu, Yiming Qiu, Weijun Fan, Ting He Lei Lin, Haoran Zhang, Owen M
Katarzyna Kobalczyk et al. “Active Task Disambiguation with LLMs”. In:arXiv preprint arXiv:2502.04485(2025)
-
[31]
Matched pair calibration for ranking fairness
Hannah Korevaar et al. “Matched pair calibration for ranking fairness”. In:arXiv preprint arXiv:2306.03775(2023)
-
[32]
Endless forms most similar: the dearth of the author in AI-supported art
Max Kreminski. “Endless forms most similar: the dearth of the author in AI-supported art”. In:AI & SOCIETY(2025), pp. 1–15
work page 2025
-
[33]
Clam: Selective clarification for ambiguous questions with generative language models
Lorenz Kuhn, Yarin Gal, and Sebastian Farquhar. “Clam: Selective clarification for ambiguous questions with generative language models”. In:arXiv preprint arXiv:2212.07769(2022)
-
[34]
Diversity in recommender systems–A survey
Matevˇ z Kunaver and Tomaˇ z Poˇ zrl. “Diversity in recommender systems–A survey”. In:Knowledge- based systems123 (2017), pp. 154–162
work page 2017
-
[35]
QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?
Belinda Z Li, Been Kim, and Zi Wang. “QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?” In:arXiv preprint arXiv:2503.22674(2025)
-
[36]
Eliciting human preferences with language models
Belinda Z Li et al. “Eliciting human preferences with language models”. In:arXiv preprint arXiv:2310.11589(2023)
-
[37]
Active Feature Acquisition with Generative Surrogate Models
Yang Li and Junier Oliva. “Active Feature Acquisition with Generative Surrogate Models”. In:Proceedings of the 38th International Conference on Machine Learning. Ed. by Marina Meila and Tong Zhang. Vol. 139. Proceedings of Machine Learning Research. PMLR, July 2021, pp. 6450–6459.url:https://proceedings.mlr.press/v139/li21p.html. 18
work page 2021
-
[38]
Fairness in recommendation: Foundations, methods, and applications
Yunqi Li et al. “Fairness in recommendation: Foundations, methods, and applications”. In: ACM Transactions on Intelligent Systems and Technology14.5 (2023), pp. 1–48
work page 2023
-
[39]
Transferable fairness for cold-start recommendation
Yunqi Li et al. “Transferable fairness for cold-start recommendation”. In:arXiv preprint arXiv:2301.10665(2023)
-
[40]
What you say= what you want? Teaching humans to articulate require- ments for LLMs
Qianou Ma et al. “What you say= what you want? Teaching humans to articulate require- ments for LLMs”. In:arXiv preprint arXiv:2409.08775(2024)
-
[41]
Asking Clarifying Questions for Preference Elicitation With Large Language Models
Ali Montazeralghaem et al. “Asking Clarifying Questions for Preference Elicitation With Large Language Models”. In:arXiv preprint arXiv:2510.12015(2025)
-
[42]
Kibum Moon, Adam Green, and Kostadin Kushlev.Homogenizing Effect of Large Language Model (LLM) on Creative Diversity: An Empirical Comparison. 2024
work page 2024
-
[43]
Learning social welfare functions
Kanad Pardeshi et al. “Learning social welfare functions”. In:Advances in Neural Information Processing Systems37 (2024), pp. 41733–41766
work page 2024
-
[44]
Reconciling the accuracy-diversity trade-off in recommendations
Kenny Peng et al. “Reconciling the accuracy-diversity trade-off in recommendations”. In: Proceedings of the ACM Web Conference 2024. 2024, pp. 1318–1329
work page 2024
-
[45]
Feng Qi et al. “New proofs of weighted power mean inequalities and monotonicity for gener- alized weighted mean values”. In:Math. Inequal. Appl3.3 (2000), pp. 377–383
work page 2000
-
[46]
Can Language Models Perform Implicit Bayesian Inference Over User Pref- erence States?
Linlu Qiu et al. “Can Language Models Perform Implicit Bayesian Inference Over User Pref- erence States?” In:The First Workshop on System-2 Reasoning at Scale, NeurIPS’24. 2024
work page 2024
-
[47]
A survey on active feature acquisition strategies
Arman Rahbar, Linus Aronsson, and Morteza Haghir Chehreghani. “A survey on active feature acquisition strategies”. In:arXiv preprint arXiv:2502.11067(2025)
-
[48]
Societal Impacts Research Requires Benchmarks for Creative Composition Tasks
Judy Hanwen Shen and Carlos Guestrin. “Societal Impacts Research Requires Benchmarks for Creative Composition Tasks”. In:arXiv preprint arXiv:2504.06549(2025)
-
[49]
Direct alignment with heterogeneous preferences
Ali Shirali et al. “Direct Alignment with Heterogeneous Preferences”. In:arXiv preprint arXiv:2502.16320(2025)
-
[50]
Fairness of exposure in rankings
Ashudeep Singh and Thorsten Joachims. “Fairness of exposure in rankings”. In:Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2018, pp. 2219–2228
work page 2018
-
[51]
arXiv preprint arXiv:2402.05070 , year =
Taylor Sorensen et al. “A roadmap to pluralistic alignment”. In:arXiv preprint arXiv:2402.05070 (2024)
-
[52]
APPA: Adaptive Preference Pluralistic Alignment for Fair Federated RLHF of LLMs
Mahmoud Srewa, Tianyu Zhao, and Salma Elmalaki. “APPA: Adaptive Preference Pluralistic Alignment for Fair Federated RLHF of LLMs”. In:arXiv preprint arXiv:2604.04261(2026)
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[53]
Artwhisperer: A dataset for characterizing human-ai in- teractions in artistic creations
Kailas Vodrahalli and James Zou. “Artwhisperer: A dataset for characterizing human-ai in- teractions in artistic creations”. In:arXiv preprint arXiv:2306.08141(2023)
-
[54]
Are big recommendation models fair to cold users?
Chuhan Wu et al. “Are big recommendation models fair to cold users?” In:arXiv preprint arXiv:2202.13607(2022)
-
[55]
Aligning LLMs with Individual Preferences via Interaction
Shujin Wu et al. “Aligning LLMs with Individual Preferences via Interaction”. In:arXiv preprint arXiv:2410.03642(2024)
-
[56]
Siwei Wu, Xiangqing Shen, and Rui Xia. “A New Dialogue Response Generation Agent for Large Language Models by Asking Questions to Detect User’s Intentions”. In:arXiv preprint arXiv:2310.03293(2023)
-
[57]
Measuring human contribution in ai-assisted content generation
Yueqi Xie et al. “Measuring human contribution in ai-assisted content generation”. In:arXiv preprint arXiv:2408.14792(2024). 19
-
[58]
A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models
Zhouhang Xie et al. “A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models”. In:arXiv preprint arXiv:2504.07070(2025)
-
[59]
Query Understanding in LLM-based Conversational Information Seeking
Yifei Yuan et al. “Query Understanding in LLM-based Conversational Information Seeking”. In:arXiv preprint arXiv:2504.06356(2025)
-
[60]
Analyzing and learning from user interactions for search clarification
Hamed Zamani et al. “Analyzing and learning from user interactions for search clarification”. In:Proceedings of the 43rd international acm sigir conference on research and development in information retrieval. 2020, pp. 1181–1190
work page 2020
-
[61]
Modeling future conversation turns to teach llms to ask clarifying questions
Michael JQ Zhang, W Bradley Knox, and Eunsol Choi. “Modeling future conversation turns to teach llms to ask clarifying questions”. In:arXiv preprint arXiv:2410.13788(2024)
-
[62]
Noveltybench: Evaluating language models for humanlike diversity.arXiv preprint arXiv:2504.05228,
Yiming Zhang et al. “NoveltyBench: Evaluating Creativity and Diversity in Language Mod- els”. In:arXiv preprint arXiv:2504.05228(2025)
-
[63]
Solving the apparent diversity-accuracy dilemma of recommender systems
Tao Zhou et al. “Solving the apparent diversity-accuracy dilemma of recommender systems”. In:Proceedings of the National Academy of Sciences107.10 (2010), pp. 4511–4515
work page 2010
-
[64]
Fairness among new items in cold start recommender systems
Ziwei Zhu et al. “Fairness among new items in cold start recommender systems”. In:Pro- ceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 2021, pp. 767–776. A Derivations A.1 Utility To obtain a formalization of expected utility from the high-level form given in Equation 2, we start by breaking do...
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.