pith. machine review for the scientific record. sign in

arxiv: 2602.04713 · v2 · submitted 2026-02-04 · 💻 cs.HC · cs.AI· cs.CV

Recognition: no theorem link

Adaptive Prompt Elicitation for Text-to-Image Generation

Authors on Pith no claims yet

Pith reviewed 2026-05-16 07:26 UTC · model grok-4.3

classification 💻 cs.HC cs.AIcs.CV
keywords text-to-image generationprompt elicitationuser intent inferenceadaptive interfacesinformation theoryvisual querieshuman-computer interaction
0
0 comments X

The pith

Adaptive Prompt Elicitation improves text-to-image alignment by using visual queries to infer and refine user intent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Adaptive Prompt Elicitation as a method to help users obtain images that better match their goals when text prompts are vague or incomplete. It models the task as interactive intent inference, where language model priors convert unclear preferences into concrete feature requirements and an information-theoretic rule selects visual queries that gather the most useful feedback. These responses are then assembled into revised prompts that the image model uses directly. Experiments on standard benchmarks show gains in alignment and speed, while a study with 128 people found users rated the outputs 19.8 percent higher in perceived match without reporting extra effort. The result is presented as a practical alternative to typing ever-longer text instructions.

Core claim

APE represents latent user intent as interpretable feature requirements using language model priors, adaptively generates visual queries under an information-theoretic framework to maximize information gain, and compiles the elicited requirements into effective prompts that produce stronger alignment with user intent than standard text-only interaction.

What carries the argument

The information-theoretic framework for interactive intent inference that selects visual queries to reduce uncertainty about user feature preferences.

If this is right

  • APE produces stronger alignment with improved efficiency on IDEA-Bench and DesignBench.
  • A 128-participant study on user-defined tasks reports 19.8 percent higher perceived alignment without added workload.
  • Users refine prompts through visual queries rather than writing extensive text.
  • The method supplies a principled complement to ordinary prompt-based interaction with text-to-image models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same elicitation loop could be tested on text-to-video or text-to-3D generators to check whether intent matching improves there too.
  • Integrating the inferred features directly into image-editing controls might let users adjust outputs without returning to text.
  • Lowering the writing barrier could widen access to generative tools for users who find prompt engineering difficult.

Load-bearing premise

Language model priors can turn vague user intent into reliable, interpretable feature lists and users will give accurate answers to the generated visual queries across varied tasks.

What would settle it

A side-by-side test in which participants give the same initial ambiguous prompt to both APE and a standard text-only system, then rate how well the resulting images match their stated goals; if APE shows no reliable improvement in match scores, the central claim fails.

Figures

Figures reproduced from arXiv: 2602.04713 by Antti Oulasvirta, Lena Hegemann, Shuai Ma, Xiaofu Jin, Xinyi Wen.

Figure 1
Figure 1. Figure 1: Manual prompt engineering (A) requires users to iterate extensively. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: User interface design. The left area functions as feature specification, consisting of (A) question panel that presents [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: APE’s computational pipeline. The system maintains a user intent model over visual features, adaptively generates [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The progress of alignment over iterations, evaluated by image-image similarity (DreamSim [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Alignment ratings (± SE) for four dimensions across conditions. Participants using APE reported significantly higher scores for all dimensions compared to Baseline. As￾terisks indicate significance levels (* 𝑝 < 0.05, ** 𝑝 < 0.01, *** 𝑝 < 0.001). that APE effectively helps users communicate preferences they struggle to articulate textually. The intentionality gap showed marked improvement with APE (𝑀 = 4.9… view at source ↗
Figure 6
Figure 6. Figure 6: Alignment ratings (sum ± SD) by condition and prior experience level. Mental Demand Physical Demand Temporal Demand Performance Effort Frustration 1 2 3 4 5 6 7 Ratings NASA-TLX Baseline APE [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: NASA-TLX ratings (± SE) across six dimensions for Baseline and APE conditions. No significant differences were found between conditions (all 𝑝 > 0.05). which interaction frequency steadily decreases. The figure shows a gradual drop-off in engagement over successive iterations. Across iterations, users most commonly engage in repeated cy￾cles of responding to visual outputs by selecting an answer option, fo… view at source ↗
Figure 8
Figure 8. Figure 8: Sankey diagram of user interaction flows across iterations ( [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Interaction trace from an APE participant in an interior design task. Beginning from a brief functional prompt, the [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Alignment ratings by participants 61 49 37 25 13 1 11 23 35 47 Number of Responses Baseline APE Baseline APE Baseline APE Baseline APE Baseline APE Baseline APE How insecure, discouraged, irritated, stressed, and annoyed were you? How hard did you have to work to accomplish your level of performance? How successful were you in accomplishing what you were asked to do? How hurried or rushed was the pace of … view at source ↗
Figure 11
Figure 11. Figure 11: NASA TLX ratings by participants [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Examples of resulting images across methods in the technical evaluation. For each row, we show the target reference [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Examples of generated images from an APE participant. The original prompt (top) provides a general theme; the [PITH_FULL_IMAGE:figures/full_fig_p024_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Examples of generated images from a Baseline participant. Across 10 generation attempts, the participant struggled [PITH_FULL_IMAGE:figures/full_fig_p025_14.png] view at source ↗
read the original abstract

Aligning text-to-image generation with user intent remains challenging, as users frequently provide ambiguous inputs and struggle with model idiosyncrasies. We propose Adaptive Prompt Elicitation (APE), a technique that adaptively poses visual queries to help users refine prompts without extensive writing. Our technical contribution is a formulation of interactive intent inference under an information-theoretic framework. APE represents latent user intent as interpretable feature requirements using language model priors, adaptively generates visual queries, and compiles elicited requirements into effective prompts. Evaluation on IDEA-Bench and DesignBench shows that APE achieves stronger alignment with improved efficiency. A user study with 128 participants on user-defined tasks demonstrates 19.8% higher perceived alignment without increased workload. Our work contributes a principled approach to prompting that offers an effective and efficient complement to the prevailing prompt-based interaction paradigm with text-to-image models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes Adaptive Prompt Elicitation (APE), a technique for text-to-image generation that uses an information-theoretic framework to adaptively generate visual queries. These queries elicit interpretable feature requirements from users based on language model priors, which are then compiled into refined prompts. The approach is evaluated on IDEA-Bench and DesignBench, where it reportedly achieves stronger alignment and improved efficiency compared to baselines. A user study with 128 participants on user-defined tasks shows 19.8% higher perceived alignment without increased workload.

Significance. If the empirical results hold under scrutiny, APE represents a meaningful contribution to HCI for generative AI by shifting from static prompt writing to an interactive, query-driven elicitation process grounded in information theory. This could reduce user effort in aligning outputs with intent, offering a scalable complement to existing T2I interfaces. The use of LM priors for feature interpretation and adaptive querying is a potentially generalizable idea that addresses a practical pain point in creative workflows.

major comments (3)
  1. [Abstract and §3] Abstract and §3 (formulation): The central information-theoretic framework for adaptive visual query generation is described at a high level but lacks the explicit objective function, entropy-based selection criterion, or pseudocode for how queries are chosen given current beliefs about user intent. This detail is load-bearing for claims of improved efficiency.
  2. [Evaluation sections] Evaluation sections: The abstract states that APE achieves 'stronger alignment with improved efficiency' on IDEA-Bench and DesignBench, yet provides no quantitative metrics (e.g., FID, CLIP score deltas), baseline definitions, or statistical significance tests. Without these, the support for the primary empirical claim cannot be verified.
  3. [User study section] User study section: The reported 19.8% higher perceived alignment with 128 participants is a key result, but the manuscript summary omits details on questionnaire design, workload measurement instrument (e.g., NASA-TLX), randomization procedure, or handling of multiple comparisons. These controls are necessary to substantiate the 'without increased workload' conclusion.
minor comments (2)
  1. [Abstract] Abstract: Consider specifying the exact alignment metrics and efficiency measures used in the benchmark evaluations to improve immediate readability.
  2. [Technical sections] Notation: Ensure consistent use of symbols for information gain and feature requirements across the technical sections to avoid ambiguity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. We have carefully reviewed each major comment and provide point-by-point responses below. Where the comments identify opportunities for greater clarity or completeness, we have revised the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (formulation): The central information-theoretic framework for adaptive visual query generation is described at a high level but lacks the explicit objective function, entropy-based selection criterion, or pseudocode for how queries are chosen given current beliefs about user intent. This detail is load-bearing for claims of improved efficiency.

    Authors: We agree that the formulation benefits from explicit formalization. In the revised manuscript, we have expanded §3 to state the objective function (maximization of expected information gain, defined as mutual information between the query response and the posterior over latent intent features), the entropy-based selection criterion (selecting the visual query that minimizes expected posterior entropy), and pseudocode for the full adaptive loop: initialize uniform prior over features, iteratively select query, elicit binary response, update posterior via Bayes rule, and terminate when entropy falls below threshold. These additions directly substantiate the efficiency claims. revision: yes

  2. Referee: [Evaluation sections] Evaluation sections: The abstract states that APE achieves 'stronger alignment with improved efficiency' on IDEA-Bench and DesignBench, yet provides no quantitative metrics (e.g., FID, CLIP score deltas), baseline definitions, or statistical significance tests. Without these, the support for the primary empirical claim cannot be verified.

    Authors: We acknowledge that the evaluation presentation can be strengthened for verifiability. The revised manuscript now includes a dedicated results table with quantitative metrics (CLIP score and FID deltas relative to baselines), explicit definitions of all baselines (standard prompting, non-adaptive random querying, and fixed-template methods), and statistical significance tests (paired t-tests with reported p-values). Efficiency is quantified as mean number of queries to reach target alignment. These details were part of our analysis and are now foregrounded in the main text. revision: yes

  3. Referee: [User study section] User study section: The reported 19.8% higher perceived alignment with 128 participants is a key result, but the manuscript summary omits details on questionnaire design, workload measurement instrument (e.g., NASA-TLX), randomization procedure, or handling of multiple comparisons. These controls are necessary to substantiate the 'without increased workload' conclusion.

    Authors: We agree that methodological transparency is essential. The revised user study section now details the questionnaire (7-point Likert items for perceived alignment and intent match), workload assessment via the full NASA-TLX instrument, randomization of task order and condition assignment, and application of Bonferroni correction for multiple comparisons. These additions confirm that the reported alignment gain occurred without a significant workload increase. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper introduces Adaptive Prompt Elicitation (APE) as a new formulation of interactive intent inference under an information-theoretic framework, where latent user intent is represented as interpretable feature requirements via language model priors, followed by adaptive visual query generation and prompt compilation. No equations, fitted parameters, or self-citations are present in the provided text that would reduce the claimed alignment gains or efficiency improvements to quantities defined by the inputs themselves. The central claims are supported by direct empirical evaluations on IDEA-Bench, DesignBench, and a 128-participant user study, which serve as independent external benchmarks rather than internal reductions. This makes the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on the assumption that language model priors can usefully encode user intent as feature requirements and that information gain can guide effective visual query selection; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)
  • domain assumption Language model priors can represent latent user intent as interpretable feature requirements
    Invoked as the basis for the interactive intent inference formulation.
  • domain assumption Adaptive visual queries selected by information-theoretic criteria will efficiently elicit requirements that improve prompt alignment
    Central to the technical contribution described in the abstract.

pith-pipeline@v0.9.0 · 5455 in / 1275 out tokens · 31007 ms · 2026-05-16T07:26:21.815946+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

100 extracted references · 100 canonical work pages · 4 internal anchors

  1. [1]

    Woźniak, Julia Dominiak, Andrzej Ro- manowski, Jakob Karolus, and Stanislav Frolov

    Krzysztof Adamkiewicz, Paweł W. Woźniak, Julia Dominiak, Andrzej Ro- manowski, Jakob Karolus, and Stanislav Frolov. 2025. PromptMap: An Alternative Interaction Style for AI-Based Image Generation. InProceedings of the 30th Interna- tional Conference on Intelligent User Interfaces (IUI ’25). Association for Computing Machinery, New York, NY, USA, 1162–1176...

  2. [2]

    Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, and Aylin Caliskan. 2023. Easily Accessible Text-to-Image Generation Amplifies Demo- graphic Stereotypes at Large Scale. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’23)...

  3. [3]

    Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. 2024. Training Diffusion Models with Reinforcement Learning. In The Twelfth Interna- tional Conference on Learning Representations

  4. [4]

    Stephen Brade, Bryan Wang, Mauricio Sousa, Sageev Oore, and Tovi Grossman

  5. [5]

    In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23)

    Promptify: Text-to-Image Generation through Interactive Prompt Explo- ration with Large Language Models. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23) . Association for Computing Machinery, New York, NY, USA, 1–14. doi:10.1145/3586183.3606725

  6. [6]

    Eric Brochu, Tyson Brochu, and Nando de Freitas. 2010. A Bayesian interactive optimization approach to procedural animation design. In Proceedings of the 2010 ACM SIGGRAPH/Eurographics Symposium on Computer Animation . 103–112

  7. [7]

    Chengkun Cai, Haoliang Liu, Xu Zhao, Zhongyu Jiang, Tianfang Zhang, Zongkai Wu, John Lee, Jenq-Neng Hwang, and Lei Li. 2025. Bayesian Optimization for Controlled Image Editing via LLMs. In Findings of the Association for Computa- tional Linguistics: ACL 2025. Association for Computational Linguistics, Vienna, Austria, 10045–10056. doi:10.18653/v1/2025.fin...

  8. [8]

    Tingfeng Cao, Chengyu Wang, Bingyan Liu, Ziheng Wu, Jinhui Zhu, and Jun Huang. 2023. BeautifulPrompt: Towards Automatic Prompt Engineering for Text- to-Image Synthesis. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track . Association for Computational Linguistics, Singapore, 1–11. doi:10.18653/v1/2023...

  9. [9]

    Jane Castleman and Aleksandra Korolova. 2025. Adultification Bias in LLMs and Text-to-Image Models. In Proceedings of the 2025 ACM Conference on Fair- ness, Accountability, and Transparency (FAccT ’25). Association for Computing Machinery, New York, NY, USA, 2751–2767. doi:10.1145/3715275.3732178

  10. [10]

    Kathryn Chaloner and Isabella Verdinelli. 1995. Bayesian Experimental Design: A Review. Statist. Sci. 10, 3 (1995), 273–304

  11. [11]

    Zijie Chen, Lichao Zhang, Fangsheng Weng, Lili Pan, and Zhenzhong Lan. 2024. Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7727–7736

  12. [12]

    Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2017. Deep Reinforcement Learning from Human Preferences. Advances in Neural Information Processing Systems 30 (2017)

  13. [13]

    Wei Chu and Zoubin Ghahramani. 2005. Preference Learning with Gaussian Processes. In Proceedings of the 22nd International Conference on Machine Learning - ICML ’05. ACM Press, Bonn, Germany, 137–144. doi:10.1145/1102351.1102369

  14. [14]

    John Joon Young Chung and Eytan Adar. 2023. PromptPaint: Steering Text- to-Image Generation Through Paint Medium-like Interactions. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–17. doi:10.1145/3586183.3606777

  15. [15]

    Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, and Alexandre Defossez. 2023. Simple and Controllable Music Generation. Advances in Neural Information Processing Systems 36 (2023), 47704–47720

  16. [16]

    Siddhartha Datta, Alexander Ku, Deepak Ramachandran, and Peter Anderson

  17. [17]

    InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

    Prompt Expansion for Adaptive Text-to-Image Generation. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Bangkok, Thailand, 3449–3476. doi:10.18653/v1/2024.acl-long.189

  18. [18]

    Sebastian Deterding, Jonathan Hook, Rebecca Fiebrink, Marco Gillies, Jeremy Gow, Memo Akten, Gillian Smith, Antonios Liapis, and Kate Compton. 2017. Mixed-Initiative Creative Interfaces. In Proceedings of the 2017 CHI Confer- ence Extended Abstracts on Human Factors in Computing Systems (CHI EA ’17). Association for Computing Machinery, New York, NY, USA,...

  19. [19]

    Moreno D’Incà, Elia Peruzzo, Massimiliano Mancini, Dejia Xu, Vidit Goe, Xingqian Xu, Zhangyang Wang, Humphrey Shi, and Nicu Sebe. 2024. OpenBias: Open-Set Bias Detection in Text-to-Image Generative Models. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 12225–12235. doi:10.1109/CVPR52733.2024.01162

  20. [20]

    Jerry Alan Fails and Dan R. Olsen. 2003. Interactive Machine Learning. In Proceedings of the 8th International Conference on Intelligent User Interfaces (IUI ’03). Association for Computing Machinery, New York, NY, USA, 39–45. doi:10. 1145/604045.604056

  21. [21]

    Xianzhe Fan, Qing Xiao, Xuhui Zhou, Jiaxin Pei, Maarten Sap, Zhicong Lu, and Hong Shen. 2025. User-Driven Value Alignment: Understanding Users’ Perceptions and Strategies for Addressing Biased and Discriminatory Statements in AI Companions. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing...

  22. [22]

    Franz Faul, Edgar Erdfelder, Axel Buchner, and Albert-Georg Lang. 2009. Sta- tistical power analyses using G* Power 3.1: Tests for correlation and regression analyses. Behavior research methods 41, 4 (2009), 1149–1160

  23. [23]

    Yingchaojie Feng, Xingbo Wang, Kam Kwai Wong, Sijia Wang, Yuhong Lu, Min- feng Zhu, Baicheng Wang, and Wei Chen. 2024. PromptMagician: Interactive Prompt Engineering for Text-to-Image Creation. IEEE Transactions on Visualiza- tion and Computer Graphics 30, 1 (Jan. 2024), 295–305. doi:10.1109/TVCG.2023. 3327168

  24. [24]

    Stephanie Fu, Netanel Tamir, Shobhita Sundaram, Lucy Chai, Richard Zhang, Tali Dekel, and Phillip Isola. 2023. DreamSim: Learning New Dimensions of Human Visual Similarity Using Synthetic Data. Advances in Neural Information Processing Systems 36 (2023), 50742–50768

  25. [25]

    Yarin Gal, Riashat Islam, and Zoubin Ghahramani. 2017. Deep Bayesian Active Learning with Image Data. In Proceedings of the 34th International Conference on Machine Learning. PMLR, 1183–1192

  26. [26]

    Lawrence

    Javier González, Zhenwen Dai, Andreas Damianou, and Neil D. Lawrence. 2017. Preferential Bayesian Optimization. In Proceedings of the 34th International Con- ference on Machine Learning . PMLR, 1282–1291

  27. [27]

    Yuhan Guo, Hanning Shao, Can Liu, Kai Xu, and Xiaoru Yuan. 2025. PrompTHis: Visualizing the Process and Influence of Prompt Editing during Text-to-Image Creation. 31, 9 (2025), 4547–4559. doi:10.1109/TVCG.2024.3408255

  28. [28]

    Shashank Gupta, Vaishnavi Shrivastava, Ameet Deshpande, Ashwin Kalyan, Peter Clark, Ashish Sabharwal, and Tushar Khot. 2024. Bias Runs Deep: Im- plicit Reasoning Biases in Persona-Assigned LLMs. In The Twelfth International Conference on Learning Representations

  29. [29]

    Meera Hahn, Wenjun Zeng, Nithish Kannen, Rich Galt, Kartikeya Badola, Been Kim, and Zi Wang. 2025. Proactive Agents for Multi-Turn Text-to-Image Gen- eration Under Uncertainty. In Forty-Second International Conference on Machine Learning

  30. [30]

    Shu-Jung Han and Susan R. Fussell. 2025. Understanding User Perceptions and the Role of AI Image Generators in Image Creation Workflows. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (New York, NY, USA, 2025-04-25) (CHI ’25). Association for Computing Machinery, 1–17. doi:10.1145/3706598.3713227

  31. [31]

    Yaru Hao, Zewen Chi, Li Dong, and Furu Wei. 2023. Optimizing Prompts for Text-to-Image Generation. Advances in Neural Information Processing Systems 36 (2023), 66923–66939

  32. [32]

    Sandra G Hart and Lowell E Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In Advances in psy- chology. Vol. 52. Elsevier, 139–183

  33. [33]

    Pappas, Hamed Hassani, Yuki Mitsufuji, Ruslan Salakhut- dinov, and J Zico Kolter

    Yutong He, Alexander Robey, Naoki Murata, Yiding Jiang, Joshua Nathaniel Williams, George J. Pappas, Hamed Hassani, Yuki Mitsufuji, Ruslan Salakhut- dinov, and J Zico Kolter. 2025. Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation. Transactions on Machine Learning Re- search (2025)

  34. [34]

    Nailei Hei, Qianyu Guo, Zihao Wang, Yan Wang, Haofen Wang, and Wenqiang Zhang. 2024. A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis. Proceedings of the AAAI Conference on Artificial Intelligence 38, 3 (March 2024), 2139–2147. doi:10.1609/aaai.v38i3.27986

  35. [35]

    Neil Houlsby, Ferenc Huszár, Zoubin Ghahramani, and Máté Lengyel

  36. [36]

    Bayesian Active Learning for Classification and Preference Learning

    Bayesian Active Learning for Classification and Preference Learning. arXiv:1112.5745 [stat.ML] https://arxiv.org/abs/1112.5745

  37. [37]

    Hsiu-Fang Hsieh and Sarah E Shannon. 2005. Three approaches to qualitative content analysis. Qualitative health research 15, 9 (2005), 1277–1288. Adaptive Prompt Elicitation for Text-to-Image Generation IUI ’26, March 23–26, 2026, Paphos, Cyprus

  38. [38]

    Daolang Huang, Yujia Guo, Luigi Acerbi, and Samuel Kaski. 2024. Amortized Bayesian Experimental Design for Decision-Making. Advances in Neural Infor- mation Processing Systems 37 (2024), 109460–109486

  39. [39]

    Rahul Jain, Amit Goel, Koichiro Niinuma, and Aakar Gupta. 2025. AdaptiveSliders: User-aligned Semantic Slider-based Editing of Text-to-Image Model Output. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, 1–27. doi:10.1145/3706598.3714292

  40. [40]

    Kaixuan Ji, Jiafan He, and Quanquan Gu. 2025. Reinforcement Learning from Human Feedback with Active Queries.Transactions on Machine Learning Research (2025)

  41. [41]

    Ironies of Generative AI: Understanding and mitigating productivity loss in human-AI interactions,

    Soomin Kim, Jinsu Eun, Changhoon Oh, and Joonhwan Lee. 2025. “Journey of Finding the Best Query”: Understanding the User Experience of AI Image Generation System. International Journal of Human–Computer Interaction 41, 2 (2025), 951–969. doi:10.1080/10447318.2024.2307670

  42. [42]

    Andreas Kirsch, Joost van Amersfoort, and Yarin Gal. 2019. BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning. Advances in Neural Information Processing Systems 32 (2019)

  43. [43]

    Kasia Kobalczyk, Nicolás Astorga, Tennison Liu, and Mihaela van der Schaar

  44. [44]

    In The Thirteenth International Conference on Learning Representations

    Active Task Disambiguation with LLMs. In The Thirteenth International Conference on Learning Representations

  45. [45]

    Yuki Koyama and Masataka Goto. 2022. BO as Assistant: Using Bayesian Op- timization for Asynchronously Generating Design Suggestions. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (UIST ’22). Association for Computing Machinery, New York, NY, USA, 1–14. doi:10.1145/3526113.3545664

  46. [46]

    Yuki Koyama, Issei Sato, and Masataka Goto. 2020. Sequential Gallery for In- teractive Visual Design Optimization. ACM Trans. Graph. 39, 4 (Aug. 2020), 88:88:1–88:88:12. doi:10.1145/3386569.3392444

  47. [47]

    Black Forest Labs. 2025. FLUX.1 - a Black-Forest-Labs Collection. https:// huggingface.co/collections/black-forest-labs/flux1 Accessed: 2025-10-10

  48. [48]

    Daniel Lee, Nikhil Sharma, Donghoon Shin, DaEun Choi, Harsh Sharma, Jeongh- wan Kim, and Heng Ji. 2025. ThematicPlane: Bridging Tacit User Intent and Latent Spaces for Image Generation. In Adjunct Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST Adjunct ’25) . Association for Computing Machinery, New York, NY, U...

  49. [49]

    Kimin Lee, Hao Liu, Moonkyung Ryu, Olivia Watkins, Yuqing Du, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, and Shixiang Shane Gu. 2023. Aligning Text-to-Image Models using Human Feedback. arXiv:2302.12192 [cs.LG] https: //arxiv.org/abs/2302.12192

  50. [50]

    Li, Alex Tamkin, Noah Goodman, and Jacob Andreas

    Belinda Z. Li, Alex Tamkin, Noah Goodman, and Jacob Andreas. 2025. Elicit- ing Human Preferences with Language Models. In The Thirteenth International Conference on Learning Representations

  51. [51]

    Chen Liang, Lianghua Huang, Jingwu Fang, Huanzhang Dou, Wei Wang, Zhi-Fan Wu, Yupeng Shi, Junge Zhang, Xin Zhao, and Yu Liu. 2025. IDEA-Bench: How Far Are Generative Models from Professional Designing?. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . 18541–18551

  52. [52]

    Haichuan Lin, Yilin Ye, Jiazhi Xia, and Wei Zeng. 2025. SketchFlex: Facilitating Spatial-Semantic Coherence in Text-to-Image Generation with Region-Based Sketches. In Proceedings of the 2025 CHI Conference on Human Factors in Com- puting Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, 1–19. doi:10.1145/3706598.3713801

  53. [53]

    Zhiqiu Lin, Deepak Pathak, Baiqi Li, Jiayao Li, Xide Xia, Graham Neubig, Pengchuan Zhang, and Deva Ramanan. 2024. Evaluating Text-to-Visual Genera- tion with Image-to-Text Generation. In European Conference on Computer Vision . Springer, 366–384

  54. [54]

    Shihong Liu, Samuel Yu, Zhiqiu Lin, Deepak Pathak, and Deva Ramanan. 2024. Language Models as Black-Box Optimizers for Vision-Language Models. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . 12687–12697

  55. [55]

    Vivian Liu and Lydia B Chilton. 2022. Design Guidelines for Prompt Engineering Text-to-Image Generative Models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New York, NY, USA, 2022-04-29) (CHI ’22). Association for Computing Machinery, 1–23. doi:10.1145/3491102.3501825

  56. [56]

    Ryan Louie, Andy Coenen, Cheng Zhi Huang, Michael Terry, and Carrie J. Cai

  57. [57]

    In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20)

    Novice-AI Music Co-Creation via AI-Steering Tools for Deep Generative Models. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. doi:10.1145/3313831.3376739

  58. [58]

    Shuai Ma, Zijun Wei, Feng Tian, Xiangmin Fan, Jianming Zhang, Xiaohui Shen, Zhe Lin, Jin Huang, Radomír Měch, Dimitris Samaras, et al . 2019. SmartEye: assisting instant photo taking via integrating user preference with deep view proposal network. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1–12

  59. [59]

    Atefeh Mahdavi Goloujeh, Anne Sullivan, and Brian Magerko. 2024. Is It AI or Is It Me? Understanding Users’ Prompt Journey with Text-to-Image Generative AI Tools. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (New York, NY, USA, 2024-05-11) (CHI ’24). Association for Computing Machinery, 1–13. doi:10.1145/3613904.3642861

  60. [60]

    Saaduddin Mahmud, Mason Nakamura, and Shlomo Zilberstein. 2025. MAPLE: A Framework for Active Preference Learning Guided by Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence 39, 26 (April 2025), 27518–27528. doi:10.1609/aaai.v39i26.34964

  61. [61]

    Oscar Mañas, Pietro Astolfi, Melissa Hall, Candace Ross, Jack Urbanek, Adina Williams, Aishwarya Agrawal, Adriana Romero-Soriano, and Michal Drozdzal

  62. [62]

    Transactions on Machine Learning Research (2024)

    Improving Text-to-Image Consistency via Automatic Prompt Optimization. Transactions on Machine Learning Research (2024)

  63. [63]

    Henry B Mann and Donald R Whitney. 1947. On a Test of Whether One of Two Random Variables Is Stochastically Larger than the Other. The Annals of Mathematical Statistics 18, 1 (1947), 50–60. jstor:2236101

  64. [64]

    OpenAI. 2025. Vector embeddings. https://platform.openai.com/docs/guides/ embeddings Accessed: 2025-10-10

  65. [65]

    Jonas Oppenlaender. 2024. A Taxonomy of Prompt Modifiers for Text-to-Image Generation. Behaviour & Information Technology 43, 15 (Nov. 2024), 3763–3776. doi:10.1080/0144929X.2023.2286532

  66. [66]

    Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V. Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po- Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patrick La...

  67. [67]

    Christiano, Jan Leike, and Ryan Lowe

    Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schul- man, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, and Ryan Lowe. 2022. Training Language Models to Follow Instructions with Huma...

  68. [68]

    Srishti Palani and Gonzalo Ramos. 2024. Evolving Roles and Workflows of Creative Practitioners in the Age of Generative AI. In Proceedings of the 16th Conference on Creativity & Cognition (C&C ’24) . Association for Computing Machinery, New York, NY, USA, 170–184. doi:10.1145/3635636.3656190

  69. [69]

    Manning, Stefano Ermon, and Chelsea Finn

    Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D. Manning, Stefano Ermon, and Chelsea Finn. 2023. Direct Preference Optimization: Your Language Model Is Secretly a Reward Model. In Advances in Neural Information Processing Systems, Vol. 36. 53728–53741

  70. [70]

    Tom Rainforth, Adam Foster, Desi R Ivanova, and Freddie Bickford Smith. 2024. Modern Bayesian Experimental Design. Statist. Sci. 39, 1 (2024), 100–114

  71. [71]

    Philip Resnik. 2025. Large Language Models Are Biased Because They Are Large Language Models. Computational Linguistics 51, 3 (Sept. 2025), 885–906. doi:10.1162/coli_a_00558

  72. [72]

    Jeba Rezwana and Mary Lou Maher. 2023. Designing Creative AI Partners with COFI: A Framework for Modeling Interaction in Human-AI Co-Creative Systems. ACM Trans. Comput.-Hum. Interact.30, 5, Article 67 (Sept. 2023), 28 pages. doi:10.1145/3519026

  73. [73]

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10684–10695

  74. [74]

    Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. 2022. Photo- realistic Text-to-Image Diffusion Models with Deep Language Understanding. Advances in Neural Information Processing Systems 35 (2022)...

  75. [75]

    Téo Sanchez. 2023. Examining the Text-to-Image Community of Practice: Why and How Do People Prompt Generative AIs?. In Proceedings of the 15th Confer- ence on Creativity and Cognition (New York, NY, USA, 2023-06-19) (C&C ’23). Association for Computing Machinery, 43–61. doi:10.1145/3591196.3593051

  76. [76]

    Jonathan W Schooler and Tonya Y Engstler-Schooler. 1990. Verbal overshadowing of visual memories: Some things are better left unsaid. Cognitive Psychology 22, 1 (1990), 36–71. doi:10.1016/0010-0285(90)90003-M

  77. [77]

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov

  78. [78]

    Proximal Policy Optimization Algorithms

    Proximal Policy Optimization Algorithms. arXiv:1707.06347 [cs.LG] https: //arxiv.org/abs/1707.06347

  79. [79]

    Bigham, Frank Bentley, Joyce Chai, Zachary Chase Lipton, Qiaozhu Mei, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, and David Jurgens

    Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Chenglei Si, Yutong Xie, Jeffrey P. Bigham, Frank Bentley, Joyce Chai, Zachary Chase Lipton, Qiaozhu Mei, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, and David Jurgens. 2025. Position: Towards Bidirectional Human-...

  80. [80]

    Xinyu Shi, Yinghou Wang, Ryan Rossi, and Jian Zhao. 2025. Brickify: Enabling Expressive Design Intent Specification through Direct Manipulation on Design Tokens. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, IUI ’26, March 23–26, 2026, Paphos, Cyprus Wen e...

Showing first 80 references.