Recognition: unknown
Intentmaking and Sensemaking: Human Interaction with AI-Guided Mathematical Discovery
Pith reviewed 2026-05-08 10:54 UTC · model grok-4.3
The pith
Mathematicians using AI for discovery engage in an iterative cycle of intentmaking to refine goals and sensemaking to interpret results.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Users enter a cycle of intentmaking, defined as the iterative process of discovering, defining, and refining one's experimental goals through active system interaction, and sensemaking, the cognitive process of building an understanding of complex or novel data. This cycle repeats many times during an investigation, suggesting that AI tools should be treated as collaborative instruments rather than opaque black-box assistants.
What carries the argument
The intentmaking-sensemaking cycle, in which intentmaking refines experimental goals iteratively via interaction and sensemaking interprets the outputs, driving the collaborative discovery process.
If this is right
- AI tools for scientific discovery benefit from supporting iterative goal refinement instead of assuming fixed questions.
- Treating AI systems as collaborative partners enables more effective mathematical exploration.
- Documentation of user workflows like intentmaking informs the creation of better discovery-oriented interfaces.
- The cycle suggests that AI interactions in science involve ongoing adjustment of objectives based on system feedback.
Where Pith is reading between the lines
- If this cycle holds, AI design in other sciences might similarly emphasize goal evolution over static queries.
- The pattern could imply that user training for AI tools should focus on adaptive intentmaking skills.
- Variations in tool design might change how frequently or deeply the cycle occurs in practice.
Load-bearing premise
The themes of intentmaking and the intentmaking-sensemaking cycle observed in interactions with this particular group and tool reflect a general pattern in human-AI interaction for mathematical discovery.
What would settle it
Finding that mathematicians typically maintain fixed experimental goals without iterative refinement when using AI tools would contradict the described cycle.
Figures
read the original abstract
Artificial intelligence offers powerful new tools for scientific discovery, but the interaction paradigms required to effectively harness these systems remain underexplored. In this paper, we present findings from a formative user study with 11 expert mathematicians who used AlphaEvolve, an evolutionary coding agent, to tackle advanced problems in their fields of expertise. We identify and characterize a distinct workflow we term intentmaking, the iterative process of discovering, defining, and refining one's experimental goals through active system interaction. We frame this as a natural extension to sensemaking, the cognitive process of building an understanding of complex or novel data. We suggest that users enter a cycle of intentmaking (defining and updating their experiment) and sensemaking (interpreting the results) which repeats many times during the course of an investigation. Our documentation of these themes suggests an approach to designing AI tools for scientific discovery that goes beyond the existing question/answer model of many current systems, treating them as collaborative instruments rather than opaque black-box assistants.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports results from a formative user study with 11 expert mathematicians who interacted with AlphaEvolve, an evolutionary AI coding agent, to solve advanced mathematical problems. It characterizes a workflow called 'intentmaking' involving iterative discovery, definition, and refinement of experimental goals via system interaction, frames it as an extension to sensemaking, and describes a repeating intentmaking-sensemaking cycle. Based on these observations, the authors propose that AI tools for scientific discovery should be designed as collaborative instruments rather than opaque question-answering systems.
Significance. If the identified themes prove robust and generalizable, the paper's contribution lies in shifting focus from purely assistive AI to interactive systems that support dynamic goal formation in mathematical discovery. This could inform HCI and AI design principles for scientific tools, emphasizing the value of iterative human-AI collaboration. The qualitative approach provides initial evidence for these interaction patterns, though further validation is needed.
major comments (3)
- [User Study section] The manuscript provides insufficient details on the study protocol, including how participants were recruited, the structure of the sessions with AlphaEvolve, data collection methods (e.g., think-aloud protocols, interviews), and the thematic analysis process used to identify 'intentmaking' and related themes. This information is essential to evaluate the validity and reproducibility of the findings that underpin the central claims.
- [Discussion section] The claim that the intentmaking-sensemaking cycle suggests a general approach to designing AI tools beyond the Q&A model is not adequately supported, as the study is limited to a single tool (AlphaEvolve) with specific affordances like evolutionary search and code-based outputs. The observed workflow may be contingent on these interface features rather than representing a stable pattern across different AI systems for mathematical discovery.
- [Findings section] The distinction between intentmaking and sensemaking is presented as a natural extension, but the paper does not provide concrete examples or quotes from participants that clearly separate the two processes or demonstrate the cycle's repetition in a way that rules out alternative interpretations of the data.
minor comments (3)
- The abstract and introduction could benefit from a brief mention of the study methodology to provide context for the claims.
- Consider adding references to prior work on sensemaking in HCI and information science to strengthen the framing.
- Ensure that all participant quotes or examples are anonymized consistently and clearly linked to the themes.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which highlights important areas for improving methodological transparency and the robustness of our claims. We address each major comment below and will incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [User Study section] The manuscript provides insufficient details on the study protocol, including how participants were recruited, the structure of the sessions with AlphaEvolve, data collection methods (e.g., think-aloud protocols, interviews), and the thematic analysis process used to identify 'intentmaking' and related themes. This information is essential to evaluate the validity and reproducibility of the findings that underpin the central claims.
Authors: We agree that greater detail on the study protocol is needed to support evaluation of validity and reproducibility. In the revised manuscript, we will expand the User Study section with: recruitment via targeted outreach to expert mathematicians through academic networks, math department lists, and conferences (resulting in 11 participants with specified expertise levels); session structure (90-minute individual sessions beginning with problem selection from the participant's research, followed by guided interaction with AlphaEvolve and a debrief); data collection (concurrent think-aloud protocols, screen and audio recordings, and post-session semi-structured interviews); and thematic analysis (inductive thematic analysis per Braun and Clarke, with two coders independently analyzing transcripts, achieving >85% agreement, and resolving differences through discussion). revision: yes
-
Referee: [Discussion section] The claim that the intentmaking-sensemaking cycle suggests a general approach to designing AI tools beyond the Q&A model is not adequately supported, as the study is limited to a single tool (AlphaEvolve) with specific affordances like evolutionary search and code-based outputs. The observed workflow may be contingent on these interface features rather than representing a stable pattern across different AI systems for mathematical discovery.
Authors: We acknowledge the limitation of observing the cycle with only AlphaEvolve and its particular features (evolutionary search and code outputs). While we maintain that the intentmaking-sensemaking cycle reflects a core human need to iteratively refine goals when engaging with complex, non-deterministic AI outputs—which is likely to appear in other discovery-oriented systems—we will revise the Discussion to include an explicit limitations paragraph noting the single-tool scope. We will also add a call for future multi-tool studies to test generalizability, thereby qualifying the design implications without overstating the current evidence. revision: partial
-
Referee: [Findings section] The distinction between intentmaking and sensemaking is presented as a natural extension, but the paper does not provide concrete examples or quotes from participants that clearly separate the two processes or demonstrate the cycle's repetition in a way that rules out alternative interpretations of the data.
Authors: We agree that additional participant quotes and explicit cycle illustrations would strengthen the distinction and help address alternative interpretations. In the revised Findings section, we will insert verbatim quotes differentiating intentmaking (e.g., participants stating goals such as 'I want the agent to evolve toward minimizing this invariant') from sensemaking (e.g., 'The generated examples show the pattern breaks here, so I need to reformulate the conjecture'), and we will describe at least two full cycles per participant across sessions to demonstrate repetition. These additions will be drawn from the existing data and will clarify why the processes are treated as distinct yet cyclical. revision: yes
Circularity Check
No circularity in empirical user study of intentmaking workflow
full rationale
The paper reports qualitative findings from a formative user study with 11 expert mathematicians interacting with AlphaEvolve. The characterization of intentmaking as an iterative goal-discovery process and its framing as an extension to sensemaking are derived inductively from thematic analysis of participant sessions and interviews. No equations, fitted parameters, predictions, or mathematical derivations exist that could reduce to inputs by construction. Any references to prior sensemaking literature are external and not self-citations that bear the load of the central claims. The analysis is self-contained against the study data and does not rely on self-referential definitions or author-prior uniqueness results.
Axiom & Free-Parameter Ledger
invented entities (1)
-
intentmaking
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N Bennett, Kori Inkpen, et al. 2019. Guidelines for human-AI interaction. InProceedings of the 2019 chi conference on human factors in computing systems. 1–13
2019
-
[2]
Kenneth Appel, Wolfgang Haken, and John Koch. 1977. Every planar map is four colorable. Part II: Reducibility.Illinois Journal of Mathematics21, 3 (1977), 491–567
1977
- [3]
- [4]
-
[5]
Oussama Boussif, Mohammed Mahfoud, Younesse Kaddar, Moksh Jain, Sida Li, Damiano Fornasiere, Xiaoyin Chen, Yoshua Bengio, and Nikolay Malkin. 2025. Bayesian Symbolic Regression with Entropic Reinforcement Learning. (2025)
2025
-
[6]
2020.The alignment problem: Machine learning and human values
Brian Christian. 2020.The alignment problem: Machine learning and human values. WW Norton & Company
2020
- [7]
- [8]
- [9]
-
[10]
Google. 2026. PAIR Guidebook: Mental Models + Expectations. https: //pair.withgoogle.com/guidebook/chapters/mental-models-and-expectations Ac- cessed: 2026-03-04
2026
-
[11]
Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, Tao Tu, Anil Palepu, Petar Sirkovic, Artiom Myaskovsky, Felix Weissenberger, Keran Rong, Ryutaro Tanno, et al. 2025. Towards an AI co-scientist.arXiv preprint arXiv:2502.18864(2025)
work page internal anchor Pith review arXiv 2025
-
[12]
Roger Guimerà, Ignasi Reichardt, Antoni Aguilar-Mogas, Francesco A Massucci, Manuel Miranda, Jordi Pallarès, and Marta Sales-Pardo. 2020. A Bayesian machine scientist to aid in the solution of challenging scientific problems.Science advances 6, 5 (2020), eaav6971
2020
-
[13]
Thomas C Hales and Samuel P Ferguson. 2006. A formulation of the Kepler conjecture.Discrete & Computational Geometry36, 1 (2006), 21–69
2006
-
[14]
Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. InProceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166
1999
-
[15]
Thomas Hubert, Rishi Mehta, Laurent Sartran, Miklós Z Horváth, Goran Žužić, Eric Wieser, Aja Huang, Julian Schrittwieser, Yannick Schroecker, Hussain Ma- soom, et al. 2025. Olympiad-level formal mathematical reasoning with reinforce- ment learning.Nature(2025), 1–3
2025
-
[16]
Geoffrey Irving, Paul Christiano, and Dario Amodei. 2018. AI safety via debate. arXiv preprint arXiv:1805.00899(2018)
work page internal anchor Pith review arXiv 2018
-
[17]
Ellen Jiang, Kristen Olson, Edwin Toh, Alejandra Molina, Aaron Donsbach, Michael Terry, and Carrie J Cai. 2022. Promptmaker: Prompt-based prototyping with large language models. InCHI Conference on Human Factors in Computing Systems Extended Abstracts. 1–8
2022
-
[18]
Ishika Joshi, Simra Shahid, Shreeya Manasvi Venneti, Manushree Vasu, Yantao Zheng, Yunyao Li, Balaji Krishnamurthy, and Gromit Yeuk-Yin Chan. 2025. Co- prompter: User-centric evaluation of LLM instruction alignment for improved prompt engineering. InProceedings of the 30th International Conference on Intelli- gent User Interfaces. 341–365
2025
-
[19]
A Gilad Kusne, Heshan Yu, Changming Wu, Huairuo Zhang, Jason Hattrick- Simpers, Brian DeCost, Suchismita Sarker, Corey Oses, Cormac Toher, Stefano Curtarolo, et al. 2020. On-the-fly closed-loop materials discovery via Bayesian active learning.Nature communications11, 1 (2020), 5966
2020
-
[20]
Pat Langley, Gary L Bradshaw, and Herbert A Simon. 1981. BACON. 5: The discovery of conservation laws. InIJCAI, Vol. 81. 121–126
1981
- [21]
-
[22]
Robert K Lindsay, Bruce G Buchanan, Edward A Feigenbaum, and Joshua Leder- berg. 1993. DENDRAL: a case study of the first expert system for scientific hypothesis formation.Artificial intelligence61, 2 (1993), 209–261
1993
-
[23]
Yiping Liu, Liting Xu, Yuyan Han, Xiangxiang Zeng, Gary G Yen, and Hisao Ishibuchi. 2023. Evolutionary multimodal multiobjective optimization for trav- eling salesman problems.IEEE Transactions on Evolutionary Computation28, 2 (2023), 516–530
2023
-
[24]
Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha
-
[25]
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
The ai scientist: Towards fully automated open-ended scientific discovery. arXiv preprint arXiv:2408.06292(2024)
work page internal anchor Pith review arXiv 2024
- [26]
-
[27]
Meredith Ringel Morris. 2025. HCI for AGI.Interactions32, 2 (2025), 26–32
2025
-
[28]
Jakob Nielsen. 2023. AI: First new UI paradigm in 60 years.Nielsen Norman Group18, 06 (2023), 2023
2023
-
[29]
Alexander Novikov, Ngân V ˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco JR Ruiz, Abbas Mehrabian, et al. 2025. AlphaEvolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131(2025)
work page internal anchor Pith review arXiv 2025
- [30]
-
[31]
Daniel M Russell, Laura Koesten, Aniket Kittur, Nitesh Goyal, and Michael Xieyang Liu. 2024. Sensemaking: What is it today?. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems. 1–5
2024
-
[32]
Samuel Schmidgall, Yusheng Su, Ze Wang, Ximeng Sun, Jialian Wu, Xiaodong Yu, Jiang Liu, Michael Moor, Zicheng Liu, and Emad Barsoum. 2025. Agent laboratory: Using llm agents as research assistants.Findings of the Association for Computational Linguistics: EMNLP 2025(2025), 5977–6043
2025
-
[33]
2017.The reflective practitioner: How professionals think in action
Donald A Schön. 2017.The reflective practitioner: How professionals think in action. Routledge
2017
-
[34]
Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Chenglei Si, et al. [n. d.]. Po- sition: Towards Bidirectional Human-AI Alignment. InThe Thirty-Ninth Annual Conference on Neural Information Processing Systems Position Paper Track
-
[35]
Hari Subramonyam, Roy Pea, Christopher Pondoc, Maneesh Agrawala, and Colleen Seifert. 2024. Bridging the gulf of envisioning: Cognitive challenges in prompt based interactions with LLMs. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–19
2024
-
[36]
Sangho Suh, Bryan Min, Srishti Palani, and Haijun Xia. 2023. Sensecape: En- abling multilevel exploration and sensemaking with large language models. In Proceedings of the 36th annual ACM symposium on user interface software and technology. 1–18
2023
- [37]
-
[38]
Trieu H Trinh, Yuhuai Wu, Quoc V Le, He He, and Thang Luong. 2024. Solving olympiad geometry without human demonstrations.Nature625, 7995 (2024), 476–482
2024
-
[39]
Hanchen Wang, Tianfan Fu, Yuanqi Du, Wenhao Gao, Kexin Huang, Ziming Liu, Payal Chandak, Shengchao Liu, Peter Van Katwyk, Andreea Deac, et al. 2023. Scientific discovery in the age of artificial intelligence.Nature620, 7972 (2023), 47–60
2023
-
[40]
Qian Yang, Aaron Steinfeld, Carolyn Rosé, and John Zimmerman. 2020. Re- examining whether, why, and how human-AI interaction is uniquely difficult to design. InProceedings of the 2020 chi conference on human factors in computing systems. 1–13
2020
-
[41]
JD Zamfirescu-Pereira, Heather Wei, Amy Xiao, Kitty Gu, Grace Jung, Matthew G Lee, Bjoern Hartmann, and Qian Yang. 2023. Herding AI cats: Lessons from de- signing a chatbot by prompting GPT-3. InProceedings of the 2023 ACM Designing Interactive Systems Conference. 2206–2220
2023
-
[42]
J Diego Zamfirescu-Pereira, Richmond Y Wong, Bjoern Hartmann, and Qian Yang. 2023. Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts. InProceedings of the 2023 CHI conference on human factors in computing systems. 1–21
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.