arxiv: 2605.05921 · v1 · submitted 2026-05-07 · 💻 cs.AI · cs.HC

Recognition: unknown

Intentmaking and Sensemaking: Human Interaction with AI-Guided Mathematical Discovery

Alex B\"auerle , Adam Connors , Alexander Novikov , Adam Zsolt Wagner , Ng\^an V\~u , Fernanda Viegas , Martin Wattenberg , Lucas Dixon

Authors on Pith no claims yet

Pith reviewed 2026-05-08 10:54 UTC · model grok-4.3

classification 💻 cs.AI cs.HC

keywords intentmakingsensemakinghuman-AI interactionmathematical discoveryscientific workflowsAI tool designuser studies

0 comments

The pith

Mathematicians using AI for discovery engage in an iterative cycle of intentmaking to refine goals and sensemaking to interpret results.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reports on how expert mathematicians interact with an AI system to address advanced problems in their fields. It characterizes intentmaking as the iterative discovery, definition, and refinement of experimental goals through ongoing interaction with the system. This process is presented as an extension of sensemaking, forming a repeating cycle that structures the entire investigation. The authors argue that this observation supports designing AI tools for discovery as collaborative instruments rather than simple question-and-answer interfaces.

Core claim

Users enter a cycle of intentmaking, defined as the iterative process of discovering, defining, and refining one's experimental goals through active system interaction, and sensemaking, the cognitive process of building an understanding of complex or novel data. This cycle repeats many times during an investigation, suggesting that AI tools should be treated as collaborative instruments rather than opaque black-box assistants.

What carries the argument

The intentmaking-sensemaking cycle, in which intentmaking refines experimental goals iteratively via interaction and sensemaking interprets the outputs, driving the collaborative discovery process.

If this is right

AI tools for scientific discovery benefit from supporting iterative goal refinement instead of assuming fixed questions.
Treating AI systems as collaborative partners enables more effective mathematical exploration.
Documentation of user workflows like intentmaking informs the creation of better discovery-oriented interfaces.
The cycle suggests that AI interactions in science involve ongoing adjustment of objectives based on system feedback.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If this cycle holds, AI design in other sciences might similarly emphasize goal evolution over static queries.
The pattern could imply that user training for AI tools should focus on adaptive intentmaking skills.
Variations in tool design might change how frequently or deeply the cycle occurs in practice.

Load-bearing premise

The themes of intentmaking and the intentmaking-sensemaking cycle observed in interactions with this particular group and tool reflect a general pattern in human-AI interaction for mathematical discovery.

What would settle it

Finding that mathematicians typically maintain fixed experimental goals without iterative refinement when using AI tools would contradict the described cycle.

Figures

Figures reproduced from arXiv: 2605.05921 by Adam Connors, Adam Zsolt Wagner, Alexander Novikov, Alex B\"auerle, Fernanda Viegas, Lucas Dixon, Martin Wattenberg, Ng\^an V\~u.

**Figure 1.** Figure 1: Sensemaking Dashboard for an AlphaEvolve experiment. (A) An overview of the experiment’s progress, the number view at source ↗

**Figure 2.** Figure 2: Initial prompt for the user to enter a description of view at source ↗

**Figure 3.** Figure 3: Experiment setup assistant. The user describes their problem and may request changes to the experiment setup in a view at source ↗

**Figure 4.** Figure 4: Program table showing candidate solutions, in this case sorted by the currently selected metric. The program table view at source ↗

**Figure 5.** Figure 5: Experiment start and stop times for an indicative participant, showing the process of starting with short-lived view at source ↗

**Figure 6.** Figure 6: Automated diagnostic summary generated to vali view at source ↗

read the original abstract

Artificial intelligence offers powerful new tools for scientific discovery, but the interaction paradigms required to effectively harness these systems remain underexplored. In this paper, we present findings from a formative user study with 11 expert mathematicians who used AlphaEvolve, an evolutionary coding agent, to tackle advanced problems in their fields of expertise. We identify and characterize a distinct workflow we term intentmaking, the iterative process of discovering, defining, and refining one's experimental goals through active system interaction. We frame this as a natural extension to sensemaking, the cognitive process of building an understanding of complex or novel data. We suggest that users enter a cycle of intentmaking (defining and updating their experiment) and sensemaking (interpreting the results) which repeats many times during the course of an investigation. Our documentation of these themes suggests an approach to designing AI tools for scientific discovery that goes beyond the existing question/answer model of many current systems, treating them as collaborative instruments rather than opaque black-box assistants.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper reports results from a formative user study with 11 expert mathematicians who interacted with AlphaEvolve, an evolutionary AI coding agent, to solve advanced mathematical problems. It characterizes a workflow called 'intentmaking' involving iterative discovery, definition, and refinement of experimental goals via system interaction, frames it as an extension to sensemaking, and describes a repeating intentmaking-sensemaking cycle. Based on these observations, the authors propose that AI tools for scientific discovery should be designed as collaborative instruments rather than opaque question-answering systems.

Significance. If the identified themes prove robust and generalizable, the paper's contribution lies in shifting focus from purely assistive AI to interactive systems that support dynamic goal formation in mathematical discovery. This could inform HCI and AI design principles for scientific tools, emphasizing the value of iterative human-AI collaboration. The qualitative approach provides initial evidence for these interaction patterns, though further validation is needed.

major comments (3)

[User Study section] The manuscript provides insufficient details on the study protocol, including how participants were recruited, the structure of the sessions with AlphaEvolve, data collection methods (e.g., think-aloud protocols, interviews), and the thematic analysis process used to identify 'intentmaking' and related themes. This information is essential to evaluate the validity and reproducibility of the findings that underpin the central claims.
[Discussion section] The claim that the intentmaking-sensemaking cycle suggests a general approach to designing AI tools beyond the Q&A model is not adequately supported, as the study is limited to a single tool (AlphaEvolve) with specific affordances like evolutionary search and code-based outputs. The observed workflow may be contingent on these interface features rather than representing a stable pattern across different AI systems for mathematical discovery.
[Findings section] The distinction between intentmaking and sensemaking is presented as a natural extension, but the paper does not provide concrete examples or quotes from participants that clearly separate the two processes or demonstrate the cycle's repetition in a way that rules out alternative interpretations of the data.

minor comments (3)

The abstract and introduction could benefit from a brief mention of the study methodology to provide context for the claims.
Consider adding references to prior work on sensemaking in HCI and information science to strengthen the framing.
Ensure that all participant quotes or examples are anonymized consistently and clearly linked to the themes.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important areas for improving methodological transparency and the robustness of our claims. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [User Study section] The manuscript provides insufficient details on the study protocol, including how participants were recruited, the structure of the sessions with AlphaEvolve, data collection methods (e.g., think-aloud protocols, interviews), and the thematic analysis process used to identify 'intentmaking' and related themes. This information is essential to evaluate the validity and reproducibility of the findings that underpin the central claims.

Authors: We agree that greater detail on the study protocol is needed to support evaluation of validity and reproducibility. In the revised manuscript, we will expand the User Study section with: recruitment via targeted outreach to expert mathematicians through academic networks, math department lists, and conferences (resulting in 11 participants with specified expertise levels); session structure (90-minute individual sessions beginning with problem selection from the participant's research, followed by guided interaction with AlphaEvolve and a debrief); data collection (concurrent think-aloud protocols, screen and audio recordings, and post-session semi-structured interviews); and thematic analysis (inductive thematic analysis per Braun and Clarke, with two coders independently analyzing transcripts, achieving >85% agreement, and resolving differences through discussion). revision: yes
Referee: [Discussion section] The claim that the intentmaking-sensemaking cycle suggests a general approach to designing AI tools beyond the Q&A model is not adequately supported, as the study is limited to a single tool (AlphaEvolve) with specific affordances like evolutionary search and code-based outputs. The observed workflow may be contingent on these interface features rather than representing a stable pattern across different AI systems for mathematical discovery.

Authors: We acknowledge the limitation of observing the cycle with only AlphaEvolve and its particular features (evolutionary search and code outputs). While we maintain that the intentmaking-sensemaking cycle reflects a core human need to iteratively refine goals when engaging with complex, non-deterministic AI outputs—which is likely to appear in other discovery-oriented systems—we will revise the Discussion to include an explicit limitations paragraph noting the single-tool scope. We will also add a call for future multi-tool studies to test generalizability, thereby qualifying the design implications without overstating the current evidence. revision: partial
Referee: [Findings section] The distinction between intentmaking and sensemaking is presented as a natural extension, but the paper does not provide concrete examples or quotes from participants that clearly separate the two processes or demonstrate the cycle's repetition in a way that rules out alternative interpretations of the data.

Authors: We agree that additional participant quotes and explicit cycle illustrations would strengthen the distinction and help address alternative interpretations. In the revised Findings section, we will insert verbatim quotes differentiating intentmaking (e.g., participants stating goals such as 'I want the agent to evolve toward minimizing this invariant') from sensemaking (e.g., 'The generated examples show the pattern breaks here, so I need to reformulate the conjecture'), and we will describe at least two full cycles per participant across sessions to demonstrate repetition. These additions will be drawn from the existing data and will clarify why the processes are treated as distinct yet cyclical. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical user study of intentmaking workflow

full rationale

The paper reports qualitative findings from a formative user study with 11 expert mathematicians interacting with AlphaEvolve. The characterization of intentmaking as an iterative goal-discovery process and its framing as an extension to sensemaking are derived inductively from thematic analysis of participant sessions and interviews. No equations, fitted parameters, predictions, or mathematical derivations exist that could reduce to inputs by construction. Any references to prior sensemaking literature are external and not self-citations that bear the load of the central claims. The analysis is self-contained against the study data and does not rely on self-referential definitions or author-prior uniqueness results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claim rests entirely on qualitative observations from the user study. No free parameters, mathematical axioms, or invented physical entities are involved; the new concept of intentmaking is a descriptive label for observed behavior rather than a postulated mechanism with independent evidence.

invented entities (1)

intentmaking no independent evidence
purpose: To name and frame the iterative process of goal discovery and refinement observed in user interactions with the AI system
New descriptive term coined from study themes; no independent falsifiable evidence provided beyond the current observations.

pith-pipeline@v0.9.0 · 5500 in / 1382 out tokens · 32238 ms · 2026-05-08T10:54:59.762900+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 13 canonical work pages · 4 internal anchors

[1]

Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N Bennett, Kori Inkpen, et al. 2019. Guidelines for human-AI interaction. InProceedings of the 2019 chi conference on human factors in computing systems. 1–13

2019
[2]

Kenneth Appel, Wolfgang Haken, and John Koch. 1977. Every planar map is four colorable. Part II: Reducibility.Illinois Journal of Mathematics21, 3 (1977), 491–567

1977
[3]

Gergely Bérczi. 2026. Evolving Local Corrections for Global Constructions in Combinatorics.arXiv preprint arXiv:2603.06692(2026)

work page arXiv 2026
[4]

Gergely Bérczi. 2026. Evolving Ranking Functions for Canonical Blow-Ups in Positive Characteristic.arXiv preprint arXiv:2602.06553(2026)

work page arXiv 2026
[5]

Oussama Boussif, Mohammed Mahfoud, Younesse Kaddar, Moksh Jain, Sida Li, Damiano Fornasiere, Xiaoyin Chen, Yoshua Bengio, and Nikolay Malkin. 2025. Bayesian Symbolic Regression with Entropic Reinforcement Learning. (2025)

2025
[6]

2020.The alignment problem: Machine learning and human values

Brian Christian. 2020.The alignment problem: Machine learning and human values. WW Norton & Company

2020
[7]

Jordan Ellenberg, Nicolas Libedinsky, David Plaza, José Simental, and Geordie Williamson. 2026. Bruhat intervals that are large hypercubes.arXiv preprint arXiv:2601.01235(2026)

work page arXiv 2026
[8]

KJ Feng, Kevin Pu, Matt Latzke, Tal August, Pao Siangliulue, Jonathan Bragg, Daniel S Weld, Amy X Zhang, and Joseph Chee Chang. 2024. Cocoa: Co-planning and co-execution with ai agents.arXiv preprint arXiv:2412.10999(2024)

work page arXiv 2024
[9]

Bogdan Georgiev, Javier Gómez-Serrano, Terence Tao, and Adam Zsolt Wag- ner. 2025. Mathematical exploration and discovery at scale.arXiv preprint arXiv:2511.02864(2025)

work page arXiv 2025
[10]

Google. 2026. PAIR Guidebook: Mental Models + Expectations. https: //pair.withgoogle.com/guidebook/chapters/mental-models-and-expectations Ac- cessed: 2026-03-04

2026
[11]

Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, Tao Tu, Anil Palepu, Petar Sirkovic, Artiom Myaskovsky, Felix Weissenberger, Keran Rong, Ryutaro Tanno, et al. 2025. Towards an AI co-scientist.arXiv preprint arXiv:2502.18864(2025)

work page internal anchor Pith review arXiv 2025
[12]

Roger Guimerà, Ignasi Reichardt, Antoni Aguilar-Mogas, Francesco A Massucci, Manuel Miranda, Jordi Pallarès, and Marta Sales-Pardo. 2020. A Bayesian machine scientist to aid in the solution of challenging scientific problems.Science advances 6, 5 (2020), eaav6971

2020
[13]

Thomas C Hales and Samuel P Ferguson. 2006. A formulation of the Kepler conjecture.Discrete & Computational Geometry36, 1 (2006), 21–69

2006
[14]

Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. InProceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166

1999
[15]

Thomas Hubert, Rishi Mehta, Laurent Sartran, Miklós Z Horváth, Goran Žužić, Eric Wieser, Aja Huang, Julian Schrittwieser, Yannick Schroecker, Hussain Ma- soom, et al. 2025. Olympiad-level formal mathematical reasoning with reinforce- ment learning.Nature(2025), 1–3

2025
[16]

Geoffrey Irving, Paul Christiano, and Dario Amodei. 2018. AI safety via debate. arXiv preprint arXiv:1805.00899(2018)

work page internal anchor Pith review arXiv 2018
[17]

Ellen Jiang, Kristen Olson, Edwin Toh, Alejandra Molina, Aaron Donsbach, Michael Terry, and Carrie J Cai. 2022. Promptmaker: Prompt-based prototyping with large language models. InCHI Conference on Human Factors in Computing Systems Extended Abstracts. 1–8

2022
[18]

Ishika Joshi, Simra Shahid, Shreeya Manasvi Venneti, Manushree Vasu, Yantao Zheng, Yunyao Li, Balaji Krishnamurthy, and Gromit Yeuk-Yin Chan. 2025. Co- prompter: User-centric evaluation of LLM instruction alignment for improved prompt engineering. InProceedings of the 30th International Conference on Intelli- gent User Interfaces. 341–365

2025
[19]

A Gilad Kusne, Heshan Yu, Changming Wu, Huairuo Zhang, Jason Hattrick- Simpers, Brian DeCost, Suchismita Sarker, Corey Oses, Cormac Toher, Stefano Curtarolo, et al. 2020. On-the-fly closed-loop materials discovery via Bayesian active learning.Nature communications11, 1 (2020), 5966

2020
[20]

Pat Langley, Gary L Bradshaw, and Herbert A Simon. 1981. BACON. 5: The discovery of conservation laws. InIJCAI, Vol. 81. 121–126

1981
[21]

Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A Ortega, Tom Everitt, An- drew Lefrancq, Laurent Orseau, and Shane Legg. 2017. AI safety gridworlds. arXiv preprint arXiv:1711.09883(2017)

work page arXiv 2017
[22]

Robert K Lindsay, Bruce G Buchanan, Edward A Feigenbaum, and Joshua Leder- berg. 1993. DENDRAL: a case study of the first expert system for scientific hypothesis formation.Artificial intelligence61, 2 (1993), 209–261

1993
[23]

Yiping Liu, Liting Xu, Yuyan Han, Xiangxiang Zeng, Gary G Yen, and Hisao Ishibuchi. 2023. Evolutionary multimodal multiobjective optimization for trav- eling salesman problems.IEEE Transactions on Evolutionary Computation28, 2 (2023), 516–530

2023
[24]

Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha
[25]

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

The ai scientist: Towards fully automated open-ended scientific discovery. arXiv preprint arXiv:2408.06292(2024)

work page internal anchor Pith review arXiv 2024
[26]

Ludovico Mitchener, Angela Yiu, Benjamin Chang, Mathieu Bourdenx, Tyler Nadolski, Arvis Sulovari, Eric C Landsness, Daniel L Barabasi, Siddharth Narayanan, Nicky Evans, et al. 2025. Kosmos: An ai scientist for autonomous discovery.arXiv preprint arXiv:2511.02824(2025)

work page arXiv 2025
[27]

Meredith Ringel Morris. 2025. HCI for AGI.Interactions32, 2 (2025), 26–32

2025
[28]

Jakob Nielsen. 2023. AI: First new UI paradigm in 60 years.Nielsen Norman Group18, 06 (2023), 2023

2023
[29]

Alexander Novikov, Ngân V ˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco JR Ruiz, Abbas Mehrabian, et al. 2025. AlphaEvolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131(2025)

work page internal anchor Pith review arXiv 2025
[30]

Brenden K Petersen, Mikel Landajuela, T Nathan Mundhenk, Claudio P Santiago, Soo K Kim, and Joanne T Kim. 2019. Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients.arXiv preprint arXiv:1912.04871(2019)

work page arXiv 2019
[31]

Daniel M Russell, Laura Koesten, Aniket Kittur, Nitesh Goyal, and Michael Xieyang Liu. 2024. Sensemaking: What is it today?. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems. 1–5

2024
[32]

Samuel Schmidgall, Yusheng Su, Ze Wang, Ximeng Sun, Jialian Wu, Xiaodong Yu, Jiang Liu, Michael Moor, Zicheng Liu, and Emad Barsoum. 2025. Agent laboratory: Using llm agents as research assistants.Findings of the Association for Computational Linguistics: EMNLP 2025(2025), 5977–6043

2025
[33]

2017.The reflective practitioner: How professionals think in action

Donald A Schön. 2017.The reflective practitioner: How professionals think in action. Routledge

2017
[34]

Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Chenglei Si, et al. [n. d.]. Po- sition: Towards Bidirectional Human-AI Alignment. InThe Thirty-Ninth Annual Conference on Neural Information Processing Systems Position Paper Track
[35]

Hari Subramonyam, Roy Pea, Christopher Pondoc, Maneesh Agrawala, and Colleen Seifert. 2024. Bridging the gulf of envisioning: Cognitive challenges in prompt based interactions with LLMs. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–19

2024
[36]

Sangho Suh, Bryan Min, Srishti Palani, and Haijun Xia. 2023. Sensecape: En- abling multilevel exploration and sensemaking with large language models. In Proceedings of the 36th annual ACM symposium on user interface software and technology. 1–18

2023
[37]

Michael Terry, Chinmay Kulkarni, Martin Wattenberg, Lucas Dixon, and Mered- ith Ringel Morris. 2023. Interactive AI alignment: Specification, process, and evaluation alignment.arXiv preprint arXiv:2311.00710(2023)

work page arXiv 2023
[38]

Trieu H Trinh, Yuhuai Wu, Quoc V Le, He He, and Thang Luong. 2024. Solving olympiad geometry without human demonstrations.Nature625, 7995 (2024), 476–482

2024
[39]

Hanchen Wang, Tianfan Fu, Yuanqi Du, Wenhao Gao, Kexin Huang, Ziming Liu, Payal Chandak, Shengchao Liu, Peter Van Katwyk, Andreea Deac, et al. 2023. Scientific discovery in the age of artificial intelligence.Nature620, 7972 (2023), 47–60

2023
[40]

Qian Yang, Aaron Steinfeld, Carolyn Rosé, and John Zimmerman. 2020. Re- examining whether, why, and how human-AI interaction is uniquely difficult to design. InProceedings of the 2020 chi conference on human factors in computing systems. 1–13

2020
[41]

JD Zamfirescu-Pereira, Heather Wei, Amy Xiao, Kitty Gu, Grace Jung, Matthew G Lee, Bjoern Hartmann, and Qian Yang. 2023. Herding AI cats: Lessons from de- signing a chatbot by prompting GPT-3. InProceedings of the 2023 ACM Designing Interactive Systems Conference. 2206–2220

2023
[42]

J Diego Zamfirescu-Pereira, Richmond Y Wong, Bjoern Hartmann, and Qian Yang. 2023. Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts. InProceedings of the 2023 CHI conference on human factors in computing systems. 1–21

2023