Recognition: unknown
Semantic Prompting: Agentic Incremental Narrative Refinement through Spatial Semantic Interaction
Pith reviewed 2026-05-10 01:16 UTC · model grok-4.3
The pith
Semantic Prompting lets LLMs interpret spatial layout changes to make targeted narrative revisions instead of full regenerations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Semantic Prompting is a framework for spatial refinement that perceives semantic interactions, reasons about refinement intent, and performs targeted positional revisions. Implemented as S-PRISM, the system enhances the precision of interaction-revision refinement and supports incremental formalization through interactive steering, as shown in an empirical evaluation and a user study with fourteen participants who valued its efficient, adaptable, and trustworthy support for strengthening human-LLM intent alignment.
What carries the argument
Semantic Prompting framework that perceives semantic interactions from spatial layouts, reasons about the user's refinement intent, and executes targeted positional revisions to the generated narrative.
If this is right
- Interaction-revision refinement achieves higher precision than collage-based or full regeneration methods.
- Users perform incremental formalization of narratives through direct interactive steering of spatial elements.
- Human-LLM intent alignment improves because revisions stay local and responsive to layout semantics.
- The resulting support feels efficient, adaptable, and trustworthy to participants in sensemaking workflows.
Where Pith is reading between the lines
- Similar spatial-semantic steering could apply to non-narrative tasks such as refining data summaries or organizing research notes.
- Over repeated sessions the spatial history might serve as a persistent record of how the narrative evolved.
- Integration with existing visualization or mind-mapping tools could let users treat layout changes as the primary control surface for AI assistance.
Load-bearing premise
LLMs can accurately perceive semantic interactions from spatial layouts and reason about refinement intent without persistent human-LLM misalignment.
What would settle it
A test measuring the percentage of cases where specific spatial adjustments by users produce narrative updates that match their stated refinement intentions, compared against full text regeneration baselines.
Figures
read the original abstract
Interactive spatial layouts empower users to synthesize information and organize findings for sensemaking. While Large Language Models (LLMs) can automate narrative generation from spatial layouts, current collage-based and re-generation methods struggle to support the incremental spatial refinements inherent to the sensemaking process. We identify three critical gaps in existing spatial-textual generation: interaction-revision misalignment, human-LLM intent misalignment, and lack of granular customization. To address these, we introduce Semantic Prompting, a framework for spatial refinement that perceives semantic interactions, reasons about refinement intent, and performs targeted positional revisions. We implemented S-PRISM to realize this framework. The empirical evaluation demonstrated that S-PRISM effectively enhanced the precision of interaction-revision refinement. A user study ($N=14$) highlighted how participants leveraged S-PRISM for incremental formalization through interactive steering. Results showed that users valued its efficient, adaptable, and trustworthy support, which effectively strengthens human-LLM intent alignment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Semantic Prompting, a framework implemented as S-PRISM, to enable agentic incremental narrative refinement from spatial layouts. It identifies three gaps in existing collage-based and regeneration methods (interaction-revision misalignment, human-LLM intent misalignment, and lack of granular customization), then claims that perceiving semantic interactions, reasoning about refinement intent, and performing targeted positional revisions addresses them. An empirical evaluation is said to demonstrate enhanced precision of interaction-revision refinement, while a user study (N=14) reports that participants leveraged the system for incremental formalization and valued its efficient, adaptable, and trustworthy support.
Significance. If the central claims hold, the work could meaningfully advance HCI research on LLM-assisted sensemaking by shifting from one-shot generation to incremental, spatially steered refinement. The framework's emphasis on targeted revisions and intent alignment offers a concrete alternative to current spatial-textual pipelines; a reproducible implementation and falsifiable user-study protocol would strengthen its contribution.
major comments (2)
- [Abstract and Evaluation] Abstract and Evaluation section: the claim that S-PRISM 'effectively enhanced the precision of interaction-revision refinement' is unsupported by any quantitative metric (e.g., edit distance to target narrative, semantic similarity, revision count, or inter-rater agreement), baseline condition, or statistical test. The N=14 study reports only subjective preference and 'leveraged for incremental formalization,' which does not establish a measurable precision gain over collage or regeneration methods.
- [Framework and User Study] Framework and User Study sections: the core assumption that the LLM can reliably extract semantic interactions from spatial layouts and infer refinement intent without persistent misalignment is never independently tested. The study records only post-hoc user valuation of 'trustworthy support'; no objective probe (e.g., alignment error rate or comparison of LLM-inferred vs. user-intended revisions) is described, leaving the headline result vulnerable to interface-novelty confounds.
minor comments (2)
- [Abstract] The abstract contains LaTeX markup ($N=14$) that should be rendered consistently for journal submission.
- [Implementation] No explicit description of the spatial encoding scheme or prompting template used in S-PRISM is provided, making the implementation details difficult to reproduce.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback, which has helped us identify areas where the manuscript can be strengthened. We address each major comment point by point below, indicating where revisions will be made to the next version of the paper.
read point-by-point responses
-
Referee: [Abstract and Evaluation] Abstract and Evaluation section: the claim that S-PRISM 'effectively enhanced the precision of interaction-revision refinement' is unsupported by any quantitative metric (e.g., edit distance to target narrative, semantic similarity, revision count, or inter-rater agreement), baseline condition, or statistical test. The N=14 study reports only subjective preference and 'leveraged for incremental formalization,' which does not establish a measurable precision gain over collage or regeneration methods.
Authors: We agree that the current abstract and evaluation presentation does not include explicit quantitative metrics, baselines, or statistical tests to support the precision enhancement claim. The empirical evaluation is grounded in the user study's demonstration of incremental formalization, but we acknowledge this leaves the claim open to the concerns raised. In the revised manuscript, we will expand the Evaluation section to report quantitative measures including edit distances to target narratives, semantic similarity scores, revision counts, and direct comparisons to collage-based and regeneration baselines, accompanied by statistical tests. The abstract will be updated to reflect these additions accurately without overstating the current results. revision: yes
-
Referee: [Framework and User Study] Framework and User Study sections: the core assumption that the LLM can reliably extract semantic interactions from spatial layouts and infer refinement intent without persistent misalignment is never independently tested. The study records only post-hoc user valuation of 'trustworthy support'; no objective probe (e.g., alignment error rate or comparison of LLM-inferred vs. user-intended revisions) is described, leaving the headline result vulnerable to interface-novelty confounds.
Authors: The referee is correct that the manuscript validates the framework's assumptions on semantic interaction extraction and intent inference primarily through post-hoc user feedback on trustworthiness rather than through dedicated, independent objective probes. This indirect approach supports the practical utility for incremental formalization but does not fully isolate alignment performance or rule out novelty effects. To address this, the revised version will add an objective evaluation subsection describing alignment error rates (via comparison of LLM-inferred revisions to user-specified ground truth) and a baseline condition to control for interface novelty. These additions will be integrated into the Framework and User Study sections. revision: yes
Circularity Check
No circularity; framework and user study are self-contained without derivations or self-referential reductions
full rationale
The paper identifies three gaps in spatial-textual generation, introduces the Semantic Prompting framework to perceive semantic interactions and perform targeted revisions, implements it as S-PRISM, and evaluates via a descriptive user study (N=14) reporting subjective valuation of efficiency and alignment. No equations, parameter fittings, uniqueness theorems, or load-bearing self-citations appear in the provided text. The central claims about precision enhancement and incremental formalization rest on the framework description and study observations rather than reducing by construction to inputs or prior author work. This is a standard non-circular HCI framework paper.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs can perceive semantic interactions from spatial layouts and reason about refinement intent to perform targeted revisions.
invented entities (1)
-
Semantic Prompting framework
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Christopher Andrews, Alex Endert, and Chris North. 2010. Space to think: large high-resolution displays for sensemaking. In Proceedings of the SIGCHI conference on human factors in computing systems . 55–64
2010
-
[2]
Anthropic. 2023. Claude 3.5 Sonnet - Anthropic Language Model. Internet: https://www.anthropic.com (2023)
2023
-
[3]
Yali Bian and Chris North. 2021. Deepsi: Interactive deep learning for seman- tic interaction. In 26th International Conference on Intelligent User Interfaces . 197–207
2021
-
[4]
Oloff C Biermann, Ning F Ma, and Dongwook Yoon. 2022. From tool to companion: Storywriters want AI writers to respect their personal values and writing strategies. In Proceedings of the 2022 ACM Designing Interactive Systems Conference. 1209–1227
2022
-
[5]
Jeremy Birnholtz and Steven Ibara. 2012. Tracking changes in collaborative writing: edits, visibility and group maintenance. In Proceedings of the ACM 2012 conference on computer supported cooperative work . 809–818
2012
-
[6]
Lauren Bradel, Chris North, and Leanna House. 2014. Multi-model semantic interaction for text analytics. In 2014 IEEE Conference on Visual Analytics Science and Technology (V AST). IEEE, 163–172
2014
-
[7]
Daniel Buschek. 2024. Collage is the New Writing: Exploring the Fragmen- tation of Text and User Interfaces in AI Tools. InProceedings of the 2024 ACM Designing Interactive Systems Conference . 2719–2737
2024
-
[8]
Daniel Buschek, Martin Zürn, and Malin Eiband. 2021. The impact of mul- tiple parallel phrase suggestions on email input and composition behaviour of native and non-native english writers. In Proceedings of the 2021 CHI Con- ference on Human Factors in Computing Systems . 1–13
2021
-
[9]
Yining Cao, Yiyi Huang, Anh Truong, Hijung Valentina Shin, and Haijun Xia. 2025. Compositional Structures as Substrates for Human-AI Co-creation Environment: A Design Approach and A Case Study. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems . 1–25
2025
-
[10]
John Joon Young Chung, Wooseok Kim, Kang Min Yoo, Hwaran Lee, Eytan Adar, and Minsuk Chang. 2022. TaleBrush: Sketching stories with generative pretrained language models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems . 1–19
2022
-
[11]
Hai Dang, Karim Benharrak, Florian Lehmann, and Daniel Buschek. 2022. Beyond text generation: Supporting writers with continuous automatic text summaries. In Proceedings of the 35th Annual ACM Symposium on User Inter- face Software and Technology. 1–13
2022
-
[12]
Hai Dang, Sven Goller, Florian Lehmann, and Daniel Buschek. 2023. Choice over control: How users write with large language models using diegetic and non-diegetic prompting. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems . 1–17
2023
-
[13]
Michelle Dowling, Nathan Wycoff, Brian Mayer, John Wenskovitch, Leanna House, Nicholas Polys, Chris North, and Peter Hauck. 2019. Interactive vi- sual analytics for sensemaking with big text. Big Data Research 16 (2019), 49–58
2019
- [14]
-
[15]
Alex Endert, Patrick Fiaux, and Chris North. 2012. Semantic interaction for sensemaking: inferring analytical reasoning for model steering. IEEE Trans- actions on Visualization and Computer Graphics 18, 12 (2012), 2879–2888
2012
-
[16]
Yu Fu, Dennis Bromley, and Vidya Setlur. 2025. DataWeaver: Authoring Data-Driven Narratives through the Integrated Composition of Visualiza- tion and Text. In Computer Graphics Forum. Wiley Online Library, e70098
2025
-
[17]
Katy Ilonka Gero, Vivian Liu, and Lydia Chilton. 2022. Sparks: Inspiration for science writing using language models. In Proceedings of the 2022 ACM Designing Interactive Systems Conference . 1002–1019
2022
-
[18]
Katy Ilonka Gero, Tao Long, and Lydia B Chilton. 2023. Social dynamics of AI support in creative writing. In Proceedings of the 2023 CHI conference on human factors in computing systems . 1–15
2023
-
[19]
Steven M Goodman, Erin Buehler, Patrick Clary, Andy Coenen, Aaron Donsbach, Tiffanie N Horne, Michal Lahav, Robert MacDonald, Rain Breaw Michaels, Ajit Narayanan, et al. 2022. Lampost: Design and evaluation of an ai-assisted email writing prototype for adults with dyslexia. In Proceed- ings of the 24th International ACM SIGACCESS Conference on Computers a...
2022
-
[20]
Mary Hegarty. 2010. Components of spatial intelligence. In Psychology of learning and motivation . Vol. 52. Elsevier, 265–297
2010
-
[21]
Tae Soo Kim, Yoonjoo Lee, Minsuk Chang, and Juho Kim. 2023. Cells, gen- erators, and lenses: Design framework for object-oriented interaction with large language models. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology . 1–18
2023
- [22]
-
[23]
Mina Lee, Percy Liang, and Qian Yang. 2022. Coauthor: Designing a human- ai collaborative writing dataset for exploring language model capabilities. In Proceedings of the 2022 CHI conference on human factors in computing systems. 1–19
2022
- [24]
-
[25]
Zhuoyan Li, Chen Liang, Jing Peng, and Ming Yin. 2024. The value, benefits, and concerns of generative ai-powered assistance in writing. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems . 1–25
2024
- [26]
-
[27]
Tao Long, Dorothy Zhang, Grace Li, Batool Taraif, Samia Menon, Kynnedy Simone Smith, Sitong Wang, Katy Ilonka Gero, and Lydia B Chilton
-
[28]
arXiv preprint arXiv:2305.12265 (2023)
Tweetorial hooks: generative AI tools to motivate science on social media. arXiv preprint arXiv:2305.12265 (2023)
-
[29]
Microsoft. 2025. Microsoft Teams. https://www.microsoft.com/en-us/ microsoft-teams/group-chat-software
2025
-
[30]
Miro. 2025. Miro. https://miro.com/
2025
- [31]
-
[32]
OpenAI. 2023. ChatGPT-4: A Conversational Language Model by OpenAI. https://openai.com/chatgpt. Accessed on 6 March 2024
2023
-
[33]
Reinhard Oppermann. 2002. User-interface design. In Handbook on informa- tion technologies for education and training . Springer, 233–248
2002
-
[34]
Peter Pirolli and Stuart Card. 2005. The sensemaking process and leverage points for analyst technology as identified through cognitive task analy- sis. In Proceedings of international conference on intelligence analysis , Vol. 5. McLean, V A, USA, 2–4
2005
-
[35]
Ahmed Y Radwan, Khaled M Alasmari, Omar A Abdulbagi, and Emad A Alghamdi. 2024. SARD: A Human-AI Collaborative Story Generation. In International Conference on Human-Computer Interaction . Springer, 94–105
2024
-
[36]
Laria Reynolds and Kyle McDonell. 2021. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended abstracts of the 2021 CHI conference on human factors in computing systems . 1–7
2021
-
[37]
Frank M Shipman and Catherine C Marshall. 1999. Formality considered harmful: Experiences, emerging themes, and directions on the use of for- mal representations in interactive systems. Computer Supported Cooperative Work (CSCW) 8 (1999), 333–352
1999
-
[38]
Momin N Siddiqui, Roy D Pea, and Hari Subramonyam. 2025. Script&Shift: A layered interface paradigm for integrating content development and rhetor- ical strategy with llm writing assistants. In Proceedings of the 2025 CHI Con- ference on Human Factors in Computing Systems . 1–19
2025
-
[39]
Nikhil Singh, Guillermo Bernal, Daria Savchenko, and Elena L Glassman
-
[40]
ACM Transactions on Computer-Human Inter- action 30, 5 (2023), 1–57
Where to hide a stolen elephant: Leaps in creative writing with mul- timodal machine intelligence. ACM Transactions on Computer-Human Inter- action 30, 5 (2023), 1–57
2023
- [41]
-
[42]
Xuxin Tang, Eric Krokos, Kirsten Whitley, et al. 2025. ReSPIRE: Transparent and Steerable Human-AI Sensemaking through Shared Workspace.TechRxiv (11 April 2025). doi:10.36227/techrxiv.174438673.31381875/v1
- [43]
-
[44]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits rea- soning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837
2022
-
[45]
John Wenskovitch and Chris North. 2020. Interactive artificial intelligence: designing for the” two black boxes” problem. Computer 53, 8 (2020), 29–39
2020
-
[46]
Christopher D Wickens, Michelle Vincow, and Michelle Yeh. 2005. Design applications of visual spatial thinking. The Cambridge handbook of visuospa- tial thinking (2005), 383–425
2005
-
[47]
Wikipedia contributors. 2025. Yellowstone National Park — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Yellowstone_National_ Park [Online; accessed 11-September-2025]
2025
-
[48]
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. React: Synergizing reasoning and acting in language models. In International Conference on Learning Representations (ICLR)
2023
-
[49]
Ryan Yen and Jian Zhao. 2024. Memolet: Reifying the reuse of user-ai con- versational memories. In Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology . 1–22
2024
-
[50]
J Diego Zamfirescu-Pereira, Richmond Y Wong, Bjoern Hartmann, and Qian Yang. 2023. Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI conference on human factors in computing systems . 1–21
2023
-
[51]
Wenshuo Zhang, Leixian Shen, Shuchang Xu, Jindu Wang, Jian Zhao, Huamin Qu, and Lin-Ping Yuan. 2025. NeuroSync: Intent-Aware Code-Based Problem Solving via Direct LLM Understanding Modification. In Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technol- ogy. 1–19
2025
- [52]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.