pith. machine review for the scientific record. sign in

arxiv: 2604.19971 · v1 · submitted 2026-04-21 · 💻 cs.HC · cs.AI

Recognition: unknown

Semantic Prompting: Agentic Incremental Narrative Refinement through Spatial Semantic Interaction

Authors on Pith no claims yet

Pith reviewed 2026-05-10 01:16 UTC · model grok-4.3

classification 💻 cs.HC cs.AI
keywords semantic promptingspatial layoutsnarrative refinementhuman-AI interactionincremental sensemakingLLM steeringintent alignmentinteractive refinement
0
0 comments X

The pith

Semantic Prompting lets LLMs interpret spatial layout changes to make targeted narrative revisions instead of full regenerations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Semantic Prompting to close three gaps in using LLMs with spatial layouts for narrative generation: misalignment between user spatial interactions and model revisions, misalignment between human intent and model output, and insufficient control over fine details. The approach works by having the model detect semantic relationships among positioned elements, infer what refinement the user wants, and apply only the necessary changes to the text. A user study with fourteen participants found that people could steer the process incrementally through spatial moves and that the results felt more precise and aligned with their goals. This matters for sensemaking tasks where information organization happens gradually in space rather than in one-shot text prompts. If the method holds, it keeps the evolving spatial structure intact while updating only the relevant story parts.

Core claim

Semantic Prompting is a framework for spatial refinement that perceives semantic interactions, reasons about refinement intent, and performs targeted positional revisions. Implemented as S-PRISM, the system enhances the precision of interaction-revision refinement and supports incremental formalization through interactive steering, as shown in an empirical evaluation and a user study with fourteen participants who valued its efficient, adaptable, and trustworthy support for strengthening human-LLM intent alignment.

What carries the argument

Semantic Prompting framework that perceives semantic interactions from spatial layouts, reasons about the user's refinement intent, and executes targeted positional revisions to the generated narrative.

If this is right

  • Interaction-revision refinement achieves higher precision than collage-based or full regeneration methods.
  • Users perform incremental formalization of narratives through direct interactive steering of spatial elements.
  • Human-LLM intent alignment improves because revisions stay local and responsive to layout semantics.
  • The resulting support feels efficient, adaptable, and trustworthy to participants in sensemaking workflows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar spatial-semantic steering could apply to non-narrative tasks such as refining data summaries or organizing research notes.
  • Over repeated sessions the spatial history might serve as a persistent record of how the narrative evolved.
  • Integration with existing visualization or mind-mapping tools could let users treat layout changes as the primary control surface for AI assistance.

Load-bearing premise

LLMs can accurately perceive semantic interactions from spatial layouts and reason about refinement intent without persistent human-LLM misalignment.

What would settle it

A test measuring the percentage of cases where specific spatial adjustments by users produce narrative updates that match their stated refinement intentions, compared against full text regeneration baselines.

Figures

Figures reproduced from arXiv: 2604.19971 by Chris North, Eric Krokos, Ibrahim Tahmid, Kirsten Whitley, Xuan Wang, Xuxin Tang.

Figure 1
Figure 1. Figure 1: Comparison of Space-to-Text Frameworks. (a) [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: S-PRISM’s multi-agent pipeline. Users first in [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The S-PRISM interface. (A) A direct-manipulation zoomable workspace for spatial document organization and [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Phase I: An overview of the four tasks (T), il [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Phase I Behavior Patterns. (a) Average correct [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Subjective Ratings. 7-point Likert scores for [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: P4 refined the workspace for refining reports [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
read the original abstract

Interactive spatial layouts empower users to synthesize information and organize findings for sensemaking. While Large Language Models (LLMs) can automate narrative generation from spatial layouts, current collage-based and re-generation methods struggle to support the incremental spatial refinements inherent to the sensemaking process. We identify three critical gaps in existing spatial-textual generation: interaction-revision misalignment, human-LLM intent misalignment, and lack of granular customization. To address these, we introduce Semantic Prompting, a framework for spatial refinement that perceives semantic interactions, reasons about refinement intent, and performs targeted positional revisions. We implemented S-PRISM to realize this framework. The empirical evaluation demonstrated that S-PRISM effectively enhanced the precision of interaction-revision refinement. A user study ($N=14$) highlighted how participants leveraged S-PRISM for incremental formalization through interactive steering. Results showed that users valued its efficient, adaptable, and trustworthy support, which effectively strengthens human-LLM intent alignment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Semantic Prompting, a framework implemented as S-PRISM, to enable agentic incremental narrative refinement from spatial layouts. It identifies three gaps in existing collage-based and regeneration methods (interaction-revision misalignment, human-LLM intent misalignment, and lack of granular customization), then claims that perceiving semantic interactions, reasoning about refinement intent, and performing targeted positional revisions addresses them. An empirical evaluation is said to demonstrate enhanced precision of interaction-revision refinement, while a user study (N=14) reports that participants leveraged the system for incremental formalization and valued its efficient, adaptable, and trustworthy support.

Significance. If the central claims hold, the work could meaningfully advance HCI research on LLM-assisted sensemaking by shifting from one-shot generation to incremental, spatially steered refinement. The framework's emphasis on targeted revisions and intent alignment offers a concrete alternative to current spatial-textual pipelines; a reproducible implementation and falsifiable user-study protocol would strengthen its contribution.

major comments (2)
  1. [Abstract and Evaluation] Abstract and Evaluation section: the claim that S-PRISM 'effectively enhanced the precision of interaction-revision refinement' is unsupported by any quantitative metric (e.g., edit distance to target narrative, semantic similarity, revision count, or inter-rater agreement), baseline condition, or statistical test. The N=14 study reports only subjective preference and 'leveraged for incremental formalization,' which does not establish a measurable precision gain over collage or regeneration methods.
  2. [Framework and User Study] Framework and User Study sections: the core assumption that the LLM can reliably extract semantic interactions from spatial layouts and infer refinement intent without persistent misalignment is never independently tested. The study records only post-hoc user valuation of 'trustworthy support'; no objective probe (e.g., alignment error rate or comparison of LLM-inferred vs. user-intended revisions) is described, leaving the headline result vulnerable to interface-novelty confounds.
minor comments (2)
  1. [Abstract] The abstract contains LaTeX markup ($N=14$) that should be rendered consistently for journal submission.
  2. [Implementation] No explicit description of the spatial encoding scheme or prompting template used in S-PRISM is provided, making the implementation details difficult to reproduce.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which has helped us identify areas where the manuscript can be strengthened. We address each major comment point by point below, indicating where revisions will be made to the next version of the paper.

read point-by-point responses
  1. Referee: [Abstract and Evaluation] Abstract and Evaluation section: the claim that S-PRISM 'effectively enhanced the precision of interaction-revision refinement' is unsupported by any quantitative metric (e.g., edit distance to target narrative, semantic similarity, revision count, or inter-rater agreement), baseline condition, or statistical test. The N=14 study reports only subjective preference and 'leveraged for incremental formalization,' which does not establish a measurable precision gain over collage or regeneration methods.

    Authors: We agree that the current abstract and evaluation presentation does not include explicit quantitative metrics, baselines, or statistical tests to support the precision enhancement claim. The empirical evaluation is grounded in the user study's demonstration of incremental formalization, but we acknowledge this leaves the claim open to the concerns raised. In the revised manuscript, we will expand the Evaluation section to report quantitative measures including edit distances to target narratives, semantic similarity scores, revision counts, and direct comparisons to collage-based and regeneration baselines, accompanied by statistical tests. The abstract will be updated to reflect these additions accurately without overstating the current results. revision: yes

  2. Referee: [Framework and User Study] Framework and User Study sections: the core assumption that the LLM can reliably extract semantic interactions from spatial layouts and infer refinement intent without persistent misalignment is never independently tested. The study records only post-hoc user valuation of 'trustworthy support'; no objective probe (e.g., alignment error rate or comparison of LLM-inferred vs. user-intended revisions) is described, leaving the headline result vulnerable to interface-novelty confounds.

    Authors: The referee is correct that the manuscript validates the framework's assumptions on semantic interaction extraction and intent inference primarily through post-hoc user feedback on trustworthiness rather than through dedicated, independent objective probes. This indirect approach supports the practical utility for incremental formalization but does not fully isolate alignment performance or rule out novelty effects. To address this, the revised version will add an objective evaluation subsection describing alignment error rates (via comparison of LLM-inferred revisions to user-specified ground truth) and a baseline condition to control for interface novelty. These additions will be integrated into the Framework and User Study sections. revision: yes

Circularity Check

0 steps flagged

No circularity; framework and user study are self-contained without derivations or self-referential reductions

full rationale

The paper identifies three gaps in spatial-textual generation, introduces the Semantic Prompting framework to perceive semantic interactions and perform targeted revisions, implements it as S-PRISM, and evaluates via a descriptive user study (N=14) reporting subjective valuation of efficiency and alignment. No equations, parameter fittings, uniqueness theorems, or load-bearing self-citations appear in the provided text. The central claims about precision enhancement and incremental formalization rest on the framework description and study observations rather than reducing by construction to inputs or prior author work. This is a standard non-circular HCI framework paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Based solely on abstract; the framework rests on assumptions about LLM capabilities for semantic perception and intent reasoning, with no free parameters or invented entities explicitly quantified.

axioms (1)
  • domain assumption LLMs can perceive semantic interactions from spatial layouts and reason about refinement intent to perform targeted revisions.
    This is the core premise enabling the Semantic Prompting framework as described.
invented entities (1)
  • Semantic Prompting framework no independent evidence
    purpose: To bridge spatial interactions with LLM narrative refinements for incremental sensemaking.
    Newly proposed method without independent evidence outside the paper's evaluation.

pith-pipeline@v0.9.0 · 5472 in / 1179 out tokens · 42379 ms · 2026-05-10T01:16:43.451059+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

52 extracted references · 10 canonical work pages

  1. [1]

    Christopher Andrews, Alex Endert, and Chris North. 2010. Space to think: large high-resolution displays for sensemaking. In Proceedings of the SIGCHI conference on human factors in computing systems . 55–64

  2. [2]

    Anthropic. 2023. Claude 3.5 Sonnet - Anthropic Language Model. Internet: https://www.anthropic.com (2023)

  3. [3]

    Yali Bian and Chris North. 2021. Deepsi: Interactive deep learning for seman- tic interaction. In 26th International Conference on Intelligent User Interfaces . 197–207

  4. [4]

    Oloff C Biermann, Ning F Ma, and Dongwook Yoon. 2022. From tool to companion: Storywriters want AI writers to respect their personal values and writing strategies. In Proceedings of the 2022 ACM Designing Interactive Systems Conference. 1209–1227

  5. [5]

    Jeremy Birnholtz and Steven Ibara. 2012. Tracking changes in collaborative writing: edits, visibility and group maintenance. In Proceedings of the ACM 2012 conference on computer supported cooperative work . 809–818

  6. [6]

    Lauren Bradel, Chris North, and Leanna House. 2014. Multi-model semantic interaction for text analytics. In 2014 IEEE Conference on Visual Analytics Science and Technology (V AST). IEEE, 163–172

  7. [7]

    Daniel Buschek. 2024. Collage is the New Writing: Exploring the Fragmen- tation of Text and User Interfaces in AI Tools. InProceedings of the 2024 ACM Designing Interactive Systems Conference . 2719–2737

  8. [8]

    Daniel Buschek, Martin Zürn, and Malin Eiband. 2021. The impact of mul- tiple parallel phrase suggestions on email input and composition behaviour of native and non-native english writers. In Proceedings of the 2021 CHI Con- ference on Human Factors in Computing Systems . 1–13

  9. [9]

    Yining Cao, Yiyi Huang, Anh Truong, Hijung Valentina Shin, and Haijun Xia. 2025. Compositional Structures as Substrates for Human-AI Co-creation Environment: A Design Approach and A Case Study. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems . 1–25

  10. [10]

    John Joon Young Chung, Wooseok Kim, Kang Min Yoo, Hwaran Lee, Eytan Adar, and Minsuk Chang. 2022. TaleBrush: Sketching stories with generative pretrained language models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems . 1–19

  11. [11]

    Hai Dang, Karim Benharrak, Florian Lehmann, and Daniel Buschek. 2022. Beyond text generation: Supporting writers with continuous automatic text summaries. In Proceedings of the 35th Annual ACM Symposium on User Inter- face Software and Technology. 1–13

  12. [12]

    Hai Dang, Sven Goller, Florian Lehmann, and Daniel Buschek. 2023. Choice over control: How users write with large language models using diegetic and non-diegetic prompting. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems . 1–17

  13. [13]

    Michelle Dowling, Nathan Wycoff, Brian Mayer, John Wenskovitch, Leanna House, Nicholas Polys, Chris North, and Peter Hauck. 2019. Interactive vi- sual analytics for sensemaking with big text. Big Data Research 16 (2019), 49–58

  14. [14]

    Wanyu Du, Zae Myung Kim, Vipul Raheja, Dhruv Kumar, and Dongyeop Kang. 2022. Read, revise, repeat: A system demonstration for human-in-the- loop iterative text revision. arXiv preprint arXiv:2204.03685 (2022)

  15. [15]

    Alex Endert, Patrick Fiaux, and Chris North. 2012. Semantic interaction for sensemaking: inferring analytical reasoning for model steering. IEEE Trans- actions on Visualization and Computer Graphics 18, 12 (2012), 2879–2888

  16. [16]

    Yu Fu, Dennis Bromley, and Vidya Setlur. 2025. DataWeaver: Authoring Data-Driven Narratives through the Integrated Composition of Visualiza- tion and Text. In Computer Graphics Forum. Wiley Online Library, e70098

  17. [17]

    Katy Ilonka Gero, Vivian Liu, and Lydia Chilton. 2022. Sparks: Inspiration for science writing using language models. In Proceedings of the 2022 ACM Designing Interactive Systems Conference . 1002–1019

  18. [18]

    Katy Ilonka Gero, Tao Long, and Lydia B Chilton. 2023. Social dynamics of AI support in creative writing. In Proceedings of the 2023 CHI conference on human factors in computing systems . 1–15

  19. [19]

    Steven M Goodman, Erin Buehler, Patrick Clary, Andy Coenen, Aaron Donsbach, Tiffanie N Horne, Michal Lahav, Robert MacDonald, Rain Breaw Michaels, Ajit Narayanan, et al. 2022. Lampost: Design and evaluation of an ai-assisted email writing prototype for adults with dyslexia. In Proceed- ings of the 24th International ACM SIGACCESS Conference on Computers a...

  20. [20]

    Mary Hegarty. 2010. Components of spatial intelligence. In Psychology of learning and motivation . Vol. 52. Elsevier, 265–297

  21. [21]

    Tae Soo Kim, Yoonjoo Lee, Minsuk Chang, and Juho Kim. 2023. Cells, gen- erators, and lenses: Design framework for object-oriented interaction with large language models. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology . 1–18

  22. [22]

    Yewon Kim, Mina Lee, Donghwi Kim, and Sung-Ju Lee. 2023. Towards explainable ai writing assistants for non-native english speakers. arXiv preprint arXiv:2304.02625 (2023)

  23. [23]

    Mina Lee, Percy Liang, and Qian Yang. 2022. Coauthor: Designing a human- ai collaborative writing dataset for exploring language model capabilities. In Proceedings of the 2022 CHI conference on human factors in computing systems. 1–19

  24. [24]

    Florian Lehmann and Daniel Buschek. 2024. Functional Flexibility in Genera- tive AI Interfaces: Text Editing with LLMs through Conversations, Toolbars, and Prompts. arXiv preprint arXiv:2410.10644 (2024)

  25. [25]

    Zhuoyan Li, Chen Liang, Jing Peng, and Ming Yin. 2024. The value, benefits, and concerns of generative ai-powered assistance in writing. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems . 1–25

  26. [26]

    Q Vera Liao and Jennifer Wortman Vaughan. 2023. Ai transparency in the age of llms: A human-centered research roadmap. arXiv preprint arXiv:2306.01941 (2023)

  27. [27]

    Tao Long, Dorothy Zhang, Grace Li, Batool Taraif, Samia Menon, Kynnedy Simone Smith, Sitong Wang, Katy Ilonka Gero, and Lydia B Chilton

  28. [28]

    arXiv preprint arXiv:2305.12265 (2023)

    Tweetorial hooks: generative AI tools to motivate science on social media. arXiv preprint arXiv:2305.12265 (2023)

  29. [29]

    Microsoft. 2025. Microsoft Teams. https://www.microsoft.com/en-us/ microsoft-teams/group-chat-software

  30. [30]

    Miro. 2025. Miro. https://miro.com/

  31. [31]

    Jason Obeid and Enamul Hoque. 2020. Chart-to-text: Generating natural language descriptions for charts by adapting the transformer model. arXiv preprint arXiv:2010.09142 (2020)

  32. [32]

    OpenAI. 2023. ChatGPT-4: A Conversational Language Model by OpenAI. https://openai.com/chatgpt. Accessed on 6 March 2024

  33. [33]

    Reinhard Oppermann. 2002. User-interface design. In Handbook on informa- tion technologies for education and training . Springer, 233–248

  34. [34]

    Peter Pirolli and Stuart Card. 2005. The sensemaking process and leverage points for analyst technology as identified through cognitive task analy- sis. In Proceedings of international conference on intelligence analysis , Vol. 5. McLean, V A, USA, 2–4

  35. [35]

    Ahmed Y Radwan, Khaled M Alasmari, Omar A Abdulbagi, and Emad A Alghamdi. 2024. SARD: A Human-AI Collaborative Story Generation. In International Conference on Human-Computer Interaction . Springer, 94–105

  36. [36]

    Laria Reynolds and Kyle McDonell. 2021. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended abstracts of the 2021 CHI conference on human factors in computing systems . 1–7

  37. [37]

    Frank M Shipman and Catherine C Marshall. 1999. Formality considered harmful: Experiences, emerging themes, and directions on the use of for- mal representations in interactive systems. Computer Supported Cooperative Work (CSCW) 8 (1999), 333–352

  38. [38]

    Momin N Siddiqui, Roy D Pea, and Hari Subramonyam. 2025. Script&Shift: A layered interface paradigm for integrating content development and rhetor- ical strategy with llm writing assistants. In Proceedings of the 2025 CHI Con- ference on Human Factors in Computing Systems . 1–19

  39. [39]

    Nikhil Singh, Guillermo Bernal, Daria Savchenko, and Elena L Glassman

  40. [40]

    ACM Transactions on Computer-Human Inter- action 30, 5 (2023), 1–57

    Where to hide a stolen elephant: Leaps in creative writing with mul- timodal machine intelligence. ACM Transactions on Computer-Human Inter- action 30, 5 (2023), 1–57

  41. [41]

    Xuxin Tang, Eric Krokos, Can Liu, Kylie Davidson, Kirsten Whitley, Naren Ramakrishnan, and Chris North. 2024. Steering LLM Summarization with Visual Workspaces for Sensemaking. arXiv preprint arXiv:2409.17289 (2024)

  42. [42]

    Xuxin Tang, Eric Krokos, Kirsten Whitley, et al. 2025. ReSPIRE: Transparent and Steerable Human-AI Sensemaking through Shared Workspace.TechRxiv (11 April 2025). doi:10.36227/techrxiv.174438673.31381875/v1

  43. [43]

    Varun Vasudevan, Faezeh Akhavizadegan, Abhinav Prakash, Yokila Arora, Jason Cho, Tanya Mendiratta, Sushant Kumar, and Kannan Achan. 2025. LLM-driven Constrained Copy Generation through Iterative Refinement. arXiv preprint arXiv:2504.10391 (2025)

  44. [44]

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits rea- soning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837

  45. [45]

    John Wenskovitch and Chris North. 2020. Interactive artificial intelligence: designing for the” two black boxes” problem. Computer 53, 8 (2020), 29–39

  46. [46]

    Christopher D Wickens, Michelle Vincow, and Michelle Yeh. 2005. Design applications of visual spatial thinking. The Cambridge handbook of visuospa- tial thinking (2005), 383–425

  47. [47]

    Wikipedia contributors. 2025. Yellowstone National Park — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Yellowstone_National_ Park [Online; accessed 11-September-2025]

  48. [48]

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. React: Synergizing reasoning and acting in language models. In International Conference on Learning Representations (ICLR)

  49. [49]

    Ryan Yen and Jian Zhao. 2024. Memolet: Reifying the reuse of user-ai con- versational memories. In Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology . 1–22

  50. [50]

    J Diego Zamfirescu-Pereira, Richmond Y Wong, Bjoern Hartmann, and Qian Yang. 2023. Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI conference on human factors in computing systems . 1–21

  51. [51]

    Wenshuo Zhang, Leixian Shen, Shuchang Xu, Jindu Wang, Jian Zhao, Huamin Qu, and Lin-Ping Yuan. 2025. NeuroSync: Intent-Aware Code-Based Problem Solving via Direct LLM Understanding Modification. In Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technol- ogy. 1–19

  52. [52]

    Zheng Zhang, Jie Gao, Ranjodh Singh Dhaliwal, and Toby Jia-Jun Li. 2023. VISAR: A Human-AI Argumentative Writing Assistant with Visual Program- ming and Rapid Draft Prototyping. arXiv preprint arXiv:2304.07810 (2023)