Chat Modeling: Interaction-Enhanced Agent Framework for Visualizing Literature-Grounded Biological Structures
Pith reviewed 2026-05-24 02:28 UTC · model grok-4.3
The pith
Collaborative LLM agents convert natural language from biology papers into structured 3D modeling operations and final models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The framework transforms user inputs including natural language descriptions, research publication content, and textual descriptions of the existing scene into modeling operations in a structured JSON format and final 3D results. The major technical contribution lies in the collaborative agent design that simultaneously supports model planning, execution, and novel user interaction design such as interactive modeling execution and dynamic widget generation that fuse text and mouse interaction within the chat window. The framework further incorporates a customized modeling memory to enhance user interaction, featuring components such as personalized memory management, feedback collection, and
What carries the argument
Collaborative agent design for simultaneous model planning and execution plus customized modeling memory with personalized management, feedback collection, and skill library.
If this is right
- Bioscientists can produce 3D visualizations without first mastering complex modeling software.
- Interactive modeling execution and dynamic widget generation become available inside the chat window.
- Modeling performance improves over time as the memory accumulates personalized data and skills.
- The system supports direct use of publication content as input for model construction.
- Quantitative results on the collected dataset confirm the framework produces usable 3D models.
Where Pith is reading between the lines
- The memory and skill-library design could support sharing of modeling expertise across multiple users or research groups.
- The same agent pattern might transfer to other domains that require turning textual descriptions into geometric models.
- Live connection to experimental data streams could let the system update models when new measurements appear in the literature.
- Automated checks against known geometric constraints of biological structures could be added to catch errors before rendering.
Load-bearing premise
LLM-based agents can reliably turn natural language descriptions of biological structures, publication content, and current scene states into correct structured JSON modeling operations without hallucinations or errors.
What would settle it
Provide the framework with a specific, well-documented biological structure from a publication and check whether the output 3D model contains geometry errors or invalid operations traceable to incorrect JSON commands.
Figures
read the original abstract
Bioscientists frequently seek to visualize the biological systems they have empirically characterized and reported in the literature. Realizing such visualizations requires biological structure modeling, an inherently complex process that demands both biological and geometric understanding. This paper addresses the problem of constructing such 3D models for visualization. In this paper, we introduce a novel agent framework that mitigates the challenges of operating 3D modeling software by transforming user inputs, including natural language descriptions, research publication content, and textual descriptions of the existing objects and structures in the current scene, into modeling operations in a structured JSON format and final 3D results. The major technical contribution lies in the collaborative agent design that simultaneously supports model planning, execution, and novel user interaction design, such as interactive modeling execution and dynamic widget generation that fuse text and mouse interaction within the chat window. The framework further incorporates a customized modeling memory to enhance user interaction, featuring components such as personalized memory management, feedback collection, and skill library design. This modeling memory is leveraged to enable improved 3D modeling performance over time. The quantitative evaluation on our collected dataset showcases the effectiveness of our framework. We also develop a prototype tool, Chat Modeling, and demonstrate its usage through two modeling case studies. Our user study and expert interviews highlight the potential of our approach for use in scientific workflows.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Chat Modeling, a collaborative multi-agent LLM framework that converts natural-language descriptions of biological structures from publications and scene states into structured JSON modeling operations for 3D visualization. Key components include agents for planning and execution, dynamic widget generation fusing text and mouse input in the chat interface, and a customized modeling memory (personalized management, feedback collection, skill library) intended to improve performance over time. The work is supported by quantitative evaluation on a collected dataset, two case studies, a user study, and expert interviews.
Significance. If the core LLM-to-JSON pipeline proves reliable, the system could meaningfully reduce the expertise barrier for bioscientists to produce literature-grounded 3D models. The interactive design elements and memory-augmented agents represent a concrete advance in applied agent frameworks for scientific HCI. The provision of a working prototype and multi-method evaluation (dataset, cases, users, experts) strengthens the applied contribution.
major comments (3)
- [Abstract and §4] Abstract and §4 (Evaluation): The central claim that the modeling memory yields 'improved 3D modeling performance over time' rests on the untested assumption that the LLM agents produce sufficiently accurate JSON outputs. No metrics, baselines, error rates, or robustness analysis are reported for JSON schema adherence, coordinate accuracy, or biological validity; without these, the memory improvement cannot be isolated from initial hallucination failures.
- [§3] §3 (System Design): The framework description does not specify any JSON schema validation, execution sandboxing, or repair loops for malformed or hallucinated outputs. Given that a single incorrect field or reference produces an invalid 3D model, the absence of these safeguards is load-bearing for the claimed reliability of the planning-execution pipeline.
- [§4] §4 (Quantitative Evaluation): The abstract states that a 'collected dataset' is used to showcase effectiveness, yet no details are provided on dataset construction, size, annotation process, chosen metrics (e.g., JSON validity rate, geometric error), or comparison against non-agent baselines. This omission prevents assessment of whether the collaborative-agent design actually outperforms simpler prompting approaches.
minor comments (2)
- [Abstract] The abstract and introduction would benefit from a concise statement of the exact JSON schema fields and an example of a successful transformation to ground the technical contribution.
- [Case Studies] Figure captions and the case-study section should explicitly link each illustrated widget or memory component to the corresponding system module described in §3.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which identifies key areas where additional detail and analysis will strengthen the manuscript. We address each major comment below and will incorporate revisions to improve rigor and transparency.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Evaluation): The central claim that the modeling memory yields 'improved 3D modeling performance over time' rests on the untested assumption that the LLM agents produce sufficiently accurate JSON outputs. No metrics, baselines, error rates, or robustness analysis are reported for JSON schema adherence, coordinate accuracy, or biological validity; without these, the memory improvement cannot be isolated from initial hallucination failures.
Authors: We agree that the evaluation would be strengthened by explicit metrics isolating JSON output quality. The reported quantitative results measure end-to-end task success on the dataset, which requires correct JSON generation, but separate breakdown of schema adherence, coordinate accuracy, and error rates was not included. We will revise §4 to add these metrics, robustness analysis, and non-agent baselines so the contribution of modeling memory can be more clearly isolated. revision: yes
-
Referee: [§3] §3 (System Design): The framework description does not specify any JSON schema validation, execution sandboxing, or repair loops for malformed or hallucinated outputs. Given that a single incorrect field or reference produces an invalid 3D model, the absence of these safeguards is load-bearing for the claimed reliability of the planning-execution pipeline.
Authors: The current manuscript emphasizes the agent collaboration and interaction mechanisms rather than low-level error handling. The prototype does perform JSON parsing with basic validation and error recovery, but these were not described. We will expand §3 to document the JSON schema, validation procedures, any sandboxing, and repair loops used to mitigate malformed outputs. revision: yes
-
Referee: [§4] §4 (Quantitative Evaluation): The abstract states that a 'collected dataset' is used to showcase effectiveness, yet no details are provided on dataset construction, size, annotation process, chosen metrics (e.g., JSON validity rate, geometric error), or comparison against non-agent baselines. This omission prevents assessment of whether the collaborative-agent design actually outperforms simpler prompting approaches.
Authors: We acknowledge that dataset and evaluation details were insufficiently specified. The dataset was built from literature-derived natural-language descriptions of biological structures paired with ground-truth modeling operations. We will revise §4 to include dataset size, construction and annotation process, full metric definitions (including JSON validity and geometric error), and direct comparisons against simpler prompting baselines. revision: yes
Circularity Check
No circularity: applied systems paper with no derivations or fitted predictions
full rationale
The paper describes a built prototype (Chat Modeling) that uses LLM agents to map natural-language inputs to JSON modeling commands for 3D biological structures. No equations, first-principles derivations, parameter fitting, or predictions appear anywhere in the manuscript. The claimed improvements from modeling memory and collaborative agents are presented as engineering outcomes evaluated via case studies and user interviews, not as quantities derived from or equivalent to the framework's own inputs. No self-citation chains or ansatzes are invoked to justify core claims. This matches the default expectation for non-theoretical systems work.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Large language models can be reliably prompted to convert complex biological descriptions and scene states into valid structured JSON modeling operations.
invented entities (2)
-
Collaborative multi-agent system with planning, execution, and interaction components plus dynamic widget generation
no independent evidence
-
Customized modeling memory with personalized management, feedback collection, and skill library
no independent evidence
Forward citations
Cited by 1 Pith paper
-
Raiven: LLM-Based Visualization Authoring via Domain-Specific Language Mediation
Raiven mediates LLM visualization authoring via a formally defined DSL that unifies scientific and information visualization, producing deterministic, verifiable code from metadata-only inputs.
Reference graph
Works this paper leans on
-
[1]
Research and scholarly methods: Semi-structured interviews
Omolola A Adeoye-Olatunde and Nicole L Olenik. Research and scholarly methods: Semi-structured interviews. Journal of the american college of clinical pharmacy, 4(10):1358–1367, 2021
work page 2021
-
[2]
Maya - 3D Computer Animation, Modeling, Simulation, and Rendering Software
Autodesk, Inc. Maya - 3D Computer Animation, Modeling, Simulation, and Rendering Software. https: //www.autodesk.ae/products/maya/overview, 2024. Accessed: 2024-03-18
work page 2024
-
[3]
Mohammed Zia Baig, Ondrej Strnad, Ivan Viola, and Deng Luo. Chloroplast Model. https://www.nanovis. org/Chloroplast-model.html, Access Year. Accessed on 13/04/2023
work page 2023
-
[4]
Hybrid Tactile/Tangible Interaction for 3D Data Exploration
Lonni Besanc ¸on, Paul Issartel, Mehdi Ammi, and Tobias Isenberg. Hybrid Tactile/Tangible Interaction for 3D Data Exploration. IEEE Transactions on Visualization and Computer Graphics, 23(1):881–890, 2017
work page 2017
-
[5]
Lonni Besanc ¸on, Amir Semmo, David J. Biau, Bruno Frachet, Virginie Pineau, El Hadi Sariali, Marc Soubeyrand, Rabah Taouachi, Tobias Isenberg, and Pierre Dragicevic. Reducing Affective Responses to Surgical Images and Videos Through Stylization. Computer Graphics Forum, 39(1):462–483, January 2020
work page 2020
-
[6]
Blender - a 3D modelling and rendering package
Blender Foundation. Blender - a 3D modelling and rendering package. https://www.blender.org/, 2024. Accessed: 2024-03-18
work page 2024
-
[7]
Language models are few-shot learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020
work page 1901
-
[8]
Local Standards for Sample Size at CHI
Kelly Caine. Local Standards for Sample Size at CHI. In Proc. CHI, CHI ’16, pages 981–992, New York, NY , USA, 2016. ACM
work page 2016
-
[9]
Layoutgpt: Compositional visual planning and generation with large language models
Weixi Feng, Wanrong Zhu, Tsu-jui Fu, Varun Jampani, Arjun Akula, Xuehai He, Sugato Basu, Xin Eric Wang, and William Yang Wang. Layoutgpt: Compositional visual planning and generation with large language models. Advances in Neural Information Processing Systems, 36, 2024
work page 2024
-
[10]
Procedural generation of roads
Eric Galin, Adrien Peytavie, Nicolas Mar ´echal, and Eric Gu ´erin. Procedural generation of roads. In Computer Graphics Forum, volume 29, pages 429–438. Wiley Online Library, 2010
work page 2010
-
[11]
Inverse procedural modeling of branching structures by inferring l-systems
Jianwei Guo, Haiyong Jiang, Bedrich Benes, Oliver Deussen, Xiaopeng Zhang, Dani Lischinski, and Hui Huang. Inverse procedural modeling of branching structures by inferring l-systems. ACM Transactions on Graphics (TOG), 39(5):1–13, 2020
work page 2020
-
[12]
Keying Guo, Raik Gr¨unberg, Yuxiang Ren, Tianrui Chang, Shofarul Wustoni, Ondrej Strnad, Anil Koklu, Escarlet D´ıaz-Galicia, Jessica Parrado Agudelo, Victor Druet, et al. SpyDirect: A Novel Biofunctionalization Method for High Stability and Longevity of Electronic Biosensors. Advanced Science, page 2306716
-
[13]
Visual programming: Compositional visual reasoning without training
Tanmay Gupta and Aniruddha Kembhavi. Visual programming: Compositional visual reasoning without training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14953–14962, 2023
work page 2023
-
[14]
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
Wenlong Huang, Chen Wang, Ruohan Zhang, Yunzhu Li, Jiajun Wu, and Li Fei-Fei. V oxposer: Composable 3d value maps for robotic manipulation with language models. arXiv preprint arXiv:2307.05973, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[15]
T. Isenberg, P. Isenberg, J. Chen, M. Sedlmair, and T. M¨oller. A Systematic Review on the Practice of Evaluating Visualization. IEEE Transactions on Visualization and Computer Graphics, 19(12):2818–2827, Dec 2013. 16 Chat Modeling A PREPRINT
work page 2013
-
[16]
VOICE: Visual Oracle for Interaction, Conversation, and Explanation, 2024
Donggang Jia, Alexandra Irger, Lonni Besancon, Ondrej Strnad, Deng Luo, Johanna Bjorklund, Anders Ynnerman, and Ivan Viola. VOICE: Visual Oracle for Interaction, Conversation, and Explanation, 2024
work page 2024
-
[17]
Alexander Kasper, Zhixing Xue, and R ¨udiger Dillmann. The kit object models database: An object model database for object recognition, localization and manipulation in service robotics. The International Journal of Robotics Research, 31(8):927–934, 2012
work page 2012
-
[18]
An efficient method for fully automatic 3D digitization of unknown objects
Souhaiel Khalfaoui, Ralph Seulin, Yohan Fougerolle, and David Fofi. An efficient method for fully automatic 3D digitization of unknown objects. Computers in Industry, 64(9):1152–1160, 2013
work page 2013
-
[19]
How many participants do researchers recruit? A look at 678 UX/HCI studies
Lisa Koeman. How many participants do researchers recruit? A look at 678 UX/HCI studies. Online. Last visited 06 January 2019, 2018
work page 2019
-
[20]
Molecumentary: Adaptable narrated documentaries using molecular visualization
David Kou ˇril, Ond ˇrej Strnad, Peter Mindek, Sarkis Halladjian, Tobias Isenberg, M Eduard Gr ¨oller, and Ivan Viola. Molecumentary: Adaptable narrated documentaries using molecular visualization. IEEE Transactions on Visualization and Computer Graphics, 29(3):1733–1747, 2021
work page 2021
-
[21]
Autonomous 3D modeling of unknown objects for active scene exploration
Simon Kriegel. Autonomous 3D modeling of unknown objects for active scene exploration. PhD thesis, Technis- che Universit¨at M¨unchen (TUM), 2015
work page 2015
-
[22]
VIRD: Immersive Match Video Analysis for High-Performance Badminton Coaching
Tica Lin, Alexandre Aouididi, Zhutian Chen, Johanna Beyer, Hanspeter Pfister, and Jui-Hsien Wang. VIRD: Immersive Match Video Analysis for High-Performance Badminton Coaching. IEEE transactions on visualization and computer graphics, 2023
work page 2023
-
[23]
Advisor: Automatic visualization answer for natural-language question on tabular data
Can Liu, Yun Han, Ruike Jiang, and Xiaoru Yuan. Advisor: Automatic visualization answer for natural-language question on tabular data. In 2021 IEEE 14th Pacific Visualization Symposium (PacificVis), pages 11–20, 2021
work page 2021
-
[24]
X-mesh: Towards fast and accurate text-driven 3d stylization via dynamic textual guidance
Yiwei Ma, Xiaoqing Zhang, Xiaoshuai Sun, Jiayi Ji, Haowei Wang, Guannan Jiang, Weilin Zhuang, and Rongrong Ji. X-mesh: Towards fast and accurate text-driven 3d stylization via dynamic textual guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2749–2760, October 2023
work page 2023
-
[25]
Paula Maddigan and Teo Susnjak. Chat2VIS: Fine-Tuning Data Visualisations using Multilingual Natural Language Text and Pre-Trained Large Language Models. arXiv preprint arXiv:2303.14292, 2023
-
[26]
Model Synthesis: A General Procedural Modeling Algorithm
Paul Merrell and Dinesh Manocha. Model Synthesis: A General Procedural Modeling Algorithm. IEEE Transactions on Visualization and Computer Graphics, 17(6):715–728, 2011
work page 2011
-
[27]
Object 3dit: Language-guided 3d-aware image editing
Oscar Michel, Anand Bhattad, Eli VanderBilt, Ranjay Krishna, Aniruddha Kembhavi, and Tanmay Gupta. Object 3dit: Language-guided 3d-aware image editing. Advances in Neural Information Processing Systems, 36, 2024
work page 2024
-
[28]
Facilitating conversational interaction in natural language interfaces for visualization
Rishab Mitra, Arpit Narechania, Alex Endert, and John Stasko. Facilitating conversational interaction in natural language interfaces for visualization. In 2022 IEEE Visualization and Visual Analytics (VIS), pages 6–10. IEEE, 2022
work page 2022
-
[29]
Clip-mesh: Generating textured meshes from text using pretrained image-text models
Nasir Mohammad Khalid, Tianhao Xie, Eugene Belilovsky, and Tiberiu Popa. Clip-mesh: Generating textured meshes from text using pretrained image-text models. In SIGGRAPH Asia 2022 Conference Papers, SA ’22. ACM, November 2022
work page 2022
-
[30]
Procedural modeling of buildings
Pascal M¨uller, Peter Wonka, Simon Haegler, Andreas Ulmer, and Luc Van Gool. Procedural modeling of buildings. In ACM SIGGRAPH 2006 Papers, pages 614–623. 2006
work page 2006
-
[31]
Arpit Narechania, Arjun Srinivasan, and John Stasko. NL4DV: A toolkit for generating analytic specifications for data visualization from natural language queries. IEEE Transactions on Visualization and Computer Graphics, 27(2):369–379, 2020
work page 2020
-
[32]
Modeling in the time of COVID-19: Statistical and rule-based mesoscale models
Ngan Nguyen, Ondˇrej Strnad, Tobias Klein, Deng Luo, Ruwayda Alharbi, Peter Wonka, Martina Maritan, Peter Mindek, Ludovic Autin, David S Goodsell, et al. Modeling in the time of COVID-19: Statistical and rule-based mesoscale models. IEEE transactions on visualization and computer graphics, 27(2):722–732, 2020
work page 2020
-
[33]
Till Niese, S¨oren Pirk, Matthias Albrecht, Bedrich Benes, and Oliver Deussen. Procedural urban forestry. ACM Transactions on Graphics (TOG), 41(2):1–18, 2022
work page 2022
-
[34]
Numeracy from literacy: Data science as an emergent skill from large language models
David Noever and Forrest McKee. Numeracy from literacy: Data science as an emergent skill from large language models. arXiv preprint arXiv:2301.13382, 2023
- [35]
-
[36]
Yoav IH Parish and Pascal M ¨uller. Procedural modeling of cities. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 301–308, 2001. 17 Chat Modeling A PREPRINT
work page 2001
-
[37]
Vahid Pooryousef, Maxime Cordeil, Lonni Besan c ¸on, Christophe Hurter, Tim Dwyer, and Richard Bassed. Working with Forensic Practitioners to Understand the Opportunities and Challenges for Mixed-Reality Digital Autopsy. In Proc. CHI, CHI ’23, New York, NY , USA, 2023. Association for Computing Machinery
work page 2023
-
[38]
The algorithmic beauty of plants
Przemyslaw Prusinkiewicz and Aristid Lindenmayer. The algorithmic beauty of plants. Springer Science & Business Media, 2012
work page 2012
-
[39]
Robust speech recognition via large-scale weak supervision
Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning, pages 28492–28518. PMLR, 2023
work page 2023
-
[40]
Infinite photorealistic worlds using procedural generation
Alexander Raistrick, Lahav Lipson, Zeyu Ma, Lingjie Mei, Mingzhe Wang, Yiming Zuo, Karhan Kayan, Hongyu Wen, Beining Han, Yihan Wang, et al. Infinite photorealistic worlds using procedural generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12630–12641, 2023
work page 2023
-
[41]
Vis-SPLIT: Interactive Hierarchical Modeling for mRNA Expression Classification
Braden Roper, James C Mathews, Saad Nadeem, and Ji Hwan Park. Vis-SPLIT: Interactive Hierarchical Modeling for mRNA Expression Classification. In 2023 IEEE Visualization and Visual Analytics (VIS), pages 106–110. IEEE, 2023
work page 2023
-
[42]
Advanced procedural modeling of architecture
Michael Schwarz and Pascal M ¨uller. Advanced procedural modeling of architecture. ACM Transactions on Graphics (TOG), 34(4):1–12, 2015
work page 2015
-
[43]
3d-gpt: Procedural 3d modeling with large language models
Chunyi Sun, Junlin Han, Weijian Deng, Xinlong Wang, Zishan Qin, and Stephen Gould. 3d-gpt: Procedural 3d modeling with large language models. arXiv preprint arXiv:2310.12945, 2023
-
[44]
ViperGPT: Visual Inference via Python Execution for Reasoning
D´ıdac Sur´ıs, Sachit Menon, and Carl V ondrick. Vipergpt: Visual inference via python execution for reasoning. arXiv preprint arXiv:2303.08128, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[45]
Qt — Tools for Each Stage of Software Development Lifecycle
The Qt Company. Qt — Tools for Each Stage of Software Development Lifecycle. https://www.qt.io/, 2024. Accessed: 2024-03-23
work page 2024
-
[46]
Prompt a robot to walk with large language models
Yen-Jen Wang, Bike Zhang, Jianyu Chen, and Koushil Sreenath. Prompt a robot to walk with large language models. arXiv preprint arXiv:2309.09969, 2023
-
[47]
Towards natural language-based visualization authoring
Yun Wang, Zhitao Hou, Leixian Shen, Tongshuang Wu, Jiaqi Wang, He Huang, Haidong Zhang, and Dongmei Zhang. Towards natural language-based visualization authoring. IEEE Transactions on Visualization and Computer Graphics, 29(1):1222–1232, 2022
work page 2022
-
[48]
Antoine Webanck, Yann Cortial, Eric Gu ´erin, and Eric Galin. Procedural cloudscapes. In Computer Graphics Forum, volume 37, pages 431–442. Wiley Online Library, 2018
work page 2018
-
[49]
Chain-of-thought prompting elicits reasoning in large language models
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022
work page 2022
- [50]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.