VOICE: Visual Oracle for Interaction, Conversation, and Explanation
Pith reviewed 2026-05-24 09:09 UTC · model grok-4.3
The pith
The VOICE framework pairs arbitrary voice commands with real-time verbal responses and matching 3D visual flythroughs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
VOICE relies on a pack-of-bots that performs distinct roles such as task assignment, instruction extraction, and content generation; after fine-tuning and prompt engineering, these bots enable the system to accept arbitrary voice commands, deliver verbal responses, and generate tightly coupled visual representations including flythrough sequences for 3D molecular models.
What carries the argument
A pack-of-bots architecture in which specialized bots handle task assignment, instruction extraction, and coherent content generation, customized via fine-tuning and prompt engineering.
If this is right
- Natural language inputs allow real-time navigation and manipulation of 3D models.
- Text-to-visualization produces flythrough sequences that match the verbal explanation content.
- The framework maintains low latency and high accuracy when coupling voice responses to visual changes.
- The method applies to molecular models that include multi-scale and multi-instance attributes.
Where Pith is reading between the lines
- The same bot-pack structure could be adapted to visualization domains other than molecules by changing the fine-tuning data.
- Educational settings might benefit from voice-driven sessions that reduce reliance on mouse-and-keyboard controls.
- Direct comparison of latency and error rates against existing visualization interfaces would quantify the claimed gains.
- Extending the system to handle multi-user conversations could support group explanations without additional engineering.
Load-bearing premise
Fine-tuning and prompt engineering of the bots will produce accurate, coherent responses to arbitrary user queries in molecular visualization without hallucinations or task failures.
What would settle it
Run user tests with ambiguous, complex, or out-of-domain voice commands on the molecular models and check whether responses remain accurate, coherent, and free of hallucinations or failures.
Figures
read the original abstract
We present VOICE, a novel approach to science communication that connects large language models' (LLM) conversational capabilities with interactive exploratory visualization. VOICE introduces several innovative technical contributions that drive our conversational visualization framework. Our foundation is a pack-of-bots that can perform specific tasks, such as assigning tasks, extracting instructions, and generating coherent content. We employ fine-tuning and prompt engineering techniques to tailor bots' performance to their specific roles and accurately respond to user queries. Our interactive text-to-visualization method generates a flythrough sequence matching the content explanation. Besides, natural language interaction provides capabilities to navigate and manipulate the 3D models in real-time. The VOICE framework can receive arbitrary voice commands from the user and respond verbally, tightly coupled with corresponding visual representation with low latency and high accuracy. We demonstrate the effectiveness of our approach by applying it to the molecular visualization domain: analyzing three 3D molecular models with multi-scale and multi-instance attributes. We finally evaluate VOICE with the identified educational experts to show the potential of our approach. All supplemental materials are available at https://osf.io/g7fbr.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents VOICE, a novel framework for science communication that integrates large language models with interactive exploratory visualization. It introduces a pack-of-bots for task-specific roles using fine-tuning and prompt engineering, a text-to-visualization method for flythrough sequences, and natural language interaction for real-time 3D model navigation and manipulation. The system is demonstrated on three 3D molecular models with multi-scale and multi-instance attributes, evaluated by educational experts, and claims to handle arbitrary voice commands with low latency and high accuracy.
Significance. If the performance claims hold, VOICE could advance conversational interfaces for visualization in education and science communication by enabling natural voice-driven exploration of complex 3D models. The integration of LLMs with visualization is timely, but the lack of quantitative benchmarks makes it difficult to assess its novelty or superiority over prior systems.
major comments (2)
- [Abstract] Abstract: The central claim that 'the VOICE framework can receive arbitrary voice commands from the user and respond verbally, tightly coupled with corresponding visual representation with low latency and high accuracy' is unsupported by any quantitative metrics, error rates, latency measurements, hallucination rates, or baseline comparisons. The expert evaluation is mentioned but supplies no details on methodology, tasks, or outcomes.
- [Evaluation] Evaluation (implied by expert review description): The assumption that fine-tuning and prompt engineering of the pack-of-bots will yield coherent, hallucination-free responses to arbitrary queries in the molecular visualization domain is load-bearing for the contribution but receives no empirical validation or test coverage description.
minor comments (1)
- The manuscript should specify what supplemental materials (e.g., code, prompts, or evaluation data) are provided at the OSF link to support reproducibility of the pack-of-bots implementation.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major point below and indicate where revisions will be made to strengthen the presentation of our claims and evaluation.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that 'the VOICE framework can receive arbitrary voice commands from the user and respond verbally, tightly coupled with corresponding visual representation with low latency and high accuracy' is unsupported by any quantitative metrics, error rates, latency measurements, hallucination rates, or baseline comparisons. The expert evaluation is mentioned but supplies no details on methodology, tasks, or outcomes.
Authors: We agree that the abstract asserts low latency and high accuracy without accompanying quantitative evidence in the current manuscript. In revision we will qualify this claim to describe observed behavior from our implementation and expand the evaluation section with details on the expert review protocol, specific tasks, participant feedback, and any available latency or accuracy observations from the molecular model demonstrations. revision: yes
-
Referee: [Evaluation] Evaluation (implied by expert review description): The assumption that fine-tuning and prompt engineering of the pack-of-bots will yield coherent, hallucination-free responses to arbitrary queries in the molecular visualization domain is load-bearing for the contribution but receives no empirical validation or test coverage description.
Authors: The manuscript relies on fine-tuning and prompt engineering for the pack-of-bots but does not supply a dedicated description of test coverage or hallucination mitigation results. We will add a short subsection outlining the validation steps performed during development, including example query sets used for the molecular domain and observed coherence outcomes, to make this aspect more transparent. revision: yes
Circularity Check
No circularity: system description with no derivations or self-referential reductions
full rationale
The paper is a descriptive account of an implemented conversational visualization system (pack-of-bots, fine-tuning, prompt engineering, text-to-visualization flythroughs, real-time 3D manipulation). No equations, first-principles derivations, fitted parameters presented as predictions, or load-bearing self-citations appear in the provided text. Claims rest on system construction and expert evaluation rather than any chain that reduces to its own inputs by definition. This matches the default case of a non-circular engineering paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Hybrid Tactile/Tangible Interaction for 3D Data Exploration
Lonni Besanc ¸on, Paul Issartel, Mehdi Ammi, and Tobias Isenberg. Hybrid Tactile/Tangible Interaction for 3D Data Exploration. IEEE Transactions on Visualization and Computer Graphics, 23(1):881–890, 2017
work page 2017
-
[2]
Mouse, Tactile, and Tangible Input for 3D Manipulation
Lonni Besanc ¸on, Paul Issartel, Mehdi Ammi, and Tobias Isenberg. Mouse, Tactile, and Tangible Input for 3D Manipulation. In Proc. CHI, pages 4727–4740, Denver, United States, May 2017
work page 2017
-
[3]
Lonni Besanc ¸on, Konrad Sch¨onborn, Erik Sund´en, He Yin, Samuel Rising, Peter Westerdahl, Patric Ljung, Josef Widestr¨om, Charles Hansen, and Anders Ynnerman. Exploring and explaining climate change: Exploranation as a visualization pedagogy for societal action. In VIS4GOOD, a workshop on Visualization for Social Good, held as part of IEEE VIS 2022, 2022
work page 2022
-
[4]
Lonni Besanc ¸on, Amir Semmo, David J. Biau, Bruno Frachet, Virginie Pineau, El Hadi Sariali, Marc Soubeyrand, Rabah Taouachi, Tobias Isenberg, and Pierre Dragicevic. Reducing Affective Responses to Surgical Images and Videos Through Stylization. Computer Graphics Forum, 39(1):462–483, January 2020
work page 2020
-
[5]
The State of the Art of Spatial Interfaces for 3D Visualization
Lonni Besanc ¸on, Anders Ynnerman, Daniel F Keefe, Lingyun Yu, and Tobias Isenberg. The State of the Art of Spatial Interfaces for 3D Visualization. Computer Graphics Forum, 40(1):293–326, February 2021
work page 2021
-
[6]
Social interaction and learning among family groups visiting a museum
Linda M Blud. Social interaction and learning among family groups visiting a museum. Museum Management and Curatorship, 9(1):43–51, 2009
work page 2009
-
[7]
Alexander Bock, Emil Axelsson, Jonathas Costa, Gene Payne, Micah Acinapura, Vivian Trakinski, Carter Emmart, Cl´audio Silva, Charles Hansen, and Anders Ynnerman. Openspace: A system for astrographics.IEEE Transactions on Visualization and Computer Graphics, 26(1):633–642, 2020
work page 2020
-
[8]
Alexander Bock, Emil Axelsson, Carter Emmart, Masha Kuznetsova, Charles Hansen, and Anders Ynnerman. Openspace: Changing the narrative of public dissemination in astronomical visualization from what to how. IEEE computer graphics and applications, 38(3):44–57, 2018
work page 2018
-
[9]
Springer International Publishing, Cham, 2020
Michael B¨ottinger, Helen-Nicole Kostis, Maria Velez-Rojas, Penny Rheingans, and Anders Ynnerman.Reflections on Visualization for Broad Audiences, pages 297–305. Springer International Publishing, Cham, 2020. 18 VOICE A PREPRINT
work page 2020
-
[10]
Moliverse: Contextually embedding the microcosm into the universe
Mathis Brossier, Robin Sk˚anberg, Lonni Besanc ¸on, Mathieu Linares, Tobias Isenberg, Anders Ynnerman, and Alexander Bock. Moliverse: Contextually embedding the microcosm into the universe. Computers and Graphics, 112:22–30, May 2023
work page 2023
-
[11]
Local standards for sample size at chi
Kelly Caine. Local standards for sample size at chi. In Proc. CHI, CHI ’16, pages 981–992, New York, NY , USA,
- [12]
-
[13]
Heather A Coan, Geoff Goehle, and Robert T Youker. Teaching biochemistry and molecular biology with virtual reality—lesson creation and student response. Journal of Teaching and Learning, 14(1):71–92, 2020
work page 2020
-
[14]
Marianna J Coulentianos, Ilka Rodriguez-Calero, Shanna R Daly, and Kathleen H Sienko. Stakeholder engagement with prototypes during front-end medical device design: Who is engaged with what prototype? In Frontiers in Biomedical Devices, volume 83549, page V001T08A001. American Society of Mechanical Engineers, 2020
work page 2020
-
[15]
Nano for the public: An exploranation perspective
Gunnar H¨ost, Karljohan Palmerius, and Konrad Sch¨onborn. Nano for the public: An exploranation perspective. IEEE Computer Graphics and Applications, 40(2):32–42, 2020
work page 2020
-
[16]
T. Isenberg, P. Isenberg, J. Chen, M. Sedlmair, and T. M¨oller. A systematic review on the practice of evaluating visualization. IEEE Transactions on Visualization and Computer Graphics, 19(12):2818–2827, Dec 2013
work page 2013
-
[17]
Oliver Jacobs, Farid Pazhoohi, and Alan Kingstone. Brief exposure increases mind perception to chatgpt and is moderated by the individual propensity to anthropomorphize. 2023
work page 2023
-
[18]
Teaching and learning chemistry via augmented and immersive virtual reality
Zulma A Jim´enez. Teaching and learning chemistry via augmented and immersive virtual reality. In Technology Integration in Chemistry Education and Research (TICER), pages 31–52. ACS Publications, 2019
work page 2019
-
[19]
Angelina Joy, Fidelia Law, Luke McGuire, Channing Mathews, Adam Hartstone-Rose, Mark Winterbottom, Adam Rutland, Grace E Fields, and Kelly Lynn Mulvey. Understanding parents’ roles in children’s learning and engagement in informal science learning sites. Frontiers in Psychology, 12:635839, 2021
work page 2021
-
[20]
Daniel F. Keefe and Tobias Isenberg. Reimagining the scientific visualization interaction paradigm. IEEE Computer, 46(5):51–57, May 2013
work page 2013
-
[21]
How many participants do researchers recruit? a look at 678 ux/hci studies
Lisa Koeman. How many participants do researchers recruit? a look at 678 ux/hci studies. Online. Last visited 06 January 2019, 2018
work page 2019
-
[22]
Hyperlabels: Browsing of dense and hierarchical molecular 3d models
David Kouˇril, Tobias Isenberg, Barbora Kozl´ıkov´a, Miriah Meyer, M Eduard Gr¨oller, and Ivan Viola. Hyperlabels: Browsing of dense and hierarchical molecular 3d models. IEEE Transactions on Visualization and Computer Graphics, 27(8):3493–3504, 2020
work page 2020
-
[23]
Molecumentary: Adaptable narrated documentaries using molecular visualization
David Kouril, Ondrej Strnad, Peter Mindek, Sarkis Halladjian, Tobias Isenberg, Eduard Groeller, and Ivan Viola. Molecumentary: Adaptable narrated documentaries using molecular visualization. IEEE Transactions on Visualization & Computer Graphics, (01):1–1, 2021
work page 2021
-
[24]
Ingeborg Krange, Kenneth Silseth, and Palmyre Pierroux. Peers, teachers and guides: A study of three conditions for scaffolding conceptual learning in science centers. Cultural Studies of Science Education, 15(1):241–263, 2020
work page 2020
-
[25]
Ingeborg Krange, Kenneth Silseth, and Palmyre Pierroux. Peers, teachers and guides: A study of three conditions for scaffolding conceptual learning in science centers. Cultural Studies of Science Education, 15:241–263, 2020
work page 2020
-
[26]
Cellview: a tool for illustrative and multi-scale rendering of large biomolecular datasets
Mathieu Le Muzic, Ludovic Autin, Julius Parulek, and Ivan Viola. Cellview: a tool for illustrative and multi-scale rendering of large biomolecular datasets. In Eurographics Workshop on Visual Computing for Biomedicine, volume 2015, page 61. NIH Public Access, 2015
work page 2015
-
[27]
Advisor: Automatic visualization answer for natural-language question on tabular data
Can Liu, Yun Han, Ruike Jiang, and Xiaoru Yuan. Advisor: Automatic visualization answer for natural-language question on tabular data. In 2021 IEEE 14th Pacific Visualization Symposium (PacificVis), pages 11–20, 2021
work page 2021
- [28]
-
[29]
Synthesizing natural language to visualization (nl2vis) benchmarks from nl2sql benchmarks
Yuyu Luo, Nan Tang, Guoliang Li, Chengliang Chai, Wenbo Li, and Xuedi Qin. Synthesizing natural language to visualization (nl2vis) benchmarks from nl2sql benchmarks. In Proceedings of the 2021 International Conference on Management of Data, SIGMOD ’21, page 1235–1247, New York, NY , USA, 2021. Association for Computing Machinery
work page 2021
-
[30]
Natural language to visualization by neural machine translation
Yuyu Luo, Nan Tang, Guoliang Li, Jiawei Tang, Chengliang Chai, and Xuedi Qin. Natural language to visualization by neural machine translation. IEEE Transactions on Visualization and Computer Graphics, 28(1):217–226, 2022. 19 VOICE A PREPRINT
work page 2022
-
[31]
Living liquid: Design and evaluation of an exploratory visualization tool for museum visitors
Joyce Ma, Isaac Liao, Kwan-Liu Ma, and Jennifer Frazier. Living liquid: Design and evaluation of an exploratory visualization tool for museum visitors. IEEE Transactions on Visualization and Computer Graphics, 18(12):2799– 2808, 2012
work page 2012
-
[32]
Decoding a complex visualization in a science museum – an empirical study
Joyce Ma, Kwan-Liu Ma, and Jennifer Frazier. Decoding a complex visualization in a science museum – an empirical study. IEEE Transactions on Visualization and Computer Graphics, 26(1):472–481, 2020
work page 2020
-
[33]
Xiaoyue Ma and Yudi Huo. Are users willing to embrace chatgpt? exploring the factors on the acceptance of chatbots from the perspective of aidua framework. Technology in Society, 75:102362, 2023
work page 2023
-
[34]
Paula Maddigan and Teo Susnjak. Chat2vis: Generating data visualisations via natural language using chatgpt, codex and gpt-3 large language models. IEEE Access, 2023
work page 2023
-
[35]
Luisa Massarani, Rosicler Neves, Graziele Scalfi, Antero Vin´ıcius Portela Firmino Pinto, Carla Almeida, Luis Amorim, Marina Ramalho, Luiz Bento, Monica Santos Dahmouche, Renata Fontanetto, et al. The role of mediators in science museums: An analysis of conversations and interactions of brazilian families in free and mediated visits to an interactive exhi...
work page 2022
-
[36]
Visualization multi-pipeline for communicating biology
Peter Mindek, David Kouˇril, Johannes Sorger, Daniel Toloudis, Blair Lyons, Graham Johnson, M Eduard Gr¨oller, and Ivan Viola. Visualization multi-pipeline for communicating biology. IEEE Transactions on Visualization and Computer Graphics, 24(1):883–892, 2017
work page 2017
-
[37]
Facilitating conversational interaction in natural language interfaces for visualization
Rishab Mitra, Arpit Narechania, Alex Endert, and John Stasko. Facilitating conversational interaction in natural language interfaces for visualization. In 2022 IEEE Visualization and Visual Analytics (VIS), pages 6–10, 2022
work page 2022
-
[38]
Arpit Narechania, Arjun Srinivasan, and John Stasko. Nl4dv: A toolkit for generating analytic specifications for data visualization from natural language queries. IEEE Transactions on Visualization and Computer Graphics, 27(2):369–379, 2020
work page 2020
-
[39]
Modeling in the time of covid-19: Statistical and rule-based mesoscale models
Ngan Nguyen, Ondˇrej Strnad, Tobias Klein, Deng Luo, Ruwayda Alharbi, Peter Wonka, Martina Maritan, Peter Mindek, Ludovic Autin, David S Goodsell, et al. Modeling in the time of covid-19: Statistical and rule-based mesoscale models. IEEE transactions on visualization and computer graphics, 27(2):722, 2021
work page 2021
-
[40]
OpenAI. Openai: Introducing chatgpt. https://openai.com/blog/chatgpt, 2022. Accessed: March 27, 2023
work page 2022
- [41]
-
[42]
Vahid Pooryousef, Maxime Cordeil, Lonni Besan c ¸on, Christophe Hurter, Tim Dwyer, and Richard Bassed. Working with forensic practitioners to understand the opportunities and challenges for mixed-reality digital autopsy. In Proc. CHI, CHI ’23, New York, NY , USA, 2023. Association for Computing Machinery
work page 2023
-
[43]
Anke V Reinschluessel, Thomas Muender, Daniela Salzmann, Tanja Doering, Rainer Malaka, and Dirk Weyhe. Virtual reality for surgical planning–evaluation based on two liver tumor resections.Frontiers in Surgery, 9:821060, 2022
work page 2022
-
[44]
Penny Rheingans, Helen-Nicole Kostis, Paulo A. Oemig, Geraldine B. Robbins, and Anders Ynnerman. Reaching Broad Audiences in an Educational Setting, pages 365–380. Springer International Publishing, Cham, 2020
work page 2020
-
[45]
Eileen Scanlon, Stamatina Anastopoulou, Lucinda Kerawalla, and Paul Mulholland. How technology resources can be used to represent personal inquiry and support students’ understanding of it across contexts. Journal of Computer Assisted Learning, 27(6):516–529, 2011
work page 2011
-
[46]
Science museums and centres: evolution and contemporary trends
Bernard Schiele. Science museums and centres: evolution and contemporary trends. In Routledge handbook of public communication of science and technology, pages 53–76. Routledge, 2021
work page 2021
-
[47]
Konrad J Sch¨onborn and Trevor R Anderson. Bridging the educational research-teaching practice gap: Founda- tions for assessing and developing biochemistry students’ visual literacy. Biochemistry and molecular biology education, 38(5):347–354, 2010
work page 2010
-
[48]
Education, entertainment, and engagement in museums in the digital age
Nellie Seale. Education, entertainment, and engagement in museums in the digital age. InCompanion Proceedings of the Annual Symposium on Computer-Human Interaction in Play, pages 326–329, 2023
work page 2023
-
[49]
Engagement in a science museum–the role of social interactions
Neta Shaby, Orit Ben-Zvi Assaraf, and Tali Tal. Engagement in a science museum–the role of social interactions. Visitor Studies, 22(1):1–20, 2019
work page 2019
-
[50]
Neta Shaby, Orit Ben-Zvi Assaraf, and Tali Tal. An examination of the interactions between museum educators and students on a school visit to science museum. Journal of Research in Science Teaching, 56(2):211–239, 2019
work page 2019
-
[51]
Hamza Shahab, Mozard Mohtar, Ezlika Ghazali, Philipp A Rauschnabel, and Andrea Geipel. Virtual reality in museums: does it promote visitor enjoyment and learning? International Journal of Human–Computer Interaction, 39(18):3586–3603, 2023. 20 VOICE A PREPRINT
work page 2023
-
[52]
David M Sobel, Susan M Letourneau, Cristine H Legare, and Maureen Callanan. Relations between parent–child interaction and children’s engagement and learning at a museum exhibit about electric circuits.Developmental Science, 24(3):e13057, 2021
work page 2021
-
[53]
Arjun Srinivasan, Bongshin Lee, Nathalie Henry Riche, Steven M. Drucker, and Ken Hinckley. Inchorus: Designing consistent multimodal interactions for data visualization on tablet devices. In Proc. CHI, CHI ’20, page 1–13, New York, NY , USA, 2020. Association for Computing Machinery
work page 2020
- [54]
-
[55]
Tloen. Alpaca lora library. https://github.com/tloen/alpaca-lora, 2023. 08-August-2023
work page 2023
-
[56]
Michael Tsang, George W Fitzmaurice, Gordon Kurtenbach, Azam Khan, and Bill Buxton. Boom chameleon: Simultaneous capture of 3D viewpoint, voice and gesture annotations on a spatially-aware display. In Proc. UIST, pages 111–120, New York, 2002. ACM
work page 2002
-
[57]
Upstage. Llama-2-70b-instruct-v2. https://huggingface.co/upstage/Llama-2-70b-instruct-v2 ,
-
[58]
Upstage. Llama-30b-instruct. https://huggingface.co/upstage/llama-30b-instruct, 2023. 10-August- 2023
work page 2023
-
[59]
Towards natural language-based visualization authoring
Yun Wang, Zhitao Hou, Leixian Shen, Tongshuang Wu, Jiaqi Wang, He Huang, Haidong Zhang, and Dong- mei Zhang. Towards natural language-based visualization authoring. IEEE Transactions on Visualization and Computer Graphics, 29(1):1222–1232, 2022
work page 2022
- [60]
-
[61]
User retention of mobile augmented reality for cultural heritage learning
Ningning Xu, Yue Li, Jie Lin, Lingyun Yu, and Hai-Ning Liang. User retention of mobile augmented reality for cultural heritage learning. In 2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pages 447–452, 2022
work page 2022
-
[62]
Molecular architecture of the sars-cov-2 virus
Hangping Yao, Yutong Song, Yong Chen, Nanping Wu, Jialu Xu, Chujie Sun, Jiaxing Zhang, Tianhao Weng, Zheyuan Zhang, Zhigang Wu, et al. Molecular architecture of the sars-cov-2 virus. Cell, 183(3):730–738, 2020
work page 2020
-
[63]
Structure and function of bacteriophage t4
Moh Lan Yap and Michael G Rossmann. Structure and function of bacteriophage t4. Future microbiology, 9(12):1319–1327, 2014
work page 2014
-
[64]
Reaching Broad Audiences from a Science Center or Museum Setting, pages 341–364
Anders Ynnerman, Patric Ljung, and Alexander Bock. Reaching Broad Audiences from a Science Center or Museum Setting, pages 341–364. Springer International Publishing, Cham, 2020
work page 2020
-
[65]
Exploranation: A new science communication paradigm
Anders Ynnerman, Jonas L¨owgren, and Lena Tibell. Exploranation: A new science communication paradigm. IEEE computer graphics and applications, 38(3):13–20, 2018
work page 2018
-
[66]
Interactive visualization of 3D scanned mummies at public venues
Anders Ynnerman, Thomas Rydell, Daniel Antoine, David Hughes, Anders Persson, and Patric Ljung. Interactive visualization of 3D scanned mummies at public venues. Commun. ACM, 59(12):72–81, December 2016. 21
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.