InvestChat: Exploring Multimodal Interaction via Natural Language, Touch, and Pen in an Investment Dashboard
Pith reviewed 2026-05-10 01:22 UTC · model grok-4.3
The pith
Combining natural language, touch, and pen input in a stock dashboard increases engagement for novice investors by letting them choose the right tool for each task.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
InvestChat shows that an investment interface supporting natural language chat alongside touch and pen input on multiple linked views enables users to apply each modality to complementary parts of stock exploration, resulting in greater engagement, enjoyment of input freedom, and preference for natural language when communicating analytical needs.
What carries the argument
The coordinated multimodal system that routes natural language, touch, and pen inputs through an LLM chat and synchronized data views so each modality updates the same underlying stock information.
If this is right
- Users select natural language for open-ended questions about market trends and touch or pen for selecting or annotating specific chart elements.
- The ability to switch modalities without losing context keeps users actively exploring data instead of stopping to adapt to one fixed input.
- Natural language input handles the majority of analytical intent most efficiently, while pen and touch preserve precision for spatial tasks.
- Coordinated views ensure that an action in one modality immediately reflects across the dashboard regardless of how the command arrived.
Where Pith is reading between the lines
- Similar multimodal designs could be tested in other exploratory domains such as portfolio rebalancing or economic data analysis where users need both high-level questions and fine visual control.
- Adding voice as a fourth input might further reduce barriers for users who prefer hands-free operation during mobile use.
- The observed preference for natural language may shift if the underlying LLM accuracy changes, suggesting that reliability of the chat component is a practical limiter.
- Real-time market volatility could alter modality use, as quick pen marks might become more valuable than typed queries under time pressure.
Load-bearing premise
The interaction patterns and preferences seen with twelve novice investors in a lab setting will appear in larger groups of users and in actual daily investment work.
What would settle it
A follow-up study that tracks whether participants still switch between all three modalities when performing the same tasks over multiple sessions or instead settle on one primary method.
Figures
read the original abstract
We designed and implemented InvestChat, a multimodal tablet-based application that supports stock market exploration with multiple coordinated views and an LLM-powered chat. We evaluated the application with 12 novice investors. Our findings suggest that combining natural language, touch, and pen input during stock market exploration facilitates user engagement. Participants leveraged the modalities in complementary ways, enjoying the freedom of choice and finding natural language most effective.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents InvestChat, a tablet-based multimodal application for stock market exploration that integrates an LLM-powered natural language chat with touch and pen inputs across coordinated visualizations. It reports results from a user study with 12 novice investors, claiming that the combination of modalities facilitates engagement, that users leverage them complementarily, and that natural language is perceived as most effective.
Significance. If the reported observations hold under more rigorous conditions, the work offers a concrete case study of multimodal interaction in a financial analytics context, potentially informing interface designs that support flexible exploration. The emphasis on user freedom of choice and modality complementarity aligns with broader HCI interests in adaptive input, though the small-scale qualitative focus restricts immediate generalizability to expert users or real-world trading scenarios.
major comments (2)
- [User Study] User Study section: The evaluation is conducted in a single-condition prototype without a control or baseline (e.g., touch-only or chat-only interface), and no objective metrics such as task completion time, insight count, or logged interaction patterns are referenced; this makes it impossible to isolate multimodal benefits from novelty or overall interface quality when claiming facilitated engagement.
- [Findings] Findings / Results section: Assertions that modalities were used 'in complementary ways' and that natural language was 'most effective' rest exclusively on post-study interviews and observed behaviors with N=12 novices; no details on qualitative analysis procedure, coding scheme, or any quantitative triangulation are provided, leaving the central claim without verifiable support.
minor comments (1)
- [Abstract] Abstract: The summary of outcomes is stated clearly but omits any mention of study scale or qualitative nature, which could be added in one sentence to set reader expectations.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments on our manuscript. Below we respond point-by-point to the major comments, indicating where we agree and the specific revisions we will make.
read point-by-point responses
-
Referee: [User Study] User Study section: The evaluation is conducted in a single-condition prototype without a control or baseline (e.g., touch-only or chat-only interface), and no objective metrics such as task completion time, insight count, or logged interaction patterns are referenced; this makes it impossible to isolate multimodal benefits from novelty or overall interface quality when claiming facilitated engagement.
Authors: We agree that the single-condition, exploratory design limits our ability to isolate the specific benefits of multimodality or to make comparative claims. The study was intended as a formative investigation of integrated use rather than a controlled comparison. In the revised manuscript we will moderate the language throughout the abstract, findings, and conclusion (e.g., replacing 'facilitates user engagement' with descriptions of observed behaviors and reported experiences). We will also add an explicit limitations subsection that acknowledges the absence of baseline conditions and objective performance measures. We cannot, however, introduce new logged metrics or a between-subjects baseline without running a follow-up study. revision: partial
-
Referee: [Findings] Findings / Results section: Assertions that modalities were used 'in complementary ways' and that natural language was 'most effective' rest exclusively on post-study interviews and observed behaviors with N=12 novices; no details on qualitative analysis procedure, coding scheme, or any quantitative triangulation are provided, leaving the central claim without verifiable support.
Authors: We accept that greater methodological transparency is required. In the revised User Study section we will describe the data sources (screen recordings, think-aloud protocols, and semi-structured interviews), the transcription process, and the thematic analysis procedure (following Braun & Clarke), including how the coding scheme for modality complementarity and perceived effectiveness was developed and applied. We will also note the absence of quantitative triangulation as a limitation. These additions will allow readers to evaluate the grounding of our claims while preserving the qualitative nature of the work. revision: yes
- We cannot supply objective metrics (task times, insight counts, or interaction logs) because these data were not collected in the original study; adding them would require a new experiment.
Circularity Check
No circularity: empirical user study with no derivations or fitted predictions
full rationale
The paper describes the design of InvestChat and reports qualitative observations from a single-condition user study with 12 novice investors. No equations, parameters, predictions, or theoretical derivations appear in the provided text or abstract. Claims about modality complementarity and engagement rest directly on post-study interviews and observed behaviors rather than any reduction to inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked to support a derivation chain. The work is self-contained empirical reporting without internal logical loops that would qualify under any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Sriram Karthik Badam, Arjun Srinivasan, Niklas Elmqvist, and John Stasko
-
[2]
InProceedings of the IEEE VIS Immersive Analytics Workshop
Affordances of Input Modalities for Visual Data Exploration in Immersive Environments. InProceedings of the IEEE VIS Immersive Analytics Workshop. IEEE Computer Society, Los Alamitos, CA, USA, 5 pages. https://api.semanticscholar. org/CorpusID:20980425
-
[3]
Sriram Karthik Badam, Jieqiong Zhao, Shivalik Sen, Niklas Elmqvist, and David Ebert. 2016. TimeFork: Interactive Prediction of Time Series. InProceedings of the ACM Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 5409–5420. doi:10.1145/2858036.2858150
-
[4]
John Bellio. 2024. System Usability Scale (SUS) Practical Guide for 2025. https: //blog.uxtweak.com/system-usability-scale/. Last accessed: 2025-06-06
2024
-
[5]
John Bollinger. 1992. Using Bollinger Bands.Stocks & Commodities10, 2 (1992), 47–51. https://c.mql5.com/forextsd/forum/211/Using%20Bollinger%20Bands% 20by%20John%20Bollinger.pdf
1992
-
[6]
Juntong Chen, Jiang Wu, Jiajing Guo, Vikram Mohanty, Xueming Li, Jorge Ono, Wenbin He, Liu Ren, and Dongyu Liu. 2025. InterChat: Enhancing Generative Visual Analytics using Multimodal Interactions.Computer Graphics Forum44, 3 (05 2025). doi:10.1111/cgf.70112
-
[7]
Sabir Hossain, and Mohammad Mainul Islam
Imran Chowdhury, Abdul Moeid, Enamul Hoque, Muhammad Ashad Kabir, Md. Sabir Hossain, and Mohammad Mainul Islam. 2021. Designing and Evaluat- ing Multimodal Interactions for Facilitating Visual Analysis With Dashboards. IEEE Access9 (2021), 60–71. doi:10.1109/ACCESS.2020.3046623
-
[8]
Bernard J Jansen, Kathleen W Guan, Joni Salminen, Kholoud Khalil Aldous, and Soon-Gyo Jung. 2025. What is User Engagement?: A Systematic Review of 241 Research Articles in Human-Computer Interaction and Beyond. InProceedings of the ACM Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, Article 457, 19 pages. doi:10.1145/3706598.3713505
-
[9]
Waqas Javed, Bryan McDonnel, and Niklas Elmqvist. 2010. Graphical Perception of Multiple Time Series.IEEE Transactions on Visualization and Computer Graphics 16, 6 (2010), 927–934. doi:10.1109/TVCG.2010.162
-
[10]
Jaemin Jo, Sehi L’Yi, Bongshin Lee, and Jinwook Seo. 2017. TouchPivot: Blending WIMP & Post-WIMP Interfaces for Data Exploration on Tablet Devices. InProceed- ings of the ACM Conference on Human Factors in Computing Systems(Denver, Col- orado, USA). ACM, New York, NY, USA, 2660–2671. doi:10.1145/3025453.3025752
-
[11]
May Jorella Lazaro, Jaeyong Lee, Jaemin Chun, Myung Hwan Yun, and Sungho Kim. 2022. Multimodal interaction: Input-output modality combinations for identification tasks in augmented reality.Applied Ergonomics105 (2022), 103842. doi:10.1016/j.apergo.2022.103842
-
[12]
Bongshin Lee, Petra Isenberg, Nathalie Henry Riche, and Sheelagh Carpendale
-
[13]
Beyond Mouse and Keyboard: Expanding Design Considerations for In- formation Visualization Interactions.IEEE Transactions on Visualization and Computer Graphics18, 12 (2012), 2689–2698. doi:10.1109/TVCG.2012.204
-
[14]
Gabriela Molina León, Anastasia Bezerianos, Olivier Gladin, and Petra Isenberg
-
[15]
Talk to the Wall: The Role of Speech Interaction in Collaborative Visual Analytics.IEEE Transactions on Visualization and Computer Graphics31, 1 (2025), 941–951. doi:10.1109/TVCG.2024.3456335
-
[16]
Nordea Bank. 2024. Young people are investing more and more — are you follow- ing the trend? https://www.nordea.com/en/news/young-people-are-investing- more-and-more-are-you-following-the-trend. Last accessed: 2026-01-09
2024
-
[17]
Sungbok Shin, Inyoup Na, and Niklas Elmqvist. 2025. Drillboards: Adaptive Visualization Dashboards for Dynamic Personalization of Visualization Expe- riences.IEEE Transactions on Visualization and Computer Graphics31, 10 (Feb. 2025), 7196–7210. doi:10.1109/TVCG.2025.3542606
-
[18]
Arjun Srinivasan, Bongshin Lee, Nathalie Henry Riche, Steven M. Drucker, and Ken Hinckley. 2020. InChorus: Designing Consistent Multimodal Interactions for Data Visualization on Tablet Devices. InProceedings of the ACM Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–13. doi:10. 1145/3313831.3376782
-
[19]
Matthew Turk. 2014. Multimodal interaction: A review.Pattern Recognition Letters36 (2014), 189–195. doi:10.1016/j.patrec.2013.07.003
-
[20]
Peng Wang, Shusheng Zhang, Xiaoliang Bai, Mark Billinghurst, Li Zhang, Shuxia Wang, Dechuan Han, Hao Lv, and Yuxiang Yan. 2019. A gesture- and head-based multimodal interaction platform for MR remote collaboration.The International Journal of Advanced Manufacturing Technology105 (12 2019). doi:10.1007/s00170- 019-04434-2
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.