pith. the verified trust layer for science. sign in

arxiv: 2604.19537 · v2 · submitted 2026-04-21 · 💻 cs.HC

InvestChat: Exploring Multimodal Interaction via Natural Language, Touch, and Pen in an Investment Dashboard

Pith reviewed 2026-05-10 01:22 UTC · model grok-4.3

classification 💻 cs.HC
keywords multimodal interactioninvestment dashboardnatural language inputtouch and penstock market explorationuser engagementLLM chat interfacenovice investors
0
0 comments X p. Extension

The pith

Combining natural language, touch, and pen input in a stock dashboard increases engagement for novice investors by letting them choose the right tool for each task.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces InvestChat, a tablet application with coordinated charts and an LLM chat that accepts queries and commands through typed or spoken natural language, direct touch gestures, and pen marks. In a study with twelve novice investors, participants switched between these inputs depending on the action, such as using natural language for trend questions and pen for marking specific data points, which kept them more involved than single-modality interfaces would. If the pattern holds, investment tools could move away from touch-only designs toward systems that preserve choice, making financial exploration feel less rigid and more conversational. Natural language stood out as the most effective channel for expressing complex requests, while the other two modalities handled precise visual adjustments without interrupting flow.

Core claim

InvestChat shows that an investment interface supporting natural language chat alongside touch and pen input on multiple linked views enables users to apply each modality to complementary parts of stock exploration, resulting in greater engagement, enjoyment of input freedom, and preference for natural language when communicating analytical needs.

What carries the argument

The coordinated multimodal system that routes natural language, touch, and pen inputs through an LLM chat and synchronized data views so each modality updates the same underlying stock information.

If this is right

  • Users select natural language for open-ended questions about market trends and touch or pen for selecting or annotating specific chart elements.
  • The ability to switch modalities without losing context keeps users actively exploring data instead of stopping to adapt to one fixed input.
  • Natural language input handles the majority of analytical intent most efficiently, while pen and touch preserve precision for spatial tasks.
  • Coordinated views ensure that an action in one modality immediately reflects across the dashboard regardless of how the command arrived.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar multimodal designs could be tested in other exploratory domains such as portfolio rebalancing or economic data analysis where users need both high-level questions and fine visual control.
  • Adding voice as a fourth input might further reduce barriers for users who prefer hands-free operation during mobile use.
  • The observed preference for natural language may shift if the underlying LLM accuracy changes, suggesting that reliability of the chat component is a practical limiter.
  • Real-time market volatility could alter modality use, as quick pen marks might become more valuable than typed queries under time pressure.

Load-bearing premise

The interaction patterns and preferences seen with twelve novice investors in a lab setting will appear in larger groups of users and in actual daily investment work.

What would settle it

A follow-up study that tracks whether participants still switch between all three modalities when performing the same tasks over multiple sessions or instead settle on one primary method.

Figures

Figures reproduced from arXiv: 2604.19537 by Adson Lucas de Paiva Sales, Gabriela Molina Le\'on, Henrik {\O}stergaard, Sarah Lykke Tost, Vaishali Dhanoa.

Figure 1
Figure 1. Figure 1: InvestChat Areas: (a) Stock overview with KPI and three different visualizations for the selected stock. (b) Prediction [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
read the original abstract

We designed and implemented InvestChat, a multimodal tablet-based application that supports stock market exploration with multiple coordinated views and an LLM-powered chat. We evaluated the application with 12 novice investors. Our findings suggest that combining natural language, touch, and pen input during stock market exploration facilitates user engagement. Participants leveraged the modalities in complementary ways, enjoying the freedom of choice and finding natural language most effective.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents InvestChat, a tablet-based multimodal application for stock market exploration that integrates an LLM-powered natural language chat with touch and pen inputs across coordinated visualizations. It reports results from a user study with 12 novice investors, claiming that the combination of modalities facilitates engagement, that users leverage them complementarily, and that natural language is perceived as most effective.

Significance. If the reported observations hold under more rigorous conditions, the work offers a concrete case study of multimodal interaction in a financial analytics context, potentially informing interface designs that support flexible exploration. The emphasis on user freedom of choice and modality complementarity aligns with broader HCI interests in adaptive input, though the small-scale qualitative focus restricts immediate generalizability to expert users or real-world trading scenarios.

major comments (2)
  1. [User Study] User Study section: The evaluation is conducted in a single-condition prototype without a control or baseline (e.g., touch-only or chat-only interface), and no objective metrics such as task completion time, insight count, or logged interaction patterns are referenced; this makes it impossible to isolate multimodal benefits from novelty or overall interface quality when claiming facilitated engagement.
  2. [Findings] Findings / Results section: Assertions that modalities were used 'in complementary ways' and that natural language was 'most effective' rest exclusively on post-study interviews and observed behaviors with N=12 novices; no details on qualitative analysis procedure, coding scheme, or any quantitative triangulation are provided, leaving the central claim without verifiable support.
minor comments (1)
  1. [Abstract] Abstract: The summary of outcomes is stated clearly but omits any mention of study scale or qualitative nature, which could be added in one sentence to set reader expectations.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. Below we respond point-by-point to the major comments, indicating where we agree and the specific revisions we will make.

read point-by-point responses
  1. Referee: [User Study] User Study section: The evaluation is conducted in a single-condition prototype without a control or baseline (e.g., touch-only or chat-only interface), and no objective metrics such as task completion time, insight count, or logged interaction patterns are referenced; this makes it impossible to isolate multimodal benefits from novelty or overall interface quality when claiming facilitated engagement.

    Authors: We agree that the single-condition, exploratory design limits our ability to isolate the specific benefits of multimodality or to make comparative claims. The study was intended as a formative investigation of integrated use rather than a controlled comparison. In the revised manuscript we will moderate the language throughout the abstract, findings, and conclusion (e.g., replacing 'facilitates user engagement' with descriptions of observed behaviors and reported experiences). We will also add an explicit limitations subsection that acknowledges the absence of baseline conditions and objective performance measures. We cannot, however, introduce new logged metrics or a between-subjects baseline without running a follow-up study. revision: partial

  2. Referee: [Findings] Findings / Results section: Assertions that modalities were used 'in complementary ways' and that natural language was 'most effective' rest exclusively on post-study interviews and observed behaviors with N=12 novices; no details on qualitative analysis procedure, coding scheme, or any quantitative triangulation are provided, leaving the central claim without verifiable support.

    Authors: We accept that greater methodological transparency is required. In the revised User Study section we will describe the data sources (screen recordings, think-aloud protocols, and semi-structured interviews), the transcription process, and the thematic analysis procedure (following Braun & Clarke), including how the coding scheme for modality complementarity and perceived effectiveness was developed and applied. We will also note the absence of quantitative triangulation as a limitation. These additions will allow readers to evaluate the grounding of our claims while preserving the qualitative nature of the work. revision: yes

standing simulated objections not resolved
  • We cannot supply objective metrics (task times, insight counts, or interaction logs) because these data were not collected in the original study; adding them would require a new experiment.

Circularity Check

0 steps flagged

No circularity: empirical user study with no derivations or fitted predictions

full rationale

The paper describes the design of InvestChat and reports qualitative observations from a single-condition user study with 12 novice investors. No equations, parameters, predictions, or theoretical derivations appear in the provided text or abstract. Claims about modality complementarity and engagement rest directly on post-study interviews and observed behaviors rather than any reduction to inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked to support a derivation chain. The work is self-contained empirical reporting without internal logical loops that would qualify under any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical HCI design and evaluation paper. No mathematical models, free parameters, or invented entities are present.

pith-pipeline@v0.9.0 · 5381 in / 948 out tokens · 34518 ms · 2026-05-10T01:22:25.285093+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references · 13 canonical work pages

  1. [1]

    Sriram Karthik Badam, Arjun Srinivasan, Niklas Elmqvist, and John Stasko

  2. [2]

    InProceedings of the IEEE VIS Immersive Analytics Workshop

    Affordances of Input Modalities for Visual Data Exploration in Immersive Environments. InProceedings of the IEEE VIS Immersive Analytics Workshop. IEEE Computer Society, Los Alamitos, CA, USA, 5 pages. https://api.semanticscholar. org/CorpusID:20980425

  3. [3]

    Sriram Karthik Badam, Jieqiong Zhao, Shivalik Sen, Niklas Elmqvist, and David Ebert. 2016. TimeFork: Interactive Prediction of Time Series. InProceedings of the ACM Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 5409–5420. doi:10.1145/2858036.2858150

  4. [4]

    John Bellio. 2024. System Usability Scale (SUS) Practical Guide for 2025. https: //blog.uxtweak.com/system-usability-scale/. Last accessed: 2025-06-06

  5. [5]

    John Bollinger. 1992. Using Bollinger Bands.Stocks & Commodities10, 2 (1992), 47–51. https://c.mql5.com/forextsd/forum/211/Using%20Bollinger%20Bands% 20by%20John%20Bollinger.pdf

  6. [6]

    Juntong Chen, Jiang Wu, Jiajing Guo, Vikram Mohanty, Xueming Li, Jorge Ono, Wenbin He, Liu Ren, and Dongyu Liu. 2025. InterChat: Enhancing Generative Visual Analytics using Multimodal Interactions.Computer Graphics Forum44, 3 (05 2025). doi:10.1111/cgf.70112

  7. [7]

    Sabir Hossain, and Mohammad Mainul Islam

    Imran Chowdhury, Abdul Moeid, Enamul Hoque, Muhammad Ashad Kabir, Md. Sabir Hossain, and Mohammad Mainul Islam. 2021. Designing and Evaluat- ing Multimodal Interactions for Facilitating Visual Analysis With Dashboards. IEEE Access9 (2021), 60–71. doi:10.1109/ACCESS.2020.3046623

  8. [8]

    Bernard J Jansen, Kathleen W Guan, Joni Salminen, Kholoud Khalil Aldous, and Soon-Gyo Jung. 2025. What is User Engagement?: A Systematic Review of 241 Research Articles in Human-Computer Interaction and Beyond. InProceedings of the ACM Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, Article 457, 19 pages. doi:10.1145/3706598.3713505

  9. [9]

    Waqas Javed, Bryan McDonnel, and Niklas Elmqvist. 2010. Graphical Perception of Multiple Time Series.IEEE Transactions on Visualization and Computer Graphics 16, 6 (2010), 927–934. doi:10.1109/TVCG.2010.162

  10. [10]

    Jaemin Jo, Sehi L’Yi, Bongshin Lee, and Jinwook Seo. 2017. TouchPivot: Blending WIMP & Post-WIMP Interfaces for Data Exploration on Tablet Devices. InProceed- ings of the ACM Conference on Human Factors in Computing Systems(Denver, Col- orado, USA). ACM, New York, NY, USA, 2660–2671. doi:10.1145/3025453.3025752

  11. [11]

    May Jorella Lazaro, Jaeyong Lee, Jaemin Chun, Myung Hwan Yun, and Sungho Kim. 2022. Multimodal interaction: Input-output modality combinations for identification tasks in augmented reality.Applied Ergonomics105 (2022), 103842. doi:10.1016/j.apergo.2022.103842

  12. [12]

    Bongshin Lee, Petra Isenberg, Nathalie Henry Riche, and Sheelagh Carpendale

  13. [13]

    doi:10.1109/TVCG.2012.204

    Beyond Mouse and Keyboard: Expanding Design Considerations for In- formation Visualization Interactions.IEEE Transactions on Visualization and Computer Graphics18, 12 (2012), 2689–2698. doi:10.1109/TVCG.2012.204

  14. [14]

    Gabriela Molina León, Anastasia Bezerianos, Olivier Gladin, and Petra Isenberg

  15. [15]

    MSz: An efficient parallel algorithm for correcting morse-smale segmentations in error-bounded lossy compressors,

    Talk to the Wall: The Role of Speech Interaction in Collaborative Visual Analytics.IEEE Transactions on Visualization and Computer Graphics31, 1 (2025), 941–951. doi:10.1109/TVCG.2024.3456335

  16. [16]

    Nordea Bank. 2024. Young people are investing more and more — are you follow- ing the trend? https://www.nordea.com/en/news/young-people-are-investing- more-and-more-are-you-following-the-trend. Last accessed: 2026-01-09

  17. [17]

    Sungbok Shin, Inyoup Na, and Niklas Elmqvist. 2025. Drillboards: Adaptive Visualization Dashboards for Dynamic Personalization of Visualization Expe- riences.IEEE Transactions on Visualization and Computer Graphics31, 10 (Feb. 2025), 7196–7210. doi:10.1109/TVCG.2025.3542606

  18. [18]

    Drucker, and Ken Hinckley

    Arjun Srinivasan, Bongshin Lee, Nathalie Henry Riche, Steven M. Drucker, and Ken Hinckley. 2020. InChorus: Designing Consistent Multimodal Interactions for Data Visualization on Tablet Devices. InProceedings of the ACM Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–13. doi:10. 1145/3313831.3376782

  19. [19]

    Matthew Turk. 2014. Multimodal interaction: A review.Pattern Recognition Letters36 (2014), 189–195. doi:10.1016/j.patrec.2013.07.003

  20. [20]

    Peng Wang, Shusheng Zhang, Xiaoliang Bai, Mark Billinghurst, Li Zhang, Shuxia Wang, Dechuan Han, Hao Lv, and Yuxiang Yan. 2019. A gesture- and head-based multimodal interaction platform for MR remote collaboration.The International Journal of Advanced Manufacturing Technology105 (12 2019). doi:10.1007/s00170- 019-04434-2