pith. sign in

arxiv: 2509.26557 · v1 · submitted 2025-09-30 · 💻 cs.HC

The Invisible Mentor: Inferring User Actions from Screen Recordings to Recommend Better Workflows

Pith reviewed 2026-05-18 11:49 UTC · model grok-4.3

classification 💻 cs.HC
keywords screen recordingsworkflow recommendationsvision-language modelsAI assistantsuser action inferencesoftware efficiencyhuman-computer interaction
0
0 comments X

The pith

Screen recordings can be turned into precise workflow suggestions using a vision-language model pipeline

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces InvisibleMentor, a system that processes screen recordings of users working in tools like Excel to automatically spot inefficient patterns such as repetitive edits and then recommend better alternatives. It does this through a two-stage process without requiring the user to describe their goals or problems. A vision-language model first extracts actions and context from the raw video, after which a language model produces structured suggestions. This matters because many users miss more efficient methods in complex software, and current AI assistants depend on imprecise or effortful prompts from the user. In testing, the system correctly identified workflow issues and participants rated its output as more actionable, tailored, and useful for learning than suggestions from a prompt-based spreadsheet assistant.

Core claim

InvisibleMentor turns screen recordings of task completion into vision-grounded reflections on tasks. It detects issues such as repetitive edits and recommends more efficient alternatives based on observed behavior. Unlike prior systems that rely on logs, APIs, or user prompts, InvisibleMentor operates directly on screen recordings. It uses a two-stage pipeline: a vision-language model reconstructs actions and context, and a language model generates structured, high-fidelity suggestions.

What carries the argument

Two-stage pipeline in which a vision-language model reconstructs user actions and task context directly from screen recordings, followed by a language model that produces structured suggestions for more efficient workflows.

If this is right

  • Users receive tailored efficiency suggestions without having to articulate their goals or problems.
  • Repetitive or inefficient actions visible in behavior can be automatically detected and addressed.
  • Suggestions are judged more actionable and helpful for learning than those produced by prompt-based spreadsheet assistants.
  • The approach works directly from video input and does not require access to application logs or APIs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same video-based reconstruction could be tested in other feature-rich applications such as presentation or data-visualization tools.
  • Running the pipeline on continuous screen capture might support ongoing rather than post-task guidance.
  • Errors in action reconstruction could be reduced by allowing users to correct the inferred steps before suggestions are generated.

Load-bearing premise

A vision-language model can reliably reconstruct precise user actions and task context from raw screen recordings without substantial errors or loss of detail.

What would settle it

A study that supplies screen recordings of tasks with known optimal versus suboptimal workflows, then measures how often the reconstructed actions match independent manual annotations and whether the generated suggestions correctly address the identified inefficiencies.

Figures

Figures reproduced from arXiv: 2509.26557 by Andrew Head, Chris Parnin, Emerson Murphy-Hill, Ken Milne, Litao Yan, Sumit Gulwani, Vu Le.

Figure 1
Figure 1. Figure 1: InvisibleMentor transforms ordinary screen recordings into more than just sequences of user actions. It interprets the [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: InvisibleMentor’s pipeline for generating suggestions from a screen recording. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: User interface of a spreadsheet assistant that provides structured workflow guidance. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Participant ratings of InvisibleMentor’s suggestions. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Participants’ comparative preferences between In [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Relationship between video duration and VLM pro [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
read the original abstract

Many users struggle to notice when a more efficient workflow exists in feature-rich tools like Excel. Existing AI assistants offer help only after users describe their goals or problems, which can be effortful and imprecise. We present InvisibleMentor, a system that turns screen recordings of task completion into vision-grounded reflections on tasks. It detects issues such as repetitive edits and recommends more efficient alternatives based on observed behavior. Unlike prior systems that rely on logs, APIs, or user prompts, InvisibleMentor operates directly on screen recordings. It uses a two-stage pipeline: a vision-language model reconstructs actions and context, and a language model generates structured, high-fidelity suggestions. In evaluation, InvisibleMentor accurately identified inefficient workflows, and participants found its suggestions more actionable, tailored, and more helpful for learning and improvement compared to a prompt-based spreadsheet assistant.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces InvisibleMentor, a two-stage pipeline that processes screen recordings of user tasks in feature-rich applications such as Excel. A vision-language model first reconstructs actions and task context from raw video, after which a language model produces structured suggestions for more efficient workflows. The abstract claims that the system accurately detects issues like repetitive edits and that, in an evaluation, participants rated its suggestions as more actionable, tailored, and helpful for learning than those from a prompt-based spreadsheet assistant.

Significance. If the evaluation claims are substantiated with rigorous controls and metrics, the work could offer a meaningful advance in passive, observation-driven AI assistance for workflow discovery, reducing reliance on explicit user prompts or application logs. The approach aligns with HCI goals of lowering the effort required to identify inefficiencies in complex tools. However, the absence of any quantitative results, participant counts, or protocol details in the provided manuscript prevents a firm assessment of its potential impact or generalizability.

major comments (2)
  1. [Abstract] Abstract: The central claim that 'InvisibleMentor accurately identified inefficient workflows' is presented without any supporting metrics, error rates, participant numbers, or description of the evaluation protocol. This omission is load-bearing because the paper's contribution rests on demonstrating superior performance over the prompt-based baseline.
  2. [Abstract] Abstract: The weakest assumption—that a vision-language model can reliably reconstruct precise user actions and task context from raw screen recordings without substantial errors—is stated but not accompanied by any fidelity measures, failure cases, or validation against ground-truth action logs. This directly affects the credibility of the downstream suggestions.
minor comments (1)
  1. [Abstract] Abstract: The comparison baseline is described only as 'a prompt-based spreadsheet assistant'; clarifying its exact capabilities and prompting strategy would help readers understand the strength of the reported advantage.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We agree that additional details are needed to substantiate the claims and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that 'InvisibleMentor accurately identified inefficient workflows' is presented without any supporting metrics, error rates, participant numbers, or description of the evaluation protocol. This omission is load-bearing because the paper's contribution rests on demonstrating superior performance over the prompt-based baseline.

    Authors: We agree that the abstract would benefit from more specific details to support the central claim. The current version summarizes the evaluation outcomes at a high level for brevity. In the revised manuscript we will update the abstract to include participant counts, key comparative metrics against the prompt-based baseline, and a concise description of the evaluation protocol. revision: yes

  2. Referee: [Abstract] Abstract: The weakest assumption—that a vision-language model can reliably reconstruct precise user actions and task context from raw screen recordings without substantial errors—is stated but not accompanied by any fidelity measures, failure cases, or validation against ground-truth action logs. This directly affects the credibility of the downstream suggestions.

    Authors: We acknowledge that explicit validation of the VLM reconstruction step strengthens the paper. The abstract currently focuses on the end-to-end system rather than intermediate fidelity metrics. We will revise the abstract to reference the VLM validation approach and will ensure the full manuscript includes quantitative fidelity measures, selected failure cases, and comparison to ground-truth logs. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided abstract describes a two-stage VLM+LM pipeline for inferring actions from screen recordings and generating workflow suggestions, along with qualitative evaluation results from a user study. No equations, parameters, derivations, or self-citations appear in the text. The claims rest on system behavior and external participant feedback rather than any reduction of outputs to fitted inputs or self-referential premises by construction. This is a standard non-circular system paper whose central assertions are evaluated independently of internal fitting.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the unverified performance of current vision-language models for action reconstruction and on the assumption that observed behavior in recordings is representative of real workflow problems.

axioms (1)
  • domain assumption Vision-language models can accurately reconstruct user actions and context from screen recordings
    The first stage of the pipeline explicitly relies on this capability of VLMs.
invented entities (1)
  • InvisibleMentor two-stage pipeline no independent evidence
    purpose: To convert screen recordings into structured workflow suggestions
    The system itself is the primary new artifact introduced by the paper.

pith-pipeline@v0.9.0 · 5662 in / 1140 out tokens · 32280 ms · 2026-05-18T11:49:41.992075+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages

  1. [1]

    David Akers, Matthew Simpson, Robin Jeffries, and Terry Winograd. 2009. Undo and erase events as indicators of usability problems. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 659–668

  2. [2]

    Mohammad Alahmadi, Abdulkarim Malkadi, and Sonia Haiduc. 2020. UI Screens Identification and Extraction from Mobile Programming Screencasts. InPro- ceedings of the 28th International Conference on Program Comprehension. ACM, Litao Yan, Andrew Head, Ken Milne, Vu Le, Sumit Gulwani, Chris Parnin, and Emerson Murphy-Hill 319–330

  3. [3]

    Gilles Baechler, Srinivas Sunkara, Maria Wang, Fedir Zubach, Hassan Mansoor, Vincent Etter, Victor Cărbune, Jason Lin, Jindong Chen, and Abhanshu Sharma

  4. [4]

    ScreenAI: A Vision-Language Model for UI and Infographics Understand- ing

  5. [5]

    Carlos Bernal-Cárdenas, Nathan Cooper, Madeleine Havranek, Kevin Moran, Oscar Chaparro, Denys Poshyvanyk, and Andrian Marcus. 2023. Translating Video Recordings of Complex Mobile App UI Gestures into Replayable Scenarios. IEEE Transactions on Software Engineering49 (2023), 1782–1803

  6. [6]

    2016.Qualitative HCI research: Going behind the scenes

    Ann Blandford, Dominic Furniss, and Stephann Makri. 2016.Qualitative HCI research: Going behind the scenes. Morgan & Claypool Publishers

  7. [7]

    Bradbard, Charles Alvis, and Richard Morris

    David A. Bradbard, Charles Alvis, and Richard Morris. 2014. Spreadsheet usage by management accountants: An exploratory study.Journal of Accounting Education (2014), 24–30

  8. [8]

    Tyson Bulmer, Lloyd Montgomery, and Daniela Damian. 2018. Predicting develop- ers’ IDE commands with machine learning. InProceedings of the 15th International Conference on Mining Software Repositories. ACM, 82–85

  9. [9]

    It’s Freedom to Put Things Where My Mind Wants

    George Chalhoub and Advait Sarkar. 2022. “It’s Freedom to Put Things Where My Mind Wants”: Understanding and Improving the User Experience of Structuring Data in Spreadsheets. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems. ACM, Article 585, 24 pages

  10. [10]

    Sibei Chen, Yeye He, Weiwei Cui, Ju Fan, Song Ge, Haidong Zhang, Dongmei Zhang, and Surajit Chaudhuri. 2024. Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table Representations.Proceedings of the ACM on Management of Data, Article 122 (2024), 27 pages

  11. [11]

    Yanting Chen, Yi Ren, Xiaoting Qin, Jue Zhang, Kehong Yuan, Lu Han, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan, and Qi Zhang. 2024. Sharingan: Extract User Action Sequence from Desktop Recordings

  12. [12]

    Chilana, Amy J

    Parmit K. Chilana, Amy J. Ko, and Jacob O. Wobbrock. 2012. LemonAid: selection- based crowdsourced contextual help for web applications. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1549–1558

  13. [13]

    1993.Eager: programming repetitive tasks by demonstration

    Allen Cypher. 1993.Eager: programming repetitive tasks by demonstration. MIT Press, Cambridge, MA, USA, 205–217

  14. [14]

    Robert DeLine, Amir Khella, Mary Czerwinski, and George Robertson. 2005. Towards understanding programs through wear-based filtering. InProceedings of the 2005 ACM Symposium on Software Visualization. ACM, 183–192

  15. [15]

    Travis Faas, Lynn Dombrowski, Alyson Young, and Andrew D. Miller. 2018. Watch Me Code: Programming Mentorship Communities on Twitch.tv.Proceed- ings of the ACM on Human-Computer Interaction, Article 50 (2018), 18 pages

  16. [16]

    Leah Findlater and Joanna McGrenere. 2004. A comparison of static, adaptive, and adaptable menus. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 89–96

  17. [17]

    Adam Fourney, Richard Mann, and Michael Terry. 2011. Query-feature graphs: bridging user vocabulary and system functionality. InProceedings of the 24th An- nual ACM Symposium on User Interface Software and Technology. ACM, 207–216

  18. [18]

    Ailie Fraser, Mira Dontcheva, Holger Winnemöller, Sheryl Ehrlich, and Scott Klemmer

    C. Ailie Fraser, Mira Dontcheva, Holger Winnemöller, Sheryl Ehrlich, and Scott Klemmer. 2016. DiscoverySpace: Suggesting Actions in Complex Software. In Proceedings of the 2016 ACM Conference on Designing Interactive Systems. ACM, 1221–1232

  19. [19]

    Tovi Grossman and George Fitzmaurice. 2010. ToolClips: an investigation of contextual video assistance for functionality understanding. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1515–1524

  20. [20]

    Harris, and Rishabh Singh

    Sumit Gulwani, William R. Harris, and Rishabh Singh. 2012. Spreadsheet data manipulation using examples.Commun. ACM(2012), 97–105

  21. [21]

    Björn Hartmann, Daniel MacDougall, Joel Brandt, and Scott R. Klemmer. 2010. What would other programmers do: suggesting solutions to error messages. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1019–1028

  22. [22]

    Sture Holm. 1979. A simple sequentially rejective multiple test procedure.Scan- dinavian journal of statistics(1979), 65–70

  23. [23]

    Forrest Huang, Gang Li, Tao Li, and Yang Li. 2024. Automatic Macro Mining from Interaction Traces at Scale. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, Article 1038, 16 pages

  24. [24]

    Yue Jiang, Eldon Schoop, Amanda Swearngin, and Jeffrey Nichols. 2025. ILu- vUI: Instruction-tuned LangUage-Vision modeling of UIs from Machine Con- versations. InProceedings of the 30th International Conference on Intelligent User Interfaces. ACM, 861–877

  25. [25]

    Yiqiao Jin, Stefano Petrangeli, Yu Shen, and Gang Wu. 2025. ScreenLLM: Stateful Screen Schema for Efficient Action Understanding and Prediction. InCompanion Proceedings of the ACM on Web Conference 2025. ACM, 2008–2013

  26. [26]

    Anjali Khurana, Xiaotian Su, April Yi Wang, and Parmit K Chilana. 2025. Do It For Me vs. Do It With Me: Investigating User Perceptions of Different Paradigms of Automation in Copilots for Feature-Rich Software. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. ACM, Article 880, 18 pages

  27. [27]

    Anjali Khurana, Hariharan Subramonyam, and Parmit K Chilana. 2024. Why and When LLM-Based Assistants Can Go Wrong: Investigating the Effectiveness of Prompt-Based Interactions for Software Help-Seeking. InProceedings of the 29th International Conference on Intelligent User Interfaces. ACM, 288–303

  28. [28]

    Andrea Kohlhase, Michael Kohlhase, and Ana Guseva. 2015. Context in Spread- sheet Comprehension. InSEMS@ ICSE. 21–27

  29. [29]

    Benjamin Lafreniere, Andrea Bunt, Matthew Lount, Filip Krynicki, and Michael A. Terry. 2011. AdaptableGIMP: designing a socially-adaptable interface. InPro- ceedings of the 24th Annual ACM Symposium Adjunct on User Interface Software and Technology. ACM, 89–90

  30. [30]

    Chilana, Adam Fourney, and Michael A

    Benjamin Lafreniere, Parmit K. Chilana, Adam Fourney, and Michael A. Terry

  31. [31]

    InProceedings of the 28th Annual ACM Symposium on User Interface Software & Technology

    These Aren’t the Commands You’re Looking For: Addressing False Feedfor- ward in Feature-Rich Software. InProceedings of the 28th Annual ACM Symposium on User Interface Software & Technology. ACM, 619–628

  32. [32]

    Liang, Aayush Kumar, Yasharth Bajpai, Sumit Gulwani, Vu Le, Chris Parnin, Arjun Radhakrishna, Ashish Tiwari, Emerson Murphy-Hill, and Gustavo Soares

    Jenny T. Liang, Aayush Kumar, Yasharth Bajpai, Sumit Gulwani, Vu Le, Chris Parnin, Arjun Radhakrishna, Ashish Tiwari, Emerson Murphy-Hill, and Gustavo Soares. 2025. TableTalk: Scaffolding Spreadsheet Development with a Language Agent.ACM Transactions on Computer-Human Interaction(2025)

  33. [33]

    Wendy E. Mackay. 1990. Patterns of sharing customizable software. InProceedings of the 1990 ACM Conference on Computer-Supported Cooperative Work. ACM

  34. [34]

    Abdulkarim Malkadi, Ahmad Tayeb, and Sonia Haiduc. 2023. Improving Code Ex- traction from Coding Screencasts Using a Code-Aware Encoder-Decoder Model. In2023 38th IEEE/ACM International Conference on Automated Software Engineer- ing (ASE). 1492–1504

  35. [35]

    Justin Matejka, Tovi Grossman, and George Fitzmaurice. 2011. Ambient help. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2751–2760

  36. [36]

    Justin Matejka, Tovi Grossman, and George Fitzmaurice. 2013. Patina: dynamic heatmaps for visualizing application usage. InProceedings of the SIGCHI Confer- ence on Human Factors in Computing Systems. ACM, 3227–3236

  37. [37]

    Justin Matejka, Wei Li, Tovi Grossman, and George Fitzmaurice. 2009. Com- munityCommands: command recommendations for software applications. In Proceedings of the 22nd Annual ACM Symposium on User Interface Software and Technology. ACM, 193–202

  38. [38]

    Emerson Murphy-Hill, Rahul Jiresal, and Gail C. Murphy. 2012. Improving software developers’ fluency by recommending development environment com- mands. InProceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. ACM, Article 42, 11 pages

  39. [39]

    Emerson Murphy-Hill, Da Young Lee, Gail C Murphy, and Joanna McGrenere

  40. [40]

    How do users discover new tools in software development and beyond? Computer Supported Cooperative Work (CSCW)24, 5 (2015), 389–422

  41. [41]

    Nambhi, Bhanu Prakash Reddy, Aarsh Prakash Agarwal, Gaurav Verma, Harvineet Singh, and Iftikhar Ahamath Burhanuddin

    Aadhavan M. Nambhi, Bhanu Prakash Reddy, Aarsh Prakash Agarwal, Gaurav Verma, Harvineet Singh, and Iftikhar Ahamath Burhanuddin. 2019. Stuck? No worries! Task-aware Command Recommendation and Proactive Help for Analysts. InProceedings of the 27th ACM Conference on User Modeling, Adaptation and Personalization. ACM, 271–275

  42. [42]

    1994.Usability Engineering

    Jakob Nielsen. 1994.Usability Engineering. Morgan Kaufmann Publishers Inc

  43. [43]

    Chris Parnin and Robert DeLine. 2010. Evaluating cues for resuming interrupted programming tasks. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 93–102

  44. [44]

    How do you even know that stuff?

    Qing, Xia, Advait Sarkar, Duncan Brumby, and Anna Cox. 2025. "How do you even know that stuff?": Barriers to expertise sharing among spreadsheet users

  45. [45]

    Vidya Ramesh, Charlie Hsu, Maneesh Agrawala, and Björn Hartmann. 2011. ShowMeHow: translating user interface instructions between applications. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology. ACM, 127–134

  46. [46]

    Satterthwaite

    Franklin E. Satterthwaite. 1946. An approximate distribution of estimates of variance components.Biometrics bulletin2, 6 (1946), 110–114

  47. [47]

    Rishabh Singh and Sumit Gulwani. 2016. Transforming spreadsheet data types using examples.POPL ’16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languag(2016), 343–356

  48. [48]

    Ananya Singha, Bhavya Chopra, Anirudh Khatry, Sumit Gulwani, Austin Henley, Vu Le, Chris Parnin, Mukul Singh, and Gust Verbruggen. 2024. Semantically Aligned Question and Code Generation for Automated Insight Generation. In Proceedings of the 1st International Workshop on Large Language Models for Code. ACM, 127–134

  49. [49]

    Sruti Srinivasa Ragavan, Advait Sarkar, and Andrew D Gordon. 2021. Spread- sheet Comprehension: Guesswork, Giving Up and Going Back to the Author. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. ACM, Article 181, 1-21 pages

  50. [50]

    Michael B. Twidale. 2005. Over the shoulder learning: supporting brief informal learning.Computer Supported Cooperative Work (CSCW)14, 6 (2005), 505–547

  51. [51]

    Xu Wang, Benjamin Lafreniere, and Tovi Grossman. 2018. Leveraging Community-Generated Videos and Command Logs to Classify and Recommend Software Workflows. InProceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 1–13

  52. [52]

    Frank Wilcoxon. 1992. Individual comparisons by ranking methods. InBreak- throughs in Statistics: Methodology and Distribution. Springer, 196–202

  53. [53]

    Qinzhuo Wu, Weikai Xu, Wei Liu, Tao Tan, Jianfeng Liu, Ang Li, Jian Luan, Bin Wang, and Shuo Shang. 2024. MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding. The Invisible Mentor: Inferring User Actions from Screen Recordings to Recommend Better Workflows

  54. [54]

    Qing Xia, Advait Sarkar, Duncan Brumby, and Anna Cox. 2025. How do you even know that stuff?: Barriers to expertise sharing among spreadsheet users

  55. [55]

    Zamfirescu-Pereira, Richmond Y

    J.D. Zamfirescu-Pereira, Richmond Y. Wong, Bjoern Hartmann, and Qian Yang

  56. [56]

    InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems

    Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems. ACM, Article 437, 21 pages

  57. [57]

    Dehai Zhao, Zhenchang Xing, Chunyang Chen, Xin Xia, and Guoqiang Li. 2019. ActionNet: Vision-Based Workflow Action Recognition From Programming Screencasts. In2019 IEEE/ACM 41st International Conference on Software En- gineering (ICSE). 350–361

  58. [58]

    Dehai Zhao, Zhenchang Xing, Xin Xia, Deheng Ye, Xiwei Xu, and Liming Zhu

  59. [59]

    cell content

    SeeHow: Workflow Extraction from Programming Screencasts through Action-Aware Video Analytics. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 1946–1957. A Prompt Templates This appendix provides the prompt templates used in each phase of InvisibleMentor’s architecture. The prompts were constructed to guide the vision-languag...

  60. [60]

    Group related actions into workflows (steps accomplishing a specific task)

  61. [61]

    For each workflow, set "Optimal" to true/false based on efficiency

  62. [62]

    Optimal": false): -

    For suboptimal workflows ("Optimal": false): - "ActionList": List actions starting with "It looks like you..." - "Reason": Main inefficiency (be specific) starting with "You ..." - "Suggestion": Provide ONE actionable solution using Excel features: - Give step-by-step instructions with exact Ribbon paths/shortcuts - Include detailed examples with realisti...

  63. [63]

    Focus on efficiency and maintainability, not just task completion

  64. [64]

    Only include 3 most impactful suboptimal workflows and rank them by importance

  65. [65]

    Use proper formatting: backticks (`) around Excel functions, formulas, keyboard shortcuts, and feature names, and triple backticks (```) for multi-line formulas or step- by-step code examples

  66. [66]

    Workflows

    Create plausible placeholders for unclear data references Output JSON format: { "Workflows": [ { "ActionList": ["Action 1", "Action 2"], "Optimal": true/false, Litao Yan, Andrew Head, Ken Milne, Vu Le, Sumit Gulwani, Chris Parnin, and Emerson Murphy-Hill "Reason": "Brief explanation", "Suggestion": "Step-by-step actionable solution" } ] } B User Scenario ...