arxiv: 2604.09438 · v1 · submitted 2026-04-10 · 💻 cs.HC

Recognition: unknown

Intent Lenses: Inferring Capture-Time Intent to Transform Opportunistic Photo Captures into Structured Visual Notes

Ashwin Ram , Aeneas Leon Sommer , Martin Schmitz , J\"urgen Steimle

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:04 UTC · model grok-4.3

classification 💻 cs.HC

keywords opportunistic photo captureintent inferencevisual noteslarge language modelssensemakinginteractive systemsconference notesstructured notes

0 comments

The pith

Inferring capture-time intent from photos lets large language models create structured visual notes that reflect what users meant to capture.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Intent Lenses as a conceptual primitive to turn opportunistic photos into meaningful notes. It uses large language models to infer the user's intent at capture time and generates reusable interactive objects that specify what function to apply, which parts of the photo to focus on, and how to represent the results. This is demonstrated in an interactive system for academic conference photos where lenses are placed on a spatial canvas for sensemaking. A study with nine academics indicated that these intent-mediated notes matched expectations and supported both overviews and deeper exploration. Such an approach addresses the common problem of photo collections remaining unstructured and underused.

Core claim

Intent Lenses reify users' capture-time intent inferred from captured information into reusable interactive objects that encode the function to perform, the information sources to focus on, and how results are represented at an appropriate level of detail. These lenses are dynamically generated using the reasoning capabilities of large language models and, when applied to conference presentation captures, produce structured visual notes on a spatial canvas that users can further manipulate.

What carries the argument

Intent Lenses, reusable interactive objects that encode the function to perform, the information sources to focus on, and representation detail level, generated dynamically by large language models from photo visual content.

If this is right

Intent-mediated notes align with users' expectations for what they intended to capture.
The notes provide effective overviews of captures while facilitating deeper sensemaking.
Users can add, link, and arrange lenses across captures to support exploration.
The system transforms generic photo collections into personalized structured notes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar lenses could be applied in non-academic settings such as capturing product information during shopping or artifact details at museums.
Over time, patterns in inferred intents might inform better default lenses or user-specific models.
Combining lenses with other data sources like timestamps or location could further refine the inference process.

Load-bearing premise

That large language models can accurately and consistently infer users' specific capture-time intent solely from the visual content of opportunistic photos without additional context or user input.

What would settle it

A study in which participants compare alignment of generated notes with their own recalled intent shows no advantage for intent-inferred lenses over generic summaries, or shows no gain in sensemaking ratings.

Figures

Figures reproduced from arXiv: 2604.09438 by Aeneas Leon Sommer, Ashwin Ram, J\"urgen Steimle, Martin Schmitz.

**Figure 1.** Figure 1: Intent Lenses are reusable interactive representations of users’ capture-time intent, inferred from opportunistic photo captures in information-rich settings such as academic conferences. These lenses automatically transform captured photos into structured visual notes that align with what users intended to capture. Users can further treat inferred lenses as reusable objects that can be reapplied to presen… view at source ↗

**Figure 2.** Figure 2: Intent Lens construction pipeline. It consists of four LLM-based stages: intent inference, intent clustering, intent [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Graphical interface for exploring and making sense of opportunistic conference photo captures using [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Overview of descriptive statistics for participants’ provided photo captures and interaction behavior with [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

read the original abstract

Opportunistic photo capture (e.g., slides, exhibits, or artifacts) is a common strategy for preserving information encountered in information-rich environments for later revisitation. While fast and minimally disruptive, such photo collections rarely become meaningful notes. Existing automatic note-generation approaches provide some support but often produce generic summaries that fail to reflect what users intended to capture. We introduce Intent Lenses, a conceptual primitive for intent-mediated note generation and sensemaking. Intent Lenses reify users' capture-time intent inferred from captured information into reusable interactive objects that encode the function to perform, the information sources to focus on, and how results are represented at an appropriate level of detail. These lenses are dynamically generated using the reasoning capabilities of large language models. To investigate this concept, we instantiate Intent Lenses in the context of academic conference photos and present an interactive system that infers lenses from presentation captures to generate structured visual notes on a spatial canvas. Users can further add, link, and arrange lenses across captures to support exploration and sensemaking. A study with nine academics showed that intent-mediated notes aligned with users' expectations, providing effective overviews of their captures while facilitating deeper sensemaking.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Intent Lenses offers a clean new framing for turning casual photos into structured notes via LLM-inferred intent, but the n=9 study only checks post-hoc user satisfaction and skips any direct test of whether the inference actually matches what users meant at capture time.

read the letter

The paper's main contribution is the Intent Lenses concept. These are reusable interactive objects that encode a specific function, the information to pull from, and the right level of detail for turning opportunistic photos into visual notes on a spatial canvas. They instantiate it for conference photos, where an LLM infers lenses from the images and generates the notes automatically. Users can then add, link, and rearrange lenses across multiple captures to support sensemaking. That framing feels distinct from plain summarization tools and gives the system a clear interactive hook. The nine-academic study reports that the resulting notes matched expectations and helped with both quick overviews and deeper exploration, which is useful initial feedback on the workflow. The system description and the way lenses are presented as composable objects are the parts that stand out as actually new. The evaluation stays qualitative and post-use. Participants gave feedback after seeing the notes, but the paper does not collect users' stated intent before inference, does not measure how often the LLM got the intent right against any ground truth, and does not compare against a baseline like generic photo summarization. That leaves the central claim about intent mediation resting on subjective alignment rather than a direct check of the inference step. The reliance on LLM reasoning without user input at capture time makes this gap noticeable. This is for HCI researchers who work on personal information capture, note-taking tools, or AI-supported sensemaking in knowledge work. Readers looking for concrete system ideas around photo-based notes will find something to build on. It has a distinct enough primitive and enough user data to deserve referee time, even though the evaluation needs more direct validation of the inference accuracy. I would send it to peer review with a request to add a targeted test of intent inference quality.

Referee Report

1 major / 2 minor

Summary. The paper introduces Intent Lenses as a conceptual primitive for intent-mediated note generation: LLMs infer capture-time intent from opportunistic photos (e.g., conference slides) to produce reusable interactive objects that specify functions, information sources, and representation detail. These are instantiated in an interactive system for academic photos on a spatial canvas where users can add, link, and arrange lenses. A qualitative study with nine academics reports that the resulting notes aligned with user expectations, provided effective overviews, and supported deeper sensemaking.

Significance. If the core inference mechanism can be shown to reliably recover user intent, the work offers a promising HCI primitive for turning ad-hoc photo collections into personalized, explorable notes. The reification of inferred intent into dynamic, composable lenses on a canvas is a concrete design contribution that extends beyond generic summarization and could influence future sensemaking tools that integrate LLM reasoning with direct manipulation.

major comments (1)

Evaluation section: The n=9 qualitative study reports post-use subjective alignment with expectations but contains no pre-capture user statements of intent, no quantitative accuracy/precision metrics of the LLM inference against those statements, and no baseline comparison (e.g., generic summarization without intent lenses). This leaves the central claim—that the notes succeed because intent was accurately inferred from visual content alone—untested; observed benefits could arise from the structured canvas format regardless of inference quality.

minor comments (2)

The manuscript should provide concrete examples of LLM prompts, the exact model and parameters used, and any observed inference failures or edge cases to support reproducibility and allow readers to assess the reliability of the dynamic lens generation.
Figure captions and system description would benefit from clearer distinction between automatically generated lenses and user-added or edited elements to help readers understand the boundary between inference and manual sensemaking.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on the evaluation. We address the major comment below and will revise the manuscript to strengthen the discussion of study limitations and design rationale while preserving the qualitative focus appropriate to this early-stage conceptual contribution.

read point-by-point responses

Referee: Evaluation section: The n=9 qualitative study reports post-use subjective alignment with expectations but contains no pre-capture user statements of intent, no quantitative accuracy/precision metrics of the LLM inference against those statements, and no baseline comparison (e.g., generic summarization without intent lenses). This leaves the central claim—that the notes succeed because intent was accurately inferred from visual content alone—untested; observed benefits could arise from the structured canvas format regardless of inference quality.

Authors: We agree that a quantitative evaluation of inference accuracy against explicit pre-capture intent would provide stronger evidence. However, the opportunistic nature of photo capture means users rarely articulate precise intent before taking a photo; the study instead measured whether the resulting notes aligned with participants' post-review expectations, which serves as a proxy for intent fidelity in this context. We will revise the Evaluation and Discussion sections to explicitly acknowledge this limitation, clarify that benefits may partly derive from the spatial canvas, and include a qualitative comparison of lens-generated notes versus generic LLM summaries based on participant comments. A controlled quantitative baseline study lies beyond the scope of the current exploratory work but is noted as future research. revision: partial

Circularity Check

0 steps flagged

No circularity; claims rest on independent user study and external LLM capabilities

full rationale

The paper presents a conceptual system (Intent Lenses) that uses LLMs to infer capture-time intent from photos and generates structured notes, evaluated via a separate n=9 user study. No equations, parameters, or derivations exist that could reduce outputs to inputs by construction. Central claims are supported by the external reasoning of LLMs and post-study subjective feedback rather than any self-referential fit or definition. No load-bearing self-citations or ansatzes are present. This is a standard non-circular HCI systems paper with empirical grounding outside its own definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The paper introduces a new conceptual object (Intent Lenses) and relies on standard HCI assumptions about user intent and LLM capabilities without introducing fitted parameters or ungrounded entities.

axioms (2)

domain assumption Users form a specific, inferable intent when opportunistically capturing photos of information-rich scenes
This premise underpins the entire intent-inference mechanism and is invoked in the design of the lenses.
domain assumption Large language models possess sufficient reasoning capabilities to translate photo content into structured note-generation functions
Central to the dynamic generation of lenses; treated as an external capability rather than derived.

invented entities (1)

Intent Lenses no independent evidence
purpose: Reusable interactive objects that encode capture intent, information focus, and output representation for note generation
New conceptual primitive introduced to mediate between raw photos and structured notes.

pith-pipeline@v0.9.0 · 5522 in / 1442 out tokens · 61709 ms · 2026-05-10T17:04:19.400959+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

67 extracted references · 57 canonical work pages · 2 internal anchors

[1]

Michel Beaudouin-Lafon. 2000. Instrumental interaction: an interaction model for designing post-WIMP user interfaces. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(The Hague, The Netherlands)(CHI ’00). Association for Computing Machinery, New York, NY, USA, 446–453. doi:10. 1145/332040.332473

work page arXiv 2000
[2]

Michel Beaudouin-Lafon and Wendy E. Mackay. 2000. Reification, polymorphism and reuse: three principles for designing visual interfaces. InProceedings of the Working Conference on Advanced Visual Interfaces(Palermo, Italy)(A VI ’00). Association for Computing Machinery, New York, NY, USA, 102–109. doi:10. 1145/345513.345267

work page arXiv 2000
[3]

Peter Brandl, Christoph Richter, and Michael Haller. 2010. NiCEBook: supporting natural note taking. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Atlanta, Georgia, USA)(CHI ’10). Association for Computing Machinery, New York, NY, USA, 599–608. doi:10.1145/1753326.1753417

work page doi:10.1145/1753326.1753417 2010
[4]

Yining Cao, Peiling Jiang, and Haijun Xia. 2025. Generative and Malleable User Interfaces with Generative and Evolving Task-Driven Data Model. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 686, 20 pages. doi:10.1145/3706598.3713285

work page doi:10.1145/3706598.3713285 2025
[5]

Yining Cao, Hariharan Subramonyam, and Eytan Adar. 2022. VideoSticker: A Tool for Active Viewing and Visual Note-taking from Videos. InProceedings of the 27th International Conference on Intelligent User Interfaces(Helsinki, Finland) (IUI ’22). Association for Computing Machinery, New York, NY, USA, 672–690. doi:10.1145/3490099.3511132

work page doi:10.1145/3490099.3511132 2022
[6]

Patrick Chiu, Ashutosh Kapuskar, Sarah Reitmeier, and Lynn Wilcox. 1999. NoteLook: taking notes in meetings with digital video and ink. InProceedings of the Seventh ACM International Conference on Multimedia (Part 1)(Orlando, Florida, USA)(MULTIMEDIA ’99). Association for Computing Machinery, New York, NY, USA, 149–158. doi:10.1145/319463.319483

work page doi:10.1145/319463.319483 1999
[7]

Hai Dang, Karim Benharrak, Florian Lehmann, and Daniel Buschek. 2022. Be- yond Text Generation: Supporting Writers with Continuous Automatic Text Sum- maries. InProceedings of the 35th Annual ACM Symposium on User Interface Soft- ware and Technology(Bend, OR, USA)(UIST ’22). Association for Computing Ma- chinery, New York, NY, USA, Article 98, 13 pages. d...

work page doi:10.1145/3526113.3545672 2022
[8]

Mustafa Doga Dogan, Eric J Gonzalez, Karan Ahuja, Ruofei Du, Andrea Colaço, Johnny Lee, Mar Gonzalez-Franco, and David Kim. 2024. Augmented Object Intelligence with XR-Objects. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology(Pittsburgh, PA, USA)(UIST ’24). As- sociation for Computing Machinery, New York, NY, USA, A...

work page doi:10.1145/3654777.3676379 2024
[9]

Yufeng Du, Minyang Tian, Srikanth Ronanki, Subendhu Rongali, Sravan Bodap- ati, Aram Galstyan, Azton Wells, Roy Schwartz, Eliu A Huerta, and Hao Peng
[10]

Context length alone hurts llm performance despite perfect retrieval.arXiv preprint arXiv:2510.05381, 2025

Context Length Alone Hurts LLM Performance Despite Perfect Retrieval. arXiv:2510.05381 [cs.CL] https://arxiv.org/abs/2510.05381

work page arXiv
[11]

Inner-Voice

Cathy Mengying Fang, Yasith Samaradivakara, Pattie Maes, and Suranga Nanayakkara. 2025. Mirai: A Wearable Proactive AI "Inner-Voice" for Contextual Nudging. InProceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA ’25). Association for Computing Machin- ery, New York, NY, USA, Article 399, 9 pages. doi:10...

work page doi:10.1145/3706599.3719881 2025
[12]

Dañiel Gerhardt, Divyanshu Bhardwaj, Ashwin Ram, André Zenner, Jürgen Steimle, and Katharina Krombholz. 2026. Privacy & Safety Challenges of On- Body Interaction Techniques. (2 2026). doi:10.60882/cispa.31409178.v1

work page doi:10.60882/cispa.31409178.v1 2026
[13]

Frederic Gmeiner, Nicolai Marquardt, Michael Bentley, Hugo Romat, Michel Pahud, David Brown, Asta Roseway, Nikolas Martelaro, Kenneth Holstein, Ken Hinckley, and Nathalie Riche. 2025. Intent Tagging: Exploring Micro-Prompting Interactions for Supporting Granular Human-GenAI Co-Creation Workflows. In Proceedings of the 2025 CHI Conference on Human Factors ...

work page doi:10.1145/3706598.3713861 2025
[14]

Ken Hinckley, Xiaojun Bi, Michel Pahud, and Bill Buxton. 2012. Informal infor- mation gathering techniques for active reading. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Austin, Texas, USA)(CHI ’12). Association for Computing Machinery, New York, NY, USA, 1893–1896. doi:10.1145/2207676.2208327

work page doi:10.1145/2207676.2208327 2012
[15]

Ken Hinckley, Shengdong Zhao, Raman Sarin, Patrick Baudisch, Edward Cutrell, Michael Shilman, and Desney Tan. 2007. InkSeine: In Situ search for active note taking. InProceedings of the SIGCHI Conference on Human Factors in Comput- ing Systems(San Jose, California, USA)(CHI ’07). Association for Computing Machinery, New York, NY, USA, 251–260. doi:10.1145...

work page doi:10.1145/1240624.1240666 2007
[16]

Faria Huq, Abdus Samee, David Chuan-En Lin, Alice Xiaodi Tang, and Jeffrey P Bigham. 2025. NoTeeline: Supporting Real-Time, Personalized Notetaking with LLM-Enhanced Micronotes. InProceedings of the 30th International Conference on Intelligent User Interfaces (IUI ’25). Association for Computing Machinery, New York, NY, USA, 1064–1081. doi:10.1145/3708359.3712086

work page doi:10.1145/3708359.3712086 2025
[17]

Mi Jiang, Junran Gao, Zeyu Pan, Yue Wu, and Zile Wang. 2025. NexaNota: An AI-Powered Smart Linked Lecture Note-Taking System Leveraging Large Language Models. InProceedings of the 2025 International Conference on Big Data and Informatization Education (ICBDIE ’25). Association for Computing Machinery, New York, NY, USA, 242–248. doi:10.1145/3729605.3729648

work page doi:10.1145/3729605.3729648 2025
[18]

Dow, and Haijun Xia

Peiling Jiang, Jude Rayan, Steven P. Dow, and Haijun Xia. 2023. Graphologue: Exploring Large Language Model Responses with Interactive Diagrams. InPro- ceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA)(UIST ’23). Association for Computing Ma- chinery, New York, NY, USA, Article 3, 20 pages. doi:10....

work page doi:10.1145/3586183.3606737 2023
[19]

Hita Kambhamettu, Jamie Flores, and Andrew Head. 2025. Traceable Texts and Their Effects: A Study of Summary-Source Links in AI-Generated Summaries. In Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA ’25). Association for Computing Machinery, New York, NY, USA, Article 538, 7 pages. doi:10.1145/370...

work page doi:10.1145/3706599.3719830 2025
[20]

Tae Soo Kim, Yoonjoo Lee, Minsuk Chang, and Juho Kim. 2023. Cells, Gen- erators, and Lenses: Design Framework for Object-Oriented Interaction with Large Language Models. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA)(UIST ’23). Association for Computing Machinery, New York, NY, USA, Article...

work page doi:10.1145/3586183.3606833 2023
[21]

Yoonsang Kim, Devshree Jadeja, Divyansh Pradhan, Yalong Yang, and Arie Kaufman. 2026. SpeechLess: Micro-utterance with Personalized Spatial Memory- aware Assistant in Everyday Augmented Reality.arXiv preprint arXiv:2602.00793 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[22]

Andrew Kuznetsov, Joseph Chee Chang, Nathan Hahn, Napol Rachatasumrit, Bradley Breneisen, Julina Coupland, and Aniket Kittur. 2022. Fuse: In-Situ Sensemaking Support in the Browser. InProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology(Bend, OR, USA)(UIST ’22). Association for Computing Machinery, New York, NY, USA, Arti...

work page doi:10.1145/3526113.3545693 2022
[24]

Rodriguez, and Jon E

Jaewook Lee, Jun Wang, Elizabeth Brown, Liam Chu, Sebastian S. Rodriguez, and Jon E. Froehlich. 2024. GazePointAR: A Context-Aware Multimodal Voice Assis- tant for Pronoun Disambiguation in Wearable Augmented Reality. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machi...

work page doi:10.1145/3613904.3642230 2024
[25]

Chenyi Li, Guande Wu, Gromit Yeuk-Yin Chan, Dishita Gdi Turakhia, Sonia Castelo Quispe, Dong Li, Leslie Welch, Claudio Silva, and Jing Qian. 2025. Satori: Towards Proactive AR Assistant with Belief-Desire-Intention User Modeling. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery,...

work page doi:10.1145/3706598.3714188 2025
[26]

Daniel Li, Thomas Chen, Albert Tung, and Lydia B Chilton. 2021. Hierarchi- cal Summarization for Longform Spoken Dialog. InThe 34th Annual ACM Symposium on User Interface Software and Technology(Virtual Event, USA) (UIST ’21). Association for Computing Machinery, New York, NY, USA, 582–597. doi:10.1145/3472749.3474771

work page doi:10.1145/3472749.3474771 2021
[27]

Jiahao Nick Li, Yan Xu, Tovi Grossman, Stephanie Santosa, and Michelle Li. 2024. OmniActions: Predicting Digital Actions in Response to Real-World Multimodal Sensory Inputs with LLMs. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 8...

work page arXiv 2024
[28]

Jiahao Nick Li, Zhuohao (Jerry) Zhang, and Jiaju Ma. 2025. OmniQuery: Contex- tually Augmenting Captured Multimodal Memories to Enable Personal Question Answering. InProceedings of the 2025 CHI Conference on Human Factors in Com- puting Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 635, 20 pages. doi:10.1145/3706598.3713448

work page doi:10.1145/3706598.3713448 2025
[29]

Michael Xieyang Liu, Tongshuang Wu, Tianying Chen, Franklin Mingzhe Li, Aniket Kittur, and Brad A Myers. 2024. Selenite: Scaffolding Online Sensemak- ing with Comprehensive Overviews Elicited from Large Language Models. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA)(CHI ’24). Association for Computing M...

work page doi:10.1145/3613904.3642149 2024
[30]

In: Transactions of the Association for Computational Linguistics (TACL), 12:157-173 (2024)

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Transactions of the Association for Computational Linguistics 12 (2024), 157–173. doi:10.1162/tacl_a_00638

work page doi:10.1162/tacl_a_00638 2024
[31]

Damien Masson, Sylvain Malacria, Géry Casiez, and Daniel Vogel. 2024. Direct- GPT: A Direct Manipulation Interface to Interact with Large Language Models. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 975, 16 pages. doi:10.1145/36...

work page doi:10.1145/3613904.3642462 2024
[32]

Xiaojun Meng, Shengdong Zhao, and Darren Edge. 2016. HyNote: Integrated Con- cept Mapping and Notetaking. InProceedings of the International Working Con- ference on Advanced Visual Interfaces(Bari, Italy)(A VI ’16). Association for Com- puting Machinery, New York, NY, USA, 236–239. doi:10.1145/2909132.2909277

work page doi:10.1145/2909132.2909277 2016
[33]

Bryan Min and Haijun Xia. 2025. Meridian: A Design Framework for Malleable Overview-Detail Interfaces. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). Association for Computing Machinery, New York, NY, USA, Article 200, 14 pages. doi:10.1145/3746059. 3747654

work page doi:10.1145/3746059 2025
[34]

Rawson, Rachael Blasiman, and R

Kayla Morehead, John Dunlosky, Katherine A. Rawson, Rachael Blasiman, and R. Benjamin Hollis. 2019. Note-taking habits of 21st Century college students: implications for student learning, memory, and achievement.Memory27, 6 (2019), 807–819. arXiv:https://doi.org/10.1080/09658211.2019.1569694 doi:10. 1080/09658211.2019.1569694 PMID: 30747570

work page doi:10.1080/09658211.2019.1569694 2019
[35]

Olsen, Trent Taufer, and Jerry Alan Fails

Dan R. Olsen, Trent Taufer, and Jerry Alan Fails. 2004. ScreenCrayons: annotat- ing anything. InProceedings of the 17th Annual ACM Symposium on User Interface Software and Technology(Santa Fe, NM, USA)(UIST ’04). Association for Com- puting Machinery, New York, NY, USA, 165–174. doi:10.1145/1029632.1029663

work page doi:10.1145/1029632.1029663 2004
[36]

Annie Piolat, Thierry Olive, and Ronald T. Kellogg. 2005. Cognitive ef- fort during note taking.Applied Cognitive Psychology19, 3 (2005), 291–312. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/acp.1086 doi:10.1002/acp. 1086

work page doi:10.1002/acp.1086 2005
[37]

Kevin Pu, Ting Zhang, Naveen Sendhilnathan, Sebastian Freitag, Raj Sodhi, and Tanya R. Jonker. 2025. ProMemAssist: Exploring Timely Proactive Assistance Through Working Memory Modeling in Multi-Modal Wearable Devices. InPro- ceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). Association for Computing Machinery,...

work page doi:10.1145/3746059.3747770 2025
[38]

Wanli Qian, Chenfeng Gao, Anup Sathya, Ryo Suzuki, and Ken Nakagaki. 2024. SHAPE-IT: Exploring Text-to-Shape-Display for Generative Shape-Changing Behaviors with LLMs. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology(Pittsburgh, PA, USA)(UIST ’24). Association for Computing Machinery, New York, NY, USA, Article 118,...

work page arXiv 2024
[39]

Leping Qiu, Erin Seongyoon Kim, Sangho Suh, Ludwig Sidenmark, and Tovi Grossman. 2025. MaRginalia: Enabling In-person Lecture Capturing and Note- taking Through Mixed Reality. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 141, 15 pages. doi:10.1145/...

work page doi:10.1145/3706598 2025
[40]

Hollan, and Peter Dalsgaard

Emilia Rosselli Del Turco, Nanna Inie, James D. Hollan, and Peter Dalsgaard
[41]

Comput.-Hum

How Creative Practitioners Use Tools to Capture Ideas: A Cross-Domain Study.ACM Trans. Comput.-Hum. Interact.32, 4, Article 40 (Aug. 2025), 38 pages. doi:10.1145/3727979

work page doi:10.1145/3727979 2025
[42]

Nirmal Roy, Manuel Valle Torre, Ujwal Gadiraju, David Maxwell, and Clau- dia Hauff. 2021. Note the Highlight: Incorporating Active Reading Tools in a Search as Learning Environment. InProceedings of the 2021 Conference on Hu- man Information Interaction and Retrieval(Canberra ACT, Australia)(CHIIR ’21). Association for Computing Machinery, New York, NY, U...

work page doi:10.1145/3406522.3446025 2021
[43]

Russell, Mark J

Daniel M. Russell, Mark J. Stefik, Peter Pirolli, and Stuart K. Card. 1993. The cost structure of sensemaking. InProceedings of the INTERACT ’93 and CHI ’93 Con- ference on Human Factors in Computing Systems(Amsterdam, The Netherlands) (CHI ’93). Association for Computing Machinery, New York, NY, USA, 269–276. doi:10.1145/169059.169209

work page doi:10.1145/169059.169209 1993
[44]

Bernstein

Omar Shaikh, Shardul Sapkota, Shan Rizvi, Eric Horvitz, Joon Sung Park, Diyi Yang, and Michael S. Bernstein. 2025. Creating General User Models from Com- puter Use. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). Association for Computing Machinery, New York, NY, USA, Article 35, 23 pages. doi:10.1145/3...

work page doi:10.1145/3746059.3747722 2025
[45]

Hijung Valentina Shin, Floraine Berthouzoz, Wilmot Li, and Frédo Durand. 2015. Visual transcripts: lecture notes from blackboard-style lecture videos.ACM Trans. Graph.34, 6, Article 240 (Nov. 2015), 10 pages. doi:10.1145/2816795.2818123

work page doi:10.1145/2816795.2818123 2015
[46]

Hariharan Subramonyam, Colleen Seifert, Priti Shah, and Eytan Adar. 2020. texS- ketch: Active Diagramming through Pen-and-Ink Annotations. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. doi:10.1145/3313831.3376155

work page doi:10.1145/3313831.3376155 2020
[47]

Dow, and Tovi Grossman

Sangho Suh, Michael Lai, Kevin Pu, Steven P. Dow, and Tovi Grossman. 2025. StoryEnsemble: Enabling Dynamic Exploration & Iteration in the Design Process with AI and Forward-Backward Propagation. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). Association for Computing Machinery, New York, NY, USA, Artic...

work page arXiv 2025
[48]

Sangho Suh, Bryan Min, Srishti Palani, and Haijun Xia. 2023. Sensecape: En- abling Multilevel Exploration and Sensemaking with Large Language Models. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA)(UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 1, 18 pages. doi:10...

work page doi:10.1145/3586183.3606756 2023
[49]

Bennett, and Jaime Teevan

Amanda Swearngin, Shamsi Iqbal, Victor Poznanski, Mark Encarnación, Paul N. Bennett, and Jaime Teevan. 2021. Scraps: Enabling Mobile Capture, Contex- tualization, and Use of Document Resources. InProceedings of the 2021 CHI Conference on Human Factors in Computing Systems(Yokohama, Japan)(CHI ’21). Association for Computing Machinery, New York, NY, USA, A...

work page doi:10.1145/3411764.3445185 2021
[50]

Tashman and W

Craig S. Tashman and W. Keith Edwards. 2011. LiquidText: a flexible, multitouch environment to support active reading. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Vancouver, BC, Canada)(CHI ’11). Association for Computing Machinery, New York, NY, USA, 3285–3294. doi:10. 1145/1978942.1979430

work page arXiv 2011
[51]

Amrita Thakur, Michael Gormish, and Berna Erol. 2011. Mobile phones and information capture in the workplace. InCHI ’11 Extended Abstracts on Human Factors in Computing Systems(Vancouver, BC, Canada)(CHI EA ’11). Association for Computing Machinery, New York, NY, USA, 1513–1518. doi:10.1145/1979742. 1979800

work page doi:10.1145/1979742 2011
[52]

Hsin-Ruey Tsai, Shih-Kang Chiu, and Bryan Wang. 2025. GazeNoter: Co-Piloted AR Note-Taking via Gaze Selection of LLM Suggestions to Match Users’ Inten- tions. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 32, 22 pages. doi:10.1145/3706598.3714294

work page doi:10.1145/3706598.3714294 2025
[53]

Sarah Shi Hui Wong and Stephen Wee Hun Lim. 2023. Take notes, not photos: Mind-wandering mediates the impact of note-taking strategies on video-recorded lecture learning performance.Journal of Experimental Psychology: Applied29, 1 (2023), 124

2023
[54]

Haijun Xia, Bruno Araujo, Tovi Grossman, and Daniel Wigdor. 2016. Object- Oriented Drawing. InProceedings of the 2016 CHI Conference on Human Factors in Computing Systems(San Jose, California, USA)(CHI ’16). Association for Computing Machinery, New York, NY, USA, 4610–4621. doi:10.1145/2858036. 2858075

work page doi:10.1145/2858036 2016
[55]

Haijun Xia, Nathalie Henry Riche, Fanny Chevalier, Bruno De Araujo, and Daniel Wigdor. 2018. DataInk: Direct and Creative Data-Oriented Drawing. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada)(CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–13. doi:10.1145/3173574.3173797

work page doi:10.1145/3173574.3173797 2018
[56]

Chengpei Xu, Wenjing Jia, Ruomei Wang, Xiangjian He, Baoquan Zhao, and Yuanfang Zhang. 2023. Semantic Navigation of PowerPoint-Based Lecture Video for AutoNote Generation.IEEE Transactions on Learning Technologies16, 1 (2023), 1–17. doi:10.1109/TLT.2022.3216535

work page doi:10.1109/tlt.2022.3216535 2023
[57]

Chenyang Yang, Yike Shi, Qianou Ma, Michael Xieyang Liu, Christian Käst- ner, and Tongshuang Wu. 2025. What Prompts Don’t Say: Understanding and Managing Underspecification in LLM Prompts. arXiv:2505.13360 [cs.CL] https://arxiv.org/abs/2505.13360

work page internal anchor Pith review Pith/arXiv arXiv 2025
[58]

Ron Yeh, Chunyuan Liao, Scott Klemmer, François Guimbretière, Brian Lee, Boyko Kakaradov, Jeannie Stamberger, and Andreas Paepcke. 2006. ButterflyNet: a mobile capture and access system for field biology research. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Montréal, Québec, Canada)(CHI ’06). Association for Computing Mach...

work page doi:10.1145/1124772.1124859 2006
[59]

Min-Ling Zhang and Zhi-Hua Zhou. 2014. A Review on Multi-Label Learning Algorithms.IEEE Transactions on Knowledge and Data Engineering26, 8 (2014), 1819–1837. doi:10.1109/TKDE.2013.39

work page doi:10.1109/tkde.2013.39 2014
[60]

Running Zhao, Zhihan Jiang, Xinchen Zhang, Chirui Chang, Handi Chen, Weipeng Deng, Luyao Jin, Xiaojuan Qi, Xun Qian, and Edith C.H. Ngai. 2025. NoteIt: A System Converting Instructional Videos to Interactable Notes Through Multimodal Video Understanding. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). A...

work page doi:10.1145/3746059.3747626 2025
[61]

Wazeer Deen Zulfikar, Samantha Chan, and Pattie Maes. 2024. Memoro: Using Large Language Models to Realize a Concise Interface for Real-Time Memory Augmentation. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 450, 18 pages. doi:10.1...

work page doi:10.1145/3613904 2024
[62]

Information Density & Cognitive Load Is the slide dense , technical , or hard to parse quickly ? Does it contain equations , multiple plots , diagrams , or many concepts at once ? Dense slides are often captured to revisit or study later rather than to remember a single message
[63]

Role in Research Workflow Would this slide help the user later when writing , designing , or positioning their own work ? Is it more useful for context - setting , evidence , inspiration , or technical reference ?
[64]

Ownership of Content Does the slide summarize prior work ( external citations , older papers ) ? Or does it present the speaker's core contribution or vision ? Slides about others'work are often captured for literature mapping ; slides about contributions are often captured for conceptual understanding
[65]

Type of Knowledge Captured Is the knowledge : Conceptual ( ideas , framing , agenda ) ? Empirical ( results , benchmarks , performance ) ? Procedural ( methods , fabrication , pipeline ) ? Speculative ( limitations , future work , open questions ) ?
[66]

look later ,

Likely Annotation Behavior Would a user label this " look later ," " important ," " reference ," or " idea "? Is this something they would quote , compare against , or build on ?
[67]

Slide Position in Talk Early ( motivation / problem framing ) ? Middle ( methods / results ) ? Late ( discussion / future work / agenda ) ? Slides later in the talk are more often captured for inspiration or direction rather than understanding
[68]

Summary / Overview

Decide the Primary Intent Choose one clear intent , not a mixture . Examples include ( but are not limited to ) : Summarize core idea Track related work Capture empirical evidence Understand methodology Note research agenda / vision Identify future directions Collect references Design inspiration Mark title slides , section title slides , or similar slide...