Recognition: unknown
Intent Lenses: Inferring Capture-Time Intent to Transform Opportunistic Photo Captures into Structured Visual Notes
Pith reviewed 2026-05-10 17:04 UTC · model grok-4.3
The pith
Inferring capture-time intent from photos lets large language models create structured visual notes that reflect what users meant to capture.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Intent Lenses reify users' capture-time intent inferred from captured information into reusable interactive objects that encode the function to perform, the information sources to focus on, and how results are represented at an appropriate level of detail. These lenses are dynamically generated using the reasoning capabilities of large language models and, when applied to conference presentation captures, produce structured visual notes on a spatial canvas that users can further manipulate.
What carries the argument
Intent Lenses, reusable interactive objects that encode the function to perform, the information sources to focus on, and representation detail level, generated dynamically by large language models from photo visual content.
If this is right
- Intent-mediated notes align with users' expectations for what they intended to capture.
- The notes provide effective overviews of captures while facilitating deeper sensemaking.
- Users can add, link, and arrange lenses across captures to support exploration.
- The system transforms generic photo collections into personalized structured notes.
Where Pith is reading between the lines
- Similar lenses could be applied in non-academic settings such as capturing product information during shopping or artifact details at museums.
- Over time, patterns in inferred intents might inform better default lenses or user-specific models.
- Combining lenses with other data sources like timestamps or location could further refine the inference process.
Load-bearing premise
That large language models can accurately and consistently infer users' specific capture-time intent solely from the visual content of opportunistic photos without additional context or user input.
What would settle it
A study in which participants compare alignment of generated notes with their own recalled intent shows no advantage for intent-inferred lenses over generic summaries, or shows no gain in sensemaking ratings.
Figures
read the original abstract
Opportunistic photo capture (e.g., slides, exhibits, or artifacts) is a common strategy for preserving information encountered in information-rich environments for later revisitation. While fast and minimally disruptive, such photo collections rarely become meaningful notes. Existing automatic note-generation approaches provide some support but often produce generic summaries that fail to reflect what users intended to capture. We introduce Intent Lenses, a conceptual primitive for intent-mediated note generation and sensemaking. Intent Lenses reify users' capture-time intent inferred from captured information into reusable interactive objects that encode the function to perform, the information sources to focus on, and how results are represented at an appropriate level of detail. These lenses are dynamically generated using the reasoning capabilities of large language models. To investigate this concept, we instantiate Intent Lenses in the context of academic conference photos and present an interactive system that infers lenses from presentation captures to generate structured visual notes on a spatial canvas. Users can further add, link, and arrange lenses across captures to support exploration and sensemaking. A study with nine academics showed that intent-mediated notes aligned with users' expectations, providing effective overviews of their captures while facilitating deeper sensemaking.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Intent Lenses as a conceptual primitive for intent-mediated note generation: LLMs infer capture-time intent from opportunistic photos (e.g., conference slides) to produce reusable interactive objects that specify functions, information sources, and representation detail. These are instantiated in an interactive system for academic photos on a spatial canvas where users can add, link, and arrange lenses. A qualitative study with nine academics reports that the resulting notes aligned with user expectations, provided effective overviews, and supported deeper sensemaking.
Significance. If the core inference mechanism can be shown to reliably recover user intent, the work offers a promising HCI primitive for turning ad-hoc photo collections into personalized, explorable notes. The reification of inferred intent into dynamic, composable lenses on a canvas is a concrete design contribution that extends beyond generic summarization and could influence future sensemaking tools that integrate LLM reasoning with direct manipulation.
major comments (1)
- Evaluation section: The n=9 qualitative study reports post-use subjective alignment with expectations but contains no pre-capture user statements of intent, no quantitative accuracy/precision metrics of the LLM inference against those statements, and no baseline comparison (e.g., generic summarization without intent lenses). This leaves the central claim—that the notes succeed because intent was accurately inferred from visual content alone—untested; observed benefits could arise from the structured canvas format regardless of inference quality.
minor comments (2)
- The manuscript should provide concrete examples of LLM prompts, the exact model and parameters used, and any observed inference failures or edge cases to support reproducibility and allow readers to assess the reliability of the dynamic lens generation.
- Figure captions and system description would benefit from clearer distinction between automatically generated lenses and user-added or edited elements to help readers understand the boundary between inference and manual sensemaking.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on the evaluation. We address the major comment below and will revise the manuscript to strengthen the discussion of study limitations and design rationale while preserving the qualitative focus appropriate to this early-stage conceptual contribution.
read point-by-point responses
-
Referee: Evaluation section: The n=9 qualitative study reports post-use subjective alignment with expectations but contains no pre-capture user statements of intent, no quantitative accuracy/precision metrics of the LLM inference against those statements, and no baseline comparison (e.g., generic summarization without intent lenses). This leaves the central claim—that the notes succeed because intent was accurately inferred from visual content alone—untested; observed benefits could arise from the structured canvas format regardless of inference quality.
Authors: We agree that a quantitative evaluation of inference accuracy against explicit pre-capture intent would provide stronger evidence. However, the opportunistic nature of photo capture means users rarely articulate precise intent before taking a photo; the study instead measured whether the resulting notes aligned with participants' post-review expectations, which serves as a proxy for intent fidelity in this context. We will revise the Evaluation and Discussion sections to explicitly acknowledge this limitation, clarify that benefits may partly derive from the spatial canvas, and include a qualitative comparison of lens-generated notes versus generic LLM summaries based on participant comments. A controlled quantitative baseline study lies beyond the scope of the current exploratory work but is noted as future research. revision: partial
Circularity Check
No circularity; claims rest on independent user study and external LLM capabilities
full rationale
The paper presents a conceptual system (Intent Lenses) that uses LLMs to infer capture-time intent from photos and generates structured notes, evaluated via a separate n=9 user study. No equations, parameters, or derivations exist that could reduce outputs to inputs by construction. Central claims are supported by the external reasoning of LLMs and post-study subjective feedback rather than any self-referential fit or definition. No load-bearing self-citations or ansatzes are present. This is a standard non-circular HCI systems paper with empirical grounding outside its own definitions.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Users form a specific, inferable intent when opportunistically capturing photos of information-rich scenes
- domain assumption Large language models possess sufficient reasoning capabilities to translate photo content into structured note-generation functions
invented entities (1)
-
Intent Lenses
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Michel Beaudouin-Lafon. 2000. Instrumental interaction: an interaction model for designing post-WIMP user interfaces. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(The Hague, The Netherlands)(CHI ’00). Association for Computing Machinery, New York, NY, USA, 446–453. doi:10. 1145/332040.332473
-
[2]
Michel Beaudouin-Lafon and Wendy E. Mackay. 2000. Reification, polymorphism and reuse: three principles for designing visual interfaces. InProceedings of the Working Conference on Advanced Visual Interfaces(Palermo, Italy)(A VI ’00). Association for Computing Machinery, New York, NY, USA, 102–109. doi:10. 1145/345513.345267
-
[3]
Peter Brandl, Christoph Richter, and Michael Haller. 2010. NiCEBook: supporting natural note taking. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Atlanta, Georgia, USA)(CHI ’10). Association for Computing Machinery, New York, NY, USA, 599–608. doi:10.1145/1753326.1753417
-
[4]
Yining Cao, Peiling Jiang, and Haijun Xia. 2025. Generative and Malleable User Interfaces with Generative and Evolving Task-Driven Data Model. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 686, 20 pages. doi:10.1145/3706598.3713285
-
[5]
Yining Cao, Hariharan Subramonyam, and Eytan Adar. 2022. VideoSticker: A Tool for Active Viewing and Visual Note-taking from Videos. InProceedings of the 27th International Conference on Intelligent User Interfaces(Helsinki, Finland) (IUI ’22). Association for Computing Machinery, New York, NY, USA, 672–690. doi:10.1145/3490099.3511132
-
[6]
Patrick Chiu, Ashutosh Kapuskar, Sarah Reitmeier, and Lynn Wilcox. 1999. NoteLook: taking notes in meetings with digital video and ink. InProceedings of the Seventh ACM International Conference on Multimedia (Part 1)(Orlando, Florida, USA)(MULTIMEDIA ’99). Association for Computing Machinery, New York, NY, USA, 149–158. doi:10.1145/319463.319483
-
[7]
Hai Dang, Karim Benharrak, Florian Lehmann, and Daniel Buschek. 2022. Be- yond Text Generation: Supporting Writers with Continuous Automatic Text Sum- maries. InProceedings of the 35th Annual ACM Symposium on User Interface Soft- ware and Technology(Bend, OR, USA)(UIST ’22). Association for Computing Ma- chinery, New York, NY, USA, Article 98, 13 pages. d...
-
[8]
Mustafa Doga Dogan, Eric J Gonzalez, Karan Ahuja, Ruofei Du, Andrea Colaço, Johnny Lee, Mar Gonzalez-Franco, and David Kim. 2024. Augmented Object Intelligence with XR-Objects. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology(Pittsburgh, PA, USA)(UIST ’24). As- sociation for Computing Machinery, New York, NY, USA, A...
-
[9]
Yufeng Du, Minyang Tian, Srikanth Ronanki, Subendhu Rongali, Sravan Bodap- ati, Aram Galstyan, Azton Wells, Roy Schwartz, Eliu A Huerta, and Hao Peng
-
[10]
Context Length Alone Hurts LLM Performance Despite Perfect Retrieval. arXiv:2510.05381 [cs.CL] https://arxiv.org/abs/2510.05381
-
[11]
Cathy Mengying Fang, Yasith Samaradivakara, Pattie Maes, and Suranga Nanayakkara. 2025. Mirai: A Wearable Proactive AI "Inner-Voice" for Contextual Nudging. InProceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA ’25). Association for Computing Machin- ery, New York, NY, USA, Article 399, 9 pages. doi:10...
-
[12]
Dañiel Gerhardt, Divyanshu Bhardwaj, Ashwin Ram, André Zenner, Jürgen Steimle, and Katharina Krombholz. 2026. Privacy & Safety Challenges of On- Body Interaction Techniques. (2 2026). doi:10.60882/cispa.31409178.v1
-
[13]
Frederic Gmeiner, Nicolai Marquardt, Michael Bentley, Hugo Romat, Michel Pahud, David Brown, Asta Roseway, Nikolas Martelaro, Kenneth Holstein, Ken Hinckley, and Nathalie Riche. 2025. Intent Tagging: Exploring Micro-Prompting Interactions for Supporting Granular Human-GenAI Co-Creation Workflows. In Proceedings of the 2025 CHI Conference on Human Factors ...
-
[14]
Ken Hinckley, Xiaojun Bi, Michel Pahud, and Bill Buxton. 2012. Informal infor- mation gathering techniques for active reading. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Austin, Texas, USA)(CHI ’12). Association for Computing Machinery, New York, NY, USA, 1893–1896. doi:10.1145/2207676.2208327
-
[15]
Ken Hinckley, Shengdong Zhao, Raman Sarin, Patrick Baudisch, Edward Cutrell, Michael Shilman, and Desney Tan. 2007. InkSeine: In Situ search for active note taking. InProceedings of the SIGCHI Conference on Human Factors in Comput- ing Systems(San Jose, California, USA)(CHI ’07). Association for Computing Machinery, New York, NY, USA, 251–260. doi:10.1145...
-
[16]
Faria Huq, Abdus Samee, David Chuan-En Lin, Alice Xiaodi Tang, and Jeffrey P Bigham. 2025. NoTeeline: Supporting Real-Time, Personalized Notetaking with LLM-Enhanced Micronotes. InProceedings of the 30th International Conference on Intelligent User Interfaces (IUI ’25). Association for Computing Machinery, New York, NY, USA, 1064–1081. doi:10.1145/3708359.3712086
-
[17]
Mi Jiang, Junran Gao, Zeyu Pan, Yue Wu, and Zile Wang. 2025. NexaNota: An AI-Powered Smart Linked Lecture Note-Taking System Leveraging Large Language Models. InProceedings of the 2025 International Conference on Big Data and Informatization Education (ICBDIE ’25). Association for Computing Machinery, New York, NY, USA, 242–248. doi:10.1145/3729605.3729648
-
[18]
Peiling Jiang, Jude Rayan, Steven P. Dow, and Haijun Xia. 2023. Graphologue: Exploring Large Language Model Responses with Interactive Diagrams. InPro- ceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA)(UIST ’23). Association for Computing Ma- chinery, New York, NY, USA, Article 3, 20 pages. doi:10....
-
[19]
Hita Kambhamettu, Jamie Flores, and Andrew Head. 2025. Traceable Texts and Their Effects: A Study of Summary-Source Links in AI-Generated Summaries. In Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA ’25). Association for Computing Machinery, New York, NY, USA, Article 538, 7 pages. doi:10.1145/370...
-
[20]
Tae Soo Kim, Yoonjoo Lee, Minsuk Chang, and Juho Kim. 2023. Cells, Gen- erators, and Lenses: Design Framework for Object-Oriented Interaction with Large Language Models. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA)(UIST ’23). Association for Computing Machinery, New York, NY, USA, Article...
-
[21]
Yoonsang Kim, Devshree Jadeja, Divyansh Pradhan, Yalong Yang, and Arie Kaufman. 2026. SpeechLess: Micro-utterance with Personalized Spatial Memory- aware Assistant in Everyday Augmented Reality.arXiv preprint arXiv:2602.00793 (2026)
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[22]
Andrew Kuznetsov, Joseph Chee Chang, Nathan Hahn, Napol Rachatasumrit, Bradley Breneisen, Julina Coupland, and Aniket Kittur. 2022. Fuse: In-Situ Sensemaking Support in the Browser. InProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology(Bend, OR, USA)(UIST ’22). Association for Computing Machinery, New York, NY, USA, Arti...
-
[24]
Jaewook Lee, Jun Wang, Elizabeth Brown, Liam Chu, Sebastian S. Rodriguez, and Jon E. Froehlich. 2024. GazePointAR: A Context-Aware Multimodal Voice Assis- tant for Pronoun Disambiguation in Wearable Augmented Reality. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machi...
-
[25]
Chenyi Li, Guande Wu, Gromit Yeuk-Yin Chan, Dishita Gdi Turakhia, Sonia Castelo Quispe, Dong Li, Leslie Welch, Claudio Silva, and Jing Qian. 2025. Satori: Towards Proactive AR Assistant with Belief-Desire-Intention User Modeling. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery,...
-
[26]
Daniel Li, Thomas Chen, Albert Tung, and Lydia B Chilton. 2021. Hierarchi- cal Summarization for Longform Spoken Dialog. InThe 34th Annual ACM Symposium on User Interface Software and Technology(Virtual Event, USA) (UIST ’21). Association for Computing Machinery, New York, NY, USA, 582–597. doi:10.1145/3472749.3474771
-
[27]
Jiahao Nick Li, Yan Xu, Tovi Grossman, Stephanie Santosa, and Michelle Li. 2024. OmniActions: Predicting Digital Actions in Response to Real-World Multimodal Sensory Inputs with LLMs. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 8...
-
[28]
Jiahao Nick Li, Zhuohao (Jerry) Zhang, and Jiaju Ma. 2025. OmniQuery: Contex- tually Augmenting Captured Multimodal Memories to Enable Personal Question Answering. InProceedings of the 2025 CHI Conference on Human Factors in Com- puting Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 635, 20 pages. doi:10.1145/3706598.3713448
-
[29]
Michael Xieyang Liu, Tongshuang Wu, Tianying Chen, Franklin Mingzhe Li, Aniket Kittur, and Brad A Myers. 2024. Selenite: Scaffolding Online Sensemak- ing with Comprehensive Overviews Elicited from Large Language Models. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA)(CHI ’24). Association for Computing M...
-
[30]
In: Transactions of the Association for Computational Linguistics (TACL), 12:157-173 (2024)
Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Transactions of the Association for Computational Linguistics 12 (2024), 157–173. doi:10.1162/tacl_a_00638
-
[31]
Damien Masson, Sylvain Malacria, Géry Casiez, and Daniel Vogel. 2024. Direct- GPT: A Direct Manipulation Interface to Interact with Large Language Models. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 975, 16 pages. doi:10.1145/36...
-
[32]
Xiaojun Meng, Shengdong Zhao, and Darren Edge. 2016. HyNote: Integrated Con- cept Mapping and Notetaking. InProceedings of the International Working Con- ference on Advanced Visual Interfaces(Bari, Italy)(A VI ’16). Association for Com- puting Machinery, New York, NY, USA, 236–239. doi:10.1145/2909132.2909277
-
[33]
Bryan Min and Haijun Xia. 2025. Meridian: A Design Framework for Malleable Overview-Detail Interfaces. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). Association for Computing Machinery, New York, NY, USA, Article 200, 14 pages. doi:10.1145/3746059. 3747654
-
[34]
Rawson, Rachael Blasiman, and R
Kayla Morehead, John Dunlosky, Katherine A. Rawson, Rachael Blasiman, and R. Benjamin Hollis. 2019. Note-taking habits of 21st Century college students: implications for student learning, memory, and achievement.Memory27, 6 (2019), 807–819. arXiv:https://doi.org/10.1080/09658211.2019.1569694 doi:10. 1080/09658211.2019.1569694 PMID: 30747570
-
[35]
Olsen, Trent Taufer, and Jerry Alan Fails
Dan R. Olsen, Trent Taufer, and Jerry Alan Fails. 2004. ScreenCrayons: annotat- ing anything. InProceedings of the 17th Annual ACM Symposium on User Interface Software and Technology(Santa Fe, NM, USA)(UIST ’04). Association for Com- puting Machinery, New York, NY, USA, 165–174. doi:10.1145/1029632.1029663
-
[36]
Annie Piolat, Thierry Olive, and Ronald T. Kellogg. 2005. Cognitive ef- fort during note taking.Applied Cognitive Psychology19, 3 (2005), 291–312. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/acp.1086 doi:10.1002/acp. 1086
-
[37]
Kevin Pu, Ting Zhang, Naveen Sendhilnathan, Sebastian Freitag, Raj Sodhi, and Tanya R. Jonker. 2025. ProMemAssist: Exploring Timely Proactive Assistance Through Working Memory Modeling in Multi-Modal Wearable Devices. InPro- ceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). Association for Computing Machinery,...
-
[38]
Wanli Qian, Chenfeng Gao, Anup Sathya, Ryo Suzuki, and Ken Nakagaki. 2024. SHAPE-IT: Exploring Text-to-Shape-Display for Generative Shape-Changing Behaviors with LLMs. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology(Pittsburgh, PA, USA)(UIST ’24). Association for Computing Machinery, New York, NY, USA, Article 118,...
-
[39]
Leping Qiu, Erin Seongyoon Kim, Sangho Suh, Ludwig Sidenmark, and Tovi Grossman. 2025. MaRginalia: Enabling In-person Lecture Capturing and Note- taking Through Mixed Reality. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 141, 15 pages. doi:10.1145/...
-
[40]
Hollan, and Peter Dalsgaard
Emilia Rosselli Del Turco, Nanna Inie, James D. Hollan, and Peter Dalsgaard
-
[41]
How Creative Practitioners Use Tools to Capture Ideas: A Cross-Domain Study.ACM Trans. Comput.-Hum. Interact.32, 4, Article 40 (Aug. 2025), 38 pages. doi:10.1145/3727979
-
[42]
Nirmal Roy, Manuel Valle Torre, Ujwal Gadiraju, David Maxwell, and Clau- dia Hauff. 2021. Note the Highlight: Incorporating Active Reading Tools in a Search as Learning Environment. InProceedings of the 2021 Conference on Hu- man Information Interaction and Retrieval(Canberra ACT, Australia)(CHIIR ’21). Association for Computing Machinery, New York, NY, U...
-
[43]
Daniel M. Russell, Mark J. Stefik, Peter Pirolli, and Stuart K. Card. 1993. The cost structure of sensemaking. InProceedings of the INTERACT ’93 and CHI ’93 Con- ference on Human Factors in Computing Systems(Amsterdam, The Netherlands) (CHI ’93). Association for Computing Machinery, New York, NY, USA, 269–276. doi:10.1145/169059.169209
-
[44]
Omar Shaikh, Shardul Sapkota, Shan Rizvi, Eric Horvitz, Joon Sung Park, Diyi Yang, and Michael S. Bernstein. 2025. Creating General User Models from Com- puter Use. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). Association for Computing Machinery, New York, NY, USA, Article 35, 23 pages. doi:10.1145/3...
-
[45]
Hijung Valentina Shin, Floraine Berthouzoz, Wilmot Li, and Frédo Durand. 2015. Visual transcripts: lecture notes from blackboard-style lecture videos.ACM Trans. Graph.34, 6, Article 240 (Nov. 2015), 10 pages. doi:10.1145/2816795.2818123
-
[46]
Hariharan Subramonyam, Colleen Seifert, Priti Shah, and Eytan Adar. 2020. texS- ketch: Active Diagramming through Pen-and-Ink Annotations. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. doi:10.1145/3313831.3376155
-
[47]
Sangho Suh, Michael Lai, Kevin Pu, Steven P. Dow, and Tovi Grossman. 2025. StoryEnsemble: Enabling Dynamic Exploration & Iteration in the Design Process with AI and Forward-Backward Propagation. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). Association for Computing Machinery, New York, NY, USA, Artic...
-
[48]
Sangho Suh, Bryan Min, Srishti Palani, and Haijun Xia. 2023. Sensecape: En- abling Multilevel Exploration and Sensemaking with Large Language Models. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA)(UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 1, 18 pages. doi:10...
-
[49]
Amanda Swearngin, Shamsi Iqbal, Victor Poznanski, Mark Encarnación, Paul N. Bennett, and Jaime Teevan. 2021. Scraps: Enabling Mobile Capture, Contex- tualization, and Use of Document Resources. InProceedings of the 2021 CHI Conference on Human Factors in Computing Systems(Yokohama, Japan)(CHI ’21). Association for Computing Machinery, New York, NY, USA, A...
-
[50]
Craig S. Tashman and W. Keith Edwards. 2011. LiquidText: a flexible, multitouch environment to support active reading. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Vancouver, BC, Canada)(CHI ’11). Association for Computing Machinery, New York, NY, USA, 3285–3294. doi:10. 1145/1978942.1979430
-
[51]
Amrita Thakur, Michael Gormish, and Berna Erol. 2011. Mobile phones and information capture in the workplace. InCHI ’11 Extended Abstracts on Human Factors in Computing Systems(Vancouver, BC, Canada)(CHI EA ’11). Association for Computing Machinery, New York, NY, USA, 1513–1518. doi:10.1145/1979742. 1979800
-
[52]
Hsin-Ruey Tsai, Shih-Kang Chiu, and Bryan Wang. 2025. GazeNoter: Co-Piloted AR Note-Taking via Gaze Selection of LLM Suggestions to Match Users’ Inten- tions. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 32, 22 pages. doi:10.1145/3706598.3714294
-
[53]
Sarah Shi Hui Wong and Stephen Wee Hun Lim. 2023. Take notes, not photos: Mind-wandering mediates the impact of note-taking strategies on video-recorded lecture learning performance.Journal of Experimental Psychology: Applied29, 1 (2023), 124
2023
-
[54]
Haijun Xia, Bruno Araujo, Tovi Grossman, and Daniel Wigdor. 2016. Object- Oriented Drawing. InProceedings of the 2016 CHI Conference on Human Factors in Computing Systems(San Jose, California, USA)(CHI ’16). Association for Computing Machinery, New York, NY, USA, 4610–4621. doi:10.1145/2858036. 2858075
-
[55]
Haijun Xia, Nathalie Henry Riche, Fanny Chevalier, Bruno De Araujo, and Daniel Wigdor. 2018. DataInk: Direct and Creative Data-Oriented Drawing. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada)(CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–13. doi:10.1145/3173574.3173797
-
[56]
Chengpei Xu, Wenjing Jia, Ruomei Wang, Xiangjian He, Baoquan Zhao, and Yuanfang Zhang. 2023. Semantic Navigation of PowerPoint-Based Lecture Video for AutoNote Generation.IEEE Transactions on Learning Technologies16, 1 (2023), 1–17. doi:10.1109/TLT.2022.3216535
-
[57]
Chenyang Yang, Yike Shi, Qianou Ma, Michael Xieyang Liu, Christian Käst- ner, and Tongshuang Wu. 2025. What Prompts Don’t Say: Understanding and Managing Underspecification in LLM Prompts. arXiv:2505.13360 [cs.CL] https://arxiv.org/abs/2505.13360
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[58]
Ron Yeh, Chunyuan Liao, Scott Klemmer, François Guimbretière, Brian Lee, Boyko Kakaradov, Jeannie Stamberger, and Andreas Paepcke. 2006. ButterflyNet: a mobile capture and access system for field biology research. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Montréal, Québec, Canada)(CHI ’06). Association for Computing Mach...
-
[59]
Min-Ling Zhang and Zhi-Hua Zhou. 2014. A Review on Multi-Label Learning Algorithms.IEEE Transactions on Knowledge and Data Engineering26, 8 (2014), 1819–1837. doi:10.1109/TKDE.2013.39
-
[60]
Running Zhao, Zhihan Jiang, Xinchen Zhang, Chirui Chang, Handi Chen, Weipeng Deng, Luyao Jin, Xiaojuan Qi, Xun Qian, and Edith C.H. Ngai. 2025. NoteIt: A System Converting Instructional Videos to Interactable Notes Through Multimodal Video Understanding. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). A...
-
[61]
Wazeer Deen Zulfikar, Samantha Chan, and Pattie Maes. 2024. Memoro: Using Large Language Models to Realize a Concise Interface for Real-Time Memory Augmentation. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 450, 18 pages. doi:10.1...
-
[62]
Information Density & Cognitive Load Is the slide dense , technical , or hard to parse quickly ? Does it contain equations , multiple plots , diagrams , or many concepts at once ? Dense slides are often captured to revisit or study later rather than to remember a single message
-
[63]
Role in Research Workflow Would this slide help the user later when writing , designing , or positioning their own work ? Is it more useful for context - setting , evidence , inspiration , or technical reference ?
-
[64]
Ownership of Content Does the slide summarize prior work ( external citations , older papers ) ? Or does it present the speaker's core contribution or vision ? Slides about others'work are often captured for literature mapping ; slides about contributions are often captured for conceptual understanding
-
[65]
Type of Knowledge Captured Is the knowledge : Conceptual ( ideas , framing , agenda ) ? Empirical ( results , benchmarks , performance ) ? Procedural ( methods , fabrication , pipeline ) ? Speculative ( limitations , future work , open questions ) ?
-
[66]
look later ,
Likely Annotation Behavior Would a user label this " look later ," " important ," " reference ," or " idea "? Is this something they would quote , compare against , or build on ?
-
[67]
Slide Position in Talk Early ( motivation / problem framing ) ? Middle ( methods / results ) ? Late ( discussion / future work / agenda ) ? Slides later in the talk are more often captured for inspiration or direction rather than understanding
-
[68]
Summary / Overview
Decide the Primary Intent Choose one clear intent , not a mixture . Examples include ( but are not limited to ) : Summarize core idea Track related work Capture empirical evidence Understand methodology Note research agenda / vision Identify future directions Collect references Design inspiration Mark title slides , section title slides , or similar slide...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.