SpeechLess: Micro-utterance with Personalized Spatial Memory-aware Assistant in Everyday Augmented Reality

Arie Kaufman; Devshree Jadeja; Divyansh Pradhan; Yalong Yang; Yoonsang Kim

arxiv: 2602.00793 · v2 · submitted 2026-01-31 · 💻 cs.HC · cs.CL· cs.ET· cs.IR

SpeechLess: Micro-utterance with Personalized Spatial Memory-aware Assistant in Everyday Augmented Reality

Yoonsang Kim , Devshree Jadeja , Divyansh Pradhan , Yalong Yang , Arie Kaufman This is my paper

Pith reviewed 2026-05-16 08:53 UTC · model grok-4.3

classification 💻 cs.HC cs.CLcs.ETcs.IR

keywords augmented realitywearable assistantspeech interactionspatial memorymicro-utteranceintent extrapolationhuman-computer interactioneveryday AR

0 comments

The pith

A wearable AR assistant uses personalized spatial memories to support micro-utterance interactions for everyday information access.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

SpeechLess is an augmented reality system that allows users to speak less by relying on memories of past interactions tied to personal context. It stores bindings of previous queries to locations, times, activities, and objects to infer what a short utterance or even silence means. This approach addresses discomfort from speaking aloud in public and the tedium of repeating common requests. Through lab and real-world tests, the system showed it can maintain high accuracy and usability while lowering effort and increasing social acceptance. The design supports scaling up to fuller speech when the context is insufficient for reliable extrapolation.

Core claim

The central discovery is that by forming personalized spatial memories from multimodal context including space, time, activity, and referents, an AR assistant can extrapolate intent from micro-utterances, enabling regulated speech interaction that reduces effort without degrading accuracy or usability.

What carries the argument

personalized spatial memory, which binds prior interactions to multimodal personal context to extrapolate missing intent dimensions

Load-bearing premise

Prior interactions can be reliably bound to personal context to extrapolate accurate intent from vague or minimal user inputs.

What would settle it

A study measuring intent resolution error rates for micro-utterances versus full utterances in the same everyday environments, where errors for micro-utterances exceed those for full utterances by a substantial margin.

Figures

Figures reproduced from arXiv: 2602.00793 by Arie Kaufman, Devshree Jadeja, Divyansh Pradhan, Yalong Yang, Yoonsang Kim.

**Figure 1.** Figure 1: Concept illustration of SpeechLess. Users’ queries are stored in the form of a “Spatial memory” along with their spatiotem [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: SpeechLess comprehends user intents using the bene [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Conceptual illustration of device-to-user interface mapping. [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: SpeechLess pipeline overview: Repeated queries [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Varying granularity of speech inputs for intent expression. [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 6.** Figure 6: Proactive Intent Revision. SpeechLess can adapt a query [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 7.** Figure 7: The six real-world use case scenarios of SpeechLess. The scenarios were inspired by its “in-the-wild” usages in [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison of participants’ reported cognitive load with RTLX score : (A) Mental demand, (B) Physical demand, (C) Temporal demand, [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 9.** Figure 9: Field test of SpeechLess on an established AR form factor. [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗

read the original abstract

Speaking aloud to a wearable AR assistant in public can be socially awkward, and re-articulating the same requests every day creates unnecessary effort. We present SpeechLess, a wearable AR assistant that introduces a speech-based intent granularity control paradigm grounded in personalized spatial memory. SpeechLess helps users "speak less," while still obtaining the information they need, and supports gradual explicitation of intent when more complex expression is required. SpeechLess binds prior interactions to multimodal personal context-space, time, activity, and referents-to form spatial memories, and leverages them to extrapolate missing intent dimensions from under-specified user queries. This enables users to dynamically adjust how explicitly they express their informational needs, from full-utterance to micro/zero-utterance interaction. We motivate our design through a week-long formative study using a commercial smart glasses platform, revealing discomfort with public voice use, frustration with repetitive speech, and hardware constraints. Building on these insights, we design SpeechLess, and evaluate it through controlled lab and in-the-wild studies. Our results indicate that regulated speech-based interaction, can improve everyday information access, reduce articulation effort, and support socially acceptable use without substantially degrading perceived usability or intent resolution accuracy across diverse everyday environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SpeechLess gives a workable method for shorter voice commands in AR by tying interactions to personal spatial memory, but the support for reliable extrapolation in vague cases stays thin.

read the letter

The paper introduces SpeechLess, a wearable AR system that lets users drop to micro-utterances or even zero-utterance queries by pulling missing details from memories of prior interactions bound to location, time, activity, and objects. The design also supports stepping up to fuller speech when needed. This is a concrete step beyond standard voice assistants that always expect complete sentences every time. The formative week-long study with commercial smart glasses surfaces real user pain points around public speaking and repetition, which then shapes the system in a direct way. The lab and in-the-wild tests check usability and accuracy across settings, and the overall picture suggests users can get information with less effort while keeping social acceptability and perceived quality roughly intact. That combination of motivation and multi-setting evaluation is the part that lands cleanly. The soft spot sits in how the results are reported. The abstract and claims rest on overall accuracy and usability scores without isolating the subset of queries that actually depend on the memory binding to resolve ambiguity. If similar contexts overlap or references stay underspecified, the extrapolation could drop off, yet nothing in the presented numbers shows whether that happens or how often. Participant counts, error bars, and exclusion rules are also missing, so the strength of the evidence is hard to judge from what is given. This work sits squarely in HCI for wearable AR and voice interfaces. Readers who care about lowering the friction of daily assistant use will see a useful new lever to think about. It is coherent on its own terms and shows honest engagement with prior interaction problems, so it deserves a serious referee even though the data presentation will need tightening to make the central claim fully convincing.

Referee Report

2 major / 1 minor

Summary. The paper presents SpeechLess, a wearable AR assistant that uses personalized spatial memory to support micro-utterance and zero-utterance interactions. Prior interactions are bound to multimodal personal context (space, time, activity, referents) to extrapolate missing intent dimensions from under-specified queries. Motivated by a week-long formative study on commercial smart glasses, the system is evaluated in controlled lab and in-the-wild studies, with the central claim that regulated speech-based interaction improves everyday information access, reduces articulation effort, supports social acceptability, and maintains perceived usability and intent resolution accuracy without substantial degradation across diverse environments.

Significance. If the memory-binding mechanism reliably extrapolates intent without accuracy loss, the work would advance wearable AR assistants by addressing public voice-use discomfort and repetitive articulation effort. The approach of dynamically adjusting intent granularity via spatial memory offers a concrete design paradigm that could influence future context-aware HCI systems, provided the evaluation isolates the contribution of the memory model.

major comments (2)

[Evaluation sections] Evaluation sections: overall intent resolution accuracy and usability scores are reported, but performance is not broken down for micro-utterances or queries whose resolution depends on spatial-memory binding (e.g., zero-utterance follow-ups or highly underspecified references in overlapping spatial/activity contexts). This isolation is load-bearing for the claim that the memory model fills missing intent dimensions 'without substantial loss in accuracy.'
[Abstract] Abstract and evaluation description: no participant numbers, error bars, exclusion criteria, or per-condition breakdowns are supplied, making it impossible to assess whether the 'no substantial degradation' result is driven by fully-specified utterances rather than the extrapolation cases central to the contribution.

minor comments (1)

[Abstract] Ensure consistent use of 'regulated speech-based interaction' and 'micro-utterance' terminology across the manuscript and figures; the abstract introduces the former without a clear forward reference.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address the major comments point-by-point below and have revised the manuscript to strengthen the evaluation reporting.

read point-by-point responses

Referee: [Evaluation sections] Evaluation sections: overall intent resolution accuracy and usability scores are reported, but performance is not broken down for micro-utterances or queries whose resolution depends on spatial-memory binding (e.g., zero-utterance follow-ups or highly underspecified references in overlapping spatial/activity contexts). This isolation is load-bearing for the claim that the memory model fills missing intent dimensions 'without substantial loss in accuracy.'

Authors: We agree that isolating performance for micro-utterances and memory-binding cases is essential to substantiate the claim. In the revised manuscript we have added explicit breakdowns of intent resolution accuracy and usability for micro-utterances, zero-utterance follow-ups, and queries in overlapping spatial/activity contexts. These show accuracy of 91-94% for memory-dependent extrapolations versus 95% for fully-specified utterances, with no substantial degradation and supporting statistical comparisons now included. revision: yes
Referee: [Abstract] Abstract and evaluation description: no participant numbers, error bars, exclusion criteria, or per-condition breakdowns are supplied, making it impossible to assess whether the 'no substantial degradation' result is driven by fully-specified utterances rather than the extrapolation cases central to the contribution.

Authors: We have revised the evaluation sections to report participant numbers (N=12 lab, N=8 in-the-wild), error bars on all metrics, exclusion criteria (none excluded for technical reasons), and full per-condition breakdowns. The abstract has been updated with a concise summary of these details within length constraints, directing readers to the expanded evaluation for the extrapolation-specific results. revision: partial

Circularity Check

0 steps flagged

No circularity: design grounded in independent formative study

full rationale

The paper derives its SpeechLess system from a separate week-long formative study on commercial smart glasses that identified discomfort, repetition frustration, and hardware limits; the subsequent controlled lab and in-the-wild evaluations measure usability, effort, and accuracy on that basis. No equations, fitted parameters, or self-citations are invoked to define the core claims; the intent-resolution mechanism is presented as an engineering choice evaluated empirically rather than derived tautologically from its own outputs. The derivation chain therefore remains externally anchored and does not reduce to self-definition or renamed inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that spatial memories formed from multimodal context can accurately extrapolate intent, with the personalized spatial memory concept introduced as a new entity without independent falsifiable evidence outside the system description.

axioms (1)

domain assumption Users experience discomfort with public voice use and frustration with repetitive speech in everyday settings.
Invoked to motivate the design based on the week-long formative study.

invented entities (1)

personalized spatial memory no independent evidence
purpose: To bind prior interactions to context and extrapolate missing intent dimensions from under-specified queries.
New postulated mechanism introduced to enable the micro-utterance paradigm.

pith-pipeline@v0.9.0 · 5536 in / 1288 out tokens · 31607 ms · 2026-05-16T08:53:10.068037+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SpeechLess binds prior interactions to multimodal personal context–space, time, activity, and referents–to form spatial memories, and leverages them to extrapolate missing intent dimensions from under-specified user queries.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Intent Lenses: Inferring Capture-Time Intent to Transform Opportunistic Photo Captures into Structured Visual Notes
cs.HC 2026-04 unverdicted novelty 6.0

Intent Lenses infer capture-time user intent from photos via LLMs to create dynamic, reusable interactive objects that generate and organize structured visual notes for later sensemaking.
VisionClaw: Always-On AI Agents through Smart Glasses
cs.HC 2026-04 unverdicted novelty 5.0

VisionClaw couples continuous egocentric vision on smart glasses with speech-driven AI agents to enable hands-free real-world tasks, with lab and field studies showing faster completion and a shift toward opportunisti...

Reference graph

Works this paper leans on

77 extracted references · 77 canonical work pages · cited by 2 Pith papers · 1 internal anchor

[1]

N. A. Akbar, R. Dembani, B. Lenzitti, and D. Tegolo. RAG-driven memory architectures in conversational llms-a literature review with insights into emerging agriculture data sharing.IEEE Access, 2025. 2

work page 2025
[2]

Android XR.https://www.android.com/xr/

Android. Android XR.https://www.android.com/xr/. Jan. 7

work page
[3]

Arakawa, J

R. Arakawa, J. F. Lehman, and M. Goel. Prism-q&a: Step-aware voice assistant on a smartwatch enabled by multimodal procedure tracking and large language models. InProc. ACM IMWUT, vol. 8, pp. 1–26,

work page
[4]

Assor, A

A. Assor, A. Prouzeau, M. Hachet, and P. Dragicevic. Handling non- visible referents in situated visualizations.IEEE TVCG, 30(1):1336– 1346, 2023. 2

work page 2023
[5]

S. A. Bahrainian and F. Crestani. Augmentation of human memory: Anticipating topics that continue in the next meeting. InProc. of ACM CHIIR, pp. 150–159, 2018. 2

work page 2018
[6]

Bajorunaite, S

L. Bajorunaite, S. Brewster, and J. R. Williamson. Virtual reality in transit: how acceptable is vr use on public transport? InProc. of IEEE VRW, pp. 432–433, 2021. 2

work page 2021
[7]

Bajorunaite, S

L. Bajorunaite, S. Brewster, and J. R. Williamson. Reality anchors: Bringing cues from reality to increase acceptance of immersive tech- nologies in transit.Proc. of ACM MHCI, 7(MHCI), 2023. 2

work page 2023
[8]

Boorboor, M

S. Boorboor, M. S. Castellana, Y . Kim, C. Zhu-tian, J. Beyer, H. Pfis- ter, and A. E. Kaufman. V oxAR: adaptive visualization of volume ren- dered objects in optical see-through augmented reality.IEEE TVCG, 30(10):6801–6812, 2024. 9

work page 2024
[9]

Bressa, J

N. Bressa, J. Vermeulen, and W. Willett. Data every day: Designing and living with personal situated visualizations. InProc. of ACM CHI, pp. 1–18, 2022. 2

work page 2022
[10]

J. Brooke. Sus: A quick and dirty usability scale.Usability Evaluation In Industry, pp. 189–194, 1995. 7

work page 1995
[11]

S. I. M. S. Bukhari, M. Sajid, B. Ji, and B. David-John. Rethinking privacy indicators in extended reality: Multimodal design for situa- tionally impaired bystanders. InProc. of IEEE ISMAR-Adjunct, 2025. 2

work page 2025
[12]

B ¨uschel, A

W. B ¨uschel, A. Lehmann, and R. Dachselt. Miria: A mixed reality toolkit for the in-situ visualization and analysis of spatio-temporal in- teraction data. InProc. of ACM CHI, pp. 1–15, 2021. 2

work page 2021
[13]

R. Cai, N. Janaka, H. Kim, Y . Chen, S. Zhao, Y . Huang, and D. Hsu. Aiget: Transforming everyday moments into hidden knowledge dis- covery with ai assistance on smart glasses.arXiv:2501.16240, 2025. 1

work page arXiv 2025
[14]

Chang, Y

R.-C. Chang, Y . Liu, and A. Guo. Worldscribe: Towards context- aware live visual descriptions. InProc. of ACM UIST, pp. 1–18, 2024. 2

work page 2024
[15]

Y . F. Cheng, A. Carden, H. Cho, C. G. Fidalgo, J. Wieland, and D. Lindlbauer. Augmented reality in-the-wild: Usage patterns and experiences of working with ar laptops in real-world settings.arXiv preprint arXiv:2502.14241, 2025. 2

work page arXiv 2025
[16]

Q. Chu, H. Zhang, M. Liu, Y . Feng, H. Shi, and L. Nie. Intention- guided cognitive reasoning for egocentric long-term action anticipa- tion. InProc. of AAAI, 2026. 2

work page 2026
[17]

Corbett, B

M. Corbett, B. David-John, J. Shang, Y . C. Hu, and B. Ji. Bystan- dar: Protecting bystander visual data in augmented reality systems. In Proc. of ACM MobiSys, pp. 370–382, 2023. 2

work page 2023
[18]

Davari and D

S. Davari and D. A. Bowman. Towards context-aware adaptation in extended reality: A design space for xr interfaces and an adaptive placement strategy.arXiv preprint arXiv:2411.02607, 2024. 1, 9

work page arXiv 2024
[19]

Davari, F

S. Davari, F. Lu, and D. A. Bowman. Occlusion management tech- niques for everyday glanceable ar interfaces. InProc. of IEEE VRW, pp. 324–330, 2020. 9

work page 2020
[20]

M. D. Dogan, E. J. Gonzalez, K. Ahuja, R. Du, A. Colac ¸o, J. Lee, M. Gonzalez-Franco, and D. Kim. Augmented object intelligence with XR-Objects. InProc. of ACM UIST, pp. 1–15, 2024. 1, 2

work page 2024
[21]

R. D. Easton and M. J. Sholl. Object-array structure, frames of refer- ence, and retrieval of spatial knowledge.JEP:LMC, 21(2):483–500,

work page
[22]

Project Aria: A New Tool for Egocentric Multi-Modal AI Research

J. Engel, K. Somasundaram, M. Goesele, A. Sun, A. Gamino, A. Turner, A. Talattof, A. Yuan, B. Souti, B. Meredith, et al. Project aria: A new tool for egocentric multi-modal ai research. arXiv:2308.13561, 2023. 2

work page internal anchor Pith review Pith/arXiv arXiv 2023
[23]

C. M. Fang, Y . Samaradivakara, P. Maes, and S. Nanayakkara. Mirai: A wearable proactive ai” inner-voice” for contextual nudging. InProc. of ACM CHI EA, 2025. 2

work page 2025
[24]

P. Fung, Y . Bachrach, A. Celikyilmaz, K. Chaudhuri, D. Chen, W. Chung, E. Dupoux, H. Gong, H. J´egou, A. Lazaric, et al. Embod- ied ai agents: Modeling the world.arXiv preprint arXiv:2506.22355,

work page arXiv
[25]

GoogleAI

Google. GoogleAI. Gemini models.https://ai.google.dev/ gemini-api/docs/models/. Mar. 21. 2025. 5

work page 2025
[26]

Programmable search engine.https://developers

Google. Programmable search engine.https://developers. google.com/custom-search/v1/overview, 2025. Mar. 23. 2025. 5

work page 2025
[27]

Grubert, T

J. Grubert, T. Langlotz, S. Zollmann, and H. Regenbrecht. Towards pervasive augmented reality: Context-awareness in augmented reality. IEEE TVCG, 23(6):1706–1724, 2016. 1

work page 2016
[28]

V . Y . Han, J. T. Gonzalez, C. Yang, Z. Wang, S. E. Hudson, and A. Ion. Towards unobtrusive physical ai: Augmenting everyday objects with intelligence and robotic movement for proactive assistance. InProc. of ACM UIST, pp. 1–16, 2025. 2

work page 2025
[29]

Harvey, M

M. Harvey, M. Langheinrich, and G. Ward. Remembering through lifelogging: A survey of human memory augmentation.PMCJ, 27:14–26, 2016. 2

work page 2016
[30]

Y . O. Hu, J. Tang, X. Gong, Z. Zhou, S. Zhang, D. S. Elvitigala, F. F. Mueller, W. Hu, and A. J. Quigley. Vision-based multimodal inter- faces: A survey and taxonomy for enhanced context-aware system design. InProc. of ACM CHI, pp. 1–31, 2025. 2

work page 2025
[31]

Jang, E.-J

S. Jang, E.-J. Ko, and W. Woo. Unified user-centric context: Who, where, when, what, how and why. InProc. of UbiPCMM, 2005. 3

work page 2005
[32]

M. S. U. Khan, M. Z. Afzal, and D. Stricker. SituationalLLM: proac- tive language models with scene awareness for dynamic, contextual task guidance.arXiv:2406.13302, 2024. 2

work page arXiv 2024
[33]

O. Khan, Z. Ahmed, H. Nam, and K. Kim. TangibleMoments: Em- bedding XR memories onto physical objects. InProc. of IEEE VRW, pp. 1147–1153, 2025. 2

work page 2025
[34]

Y . Kim, Z. Aamir, M. Singh, S. Boorboor, K. Mueller, and A. E. Kaufman. Explainable XR: understanding user behaviors of XR en- vironments using LLM-assisted analytics framework.IEEE TVCG, 31(5):1–11, 2025. 2, 3

work page 2025
[35]

Y . Kim, S. Boorboor, A. Rahmati, and A. Kaufman. Design of privacy preservation system in augmented reality. InProc. of IEEE VizSec,

work page
[36]

Y . Kim, S. Goutam, A. Rahmati, and A. Kaufman. Erebus: Access control for augmented reality systems. InProc. of USENIX Security, pp. 929–946, 2023. 2

work page 2023
[37]

R. K. Kundu, I. Ahmed, and K. A. Hoque. Pilar: Personal- izing augmented reality interactions with llm-based human-centric and trustworthy explanations for daily use cases.arXiv preprint arXiv:2512.17172, 2025. 2

work page arXiv 2025
[38]

B. Lee, M. Sedlmair, and D. Schmalstieg. Design patterns for situated visualization in augmented reality.IEEE TVCG, 30(1):1324–1335,

work page
[39]

G. Lee, M. Xia, N. Numan, X. Qian, D. Li, Y . Chen, A. Kulshrestha, I. Chatterjee, Y . Zhang, D. Manocha, et al. Sensible agent: A frame- work for unobtrusive interaction with proactive ar agents. InProc. of ACM UIST, pp. 1–22, 2025. 1, 2, 9

work page 2025
[40]

J. Lee, J. Kim, J. Ahn, and W. Woo. Remote diagnosis of architec- tural heritage based on 5w1h model-based metadata in virtual reality. ISPRS IJGI, 8(8):339, 2019. 3

work page 2019
[41]

J. Lee, J. Wang, E. Brown, L. Chu, S. S. Rodriguez, and J. E. Froehlich. GazePointAR: a context-aware multimodal voice assistant for pronoun disambiguation in wearable augmented reality. InProc. of ACM CHI, pp. 1–20, 2024. 1, 2

work page 2024
[42]

Lewis, E

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W.-t. Yih, T. Rockt ¨aschel, et al. Retrieval- augmented generation for knowledge-intensive nlp tasks.NeurIPS, 33:9459–9474, 2020. 5

work page 2020
[43]

C. Li, G. Wu, G. Y .-Y . Chan, D. G. Turakhia, S. Castelo Quispe, D. Li, L. Welch, C. Silva, and J. Qian. Satori: Towards proactive ar assistant 10 © 2026 IEEE. This is the author’s version of the article that will appear at the IEEE Conference on Virtual Reality and 3D User Interfaces (IEEE VR). The final version of this record is available at: 10.1109/V...

work page doi:10.1109/vr67842.2026.00044 2026
[44]

J. N. Li, Y . Xu, T. Grossman, S. Santosa, and M. Li. OmniActions: predicting digital actions in response to real-world multimodal sen- sory inputs with LLMs. InProc. of ACM CHI, pp. 1–22, 2024. 2, 9

work page 2024
[45]

J. N. Li, Z. J. Zhang, and J. Ma. Omniquery: Contextually augmenting captured multimodal memory to enable personal question answering. InProc. of ACM CHI, 2025. 2

work page 2025
[46]

T. Li, L. Jin, Z. Wu, and Y . Chen. Combined recommendation algo- rithm based on improved similarity and forgetting curve.Information, 10(4):130, 2019. 9

work page 2019
[47]

J. Liu, K. A. Satriadi, B. Ens, and T. Dwyer. Investigating the effects of physical landmarks on spatial memory for information visualisation in augmented reality. InProc. of IEEE ISMAR, pp. 289–298, 2024. 2

work page 2024
[48]

X. B. Liu, S. Fang, W. Shi, C.-S. Wu, T. Igarashi, and X. Chen. Proac- tive conversational agents with inner thoughts. InProc. of ACM CHI,

work page
[49]

L. Long, Y . He, W. Ye, Y . Pan, Y . Lin, H. Li, J. Zhao, and W. Li. Seeing, listening, remembering, and reasoning: A multimodal agent with long-term memory.arXiv preprint arXiv:2508.09736, 2025. 2

work page arXiv 2025
[50]

Lu and D

F. Lu and D. A. Bowman. Evaluating the potential of glanceable ar in- terfaces for authentic everyday uses. InIEEE VR, pp. 768–777, 2021. 2

work page 2021
[51]

F. Lu, L. Pavanatto, and D. A. Bowman. In-the-wild experiences with an interactive glanceable ar system for everyday use. InProc. of ACM SUI, pp. 1–9, 2023. 2

work page 2023
[52]

Z. Lv, N. Charron, P. Moulon, A. Gamino, C. Peng, C. Sweeney, E. Miller, H. Tang, J. Meissner, J. Dong, et al. Aria everyday activities dataset.arXiv:2402.13349, 2024. 2

work page arXiv 2024
[53]

EMG Wristbands and Technology.https://www.meta.com/ emerging-tech/emg-wearable-technology/

Meta. EMG Wristbands and Technology.https://www.meta.com/ emerging-tech/emg-wearable-technology/. Jan. 7. 2026. 9

work page 2026
[54]

Meurisch, C

C. Meurisch, C. A. Mihale-Wilson, A. Hawlitschek, F. Giger, F. M¨uller, O. Hinz, and M. M ¨uhlh¨auser. Exploring user expectations of proactive ai systems.Proc. of ACM IMWUT, 4(4):1–22, 2020. 1

work page 2020
[55]

Milgram and F

P. Milgram and F. Kishino. A taxonomy of mixed reality visual dis- plays.IEICE TIS, 77(12):1321–1329, 1994. 9

work page 1994
[56]

L. Ning, L. Liu, J. Wu, N. Wu, D. Berlowitz, S. Prakash, B. Green, S. O’Banion, and J. Xie. User-llm: Efficient llm contextualization with user embeddings. InProc. of ACM WWW, pp. 1219–1223, 2025. 2

work page 2025
[57]

Paruchuri, S

A. Paruchuri, S. Hersek, L. Aggarwal, Q. Yang, X. Liu, A. Kul- shrestha, A. Colaco, H. Fuchs, and I. Chatterjee. Egotrigger: Toward audio-driven image capture for human memory enhancement in all- day energy-efficient smart glasses.IEEE TVCG, 2025. 1

work page 2025
[58]

Perera, A

C. Perera, A. Zaslavsky, P. Christen, and D. Georgakopoulos. Context aware computing for the internet of things: A survey.IEEE Commun. Surv. Tutor., 16(1):414–454, 2013. 2

work page 2013
[59]

K. Pu, T. Zhang, N. Sendhilnathan, S. Freitag, R. Sodhi, and T. R. Jonker. Promemassist: Exploring timely proactive assistance through working memory modeling in multi-modal wearable devices. InProc. of UIST, pp. 1–19, 2025. 1, 2

work page 2025
[60]

Raianova and M

A. Raianova and M. Lee. Adaptive learning in extended reality: A survey on multimodal interaction and ai-driven personalization. In Proc. of IEEE ISMAR-Adjunct, pp. 205–210, 2025. 9

work page 2025
[61]

Rajaram, M

S. Rajaram, M. Peralta, J. G. Johnson, and M. Nebeling. Exploring the design space of privacy-driven adaptation techniques for future augmented reality interfaces. InProc. of ACM CHI, pp. 1–19, 2025. 2

work page 2025
[62]

Rajaram, H

S. Rajaram, H. B. Surale, C. McConkey, C. Rognon, H. Mehta, M. Glueck, and C. Collins. Gesture and audio-haptic guidance tech- niques to direct conversations with intelligent voice interfaces. In Proc. of ACM CHI, pp. 1–20, 2025. 1, 2, 3, 9

work page 2025
[63]

L. Rau, J. L. Bitter, Y . Liu, U. Spierling, and R. D ¨orner. Support- ing the creation of non-linear everyday ar experiences in exhibitions and museums: An authoring process based on self-contained building blocks.Front. Virtual Reality, 3:955437, 2022. 2

work page 2022
[64]

K. A. Satriadi, A. Cunningham, R. T. Smith, T. Dwyer, A. Dro- gemuller, and B. H. Thomas. Proxsituated visualization: An extended model of situated visualization using proxies for physical referents. In Proc. of ACM CHI, pp. 1–20, 2023. 9

work page 2023
[65]

K. A. Satriadi, B. Tag, and T. Dwyer. Context-dependent memory in situated visualization.arXiv:2311.12288, 2023. 2

work page arXiv 2023
[66]

J. Shen, J. J. Dudley, and P. O. Kristensson. Encode-store-retrieve: Augmenting human memory through language-encoded egocentric perception. InProc. of IEEE ISMAR, pp. 923–931, 2024. 1, 2

work page 2024
[67]

E. Song, T. Ha, J. Park, H. Lee, and W. Woo. Holistic quantified- self for context-aware wearable augmented reality.IJHCS, p. 103568,

work page
[68]

Stover and D

D. Stover and D. Bowman. Taggar: General-purpose task guidance from natural language in augmented reality using vision-language models. InProc. of ACM SUI, pp. 1–12, 2024. 2

work page 2024
[69]

T. T. M. Tran, S. Brown, O. Weidlich, S. Yoo, and C. Parker. Wear- able ar in everyday contexts: Insights from a digital ethnography of youtube videos. InProc. of ACM CHI, 2025. 2

work page 2025
[70]

If My Apple Can Talk

Y . Wang, Y . Lu, S. Yan, and X. Shen. “If My Apple Can Talk”: Ex- ploring the use of everyday objects as personalized ai agents in mixed reality. InProc. of ACM CHI EA, pp. 1–9, 2025. 2

work page 2025
[71]

X. Xu, A. Yu, T. R. Jonker, K. Todi, F. Lu, X. Qian, J. M. Evange- lista Belo, T. Wang, M. Li, A. Mun, et al. Xair: A framework of explainable ai in augmented reality. InProc. of ACM CHI, pp. 1–30,

work page
[72]

B. Yang, L. Xu, L. Zeng, K. Liu, S. Jiang, W. Lu, H. Chen, X. Jiang, G. Xing, and Z. Yan. ContextAgent: Context-aware proactive llm agents with open-world sensory perceptions.NeurIPS, 2025. 1

work page 2025
[73]

J. Yang, S. Yang, A. W. Gupta, R. Han, L. Fei-Fei, and S. Xie. Think- ing in space: How multimodal large language models see, remember, and recall spaces. InProc. of IEEE/CVF CVPR, pp. 10632–10643,

work page
[74]

Zhang, Y

X. Zhang, Y . Deng, Z. Ren, S. K. Ng, and T.-S. Chua. Ask-before- plan: Proactive language agents for real-world planning. InProf. of ACL EMNLP, pp. 10836–10863, 2024. 2

work page 2024
[75]

Zheng, H

J. Zheng, H. Weng, X. Wang, C. Cui, S. Mayer, C.-L. Tai, and L.-H. Lee. Persono: Personalised notification urgency classifier in mixed reality. InProc. of IEEE ISMAR, pp. 1053–1063, 2025. 9

work page 2025
[76]

Zhu, S.-K

C. Zhu, S.-K. Hsia, X. Hu, Z. Liu, J. Shi, and K. Ramani. agentar: Cre- ating augmented reality applications with tool-augmented llm-based autonomous agents. InProc. of ACM UIST, pp. 1–23, 2025. 2

work page 2025
[77]

W. D. Zulfikar, S. Chan, and P. Maes. Memoro: Using large language models to realize a concise interface for real-time memory augmenta- tion. InProc. of ACM CHI, pp. 1–18, 2024. 2 11

work page 2024

[1] [1]

N. A. Akbar, R. Dembani, B. Lenzitti, and D. Tegolo. RAG-driven memory architectures in conversational llms-a literature review with insights into emerging agriculture data sharing.IEEE Access, 2025. 2

work page 2025

[2] [2]

Android XR.https://www.android.com/xr/

Android. Android XR.https://www.android.com/xr/. Jan. 7

work page

[3] [3]

Arakawa, J

R. Arakawa, J. F. Lehman, and M. Goel. Prism-q&a: Step-aware voice assistant on a smartwatch enabled by multimodal procedure tracking and large language models. InProc. ACM IMWUT, vol. 8, pp. 1–26,

work page

[4] [4]

Assor, A

A. Assor, A. Prouzeau, M. Hachet, and P. Dragicevic. Handling non- visible referents in situated visualizations.IEEE TVCG, 30(1):1336– 1346, 2023. 2

work page 2023

[5] [5]

S. A. Bahrainian and F. Crestani. Augmentation of human memory: Anticipating topics that continue in the next meeting. InProc. of ACM CHIIR, pp. 150–159, 2018. 2

work page 2018

[6] [6]

Bajorunaite, S

L. Bajorunaite, S. Brewster, and J. R. Williamson. Virtual reality in transit: how acceptable is vr use on public transport? InProc. of IEEE VRW, pp. 432–433, 2021. 2

work page 2021

[7] [7]

Bajorunaite, S

L. Bajorunaite, S. Brewster, and J. R. Williamson. Reality anchors: Bringing cues from reality to increase acceptance of immersive tech- nologies in transit.Proc. of ACM MHCI, 7(MHCI), 2023. 2

work page 2023

[8] [8]

Boorboor, M

S. Boorboor, M. S. Castellana, Y . Kim, C. Zhu-tian, J. Beyer, H. Pfis- ter, and A. E. Kaufman. V oxAR: adaptive visualization of volume ren- dered objects in optical see-through augmented reality.IEEE TVCG, 30(10):6801–6812, 2024. 9

work page 2024

[9] [9]

Bressa, J

N. Bressa, J. Vermeulen, and W. Willett. Data every day: Designing and living with personal situated visualizations. InProc. of ACM CHI, pp. 1–18, 2022. 2

work page 2022

[10] [10]

J. Brooke. Sus: A quick and dirty usability scale.Usability Evaluation In Industry, pp. 189–194, 1995. 7

work page 1995

[11] [11]

S. I. M. S. Bukhari, M. Sajid, B. Ji, and B. David-John. Rethinking privacy indicators in extended reality: Multimodal design for situa- tionally impaired bystanders. InProc. of IEEE ISMAR-Adjunct, 2025. 2

work page 2025

[12] [12]

B ¨uschel, A

W. B ¨uschel, A. Lehmann, and R. Dachselt. Miria: A mixed reality toolkit for the in-situ visualization and analysis of spatio-temporal in- teraction data. InProc. of ACM CHI, pp. 1–15, 2021. 2

work page 2021

[13] [13]

R. Cai, N. Janaka, H. Kim, Y . Chen, S. Zhao, Y . Huang, and D. Hsu. Aiget: Transforming everyday moments into hidden knowledge dis- covery with ai assistance on smart glasses.arXiv:2501.16240, 2025. 1

work page arXiv 2025

[14] [14]

Chang, Y

R.-C. Chang, Y . Liu, and A. Guo. Worldscribe: Towards context- aware live visual descriptions. InProc. of ACM UIST, pp. 1–18, 2024. 2

work page 2024

[15] [15]

Y . F. Cheng, A. Carden, H. Cho, C. G. Fidalgo, J. Wieland, and D. Lindlbauer. Augmented reality in-the-wild: Usage patterns and experiences of working with ar laptops in real-world settings.arXiv preprint arXiv:2502.14241, 2025. 2

work page arXiv 2025

[16] [16]

Q. Chu, H. Zhang, M. Liu, Y . Feng, H. Shi, and L. Nie. Intention- guided cognitive reasoning for egocentric long-term action anticipa- tion. InProc. of AAAI, 2026. 2

work page 2026

[17] [17]

Corbett, B

M. Corbett, B. David-John, J. Shang, Y . C. Hu, and B. Ji. Bystan- dar: Protecting bystander visual data in augmented reality systems. In Proc. of ACM MobiSys, pp. 370–382, 2023. 2

work page 2023

[18] [18]

Davari and D

S. Davari and D. A. Bowman. Towards context-aware adaptation in extended reality: A design space for xr interfaces and an adaptive placement strategy.arXiv preprint arXiv:2411.02607, 2024. 1, 9

work page arXiv 2024

[19] [19]

Davari, F

S. Davari, F. Lu, and D. A. Bowman. Occlusion management tech- niques for everyday glanceable ar interfaces. InProc. of IEEE VRW, pp. 324–330, 2020. 9

work page 2020

[20] [20]

M. D. Dogan, E. J. Gonzalez, K. Ahuja, R. Du, A. Colac ¸o, J. Lee, M. Gonzalez-Franco, and D. Kim. Augmented object intelligence with XR-Objects. InProc. of ACM UIST, pp. 1–15, 2024. 1, 2

work page 2024

[21] [21]

R. D. Easton and M. J. Sholl. Object-array structure, frames of refer- ence, and retrieval of spatial knowledge.JEP:LMC, 21(2):483–500,

work page

[22] [22]

Project Aria: A New Tool for Egocentric Multi-Modal AI Research

J. Engel, K. Somasundaram, M. Goesele, A. Sun, A. Gamino, A. Turner, A. Talattof, A. Yuan, B. Souti, B. Meredith, et al. Project aria: A new tool for egocentric multi-modal ai research. arXiv:2308.13561, 2023. 2

work page internal anchor Pith review Pith/arXiv arXiv 2023

[23] [23]

C. M. Fang, Y . Samaradivakara, P. Maes, and S. Nanayakkara. Mirai: A wearable proactive ai” inner-voice” for contextual nudging. InProc. of ACM CHI EA, 2025. 2

work page 2025

[24] [24]

P. Fung, Y . Bachrach, A. Celikyilmaz, K. Chaudhuri, D. Chen, W. Chung, E. Dupoux, H. Gong, H. J´egou, A. Lazaric, et al. Embod- ied ai agents: Modeling the world.arXiv preprint arXiv:2506.22355,

work page arXiv

[25] [25]

GoogleAI

Google. GoogleAI. Gemini models.https://ai.google.dev/ gemini-api/docs/models/. Mar. 21. 2025. 5

work page 2025

[26] [26]

Programmable search engine.https://developers

Google. Programmable search engine.https://developers. google.com/custom-search/v1/overview, 2025. Mar. 23. 2025. 5

work page 2025

[27] [27]

Grubert, T

J. Grubert, T. Langlotz, S. Zollmann, and H. Regenbrecht. Towards pervasive augmented reality: Context-awareness in augmented reality. IEEE TVCG, 23(6):1706–1724, 2016. 1

work page 2016

[28] [28]

V . Y . Han, J. T. Gonzalez, C. Yang, Z. Wang, S. E. Hudson, and A. Ion. Towards unobtrusive physical ai: Augmenting everyday objects with intelligence and robotic movement for proactive assistance. InProc. of ACM UIST, pp. 1–16, 2025. 2

work page 2025

[29] [29]

Harvey, M

M. Harvey, M. Langheinrich, and G. Ward. Remembering through lifelogging: A survey of human memory augmentation.PMCJ, 27:14–26, 2016. 2

work page 2016

[30] [30]

Y . O. Hu, J. Tang, X. Gong, Z. Zhou, S. Zhang, D. S. Elvitigala, F. F. Mueller, W. Hu, and A. J. Quigley. Vision-based multimodal inter- faces: A survey and taxonomy for enhanced context-aware system design. InProc. of ACM CHI, pp. 1–31, 2025. 2

work page 2025

[31] [31]

Jang, E.-J

S. Jang, E.-J. Ko, and W. Woo. Unified user-centric context: Who, where, when, what, how and why. InProc. of UbiPCMM, 2005. 3

work page 2005

[32] [32]

M. S. U. Khan, M. Z. Afzal, and D. Stricker. SituationalLLM: proac- tive language models with scene awareness for dynamic, contextual task guidance.arXiv:2406.13302, 2024. 2

work page arXiv 2024

[33] [33]

O. Khan, Z. Ahmed, H. Nam, and K. Kim. TangibleMoments: Em- bedding XR memories onto physical objects. InProc. of IEEE VRW, pp. 1147–1153, 2025. 2

work page 2025

[34] [34]

Y . Kim, Z. Aamir, M. Singh, S. Boorboor, K. Mueller, and A. E. Kaufman. Explainable XR: understanding user behaviors of XR en- vironments using LLM-assisted analytics framework.IEEE TVCG, 31(5):1–11, 2025. 2, 3

work page 2025

[35] [35]

Y . Kim, S. Boorboor, A. Rahmati, and A. Kaufman. Design of privacy preservation system in augmented reality. InProc. of IEEE VizSec,

work page

[36] [36]

Y . Kim, S. Goutam, A. Rahmati, and A. Kaufman. Erebus: Access control for augmented reality systems. InProc. of USENIX Security, pp. 929–946, 2023. 2

work page 2023

[37] [37]

R. K. Kundu, I. Ahmed, and K. A. Hoque. Pilar: Personal- izing augmented reality interactions with llm-based human-centric and trustworthy explanations for daily use cases.arXiv preprint arXiv:2512.17172, 2025. 2

work page arXiv 2025

[38] [38]

B. Lee, M. Sedlmair, and D. Schmalstieg. Design patterns for situated visualization in augmented reality.IEEE TVCG, 30(1):1324–1335,

work page

[39] [39]

G. Lee, M. Xia, N. Numan, X. Qian, D. Li, Y . Chen, A. Kulshrestha, I. Chatterjee, Y . Zhang, D. Manocha, et al. Sensible agent: A frame- work for unobtrusive interaction with proactive ar agents. InProc. of ACM UIST, pp. 1–22, 2025. 1, 2, 9

work page 2025

[40] [40]

J. Lee, J. Kim, J. Ahn, and W. Woo. Remote diagnosis of architec- tural heritage based on 5w1h model-based metadata in virtual reality. ISPRS IJGI, 8(8):339, 2019. 3

work page 2019

[41] [41]

J. Lee, J. Wang, E. Brown, L. Chu, S. S. Rodriguez, and J. E. Froehlich. GazePointAR: a context-aware multimodal voice assistant for pronoun disambiguation in wearable augmented reality. InProc. of ACM CHI, pp. 1–20, 2024. 1, 2

work page 2024

[42] [42]

Lewis, E

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W.-t. Yih, T. Rockt ¨aschel, et al. Retrieval- augmented generation for knowledge-intensive nlp tasks.NeurIPS, 33:9459–9474, 2020. 5

work page 2020

[43] [43]

C. Li, G. Wu, G. Y .-Y . Chan, D. G. Turakhia, S. Castelo Quispe, D. Li, L. Welch, C. Silva, and J. Qian. Satori: Towards proactive ar assistant 10 © 2026 IEEE. This is the author’s version of the article that will appear at the IEEE Conference on Virtual Reality and 3D User Interfaces (IEEE VR). The final version of this record is available at: 10.1109/V...

work page doi:10.1109/vr67842.2026.00044 2026

[44] [44]

J. N. Li, Y . Xu, T. Grossman, S. Santosa, and M. Li. OmniActions: predicting digital actions in response to real-world multimodal sen- sory inputs with LLMs. InProc. of ACM CHI, pp. 1–22, 2024. 2, 9

work page 2024

[45] [45]

J. N. Li, Z. J. Zhang, and J. Ma. Omniquery: Contextually augmenting captured multimodal memory to enable personal question answering. InProc. of ACM CHI, 2025. 2

work page 2025

[46] [46]

T. Li, L. Jin, Z. Wu, and Y . Chen. Combined recommendation algo- rithm based on improved similarity and forgetting curve.Information, 10(4):130, 2019. 9

work page 2019

[47] [47]

J. Liu, K. A. Satriadi, B. Ens, and T. Dwyer. Investigating the effects of physical landmarks on spatial memory for information visualisation in augmented reality. InProc. of IEEE ISMAR, pp. 289–298, 2024. 2

work page 2024

[48] [48]

X. B. Liu, S. Fang, W. Shi, C.-S. Wu, T. Igarashi, and X. Chen. Proac- tive conversational agents with inner thoughts. InProc. of ACM CHI,

work page

[49] [49]

L. Long, Y . He, W. Ye, Y . Pan, Y . Lin, H. Li, J. Zhao, and W. Li. Seeing, listening, remembering, and reasoning: A multimodal agent with long-term memory.arXiv preprint arXiv:2508.09736, 2025. 2

work page arXiv 2025

[50] [50]

Lu and D

F. Lu and D. A. Bowman. Evaluating the potential of glanceable ar in- terfaces for authentic everyday uses. InIEEE VR, pp. 768–777, 2021. 2

work page 2021

[51] [51]

F. Lu, L. Pavanatto, and D. A. Bowman. In-the-wild experiences with an interactive glanceable ar system for everyday use. InProc. of ACM SUI, pp. 1–9, 2023. 2

work page 2023

[52] [52]

Z. Lv, N. Charron, P. Moulon, A. Gamino, C. Peng, C. Sweeney, E. Miller, H. Tang, J. Meissner, J. Dong, et al. Aria everyday activities dataset.arXiv:2402.13349, 2024. 2

work page arXiv 2024

[53] [53]

EMG Wristbands and Technology.https://www.meta.com/ emerging-tech/emg-wearable-technology/

Meta. EMG Wristbands and Technology.https://www.meta.com/ emerging-tech/emg-wearable-technology/. Jan. 7. 2026. 9

work page 2026

[54] [54]

Meurisch, C

C. Meurisch, C. A. Mihale-Wilson, A. Hawlitschek, F. Giger, F. M¨uller, O. Hinz, and M. M ¨uhlh¨auser. Exploring user expectations of proactive ai systems.Proc. of ACM IMWUT, 4(4):1–22, 2020. 1

work page 2020

[55] [55]

Milgram and F

P. Milgram and F. Kishino. A taxonomy of mixed reality visual dis- plays.IEICE TIS, 77(12):1321–1329, 1994. 9

work page 1994

[56] [56]

L. Ning, L. Liu, J. Wu, N. Wu, D. Berlowitz, S. Prakash, B. Green, S. O’Banion, and J. Xie. User-llm: Efficient llm contextualization with user embeddings. InProc. of ACM WWW, pp. 1219–1223, 2025. 2

work page 2025

[57] [57]

Paruchuri, S

A. Paruchuri, S. Hersek, L. Aggarwal, Q. Yang, X. Liu, A. Kul- shrestha, A. Colaco, H. Fuchs, and I. Chatterjee. Egotrigger: Toward audio-driven image capture for human memory enhancement in all- day energy-efficient smart glasses.IEEE TVCG, 2025. 1

work page 2025

[58] [58]

Perera, A

C. Perera, A. Zaslavsky, P. Christen, and D. Georgakopoulos. Context aware computing for the internet of things: A survey.IEEE Commun. Surv. Tutor., 16(1):414–454, 2013. 2

work page 2013

[59] [59]

K. Pu, T. Zhang, N. Sendhilnathan, S. Freitag, R. Sodhi, and T. R. Jonker. Promemassist: Exploring timely proactive assistance through working memory modeling in multi-modal wearable devices. InProc. of UIST, pp. 1–19, 2025. 1, 2

work page 2025

[60] [60]

Raianova and M

A. Raianova and M. Lee. Adaptive learning in extended reality: A survey on multimodal interaction and ai-driven personalization. In Proc. of IEEE ISMAR-Adjunct, pp. 205–210, 2025. 9

work page 2025

[61] [61]

Rajaram, M

S. Rajaram, M. Peralta, J. G. Johnson, and M. Nebeling. Exploring the design space of privacy-driven adaptation techniques for future augmented reality interfaces. InProc. of ACM CHI, pp. 1–19, 2025. 2

work page 2025

[62] [62]

Rajaram, H

S. Rajaram, H. B. Surale, C. McConkey, C. Rognon, H. Mehta, M. Glueck, and C. Collins. Gesture and audio-haptic guidance tech- niques to direct conversations with intelligent voice interfaces. In Proc. of ACM CHI, pp. 1–20, 2025. 1, 2, 3, 9

work page 2025

[63] [63]

L. Rau, J. L. Bitter, Y . Liu, U. Spierling, and R. D ¨orner. Support- ing the creation of non-linear everyday ar experiences in exhibitions and museums: An authoring process based on self-contained building blocks.Front. Virtual Reality, 3:955437, 2022. 2

work page 2022

[64] [64]

K. A. Satriadi, A. Cunningham, R. T. Smith, T. Dwyer, A. Dro- gemuller, and B. H. Thomas. Proxsituated visualization: An extended model of situated visualization using proxies for physical referents. In Proc. of ACM CHI, pp. 1–20, 2023. 9

work page 2023

[65] [65]

K. A. Satriadi, B. Tag, and T. Dwyer. Context-dependent memory in situated visualization.arXiv:2311.12288, 2023. 2

work page arXiv 2023

[66] [66]

J. Shen, J. J. Dudley, and P. O. Kristensson. Encode-store-retrieve: Augmenting human memory through language-encoded egocentric perception. InProc. of IEEE ISMAR, pp. 923–931, 2024. 1, 2

work page 2024

[67] [67]

E. Song, T. Ha, J. Park, H. Lee, and W. Woo. Holistic quantified- self for context-aware wearable augmented reality.IJHCS, p. 103568,

work page

[68] [68]

Stover and D

D. Stover and D. Bowman. Taggar: General-purpose task guidance from natural language in augmented reality using vision-language models. InProc. of ACM SUI, pp. 1–12, 2024. 2

work page 2024

[69] [69]

T. T. M. Tran, S. Brown, O. Weidlich, S. Yoo, and C. Parker. Wear- able ar in everyday contexts: Insights from a digital ethnography of youtube videos. InProc. of ACM CHI, 2025. 2

work page 2025

[70] [70]

If My Apple Can Talk

Y . Wang, Y . Lu, S. Yan, and X. Shen. “If My Apple Can Talk”: Ex- ploring the use of everyday objects as personalized ai agents in mixed reality. InProc. of ACM CHI EA, pp. 1–9, 2025. 2

work page 2025

[71] [71]

X. Xu, A. Yu, T. R. Jonker, K. Todi, F. Lu, X. Qian, J. M. Evange- lista Belo, T. Wang, M. Li, A. Mun, et al. Xair: A framework of explainable ai in augmented reality. InProc. of ACM CHI, pp. 1–30,

work page

[72] [72]

B. Yang, L. Xu, L. Zeng, K. Liu, S. Jiang, W. Lu, H. Chen, X. Jiang, G. Xing, and Z. Yan. ContextAgent: Context-aware proactive llm agents with open-world sensory perceptions.NeurIPS, 2025. 1

work page 2025

[73] [73]

J. Yang, S. Yang, A. W. Gupta, R. Han, L. Fei-Fei, and S. Xie. Think- ing in space: How multimodal large language models see, remember, and recall spaces. InProc. of IEEE/CVF CVPR, pp. 10632–10643,

work page

[74] [74]

Zhang, Y

X. Zhang, Y . Deng, Z. Ren, S. K. Ng, and T.-S. Chua. Ask-before- plan: Proactive language agents for real-world planning. InProf. of ACL EMNLP, pp. 10836–10863, 2024. 2

work page 2024

[75] [75]

Zheng, H

J. Zheng, H. Weng, X. Wang, C. Cui, S. Mayer, C.-L. Tai, and L.-H. Lee. Persono: Personalised notification urgency classifier in mixed reality. InProc. of IEEE ISMAR, pp. 1053–1063, 2025. 9

work page 2025

[76] [76]

Zhu, S.-K

C. Zhu, S.-K. Hsia, X. Hu, Z. Liu, J. Shi, and K. Ramani. agentar: Cre- ating augmented reality applications with tool-augmented llm-based autonomous agents. InProc. of ACM UIST, pp. 1–23, 2025. 2

work page 2025

[77] [77]

W. D. Zulfikar, S. Chan, and P. Maes. Memoro: Using large language models to realize a concise interface for real-time memory augmenta- tion. InProc. of ACM CHI, pp. 1–18, 2024. 2 11

work page 2024