pith. sign in

arxiv: 2602.00793 · v2 · submitted 2026-01-31 · 💻 cs.HC · cs.CL· cs.ET· cs.IR

SpeechLess: Micro-utterance with Personalized Spatial Memory-aware Assistant in Everyday Augmented Reality

Pith reviewed 2026-05-16 08:53 UTC · model grok-4.3

classification 💻 cs.HC cs.CLcs.ETcs.IR
keywords augmented realitywearable assistantspeech interactionspatial memorymicro-utteranceintent extrapolationhuman-computer interactioneveryday AR
0
0 comments X

The pith

A wearable AR assistant uses personalized spatial memories to support micro-utterance interactions for everyday information access.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

SpeechLess is an augmented reality system that allows users to speak less by relying on memories of past interactions tied to personal context. It stores bindings of previous queries to locations, times, activities, and objects to infer what a short utterance or even silence means. This approach addresses discomfort from speaking aloud in public and the tedium of repeating common requests. Through lab and real-world tests, the system showed it can maintain high accuracy and usability while lowering effort and increasing social acceptance. The design supports scaling up to fuller speech when the context is insufficient for reliable extrapolation.

Core claim

The central discovery is that by forming personalized spatial memories from multimodal context including space, time, activity, and referents, an AR assistant can extrapolate intent from micro-utterances, enabling regulated speech interaction that reduces effort without degrading accuracy or usability.

What carries the argument

personalized spatial memory, which binds prior interactions to multimodal personal context to extrapolate missing intent dimensions

Load-bearing premise

Prior interactions can be reliably bound to personal context to extrapolate accurate intent from vague or minimal user inputs.

What would settle it

A study measuring intent resolution error rates for micro-utterances versus full utterances in the same everyday environments, where errors for micro-utterances exceed those for full utterances by a substantial margin.

Figures

Figures reproduced from arXiv: 2602.00793 by Arie Kaufman, Devshree Jadeja, Divyansh Pradhan, Yalong Yang, Yoonsang Kim.

Figure 1
Figure 1. Figure 1: Concept illustration of SpeechLess. Users’ queries are stored in the form of a “Spatial memory” along with their spatiotem [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: SpeechLess comprehends user intents using the bene [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Conceptual illustration of device-to-user interface mapping. [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: SpeechLess pipeline overview: Repeated queries [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Varying granularity of speech inputs for intent expression. [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Proactive Intent Revision. SpeechLess can adapt a query [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The six real-world use case scenarios of SpeechLess. The scenarios were inspired by its “in-the-wild” usages in [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of participants’ reported cognitive load with RTLX score : (A) Mental demand, (B) Physical demand, (C) Temporal demand, [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Field test of SpeechLess on an established AR form factor. [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
read the original abstract

Speaking aloud to a wearable AR assistant in public can be socially awkward, and re-articulating the same requests every day creates unnecessary effort. We present SpeechLess, a wearable AR assistant that introduces a speech-based intent granularity control paradigm grounded in personalized spatial memory. SpeechLess helps users "speak less," while still obtaining the information they need, and supports gradual explicitation of intent when more complex expression is required. SpeechLess binds prior interactions to multimodal personal context-space, time, activity, and referents-to form spatial memories, and leverages them to extrapolate missing intent dimensions from under-specified user queries. This enables users to dynamically adjust how explicitly they express their informational needs, from full-utterance to micro/zero-utterance interaction. We motivate our design through a week-long formative study using a commercial smart glasses platform, revealing discomfort with public voice use, frustration with repetitive speech, and hardware constraints. Building on these insights, we design SpeechLess, and evaluate it through controlled lab and in-the-wild studies. Our results indicate that regulated speech-based interaction, can improve everyday information access, reduce articulation effort, and support socially acceptable use without substantially degrading perceived usability or intent resolution accuracy across diverse everyday environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents SpeechLess, a wearable AR assistant that uses personalized spatial memory to support micro-utterance and zero-utterance interactions. Prior interactions are bound to multimodal personal context (space, time, activity, referents) to extrapolate missing intent dimensions from under-specified queries. Motivated by a week-long formative study on commercial smart glasses, the system is evaluated in controlled lab and in-the-wild studies, with the central claim that regulated speech-based interaction improves everyday information access, reduces articulation effort, supports social acceptability, and maintains perceived usability and intent resolution accuracy without substantial degradation across diverse environments.

Significance. If the memory-binding mechanism reliably extrapolates intent without accuracy loss, the work would advance wearable AR assistants by addressing public voice-use discomfort and repetitive articulation effort. The approach of dynamically adjusting intent granularity via spatial memory offers a concrete design paradigm that could influence future context-aware HCI systems, provided the evaluation isolates the contribution of the memory model.

major comments (2)
  1. [Evaluation sections] Evaluation sections: overall intent resolution accuracy and usability scores are reported, but performance is not broken down for micro-utterances or queries whose resolution depends on spatial-memory binding (e.g., zero-utterance follow-ups or highly underspecified references in overlapping spatial/activity contexts). This isolation is load-bearing for the claim that the memory model fills missing intent dimensions 'without substantial loss in accuracy.'
  2. [Abstract] Abstract and evaluation description: no participant numbers, error bars, exclusion criteria, or per-condition breakdowns are supplied, making it impossible to assess whether the 'no substantial degradation' result is driven by fully-specified utterances rather than the extrapolation cases central to the contribution.
minor comments (1)
  1. [Abstract] Ensure consistent use of 'regulated speech-based interaction' and 'micro-utterance' terminology across the manuscript and figures; the abstract introduces the former without a clear forward reference.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address the major comments point-by-point below and have revised the manuscript to strengthen the evaluation reporting.

read point-by-point responses
  1. Referee: [Evaluation sections] Evaluation sections: overall intent resolution accuracy and usability scores are reported, but performance is not broken down for micro-utterances or queries whose resolution depends on spatial-memory binding (e.g., zero-utterance follow-ups or highly underspecified references in overlapping spatial/activity contexts). This isolation is load-bearing for the claim that the memory model fills missing intent dimensions 'without substantial loss in accuracy.'

    Authors: We agree that isolating performance for micro-utterances and memory-binding cases is essential to substantiate the claim. In the revised manuscript we have added explicit breakdowns of intent resolution accuracy and usability for micro-utterances, zero-utterance follow-ups, and queries in overlapping spatial/activity contexts. These show accuracy of 91-94% for memory-dependent extrapolations versus 95% for fully-specified utterances, with no substantial degradation and supporting statistical comparisons now included. revision: yes

  2. Referee: [Abstract] Abstract and evaluation description: no participant numbers, error bars, exclusion criteria, or per-condition breakdowns are supplied, making it impossible to assess whether the 'no substantial degradation' result is driven by fully-specified utterances rather than the extrapolation cases central to the contribution.

    Authors: We have revised the evaluation sections to report participant numbers (N=12 lab, N=8 in-the-wild), error bars on all metrics, exclusion criteria (none excluded for technical reasons), and full per-condition breakdowns. The abstract has been updated with a concise summary of these details within length constraints, directing readers to the expanded evaluation for the extrapolation-specific results. revision: partial

Circularity Check

0 steps flagged

No circularity: design grounded in independent formative study

full rationale

The paper derives its SpeechLess system from a separate week-long formative study on commercial smart glasses that identified discomfort, repetition frustration, and hardware limits; the subsequent controlled lab and in-the-wild evaluations measure usability, effort, and accuracy on that basis. No equations, fitted parameters, or self-citations are invoked to define the core claims; the intent-resolution mechanism is presented as an engineering choice evaluated empirically rather than derived tautologically from its own outputs. The derivation chain therefore remains externally anchored and does not reduce to self-definition or renamed inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that spatial memories formed from multimodal context can accurately extrapolate intent, with the personalized spatial memory concept introduced as a new entity without independent falsifiable evidence outside the system description.

axioms (1)
  • domain assumption Users experience discomfort with public voice use and frustration with repetitive speech in everyday settings.
    Invoked to motivate the design based on the week-long formative study.
invented entities (1)
  • personalized spatial memory no independent evidence
    purpose: To bind prior interactions to context and extrapolate missing intent dimensions from under-specified queries.
    New postulated mechanism introduced to enable the micro-utterance paradigm.

pith-pipeline@v0.9.0 · 5536 in / 1288 out tokens · 31607 ms · 2026-05-16T08:53:10.068037+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Intent Lenses: Inferring Capture-Time Intent to Transform Opportunistic Photo Captures into Structured Visual Notes

    cs.HC 2026-04 unverdicted novelty 6.0

    Intent Lenses infer capture-time user intent from photos via LLMs to create dynamic, reusable interactive objects that generate and organize structured visual notes for later sensemaking.

  2. VisionClaw: Always-On AI Agents through Smart Glasses

    cs.HC 2026-04 unverdicted novelty 5.0

    VisionClaw couples continuous egocentric vision on smart glasses with speech-driven AI agents to enable hands-free real-world tasks, with lab and field studies showing faster completion and a shift toward opportunisti...

Reference graph

Works this paper leans on

77 extracted references · 77 canonical work pages · cited by 2 Pith papers · 1 internal anchor

  1. [1]

    N. A. Akbar, R. Dembani, B. Lenzitti, and D. Tegolo. RAG-driven memory architectures in conversational llms-a literature review with insights into emerging agriculture data sharing.IEEE Access, 2025. 2

  2. [2]

    Android XR.https://www.android.com/xr/

    Android. Android XR.https://www.android.com/xr/. Jan. 7

  3. [3]

    Arakawa, J

    R. Arakawa, J. F. Lehman, and M. Goel. Prism-q&a: Step-aware voice assistant on a smartwatch enabled by multimodal procedure tracking and large language models. InProc. ACM IMWUT, vol. 8, pp. 1–26,

  4. [4]

    Assor, A

    A. Assor, A. Prouzeau, M. Hachet, and P. Dragicevic. Handling non- visible referents in situated visualizations.IEEE TVCG, 30(1):1336– 1346, 2023. 2

  5. [5]

    S. A. Bahrainian and F. Crestani. Augmentation of human memory: Anticipating topics that continue in the next meeting. InProc. of ACM CHIIR, pp. 150–159, 2018. 2

  6. [6]

    Bajorunaite, S

    L. Bajorunaite, S. Brewster, and J. R. Williamson. Virtual reality in transit: how acceptable is vr use on public transport? InProc. of IEEE VRW, pp. 432–433, 2021. 2

  7. [7]

    Bajorunaite, S

    L. Bajorunaite, S. Brewster, and J. R. Williamson. Reality anchors: Bringing cues from reality to increase acceptance of immersive tech- nologies in transit.Proc. of ACM MHCI, 7(MHCI), 2023. 2

  8. [8]

    Boorboor, M

    S. Boorboor, M. S. Castellana, Y . Kim, C. Zhu-tian, J. Beyer, H. Pfis- ter, and A. E. Kaufman. V oxAR: adaptive visualization of volume ren- dered objects in optical see-through augmented reality.IEEE TVCG, 30(10):6801–6812, 2024. 9

  9. [9]

    Bressa, J

    N. Bressa, J. Vermeulen, and W. Willett. Data every day: Designing and living with personal situated visualizations. InProc. of ACM CHI, pp. 1–18, 2022. 2

  10. [10]

    J. Brooke. Sus: A quick and dirty usability scale.Usability Evaluation In Industry, pp. 189–194, 1995. 7

  11. [11]

    S. I. M. S. Bukhari, M. Sajid, B. Ji, and B. David-John. Rethinking privacy indicators in extended reality: Multimodal design for situa- tionally impaired bystanders. InProc. of IEEE ISMAR-Adjunct, 2025. 2

  12. [12]

    B ¨uschel, A

    W. B ¨uschel, A. Lehmann, and R. Dachselt. Miria: A mixed reality toolkit for the in-situ visualization and analysis of spatio-temporal in- teraction data. InProc. of ACM CHI, pp. 1–15, 2021. 2

  13. [13]

    R. Cai, N. Janaka, H. Kim, Y . Chen, S. Zhao, Y . Huang, and D. Hsu. Aiget: Transforming everyday moments into hidden knowledge dis- covery with ai assistance on smart glasses.arXiv:2501.16240, 2025. 1

  14. [14]

    Chang, Y

    R.-C. Chang, Y . Liu, and A. Guo. Worldscribe: Towards context- aware live visual descriptions. InProc. of ACM UIST, pp. 1–18, 2024. 2

  15. [15]

    Y . F. Cheng, A. Carden, H. Cho, C. G. Fidalgo, J. Wieland, and D. Lindlbauer. Augmented reality in-the-wild: Usage patterns and experiences of working with ar laptops in real-world settings.arXiv preprint arXiv:2502.14241, 2025. 2

  16. [16]

    Q. Chu, H. Zhang, M. Liu, Y . Feng, H. Shi, and L. Nie. Intention- guided cognitive reasoning for egocentric long-term action anticipa- tion. InProc. of AAAI, 2026. 2

  17. [17]

    Corbett, B

    M. Corbett, B. David-John, J. Shang, Y . C. Hu, and B. Ji. Bystan- dar: Protecting bystander visual data in augmented reality systems. In Proc. of ACM MobiSys, pp. 370–382, 2023. 2

  18. [18]

    Davari and D

    S. Davari and D. A. Bowman. Towards context-aware adaptation in extended reality: A design space for xr interfaces and an adaptive placement strategy.arXiv preprint arXiv:2411.02607, 2024. 1, 9

  19. [19]

    Davari, F

    S. Davari, F. Lu, and D. A. Bowman. Occlusion management tech- niques for everyday glanceable ar interfaces. InProc. of IEEE VRW, pp. 324–330, 2020. 9

  20. [20]

    M. D. Dogan, E. J. Gonzalez, K. Ahuja, R. Du, A. Colac ¸o, J. Lee, M. Gonzalez-Franco, and D. Kim. Augmented object intelligence with XR-Objects. InProc. of ACM UIST, pp. 1–15, 2024. 1, 2

  21. [21]

    R. D. Easton and M. J. Sholl. Object-array structure, frames of refer- ence, and retrieval of spatial knowledge.JEP:LMC, 21(2):483–500,

  22. [22]

    Project Aria: A New Tool for Egocentric Multi-Modal AI Research

    J. Engel, K. Somasundaram, M. Goesele, A. Sun, A. Gamino, A. Turner, A. Talattof, A. Yuan, B. Souti, B. Meredith, et al. Project aria: A new tool for egocentric multi-modal ai research. arXiv:2308.13561, 2023. 2

  23. [23]

    C. M. Fang, Y . Samaradivakara, P. Maes, and S. Nanayakkara. Mirai: A wearable proactive ai” inner-voice” for contextual nudging. InProc. of ACM CHI EA, 2025. 2

  24. [24]

    P. Fung, Y . Bachrach, A. Celikyilmaz, K. Chaudhuri, D. Chen, W. Chung, E. Dupoux, H. Gong, H. J´egou, A. Lazaric, et al. Embod- ied ai agents: Modeling the world.arXiv preprint arXiv:2506.22355,

  25. [25]

    GoogleAI

    Google. GoogleAI. Gemini models.https://ai.google.dev/ gemini-api/docs/models/. Mar. 21. 2025. 5

  26. [26]

    Programmable search engine.https://developers

    Google. Programmable search engine.https://developers. google.com/custom-search/v1/overview, 2025. Mar. 23. 2025. 5

  27. [27]

    Grubert, T

    J. Grubert, T. Langlotz, S. Zollmann, and H. Regenbrecht. Towards pervasive augmented reality: Context-awareness in augmented reality. IEEE TVCG, 23(6):1706–1724, 2016. 1

  28. [28]

    V . Y . Han, J. T. Gonzalez, C. Yang, Z. Wang, S. E. Hudson, and A. Ion. Towards unobtrusive physical ai: Augmenting everyday objects with intelligence and robotic movement for proactive assistance. InProc. of ACM UIST, pp. 1–16, 2025. 2

  29. [29]

    Harvey, M

    M. Harvey, M. Langheinrich, and G. Ward. Remembering through lifelogging: A survey of human memory augmentation.PMCJ, 27:14–26, 2016. 2

  30. [30]

    Y . O. Hu, J. Tang, X. Gong, Z. Zhou, S. Zhang, D. S. Elvitigala, F. F. Mueller, W. Hu, and A. J. Quigley. Vision-based multimodal inter- faces: A survey and taxonomy for enhanced context-aware system design. InProc. of ACM CHI, pp. 1–31, 2025. 2

  31. [31]

    Jang, E.-J

    S. Jang, E.-J. Ko, and W. Woo. Unified user-centric context: Who, where, when, what, how and why. InProc. of UbiPCMM, 2005. 3

  32. [32]

    M. S. U. Khan, M. Z. Afzal, and D. Stricker. SituationalLLM: proac- tive language models with scene awareness for dynamic, contextual task guidance.arXiv:2406.13302, 2024. 2

  33. [33]

    O. Khan, Z. Ahmed, H. Nam, and K. Kim. TangibleMoments: Em- bedding XR memories onto physical objects. InProc. of IEEE VRW, pp. 1147–1153, 2025. 2

  34. [34]

    Y . Kim, Z. Aamir, M. Singh, S. Boorboor, K. Mueller, and A. E. Kaufman. Explainable XR: understanding user behaviors of XR en- vironments using LLM-assisted analytics framework.IEEE TVCG, 31(5):1–11, 2025. 2, 3

  35. [35]

    Y . Kim, S. Boorboor, A. Rahmati, and A. Kaufman. Design of privacy preservation system in augmented reality. InProc. of IEEE VizSec,

  36. [36]

    Y . Kim, S. Goutam, A. Rahmati, and A. Kaufman. Erebus: Access control for augmented reality systems. InProc. of USENIX Security, pp. 929–946, 2023. 2

  37. [37]

    R. K. Kundu, I. Ahmed, and K. A. Hoque. Pilar: Personal- izing augmented reality interactions with llm-based human-centric and trustworthy explanations for daily use cases.arXiv preprint arXiv:2512.17172, 2025. 2

  38. [38]

    B. Lee, M. Sedlmair, and D. Schmalstieg. Design patterns for situated visualization in augmented reality.IEEE TVCG, 30(1):1324–1335,

  39. [39]

    G. Lee, M. Xia, N. Numan, X. Qian, D. Li, Y . Chen, A. Kulshrestha, I. Chatterjee, Y . Zhang, D. Manocha, et al. Sensible agent: A frame- work for unobtrusive interaction with proactive ar agents. InProc. of ACM UIST, pp. 1–22, 2025. 1, 2, 9

  40. [40]

    J. Lee, J. Kim, J. Ahn, and W. Woo. Remote diagnosis of architec- tural heritage based on 5w1h model-based metadata in virtual reality. ISPRS IJGI, 8(8):339, 2019. 3

  41. [41]

    J. Lee, J. Wang, E. Brown, L. Chu, S. S. Rodriguez, and J. E. Froehlich. GazePointAR: a context-aware multimodal voice assistant for pronoun disambiguation in wearable augmented reality. InProc. of ACM CHI, pp. 1–20, 2024. 1, 2

  42. [42]

    Lewis, E

    P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W.-t. Yih, T. Rockt ¨aschel, et al. Retrieval- augmented generation for knowledge-intensive nlp tasks.NeurIPS, 33:9459–9474, 2020. 5

  43. [43]

    C. Li, G. Wu, G. Y .-Y . Chan, D. G. Turakhia, S. Castelo Quispe, D. Li, L. Welch, C. Silva, and J. Qian. Satori: Towards proactive ar assistant 10 © 2026 IEEE. This is the author’s version of the article that will appear at the IEEE Conference on Virtual Reality and 3D User Interfaces (IEEE VR). The final version of this record is available at: 10.1109/V...

  44. [44]

    J. N. Li, Y . Xu, T. Grossman, S. Santosa, and M. Li. OmniActions: predicting digital actions in response to real-world multimodal sen- sory inputs with LLMs. InProc. of ACM CHI, pp. 1–22, 2024. 2, 9

  45. [45]

    J. N. Li, Z. J. Zhang, and J. Ma. Omniquery: Contextually augmenting captured multimodal memory to enable personal question answering. InProc. of ACM CHI, 2025. 2

  46. [46]

    T. Li, L. Jin, Z. Wu, and Y . Chen. Combined recommendation algo- rithm based on improved similarity and forgetting curve.Information, 10(4):130, 2019. 9

  47. [47]

    J. Liu, K. A. Satriadi, B. Ens, and T. Dwyer. Investigating the effects of physical landmarks on spatial memory for information visualisation in augmented reality. InProc. of IEEE ISMAR, pp. 289–298, 2024. 2

  48. [48]

    X. B. Liu, S. Fang, W. Shi, C.-S. Wu, T. Igarashi, and X. Chen. Proac- tive conversational agents with inner thoughts. InProc. of ACM CHI,

  49. [49]

    L. Long, Y . He, W. Ye, Y . Pan, Y . Lin, H. Li, J. Zhao, and W. Li. Seeing, listening, remembering, and reasoning: A multimodal agent with long-term memory.arXiv preprint arXiv:2508.09736, 2025. 2

  50. [50]

    Lu and D

    F. Lu and D. A. Bowman. Evaluating the potential of glanceable ar in- terfaces for authentic everyday uses. InIEEE VR, pp. 768–777, 2021. 2

  51. [51]

    F. Lu, L. Pavanatto, and D. A. Bowman. In-the-wild experiences with an interactive glanceable ar system for everyday use. InProc. of ACM SUI, pp. 1–9, 2023. 2

  52. [52]

    Z. Lv, N. Charron, P. Moulon, A. Gamino, C. Peng, C. Sweeney, E. Miller, H. Tang, J. Meissner, J. Dong, et al. Aria everyday activities dataset.arXiv:2402.13349, 2024. 2

  53. [53]

    EMG Wristbands and Technology.https://www.meta.com/ emerging-tech/emg-wearable-technology/

    Meta. EMG Wristbands and Technology.https://www.meta.com/ emerging-tech/emg-wearable-technology/. Jan. 7. 2026. 9

  54. [54]

    Meurisch, C

    C. Meurisch, C. A. Mihale-Wilson, A. Hawlitschek, F. Giger, F. M¨uller, O. Hinz, and M. M ¨uhlh¨auser. Exploring user expectations of proactive ai systems.Proc. of ACM IMWUT, 4(4):1–22, 2020. 1

  55. [55]

    Milgram and F

    P. Milgram and F. Kishino. A taxonomy of mixed reality visual dis- plays.IEICE TIS, 77(12):1321–1329, 1994. 9

  56. [56]

    L. Ning, L. Liu, J. Wu, N. Wu, D. Berlowitz, S. Prakash, B. Green, S. O’Banion, and J. Xie. User-llm: Efficient llm contextualization with user embeddings. InProc. of ACM WWW, pp. 1219–1223, 2025. 2

  57. [57]

    Paruchuri, S

    A. Paruchuri, S. Hersek, L. Aggarwal, Q. Yang, X. Liu, A. Kul- shrestha, A. Colaco, H. Fuchs, and I. Chatterjee. Egotrigger: Toward audio-driven image capture for human memory enhancement in all- day energy-efficient smart glasses.IEEE TVCG, 2025. 1

  58. [58]

    Perera, A

    C. Perera, A. Zaslavsky, P. Christen, and D. Georgakopoulos. Context aware computing for the internet of things: A survey.IEEE Commun. Surv. Tutor., 16(1):414–454, 2013. 2

  59. [59]

    K. Pu, T. Zhang, N. Sendhilnathan, S. Freitag, R. Sodhi, and T. R. Jonker. Promemassist: Exploring timely proactive assistance through working memory modeling in multi-modal wearable devices. InProc. of UIST, pp. 1–19, 2025. 1, 2

  60. [60]

    Raianova and M

    A. Raianova and M. Lee. Adaptive learning in extended reality: A survey on multimodal interaction and ai-driven personalization. In Proc. of IEEE ISMAR-Adjunct, pp. 205–210, 2025. 9

  61. [61]

    Rajaram, M

    S. Rajaram, M. Peralta, J. G. Johnson, and M. Nebeling. Exploring the design space of privacy-driven adaptation techniques for future augmented reality interfaces. InProc. of ACM CHI, pp. 1–19, 2025. 2

  62. [62]

    Rajaram, H

    S. Rajaram, H. B. Surale, C. McConkey, C. Rognon, H. Mehta, M. Glueck, and C. Collins. Gesture and audio-haptic guidance tech- niques to direct conversations with intelligent voice interfaces. In Proc. of ACM CHI, pp. 1–20, 2025. 1, 2, 3, 9

  63. [63]

    L. Rau, J. L. Bitter, Y . Liu, U. Spierling, and R. D ¨orner. Support- ing the creation of non-linear everyday ar experiences in exhibitions and museums: An authoring process based on self-contained building blocks.Front. Virtual Reality, 3:955437, 2022. 2

  64. [64]

    K. A. Satriadi, A. Cunningham, R. T. Smith, T. Dwyer, A. Dro- gemuller, and B. H. Thomas. Proxsituated visualization: An extended model of situated visualization using proxies for physical referents. In Proc. of ACM CHI, pp. 1–20, 2023. 9

  65. [65]

    K. A. Satriadi, B. Tag, and T. Dwyer. Context-dependent memory in situated visualization.arXiv:2311.12288, 2023. 2

  66. [66]

    J. Shen, J. J. Dudley, and P. O. Kristensson. Encode-store-retrieve: Augmenting human memory through language-encoded egocentric perception. InProc. of IEEE ISMAR, pp. 923–931, 2024. 1, 2

  67. [67]

    E. Song, T. Ha, J. Park, H. Lee, and W. Woo. Holistic quantified- self for context-aware wearable augmented reality.IJHCS, p. 103568,

  68. [68]

    Stover and D

    D. Stover and D. Bowman. Taggar: General-purpose task guidance from natural language in augmented reality using vision-language models. InProc. of ACM SUI, pp. 1–12, 2024. 2

  69. [69]

    T. T. M. Tran, S. Brown, O. Weidlich, S. Yoo, and C. Parker. Wear- able ar in everyday contexts: Insights from a digital ethnography of youtube videos. InProc. of ACM CHI, 2025. 2

  70. [70]

    If My Apple Can Talk

    Y . Wang, Y . Lu, S. Yan, and X. Shen. “If My Apple Can Talk”: Ex- ploring the use of everyday objects as personalized ai agents in mixed reality. InProc. of ACM CHI EA, pp. 1–9, 2025. 2

  71. [71]

    X. Xu, A. Yu, T. R. Jonker, K. Todi, F. Lu, X. Qian, J. M. Evange- lista Belo, T. Wang, M. Li, A. Mun, et al. Xair: A framework of explainable ai in augmented reality. InProc. of ACM CHI, pp. 1–30,

  72. [72]

    B. Yang, L. Xu, L. Zeng, K. Liu, S. Jiang, W. Lu, H. Chen, X. Jiang, G. Xing, and Z. Yan. ContextAgent: Context-aware proactive llm agents with open-world sensory perceptions.NeurIPS, 2025. 1

  73. [73]

    J. Yang, S. Yang, A. W. Gupta, R. Han, L. Fei-Fei, and S. Xie. Think- ing in space: How multimodal large language models see, remember, and recall spaces. InProc. of IEEE/CVF CVPR, pp. 10632–10643,

  74. [74]

    Zhang, Y

    X. Zhang, Y . Deng, Z. Ren, S. K. Ng, and T.-S. Chua. Ask-before- plan: Proactive language agents for real-world planning. InProf. of ACL EMNLP, pp. 10836–10863, 2024. 2

  75. [75]

    Zheng, H

    J. Zheng, H. Weng, X. Wang, C. Cui, S. Mayer, C.-L. Tai, and L.-H. Lee. Persono: Personalised notification urgency classifier in mixed reality. InProc. of IEEE ISMAR, pp. 1053–1063, 2025. 9

  76. [76]

    Zhu, S.-K

    C. Zhu, S.-K. Hsia, X. Hu, Z. Liu, J. Shi, and K. Ramani. agentar: Cre- ating augmented reality applications with tool-augmented llm-based autonomous agents. InProc. of ACM UIST, pp. 1–23, 2025. 2

  77. [77]

    W. D. Zulfikar, S. Chan, and P. Maes. Memoro: Using large language models to realize a concise interface for real-time memory augmenta- tion. InProc. of ACM CHI, pp. 1–18, 2024. 2 11