arxiv: 2602.10154 · v1 · submitted 2026-02-09 · 💻 cs.CR · cs.AI· cs.MM

Recognition: 1 theorem link

· Lean Theorem

PRISM-XR: Empowering Privacy-Aware XR Collaboration with Multimodal Large Language Models

Jiangong Chen , Mingyu Zhu , Bin Li

Authors on Pith no claims yet

Pith reviewed 2026-05-16 05:02 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.MM

keywords privacy preservationextended realitymultimodal LLMsedge computingXR collaborationsensitive data filteringspatial registration

0 comments

The pith

PRISM-XR uses edge-server preprocessing to filter sensitive data from XR frames before querying cloud MLLMs for collaborative content creation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PRISM-XR to allow multiple users in extended reality environments to collaborate using multimodal large language models without exposing private real-world information captured by headsets. It does this by running intelligent frame preprocessing on an edge server to remove sensitive objects and irrelevant context. A lightweight registration process and customizable sharing mechanism handle synchronization efficiently. Numerical tests show nearly 90 percent accuracy in requests, registration under 0.27 seconds, and spatial errors below 3.5 centimeters. A user study with 28 participants confirms over 90 percent filtering of sensitive objects with good usability.

Core claim

PRISM-XR provides privacy-aware integration of MLLMs into XR by preprocessing frames on the edge to filter sensitive data, using a lightweight registration process and fully customizable content-sharing to support efficient multi-user collaboration.

What carries the argument

Edge-server preprocessing that detects and removes sensitive content from XR frames prior to transmission to cloud MLLMs, paired with a lightweight registration process for spatial synchronization.

If this is right

Multi-user XR sessions can incorporate natural language and visual inputs for object creation without privacy violations from background scenes.
Registration and synchronization occur in under 0.27 seconds with spatial inconsistencies below 3.5 cm.
The system automatically filters highly sensitive objects in over 90 percent of tested scenarios.
Nearly 90 percent accuracy is maintained in fulfilling user requests during collaboration.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If edge preprocessing proves reliable, similar techniques could apply to other camera-based AI systems like smart glasses or autonomous vehicles.
Customizable sharing might allow fine-grained control over what collaborators see, reducing the need for full scene uploads.
Future work could test the framework with more diverse environments to confirm filtering effectiveness.

Load-bearing premise

Edge-server preprocessing can reliably detect sensitive content in XR frames and remove it without missing real privacy risks or removing context needed for correct MLLM responses.

What would settle it

Observe a scenario where the system misses filtering a credit card or user face in more than 10 percent of cases, or where filtered frames cause MLLM accuracy to drop below 80 percent on user tasks.

Figures

Figures reproduced from arXiv: 2602.10154 by Bin Li, Jiangong Chen, Mingyu Zhu.

**Figure 1.** Figure 1: An example of multi-user collaboration in PRISM-XR. In this example, two users collaborate with each other in the same [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: System architecture of PRISM-XR. Unlike existing systems that rely on cumbersome manual inputs [13], gesture-based audio triggers [9], or controller-based audio activation [18], PRISM-XR employs a more flexible keywordactivated approach. Users initiate their requests by starting with a predefined keyword, which triggers automatic audio recording when the keyword is detected and stops when silence is ide… view at source ↗

**Figure 3.** Figure 3: Workflow of privacy-aware frame processing. [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 5.** Figure 5: Successful user registration with a wire-frame cube. keyboard, center (327.80, 352.12), box (66.47, 66.03, 589.13, 638.21), confidence 0.91 This data is incorporated into the prompt for MLLMs to provide relevant real-world contexts. For example, the MLLM is aware of the user’s body gestures when the hand is detected, enabling vague requests like “moving objects here”. Since object detection outputs only p… view at source ↗

**Figure 6.** Figure 6: Registration evaluation. colocation app for 10 times and record the initialization time for two users. Our results, as shown in Figure 6b, indicate that the total latency for our registration is only around 0.27 seconds on average, almost 25 times faster than 6.73 seconds measured by Meta SSA. 6.3 Latency Analysis This section focuses on evaluating the end-to-end latency from a user’s voice request to syst… view at source ↗

**Figure 7.** Figure 7: NASA-TLX scores for all tasks. We also measured the task completion time and task fulfillment level statistics, as shown in [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Fulfillment levels [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

**Figure 9.** Figure 9: Results of questionnaire. 7.3 User Feedback and Discussion We have collected 248 comments across 28 users through our interviews and conducted a thematic analysis [10] to better understand user experiences with our system. 7.3.1 Positive Experiences Immersive collaboration. Many participants expressed strong satisfaction with the immersive XR experience and the system’s support for interactive, multi-us… view at source ↗

read the original abstract

Multimodal Large Language Models (MLLMs) enhance collaboration in Extended Reality (XR) environments by enabling flexible object and animation creation through the combination of natural language and visual inputs. However, visual data captured by XR headsets includes real-world backgrounds that may contain irrelevant or sensitive user information, such as credit cards left on the table or facial identities of other users. Uploading those frames to cloud-based MLLMs poses serious privacy risks, particularly when such data is processed without explicit user consent. Additionally, existing colocation and synchronization mechanisms in commercial XR APIs rely on time-consuming, privacy-invasive environment scanning and struggle to adapt to the highly dynamic nature of MLLM-integrated XR environments. In this paper, we propose PRISM-XR, a novel framework that facilitates multi-user collaboration in XR by providing privacy-aware MLLM integration. PRISM-XR employs intelligent frame preprocessing on the edge server to filter sensitive data and remove irrelevant context before communicating with cloud generative AI models. Additionally, we introduce a lightweight registration process and a fully customizable content-sharing mechanism to enable efficient, accurate, and privacy-preserving content synchronization among users. Our numerical evaluation results indicate that the proposed platform achieves nearly 90% accuracy in fulfilling user requests and less than 0.27 seconds registration time while maintaining spatial inconsistencies of less than 3.5 cm. Furthermore, we conducted an IRB-approved user study with 28 participants, demonstrating that our system could automatically filter highly sensitive objects in over 90% of scenarios while maintaining strong overall usability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PRISM-XR sketches a workable edge-filtering setup for private MLLM use in XR collaboration, but the evaluation leaves the reliability of that filtering unclear.

read the letter

PRISM-XR puts forward an edge-based filtering layer to keep sensitive real-world objects out of cloud MLLM queries during XR collaboration. The system also includes a fast registration step and per-user sharing rules. That combination addresses a clear practical barrier in shared XR spaces where headsets capture background details like cards or faces. The work is new in how it ties these pieces together for MLLM-driven XR. The reported results show the platform hitting about 90 percent request accuracy, registration under 0.3 seconds, and spatial errors below 3.5 cm. The 28-person user study found the automatic filtering worked in over 90 percent of cases and users found it usable. These numbers give a sense that the overall pipeline can run without obvious breakage. The main weakness is the thin description of the filtering itself. The abstract gives no information on the detection model, its training, or how it handles edge cases like occluded objects or new items. Without false-negative rates or tests on whether removing content hurts MLLM answer quality, the privacy guarantee stays unproven. The evaluation also skips baselines and any statistical checks on the user study, so it is difficult to judge whether the approach improves on simpler alternatives. This paper will interest researchers building XR tools that incorporate generative models. Someone looking for a starting architecture for privacy controls in shared environments could pick up useful design choices. For a referee, the core idea is solid enough to warrant review, though the authors would need to add implementation details and stronger validation before publication.

Referee Report

3 major / 2 minor

Summary. The paper proposes PRISM-XR, a framework for multi-user XR collaboration that integrates multimodal LLMs while addressing privacy risks. It uses edge-server frame preprocessing to filter sensitive content (e.g., credit cards, faces) before cloud MLLM queries, plus a lightweight registration process and customizable content-sharing for synchronization. The authors report ~90% request-fulfillment accuracy, <0.27 s registration time, <3.5 cm spatial inconsistency, and >90% sensitive-object filtering success in an IRB-approved study with 28 participants.

Significance. If the empirical claims hold under rigorous validation, the work would provide a concrete, deployable approach to privacy-preserving MLLM use in dynamic XR settings, filling a gap between commercial XR APIs and generative AI. The edge-preprocessing plus lightweight sync design is practically relevant for collaborative XR applications.

major comments (3)

[§4] §4 (Numerical Evaluation): The abstract and evaluation claim nearly 90% accuracy in fulfilling user requests and <3.5 cm spatial inconsistency, yet no baselines, error bars, test-scenario definitions, or measurement protocol (e.g., how spatial error was computed across frames) are provided; without these the quantitative results cannot be interpreted or reproduced.
[User study] User-study section: The >90% sensitive-object filtering rate is presented as a central result, but the manuscript supplies no description of the detection method, model architecture, training data, false-negative rates on occluded or novel items, or downstream effect on MLLM response correctness; this directly undermines the privacy guarantee that is load-bearing for the framework.
[§3.2] §3.2 (Edge Preprocessing): The assumption that edge filtering reliably removes sensitive content without stripping necessary context for the MLLM is stated but never tested with controlled ablation (e.g., MLLM accuracy with vs. without filtering on the same queries); the 28-participant study does not isolate this variable.

minor comments (2)

[Abstract] The abstract mixes system-level metrics with user-study outcomes without clear separation; a short table summarizing all reported numbers would improve readability.
[Discussion] No discussion of failure modes (e.g., what happens when the edge filter misses a partially occluded card) or computational overhead of the preprocessing step on typical XR hardware.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which highlight important areas for improving the clarity and rigor of our manuscript. We address each major comment below and will revise the paper accordingly to strengthen the presentation of our results and methods.

read point-by-point responses

Referee: [§4] §4 (Numerical Evaluation): The abstract and evaluation claim nearly 90% accuracy in fulfilling user requests and <3.5 cm spatial inconsistency, yet no baselines, error bars, test-scenario definitions, or measurement protocol (e.g., how spatial error was computed across frames) are provided; without these the quantitative results cannot be interpreted or reproduced.

Authors: We agree that the current §4 would benefit from additional detail to support interpretability and reproducibility. In the revised manuscript we will add relevant baselines (e.g., commercial XR synchronization APIs without our lightweight registration), include error bars on all reported metrics, explicitly define the test scenarios (including request types and environmental conditions), and provide a precise measurement protocol for spatial inconsistency that describes how ground-truth tracking data was used across frames. revision: yes
Referee: [User study] User-study section: The >90% sensitive-object filtering rate is presented as a central result, but the manuscript supplies no description of the detection method, model architecture, training data, false-negative rates on occluded or novel items, or downstream effect on MLLM response correctness; this directly undermines the privacy guarantee that is load-bearing for the framework.

Authors: We acknowledge that the user-study section currently lacks sufficient technical detail on the sensitive-object filtering component. In the revision we will expand this section to describe the detection method, model architecture, training data characteristics, false-negative rates (including performance on occluded and novel items), and an analysis of the downstream impact on MLLM response correctness. These additions will directly support the privacy claims. revision: yes
Referee: [§3.2] §3.2 (Edge Preprocessing): The assumption that edge filtering reliably removes sensitive content without stripping necessary context for the MLLM is stated but never tested with controlled ablation (e.g., MLLM accuracy with vs. without filtering on the same queries); the 28-participant study does not isolate this variable.

Authors: We agree that a controlled ablation would provide stronger evidence for the edge-preprocessing design choice. While the 28-participant study evaluates the integrated system, it does not isolate the filtering variable. In the revised version we will add a dedicated ablation experiment that measures MLLM response accuracy on identical queries with and without edge filtering, thereby directly testing the assumption. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on direct empirical measurements

full rationale

The paper reports performance metrics (90% request accuracy, <0.27s registration, <3.5cm spatial error, >90% sensitive-object filtering) from numerical evaluations and an IRB-approved 28-participant user study. No equations, fitted parameters, derivations, or self-citations are invoked to support these outcomes. The central claims are presented as direct experimental results rather than any chain that reduces to its own inputs by construction. This is the expected non-finding for a systems paper whose load-bearing evidence is external measurement.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework depends on the unproven assumption that edge hardware can run accurate real-time sensitive-object detection and that MLLM responses remain useful after context removal.

axioms (2)

domain assumption Edge servers have sufficient compute and low enough latency to perform real-time sensitive-data filtering before cloud MLLM calls.
Required for the preprocessing step to preserve both privacy and task utility.
domain assumption MLLM performance after filtering is comparable to performance on raw frames for the targeted XR tasks.
Implicit in the claim that the system fulfills user requests at 90% accuracy.

invented entities (1)

PRISM-XR framework no independent evidence
purpose: Privacy-aware MLLM integration for multi-user XR
The paper introduces the named system and its components.

pith-pipeline@v0.9.0 · 5583 in / 1387 out tokens · 62304 ms · 2026-05-16T05:02:42.976872+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

employs a state-of-the-art object detection model on the edge server... YOLO v11 model... textual description... cropped frame

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · 3 internal anchors

[1]

Abraham, P

M. Abraham, P. Saeghe, M. Mcgill, and M. Khamis. Implications of xr on privacy, security and behaviour: Insights from experts. InNordic Human-Computer Interaction Conference, pp. 1–12, 2022. 1

work page 2022
[2]

Alkaeed, A

M. Alkaeed, A. Qayyum, and J. Qadir. Privacy preservation in arti- ficial intelligence and extended reality (ai-xr) metaverses: A survey. Journal of Network and Computer Applications, p. 103989, 2024. 3

work page 2024
[3]

Bevan, C

N. Bevan, C. Barnum, G. Cockton, J. Nielsen, J. Spool, and D. Wixon. The” magic number 5” is it enough for web testing? InCHI’03 ex- tended abstracts on Human factors in computing systems, pp. 698– 699, 2003. 7

work page 2003
[4]

D. A. Boiko, R. MacKnight, and G. Gomes. Emergent autonomous scientific research capabilities of large language models.arXiv preprint arXiv:2304.05332, 2023. 2

work page arXiv 2023
[5]

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, X. Chen, K. Choro- manski, T. Ding, D. Driess, A. Dubey, C. Finn, et al. Rt-2: Vision- language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818, 2023. 2

work page internal anchor Pith review Pith/arXiv arXiv 2023
[6]

J. Chen, T. Lan, and B. Li. GPT-VR Nexus: ChatGPT-powered im- mersive virtual reality experience. In2024 IEEE Conference on Vir- tual Reality and 3D User Interfaces Abstracts and Workshops (VRW), pp. 01–02. IEEE, 2024. 2

work page 2024
[7]

J. Chen, F. Qian, and B. Li. Enhancing quality of experience for col- laborative virtual reality with commodity mobile devices. In2022 IEEE 42nd International Conference on Distributed Computing Sys- tems (ICDCS), pp. 1018–1028. IEEE, 2022. 2

work page 2022
[8]

J. Chen, X. Qin, G. Zhu, B. Ji, and B. Li. Motion-prediction-based wireless scheduling for multi-user panoramic video streaming. In IEEE INFOCOM 2021-IEEE Conference on Computer Communica- tions, pp. 1–10. IEEE, 2021. 2

work page 2021
[9]

J. Chen, X. Wu, T. Lan, and B. Li. LLMER: Crafting interactive extended reality worlds with JSON data generated by large language models.IEEE Transactions on Visualization and Computer Graphics,

work page
[10]

Clarke and V

V . Clarke and V . B. and. Thematic analysis.The Journal of Posi- tive Psychology, 12(3):297–298, 2017. doi: 10.1080/17439760.2016. 1262613 9

work page doi:10.1080/17439760.2016 2017
[11]

Corbett, B

M. Corbett, B. David-John, J. Shang, Y . C. Hu, and B. Ji. Bystan- dAR: Protecting bystander visual data in augmented reality systems. InProceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services, pp. 370–382, 2023. 1, 2, 3

work page 2023
[12]

Davies and L

H. Davies and L. Hjorth. Roblox in lockdown: Understanding young people’s digital social play in the pandemic.Gaming and Gamers in Times of Pandemic, p. 15, 2024. 1

work page 2024
[13]

De La Torre, C

F. De La Torre, C. M. Fang, H. Huang, A. Banburski-Fahey, J. Amores Fernandez, and J. Lanier. LLMR: Real-time prompting of interactive worlds using large language models. InProceedings of the CHI Conference on Human Factors in Computing Systems, pp. 1–22,

work page
[14]

Dhakal, X

A. Dhakal, X. Ran, Y . Wang, J. Chen, and K. Ramakrishnan. Slam- share: visual simultaneous localization and mapping for real-time multi-user augmented reality. InProceedings of the 18th International Conference on emerging Networking EXperiments and Technologies, pp. 293–306, 2022. 2

work page 2022
[15]

Z. Dong, J. Chen, and B. Li. Collaborative mixed-reality-based fire- fighter training. InIEEE INFOCOM 2023-IEEE Conference on Com- puter Communications Workshops (INFOCOM WKSHPS), pp. 1–2. IEEE, 2023. 2

work page 2023
[16]

Earle, F

S. Earle, F. Kokkinos, Y . Nie, J. Togelius, and R. Raileanu. Dream- craft: Text-guided generation of functional 3d environments in minecraft. InProceedings of the 19th International Conference on the Foundations of Digital Games, pp. 1–15, 2024. 1

work page 2024
[17]

Garrido-Jurado, R

S. Garrido-Jurado, R. Mu ˜noz-Salinas, F. J. Madrid-Cuevas, and M. J. Mar´ın-Jim´enez. Automatic generation and detection of highly reliable fiducial markers under occlusion.Pattern Recognition, 47(6):2280– 2292, 2014. 1

work page 2014
[18]

Giunchi, N

D. Giunchi, N. Numan, E. Gatti, and A. Steed. Dreamcodevr: Towards democratizing behavior design in virtual reality with speech-driven programming. In2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR), pp. 579–589. IEEE, 2024. 2, 3

work page 2024
[19]

Cloud Anchors allow different users to share AR experiences |ARCore|Google for Developers

Google. Cloud Anchors allow different users to share AR experiences |ARCore|Google for Developers. 1, 4

work page
[20]

Hadan, D

H. Hadan, D. M. Wang, L. E. Nacke, and L. Zhang-Kennedy. Privacy in immersive extended reality: Exploring user perceptions, concerns, and coping strategies. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pp. 1–24, 2024. 3

work page 2024
[21]

S. Hart. Development of nasa-tlx (task load index): Results of empiri- cal and theoretical research.Human mental workload/Elsevier, 1988. 7

work page 1988
[22]

Hirzle, F

T. Hirzle, F. M ¨uller, F. Draxler, M. Schmitz, P. Knierim, and K. Horn- bæk. When XR and AI meet - a scoping review on extended reality and artificial intelligence. InProceedings of the 2023 CHI conference on human factors in computing systems, pp. 1–45, 2023. 2

work page 2023
[23]

J. Hu, A. Iosifescu, and R. LiKamWa. Lenscap: split-process frame- work for fine-grained visual privacy control for augmented reality apps. InProceedings of the 19th annual international conference on mobile systems, applications, and services, pp. 14–27, 2021. 1, 3

work page 2021
[24]

T. Hu, F. Yang, T. Scargill, and M. Gorlatova. Apple vs Meta: A com- parative study on spatial tracking in sota xr headsets. InProceedings of the 30th Annual International Conference on Mobile Computing and Networking, pp. 2120–2127, 2024. 6

work page 2024
[25]

Y . Hu, M. Zhu, Q. Jin, F. Qian, and B. Li. MagicCloth: Protect user privacy in AR streaming. InProceedings of the 1st ACM Workshop on Mobile Immersive Computing, Networking, and Systems, pp. 222– 228, 2023. 2, 3

work page 2023
[26]

Huang, Y

Y . Huang, Y . Wang, Z. Xu, C. Gao, S. Wu, J. Ye, X. Chen, P.-Y . Chen, and X. Zhang. Breaking focus: Contextual distraction curse in large language models.arXiv preprint arXiv:2502.01609, 2025. 1

work page arXiv 2025
[27]

Jocher, J

G. Jocher, J. Qiu, and A. Chaurasia. Ultralytics YOLO, Jan. 2023. 3, 5

work page 2023
[28]

Kobenova, C

A. Kobenova, C. DeVeaux, S. Parajuli, A. Banburski-Fahey, J. A. Fernandez, and J. Lanier. Social conjuring: Multi-user runtime col- laboration with AI in building virtual 3D worlds.arXiv preprint arXiv:2410.00274, 2024. 1, 2

work page arXiv 2024
[29]

Lammerding, T

L. Lammerding, T. Hilken, D. Mahr, and J. Heller. Too real for com- fort: Measuring consumers’ augmented reality information privacy concerns. InAugmented reality and virtual reality: New trends in immersive technology, pp. 95–108. Springer, 2021. 3

work page 2021
[30]

S. M. Lehman, A. S. Alrumayh, K. Kolhe, H. Ling, and C. C. Tan. Hidden in plain sight: Exploring privacy risks of mobile augmented reality applications.ACM Transactions on Privacy and Security, 25(4):1–35, 2022. 3

work page 2022
[31]

J. R. Lewis. IBM computer usability satisfaction questionnaires: psy- chometric evaluation and instructions for use.International Journal of Human-Computer Interaction, 7(1):57–78, 1995. 8

work page 1995
[32]

F. Li, S. Yang, X. Yi, and X. Yang. CORB-SLAM: a collaborative visual SLAM system for multiple robots. InInternational Conference on Collaborative Computing: Networking, Applications and Work- sharing, pp. 480–490. Springer, 2017. 2

work page 2017
[33]

R. Li, T. Patel, Q. Wang, and X. Du. MLR-Copilot: Autonomous ma- chine learning research based on large language models agents.arXiv preprint arXiv:2408.14033, 2024. 2

work page arXiv 2024
[34]

T. Li, N. S. Nguyen, X. Zhang, T. Wang, and B. Sheng. Promar: Prac- tical reference object-based multi-user augmented reality. InIEEE IN- FOCOM 2020-IEEE Conference on Computer Communications, pp. 1359–1368. IEEE, 2020. 2

work page 2020
[35]

Liu and M

L. Liu and M. Gruteser. EdgeSharing: Edge assisted real-time local- ization and object sharing in urban streets. InIEEE INFOCOM 2021- IEEE Conference on Computer Communications, pp. 1–10. IEEE,

work page 2021
[36]

A. M. Lund. Measuring usability with the use questionnaire12.Us- ability interface, 8(2):3–6, 2001. 8

work page 2001
[37]

Mecheri, X

H. Mecheri, X. Robert-Lachaine, C. Larue, and A. Plamondon. Evalu- ation of eight methods for aligning orientation of two coordinate sys- tems.Journal of biomechanical engineering, 138(8):084501, 2016. 1

work page 2016
[38]

Merino, M

T. Merino, M. Charity, and J. Togelius. Interactive latent variable evolution for the generation of minecraft structures. InProceedings of the 18th International Conference on the Foundations of Digital Games, pp. 1–8, 2023. 1

work page 2023
[39]

Shared Spatial Anchors|Meta Horizon OS Developers

Meta. Shared Spatial Anchors|Meta Horizon OS Developers. 1, 4

work page
[40]

Spatial Anchor Sharing

Microsoft. Spatial Anchor Sharing. 1, 4

work page
[41]

D. L. Mills.Computer network time synchronization: the network time protocol. CRC press, 2006. 7

work page 2006
[42]

O’Hagan, P

J. O’Hagan, P. Saeghe, J. Gugenheimer, D. Medeiros, K. Marky, M. Khamis, and M. McGill. Privacy-enhancing technology and ev- eryday augmented reality: Understanding bystanders’ varying needs for awareness and consent.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 6(4):1–35, 2023. 3

work page 2023
[43]

E. Olson. AprilTag: A robust and flexible visual fiducial system. In 2011 IEEE International Conference on Robotics and Automation, pp. 3400–3407. IEEE, 2011. 1, 4

work page 2011
[44]

J. S. Park, J. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein. Generative agents: Interactive simulacra of human behav- ior. InProceedings of the 36th annual acm symposium on user inter- face software and technology, pp. 1–22, 2023. 2

work page 2023
[45]

Qian and B

F. Qian and B. Li. Boosting remote multi-user AR privacy through a magic rope. InProceedings of the 20th Annual International Con- ference on Mobile Systems, Applications and Services, pp. 583–584,

work page
[46]

Radford, J

A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever. Robust speech recognition via large-scale weak super- vision. InInternational conference on machine learning, pp. 28492– 28518. PMLR, 2023. 3, 5

work page 2023
[47]

Rajaram, C

S. Rajaram, C. Chen, F. Roesner, and M. Nebeling. Eliciting security & privacy-informed sharing techniques for multi-user augmented re- ality. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pp. 1–17, 2023. 3

work page 2023
[48]

X. Ran, C. Slocum, Y .-Z. Tsai, K. Apicharttrisorn, M. Gorlatova, and J. Chen. Multi-user augmented reality with communication efficient and spatially consistent virtual objects. InProceedings of the 16th International Conference on emerging Networking EXperiments and Technologies, pp. 386–398, 2020. 2, 6

work page 2020
[49]

Agent Laboratory: Using LLM Agents as Research Assistants

S. Schmidgall, Y . Su, Z. Wang, X. Sun, J. Wu, X. Yu, J. Liu, Z. Liu, and E. Barsoum. Agent Laboratory: Using LLM agents as research assistants.arXiv preprint arXiv:2501.04227, 2025. 2

work page internal anchor Pith review arXiv 2025
[50]

F. Shi, X. Chen, K. Misra, N. Scales, D. Dohan, E. H. Chi, N. Sch ¨arli, and D. Zhou. Large language models can be easily distracted by irrel- evant context. InInternational Conference on Machine Learning, pp. 31210–31227. PMLR, 2023. 1

work page 2023
[51]

Srinidhi, E

S. Srinidhi, E. Lu, and A. Rowe. XaiR: An XR platform that inte- grates large language models with the physical world. In2024 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 759–767. IEEE, 2024. 1, 2, 4

work page 2024
[52]

Srinidhi, E

S. Srinidhi, E. Lu, A. Singh, S. Kartik, A. Lin, T. Laroia, and A. Rowe. An XR platform that integrates large language models with the physi- cal world. InProceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems, pp. 700–701, 2025. 1

work page 2025
[53]

Y . Tang, J. Situ, A. Y . Cui, M. Wu, and Y . Huang. LLM integra- tion in extended reality: A comprehensive review of current trends, challenges, and future perspectives. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pp. 1–24, 2025. 2

work page 2025
[54]

G. Wang, Y . Xie, Y . Jiang, A. Mandlekar, C. Xiao, Y . Zhu, L. Fan, and A. Anandkumar. V oyager: An open-ended embodied agent with large language models.arXiv preprint arXiv:2305.16291, 2023. 2

work page internal anchor Pith review Pith/arXiv arXiv 2023
[55]

Y .-J. Wang, B. Zhang, J. Chen, and K. Sreenath. Prompt a robot to walk with large language models.arXiv preprint arXiv:2309.09969,

work page arXiv
[56]

Y . Xiu, T. Scargill, and M. Gorlatova. LOBSTAR: Language model- based obstruction detection for augmented reality. In2024 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pp. 335–336. IEEE, 2024. 1, 2

work page 2024
[57]

Y . Xiu, T. Scargill, and M. Gorlatova. ViDDAR: Vision language model-based task-detrimental content detection for augmented reality. arXiv preprint arXiv:2501.12553, 2025. 1

work page arXiv 2025
[58]

Yamakami

T. Yamakami. A privacy threat model in XR applications. InAdvances in Internet, Data and Web Technologies: The 8th International Con- ference on Emerging Internet, Data and Web Technologies (EIDWT- 2020), pp. 384–394. Springer, 2020. 1

work page 2020
[59]

X. Yao, J. Chen, T. He, J. Yang, and B. Li. A scalable mixed real- ity platform for remote collaborative LEGO design. InIEEE INFO- COM 2022-IEEE Conference on Computer Communications Work- shops (INFOCOM WKSHPS), pp. 1–2. IEEE, 2022. 2

work page 2022
[60]

Y . Yao, J. Duan, K. Xu, Y . Cai, Z. Sun, and Y . Zhang. A survey on large language model (LLM) security and privacy: The good, the bad, and the ugly.High-Confidence Computing, p. 100211, 2024. 3

work page 2024
[61]

K. You, Q. Chen, P. Xie, and S. Song. Range-based coordinate align- ment for cooperative mobile sensor network localization.IEEE Trans- actions on Control of Network Systems, 7(3):1379–1390, 2020. 1

work page 2020
[62]

W. Yu, N. Gileadi, C. Fu, S. Kirmani, K.-H. Lee, M. G. Arenas, H.- T. L. Chiang, T. Erez, L. Hasenclever, J. Humplik, et al. Language to rewards for robotic skill synthesis.arXiv preprint arXiv:2306.08647,

work page arXiv
[63]

M. Zhu, J. Chen, and B. Li. When generative AI meets extended re- ality: Enabling scalable and natural interactions.IEEE Internet Com- puting, pp. 1–10, 2026. doi: 10.1109/MIC.2025.3619462 1, 2

work page doi:10.1109/mic.2025.3619462 2026