Recognition: unknown
UNSEEN: A Cross-Stack LLM Unlearning Defense against AR-LLM Social Engineering Attacks
Pith reviewed 2026-05-08 08:09 UTC · model grok-4.3
The pith
UNSEEN combines AR access control, LLM unlearning to suppress profiles, and agent guardrails to defend against AR-LLM social engineering attacks, tested in a 60-person user study with 360 conversations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
we present UNSEEN, a coordinated cross-stack defense that combines an AR ACL (Access Control Layer) for identity-gated sensing, F-RMU-based LLM unlearning for sensitive profile suppression, and runtime agent guardrails for adaptive interaction control. We evaluate UNSEEN in an IRB-approved user study with 60 participants and a dataset of 360 annotated conversations across realistic social scenarios.
Load-bearing premise
That F-RMU unlearning can reliably suppress sensitive profile information inside opaque LLM inference while preserving utility, and that runtime guardrails can adaptively block evolving social-engineering strategies without being bypassed by new prompts.
Figures
read the original abstract
Emerging AR-LLM-based Social Engineering attack (e.g., SEAR) is at the edge of posing great threats to real-world social life. In such AR-LLM-SE attack, the attacker can leverage AR (Augmented Reality) glass to capture the image and vocal information of the target, using the LLM to identify the target and generate the social profile, using the LLM agents to apply social engineering strategies for conversation suggestion to win the target trust and perform phishing afterwards. Current defensive approaches, such as role-based access control or data flow tracking, are not directly applicable to the convergent AR-LLM ecosystem (considering embedded AR device and opaque LLM inference), leaving an emerging and potent social engineering threat that existing privacy paradigms are ill-equipped to address. This necessitates a shift beyond solely human-centric measures like legislation and user education toward enforceable vendor policies and platform-level restrictions. Realizing this vision, however, faces significant technical challenges: securing resource-constrained AR-embedded devices, implementing fine-grained access control within opaque LLM inferences, and governing adaptive interactive agents. To address these challenges, we present UNSEEN, a coordinated cross-stack defense that combines an AR ACL (Access Control Layer) for identity-gated sensing, F-RMU-based LLM unlearning for sensitive profile suppression, and runtime agent guardrails for adaptive interaction control. We evaluate UNSEEN in an IRB-approved user study with 60 participants and a dataset of 360 annotated conversations across realistic social scenarios.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes UNSEEN, a cross-stack defense against emerging AR-LLM social engineering attacks (e.g., SEAR). It integrates an AR Access Control Layer for identity-gated sensing on resource-constrained devices, F-RMU-based LLM unlearning to suppress sensitive user profiles within opaque inference, and runtime agent guardrails for adaptive interaction control. The central claim is that this coordinated approach addresses gaps in existing privacy paradigms and is evaluated via an IRB-approved user study involving 60 participants and a dataset of 360 annotated conversations across realistic social scenarios.
Significance. If the evaluation demonstrates that the combined mechanisms reduce successful social engineering outcomes while preserving LLM utility and without introducing bypassable leakage, the work could offer a practical vendor-level response to a novel convergent threat in AR-LLM ecosystems. The cross-stack design and empirical user-study framing are strengths, but the absence of reported quantitative metrics, baselines, or component ablations in the current description prevents assessment of whether the result holds or advances the state of the art.
major comments (3)
- [Evaluation] Evaluation section: The abstract and summary claim an IRB-approved study with 60 participants and 360 annotated conversations, yet no quantitative metrics (attack success rates, precision/recall on profile suppression, baseline comparisons against role-based access control or data-flow tracking, ablation results isolating AR ACL vs. F-RMU vs. guardrails, or statistical tests) are provided. This directly prevents verification of the central claim that UNSEEN constitutes an effective defense.
- [§3.2] §3.2 (F-RMU unlearning description): The premise that F-RMU reliably suppresses target-specific profile knowledge inside black-box LLM inference (preventing reconstruction via multi-turn context or indirect elicitation) is load-bearing for the overall defense but is not supported by any leakage tests, residual-knowledge probes, or adversarial prompt evaluations. The user study measures only human-facing outcomes and does not isolate internal leakage or guardrail bypass.
- [§4] §4 (runtime agent guardrails): The claim that adaptive guardrails can block evolving social-engineering strategies without being bypassed is not accompanied by any formalization of the guardrail policy, coverage analysis against prompt variants, or failure-case enumeration. This leaves the adaptive-interaction-control component unverified.
minor comments (2)
- [Abstract] The abstract states '360 annotated conversations' but does not specify annotation criteria, inter-annotator agreement, or how conversations were sampled across scenarios; this should be clarified for reproducibility.
- [§3.2] Notation for F-RMU is introduced without an explicit equation or pseudocode definition of the unlearning objective or forgetting set construction; a formal definition would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive review and for noting the strengths of the cross-stack design and user-study framing. We agree that the evaluation requires more explicit quantitative reporting, baselines, ablations, and component-specific analyses to fully substantiate the claims. We address each major comment below and will incorporate the requested additions in the revised manuscript.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: The abstract and summary claim an IRB-approved study with 60 participants and 360 annotated conversations, yet no quantitative metrics (attack success rates, precision/recall on profile suppression, baseline comparisons against role-based access control or data-flow tracking, ablation results isolating AR ACL vs. F-RMU vs. guardrails, or statistical tests) are provided. This directly prevents verification of the central claim that UNSEEN constitutes an effective defense.
Authors: We acknowledge that the current Evaluation section would benefit from more explicit quantitative reporting to allow direct verification. The 60-participant IRB-approved study and 360 annotated conversations were conducted to assess human-facing outcomes in realistic social scenarios. In the revised manuscript we will expand this section to report attack success rates, precision/recall metrics for profile suppression, baseline comparisons (including role-based access control and data-flow tracking), component ablation results, and statistical tests (e.g., paired t-tests or ANOVA with p-values). These will be derived from re-analysis of the existing annotated dataset. revision: yes
-
Referee: [§3.2] §3.2 (F-RMU unlearning description): The premise that F-RMU reliably suppresses target-specific profile knowledge inside black-box LLM inference (preventing reconstruction via multi-turn context or indirect elicitation) is load-bearing for the overall defense but is not supported by any leakage tests, residual-knowledge probes, or adversarial prompt evaluations. The user study measures only human-facing outcomes and does not isolate internal leakage or guardrail bypass.
Authors: The referee correctly notes that direct evidence for the internal suppression properties of F-RMU is necessary. While the user study provides supporting evidence via reduced social-engineering success rates, it does not isolate leakage. We will add a dedicated subsection describing leakage tests, residual-knowledge probes using adversarial and multi-turn prompts, and evaluations of reconstruction attempts. These additions will quantify the unlearning component's contribution and address potential bypass vectors. revision: yes
-
Referee: [§4] §4 (runtime agent guardrails): The claim that adaptive guardrails can block evolving social-engineering strategies without being bypassed is not accompanied by any formalization of the guardrail policy, coverage analysis against prompt variants, or failure-case enumeration. This leaves the adaptive-interaction-control component unverified.
Authors: We agree that formalization and coverage analysis are required for verifiability of the guardrails. In the revised version we will include a formal policy description (as a decision procedure or rule set), coverage analysis over prompt variants and evolving strategies drawn from the conversation dataset, and an enumeration of failure cases with corresponding mitigations. This will be supported by additional analysis of the 360 conversations. revision: yes
Circularity Check
No significant circularity; empirical system design with no derivation chain
full rationale
The paper presents UNSEEN as a coordinated system combining AR ACL for identity-gated sensing, F-RMU-based LLM unlearning for profile suppression, and runtime agent guardrails, evaluated via an IRB-approved user study with 60 participants and 360 annotated conversations. No equations, parameter fitting, or mathematical derivation steps appear in the provided text. Claims rest on the empirical outcomes of the study rather than any reduction by construction, self-definition, or load-bearing self-citation of a uniqueness theorem. F-RMU is referenced as a component but not invoked to derive the overall result tautologically; the work is self-contained against external benchmarks through the described human-subject evaluation.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption F-RMU unlearning can suppress sensitive profile information in LLM inference without destroying general capabilities
- domain assumption Runtime guardrails can detect and block adaptive social-engineering strategies in real time
Reference graph
Works this paper leans on
-
[1]
Afane, W
K. Afane, W. Wei, Y. Mao, J. Farooq, and J. Chen. Next-generation phishing: How llm agents empower cyber attackers. In2024 IEEE International Conference on Big Data (BigData), pages 2558–2567. IEEE, 2024
2024
-
[2]
V. Bazarevsky, Y. Kartynnik, A. Vakunov, K. Raveendran, and M. Grundmann. Blazeface: Sub-millisecond neural face detection on mobile gpus.arXiv preprint arXiv:1907.05047, 2019
-
[3]
T. Bi, C. Ye, Z. Yang, Z. Zhou, C. Tang, Z. Tao, J. Zhang, K. Wang, L. Zhou, Y. Yang, and T. Yu. On the feasibility of using multimodal LLMs to execute AR social engineering attacks. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 38252–38260, 2026
2026
-
[4]
Bilge, T
L. Bilge, T. Strufe, D. Balzarotti, and E. Kirda. All your contacts are belong to us: automated identity theft attacks on social networks. InProceedings of the 18th international conference on World wide web, pages 551–560, 2009
2009
-
[5]
Burda, L
P. Burda, L. Allodi, and N. Zannone. Cognition in social engineering empirical research: a systematic literature review.ACM Transactions on Computer-Human Interaction, 31(2):1–55, 2024
2024
-
[6]
S. Chen, Z. Li, F. Dangelo, C. Gao, and X. Fu. A case study of security and privacy threats from augmented reality (ar). In2018 international conference on computing, networking and communications (ICNC), pages 442–446. IEEE, 2018
2018
-
[7]
S. Chen, Y. Liu, X. Gao, and Z. Han. Mobilefacenets: Efficient cnns for accurate real-time face verification on mobile devices. InChinese conference on biometric recognition, pages 428–438. Springer, 2018
2018
-
[8]
Z. Chen, Z. Zhao, W. Qu, Z. Wen, Z. Han, Z. Zhu, J. Zhang, and H. Yao. Pandora: Detailed llm jailbreaking via collaborated phishing agents with decomposed reasoning. InICLR 2024 Workshop on Secure and Trustworthy Large Language Models, 2024
2024
-
[9]
L. Choo. How 2 students used the meta ray-bans to access personal informa- tion. https://www.forbes.com/sites/lindseychoo/2024/10/04/meta-ray-bans-ai- privacy-surveillance/, 2025
2024
- [10]
-
[11]
A. P. Felt, E. Chin, S. Hanna, D. Song, and D. Wagner. Android permissions demystified. InProceedings of the 18th ACM Conference on Computer and Com- munications Security, pages 627–638, 2011
2011
-
[12]
Fernandes, J
E. Fernandes, J. Paupore, A. Rahmati, D. Simionato, M. Conti, and A. Prakash. Flowfence: Practical data protection for emerging iot application frameworks. InUSENIX Security Symposium, 2016
2016
-
[13]
Fuste and C
A. Fuste and C. Schmandt. Artextiles: Promoting social interactions around personal interests through augmented reality. InProceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems, pages 470–470, 2017
2017
-
[14]
Geng, S.-j
C. Geng, S.-j. Huang, and S. Chen. A survey on open set recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10):3614–3631, 2021
2021
-
[15]
W. He, M. Golla, R. Padhi, J. Ofek, M. Dürmuth, E. Fernandes, and B. Ur. Re- thinking access control and authentication for the home internet of things (iot). In27th {USENIX} Security Symposium ( {USENIX} Security 18), pages 255–272, 2018
2018
-
[16]
Hirskyj-Douglas, A
I. Hirskyj-Douglas, A. Kantosalo, A. Monroy-Hernández, J. Zimmermann, M. Nebeling, and M. Gonzalez-Franco. Social ar: Reimagining and interrogating the role of augmented reality in face to face social interactions. InCompanion Publication of the 2020 Conference on Computer Supported Cooperative Work and Social Computing, pages 457–465, 2020
2020
-
[17]
G. Ho, A. Cidon, L. Gavish, M. Schweighauser, V. Paxson, S. Savage, G. M. Voelker, and D. Wagner. Detecting and characterizing lateral phishing at scale. In28th USENIX security symposium (USENIX security 19), pages 1273–1290, 2019
2019
-
[18]
E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022
2022
-
[19]
Editing Models with Task Arithmetic
G. Ilharco, M. T. Ribeiro, M. Wortsman, S. Gururangan, L. Schmidt, H. Ha- jishirzi, and A. Farhadi. Editing models with task arithmetic.arXiv preprint arXiv:2212.04089, 2022
work page internal anchor Pith review arXiv 2022
-
[20]
M. Z. Iqbal and A. G. Campbell. Adopting smart glasses responsibly: potential benefits, ethical, and privacy concerns with ray-ban stories.AI and Ethics, 3(1):325–327, 2023
2023
-
[21]
Jansen and F
P. Jansen and F. Fischbach. The social engineer: An immersive virtual reality educational game to raise social engineering awareness. InExtended Abstracts of the 2020 Annual Symposium on Computer-Human Interaction in Play, pages 59–63, 2020
2020
-
[22]
Y. J. Jia, Q. A. Chen, S. Wang, A. Rahmati, E. Fernandes, Z. M. Mao, A. Prakash, and S. J. Unviersity. Contexiot: Towards providing contextual integrity to appified iot platforms. InProceedings of The Network and Distributed System Security Symposium, volume 2017, 2017
2017
-
[23]
S. M. Lehman, A. S. Alrumayh, K. Kolhe, H. Ling, and C. C. Tan. Hidden in plain sight: Exploring privacy risks of mobile augmented reality applications.ACM Transactions on Privacy and Security, 25(4):1–35, 2022
2022
-
[24]
C. Li, G. Wu, G. Y.-Y. Chan, D. G. Turakhia, S. C. Quispe, D. Li, L. Welch, C. Silva, and J. Qian. Satori: Towards proactive ar assistant with belief-desire-intention Tianlong Yu1, Yang Yang1, Xiao Luo1, Lihong Liu1, Fudu Xing2, Zui Tao3, Kailong Wang3, Gaoyang Liu3, Ting Bi3 user modeling.arXiv preprint arXiv:2410.16668, 2024
-
[25]
MediaPipe: A Framework for Building Perception Pipelines
C. Lugaresi, J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, F. Zhang, C.-L. Chang, M. G. Yong, J. Lee, et al. Mediapipe: A framework for building perception pipelines.arXiv preprint arXiv:1906.08172, 2019
work page internal anchor Pith review arXiv 1906
-
[26]
K. Meng, D. Bau, A. Andonian, and Y. Belinkov. Locating and editing factual associations in gpt.Advances in neural information processing systems, 35:17359– 17372, 2022
2022
-
[27]
Roesner and T
F. Roesner and T. Kohno. Security and privacy for augmented reality: Our 10-year retrospective. InVR4Sec: 1st International Workshop on Security for XR and XR for Security, 2021
2021
-
[28]
S. S. Roy, P. Thota, K. V. Naragam, and S. Nilizadeh. From chatbots to phishbots?: Phishing scam generation in commercial large language models. In2024 IEEE Symposium on Security and Privacy (SP), pages 36–54. IEEE, 2024
2024
-
[29]
Y. Tian, N. Zhang, Y.-H. Lin, X. Wang, B. Ur, X. Guo, and P. Tague. Smartauth: User-centered authorization for the internet of things. InUSENIX Security Symposium, pages 361–378, 2017
2017
-
[30]
Timko, D
D. Timko, D. H. Castillo, and M. L. Rahman. Understanding influences on sms phishing detection: User behavior, demographics, and message attributes. 2025
2025
-
[31]
H.-R. Tsai, S.-K. Chiu, and B. Wang. Gazenoter: Co-piloted ar note-taking via gaze selection of llm suggestions to match users’ intentions.arXiv preprint arXiv:2407.01161, 2024
-
[32]
Ulqinaku, H
E. Ulqinaku, H. Assal, A. Abdou, S. Chiasson, and S. Capkun. Is real-time phishing eliminated with {FIDO}? social engineering downgrade attacks against {FIDO} protocols. In30th USENIX Security Symposium (USENIX Security 21), pages 3811–3828, 2021
2021
-
[33]
Vadrevu and R
P. Vadrevu and R. Perdisci. What you see is not what you get: Discovering and tracking social engineering attack campaigns. InProceedings of the Internet Measurement Conference, pages 308–321, 2019
2019
-
[34]
I. Wang, J. Smith, and J. Ruiz. Exploring virtual agents for augmented reality. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pages 1–12, 2019
2019
-
[35]
X. Wu, J. Li, M. Xu, W. Dong, S. Wu, C. Bian, and D. Xiong. Depn: Detecting and editing privacy neurons in pretrained language models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2875–2886, 2023
2023
-
[36]
F. Xing, J. Liu, S. Chen, T. Yu, and Y. Yang. A continuous verification mechanism for ensuring client data forgetfulness in federated unlearning.Engineering Applications of Artificial Intelligence, 162:112553, 2025
2025
-
[37]
B. Yang, Y. Guo, L. Xu, Z. Yan, H. Chen, G. Xing, and X. Jiang. Socialmind: Llm-based proactive ar social assistive system with human-like perception for in-situ live interactions.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 9(1):1–30, 2025
2025
-
[38]
Z. Yang, J. Allen, M. Landen, R. Perdisci, and W. Lee. {TRIDENT}: Towards detecting and mitigating web-based social engineering attacks. In32nd USENIX Security Symposium (USENIX Security 23), pages 6701–6718, 2023
2023
-
[39]
S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao. React: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR), 2023
2023
-
[40]
Y. Yoon, J. Nam, H. Yun, J. Lee, D. Kim, and J. Ok. Few-shot unlearning. In2024 IEEE Symposium on Security and Privacy (SP), pages 3276–3292. IEEE, 2024
2024
-
[41]
T. Yu, C. Ye, Z. Yang, Z. Zhou, C. Tang, Z. Tao, J. Zhang, K. Wang, L. Zhou, Y. Yang, and T. Bi. Sear: A multimodal dataset for analyzing ar-llm-driven social engineering behaviors. InProceedings of the 33rd ACM International Conference on Multimedia, pages 12981–12987, 2025
2025
-
[42]
Zhang, K
G. Zhang, K. Wang, X. Xu, Z. Wang, and H. Shi. Forget-me-not: Learning to forget in text-to-image diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1755–1764, 2024
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.