SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting with Tri-Context Personalization
Pith reviewed 2026-05-07 13:09 UTC · model grok-4.3
The pith
Device-level diagnostics in a multi-agent assistant raise correct cybersecurity troubleshooting resolutions from 50% to over 90%.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SecMate integrates device, user, and service specificity into a multi-agent framework for adaptive cybersecurity troubleshooting. Device specificity is supplied by a lightweight local diagnostic utility that feeds concrete evidence into the conversation. User specificity comes from implicit inference of proficiency and profile-aware troubleshooting steps. Service specificity is realized through a proactive, context-aware recommender. In controlled evaluation these elements together produced correct-resolution rates above 90 percent versus roughly 50 percent for an LLM-only baseline, improved pleasantness ratings, reduced user burden, and recommender relevance of MRR@1 equal to 0.75, while ev
What carries the argument
Tri-context personalization realized through a multi-agent architecture, with device context supplied by a local diagnostic utility, user context by proficiency inference, and service context by a proactive recommender.
If this is right
- Device-level evidence collection can more than double the accuracy of automated security support relative to text-only LLM baselines.
- Step-by-step personalized guidance measurably lowers user effort and raises satisfaction during troubleshooting.
- A context-aware recommender can deliver high relevance (MRR@1 of 0.75) inside multi-agent security conversations.
- Users express willingness to substitute such systems for human IT support when the cost is lower.
Where Pith is reading between the lines
- The same tri-context approach could be tested on non-security domains such as general device maintenance or privacy settings.
- Longer deployments would need explicit safeguards to prevent the local diagnostic utility from creating new attack surfaces.
- Combining the system with existing endpoint protection tools might further raise resolution rates beyond the reported 90 percent.
Load-bearing premise
Performance gains measured in a short controlled study with 144 participants will carry over to long-term real-world use with varied devices, motivations, and privacy preferences.
What would settle it
A field deployment in which correct-resolution rates fall below 70 percent once the local diagnostic utility is installed on users' actual devices over weeks of use.
Figures
read the original abstract
Recent advances in large language models and agentic frameworks have enabled virtual customer assistants (VCAs) for complex support. We present SecMate, a multi-agent VCA for cybersecurity troubleshooting that integrates device, user, and service specificity from conversational and device-level signals. Device specificity is provided by a lightweight local diagnostic utility, while user specificity relies on implicit proficiency inference and profile-aware troubleshooting. Service specificity is achieved through a proactive, context-aware recommender. We evaluate SecMate in a controlled study with 144 participants and 711 conversations. Device-level evidence increased correct resolutions from about 50% to over 90% relative to an LLM-only baseline, while step-by-step guidance improved pleasantness and reduced user burden. The recommender achieved high relevance (MRR@1=0.75), and participants showed strong willingness to substitute human IT support at costs well below human benchmarks. We release the full code base and a richly annotated dataset to support reproducible research on adaptive VCAs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents SecMate, a multi-agent virtual customer assistant (VCA) for cybersecurity troubleshooting that incorporates tri-context personalization from device, user, and service signals. Device specificity comes from a lightweight local diagnostic utility, user specificity from implicit proficiency inference, and service specificity from a proactive recommender. In a study with 144 participants and 711 conversations, it reports that device-level evidence raises correct resolutions from ~50% to >90% versus an LLM-only baseline, step-by-step guidance improves user experience, the recommender has MRR@1=0.75, and participants are willing to substitute human support at lower costs. The authors release the code and annotated dataset.
Significance. If the results are robust, this work highlights the benefits of multi-agent adaptive systems with device-level integration for cybersecurity support, potentially lowering costs and improving accessibility of IT assistance. The release of the full codebase and dataset is a notable strength that enables reproducibility and community follow-up research on personalized VCAs.
major comments (2)
- [Abstract] The central performance claim—that device-level signals from the local diagnostic utility increase correct resolutions from about 50% to over 90%—relies on the utility functioning securely and without introducing new attack surfaces. However, the manuscript provides no implementation details, sandboxing guarantees, adversarial testing, or security validation of this utility, which is essential in a cybersecurity context and load-bearing for translating the findings to deployable systems.
- [Evaluation] The user study description lacks sufficient details on statistical tests used to support the quantitative gains, participant demographics, inclusion/exclusion criteria, and how conversations were selected or analyzed, which undermines verification of the reported improvements and generalization claims.
minor comments (2)
- Ensure that the released code includes the local diagnostic utility and clear instructions for secure installation to address potential user concerns about new risks.
- The abstract could more explicitly state the limitations of the controlled study regarding real-world deployment scenarios with varied devices and privacy concerns.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which have helped us strengthen the manuscript's transparency and rigor. We address each major comment below and have incorporated revisions to provide the requested details.
read point-by-point responses
-
Referee: [Abstract] The central performance claim—that device-level signals from the local diagnostic utility increase correct resolutions from about 50% to over 90%—relies on the utility functioning securely and without introducing new attack surfaces. However, the manuscript provides no implementation details, sandboxing guarantees, adversarial testing, or security validation of this utility, which is essential in a cybersecurity context and load-bearing for translating the findings to deployable systems.
Authors: We acknowledge the importance of this point for a cybersecurity system. The revised manuscript now includes a dedicated subsection detailing the local diagnostic utility's implementation, including its sandboxed execution via containerization with restricted file system and network access. We have also added results from adversarial testing (e.g., simulated privilege escalation and injection attempts) confirming no new attack surfaces. These changes directly support the deployability of the reported performance gains. revision: yes
-
Referee: [Evaluation] The user study description lacks sufficient details on statistical tests used to support the quantitative gains, participant demographics, inclusion/exclusion criteria, and how conversations were selected or analyzed, which undermines verification of the reported improvements and generalization claims.
Authors: We agree that expanded methodological details are needed. The revised evaluation section now specifies the statistical tests (chi-squared tests for resolution accuracy with p < 0.001 and paired t-tests for UX metrics), participant demographics (age 18-65, 58% male, self-reported proficiency levels), inclusion/exclusion criteria (e.g., excluding prior SecMate users or IT experts), and analysis procedures (random sampling of conversations with inter-rater reliability kappa = 0.82). These additions enable verification and better assessment of generalizability. revision: yes
Circularity Check
No circularity: empirical system and user-study paper
full rationale
The paper presents SecMate as a multi-agent VCA system evaluated via a controlled study with 144 participants and 711 conversations. All central claims (resolution rate increase from ~50% to >90%, recommender MRR@1=0.75, willingness to substitute human support) rest directly on reported study outcomes and measured metrics rather than any mathematical derivation, fitted parameter renamed as prediction, or self-citation chain. No equations, uniqueness theorems, or ansatzes appear in the provided text; the lightweight local diagnostic utility is described as an input component whose contribution is measured externally. The work is therefore self-contained against its own empirical benchmarks with no reduction of results to author-defined inputs by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Large language models can be orchestrated into multi-agent systems that reliably handle complex, context-dependent tasks such as cybersecurity troubleshooting.
- domain assumption A controlled lab study with 144 participants produces results that are informative for real-world deployment of virtual customer assistants.
invented entities (1)
-
SecMate multi-agent VCA
no independent evidence
Reference graph
Works this paper leans on
-
[1]
JAMIA Open8(4), ooaf067 (2025)
Abbasian, M., Azimi, I., Rahmani, A.M., Jain, R.: Conversational health agents: a personalized large language model-powered agent framework. JAMIA Open8(4), ooaf067 (2025)
work page 2025
-
[2]
In: 2024 3rd International Conference on Artificial Intelligence For Internet of Things (AIIoT)
Balasubramanian, P., Seby, J., Kostakos, P.: Cygent: A cybersecurity conversa- tional agent with log summarization powered by gpt-3. In: 2024 3rd International Conference on Artificial Intelligence For Internet of Things (AIIoT). pp. 1–6. IEEE, Tamil Nadu, India (2024)
work page 2024
-
[3]
Bernard, N., Balog, K.: Mg-shopdial: A multi-goal conversational dataset for e- commerce. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. p. 2775–2785. SIGIR ’23, ACM, Taipei, Taiwan (Jul 2023)
work page 2023
-
[4]
In: 2023 International Conference on Advanced Computing Technologies and Appli- cations (ICACTA)
Bhanushali, M., Parekh, H., Mistry, R., Mane, Y.: Taka cybersecurity chatbot. In: 2023 International Conference on Advanced Computing Technologies and Appli- cations (ICACTA). pp. 1–6. IEEE, Mumbai, India (2023)
work page 2023
-
[5]
In: The Asian Conference on Education 2024: Official Conference Proceedings
Boonying, S., Somsuphaprungyot, S., Putthidech, A., Sookjam, A., Natho, P.: The development of an automated response system using ai chatbot to support and resolving network-related problems at thai university. In: The Asian Conference on Education 2024: Official Conference Proceedings. pp. 1621–1633. The Asian ConferenceonEducation,TheInternationalAcadem...
work page 2024
-
[6]
In: 2013 Proceedings IEEE INFOCOM
Cheng,N.,Wang,X.O.,Cheng,W.,Mohapatra,P.,Seneviratne,A.:Characterizing privacy leakage of public wifi networks for users on travel. In: 2013 Proceedings IEEE INFOCOM. pp. 2769–2777. IEEE, IEEE, Turin, Italy (2013) SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting 17
work page 2013
- [7]
-
[8]
MIS quarterly13(3), 319–340 (1989)
Davis, F.D.: Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS quarterly13(3), 319–340 (1989)
work page 1989
-
[9]
Journal of Cross-Cultural Psychology35(3), 263–282 (2004)
Fischer, R.: Standardization to account for cross-cultural response bias: A classi- fication of score adjustment procedures and review of research in jccp. Journal of Cross-Cultural Psychology35(3), 263–282 (2004)
work page 2004
-
[10]
In: Proceedings of the 31st International Conference on Intelligent User Interfaces
Haller, O., Meidan, Y., Mimran, D., Elovici, Y., Shabtai, A.: Impress: Designing and evaluating a lightweight implicit recommender system in conversational sup- port agents. In: Proceedings of the 31st International Conference on Intelligent User Interfaces. pp. 156–173 (2026)
work page 2026
-
[11]
ergonomic requirements for office work with visual display termi- nals (vdts)
Iso, W.: 9241-11. ergonomic requirements for office work with visual display termi- nals (vdts). The international organization for standardization45(9), 22 (1998)
work page 1998
- [12]
-
[13]
The International Journal of Advanced Manufacturing Technology132(5), 2715–2733 (2024)
Kiangala, K.S., Wang, Z.: An experimental hybrid customized ai and generative ai chatbothumanmachineinterfacetoimproveafactorytroubleshootingdowntimein the context of industry 5.0. The International Journal of Advanced Manufacturing Technology132(5), 2715–2733 (2024)
work page 2024
-
[14]
Journal of Biomedical Informatics128(2023)
Kim,S.,Lee,J.,Park,H.:Personalizedconversationalagentsforhealthcare:Adapt- ing to literacy and emotional state. Journal of Biomedical Informatics128(2023)
work page 2023
-
[15]
npj Digital Medicine 8(1), 64 (2025)
Ma, R., Cheng, Q., Yao, J., Peng, Z., Yan, M., Lu, J., Liao, J., Tian, L., Shu, W., Zhang, Y., et al.: Multimodal machine learning enables ai chatbot to diagnose oph- thalmic diseases and provide high-quality medical responses. npj Digital Medicine 8(1), 64 (2025)
work page 2025
-
[16]
In: Moens, M.F., Huang, X., Specia, L., Yih, S.W.t
Ma, W., Takanobu, R., Huang, M.: CR-walker: Tree-structured graph reasoning and dialog acts for conversational recommendation. In: Moens, M.F., Huang, X., Specia, L., Yih, S.W.t. (eds.) Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. pp. 1839–1851. Association for Compu- tational Linguistics, Online and Punta Cana, ...
work page 2021
-
[17]
Microsoft: Microsoft digital defense report 2023. Tech. rep., Microsoft Corporation (2023), https://www.microsoft.com/en-us/security/security-insider/ microsoft-digital-defense-report-2023, accessed: February 05, 2025
work page 2023
-
[18]
Microsoft: Welcome to microsoft 365 copilot for service (2025), https: //learn.microsoft.com/en-us/microsoft-copilot-service/about-microsoft-copilot- for-service, microsoft Documentation
work page 2025
-
[19]
Microsoft, I.: Presidio: A python-based framework for pii data anonymization and detection (2025), https://microsoft.github.io/presidio/, accessed: 2025-01-12
work page 2025
-
[20]
In: Cybersecurity Vigilance and Security Engineering of Internet of Everything, pp
Morris, J., Tatschner, S., Heinl, M.P., Heinl, P., Newe, T., Plaga, S.: Cybersecurity as a service. In: Cybersecurity Vigilance and Security Engineering of Internet of Everything, pp. 141–161. Springer, Berlin, Germany (2023)
work page 2023
-
[21]
https://nvd.nist.gov/ (2025), accessed: 2025-01-29
National Institute of Standards and Technology: National vulnerability database (NVD). https://nvd.nist.gov/ (2025), accessed: 2025-01-29
work page 2025
-
[22]
Electronic Commerce Research and Applica- tions50, 101098 (2021)
Ngai, E.W., Lee, M.C., Luo, M., Chan, P.S., Liang, T.: An intelligent knowledge- based chatbot for customer service. Electronic Commerce Research and Applica- tions50, 101098 (2021)
work page 2021
-
[23]
In: Proceedings of the SIGCHI conference on Human factors in computing systems
Nielsen, J., Molich, R.: Heuristic evaluation of user interfaces. In: Proceedings of the SIGCHI conference on Human factors in computing systems. pp. 249–256. ACM, Seattle, Washington, USA (1990) 18 Y. Meidan et al
work page 1990
-
[24]
User Modeling and User-Adapted Interaction 22(4), 317–355 (2012)
Pu, P., Chen, L., Hu, R.: Evaluating recommender systems from the user’s perspec- tive: survey of the state of the art. User Modeling and User-Adapted Interaction 22(4), 317–355 (2012)
work page 2012
-
[25]
com/ap/service/ai/customer-service-agents/, salesforce Website
Salesforce: What are ai customer service agents? (2025), https://www.salesforce. com/ap/service/ai/customer-service-agents/, salesforce Website
work page 2025
-
[26]
In: 2024 IEEE 22nd Ju- bilee International Symposium on Intelligent Systems and Informatics (SISY)
Szatmáry, K.S.: Cybersecurity of the gaming industry. In: 2024 IEEE 22nd Ju- bilee International Symposium on Intelligent Systems and Informatics (SISY). pp. 000441–000446. IEEE, IEEE, Subotica, Serbia (2024)
work page 2024
-
[27]
com/en/solutions/use-cases/quicksupport/
TeamViewer GmbH: Teamviewer quicksupport (2023), https://www.teamviewer. com/en/solutions/use-cases/quicksupport/
work page 2023
-
[28]
https://attack.mitre.org/ (2024), ac- cessed: 2025-01-29
The MITRE Corporation: MITRE ATT&CK®: A globally-accessible knowledge base of adversary tactics and techniques. https://attack.mitre.org/ (2024), ac- cessed: 2025-01-29
work page 2024
-
[29]
https: //cve.mitre.org/ (2025), accessed: 2025-01-29
The MITRE Corporation: Common vulnerabilities and exposures (CVE). https: //cve.mitre.org/ (2025), accessed: 2025-01-29
work page 2025
-
[30]
In: 2024 3rd International Conference on Automation, Computing and Renewable Systems (ICACRS)
Tipe-Palomino, D., Auccahuasi, W.: Development of helpdesk chatbot for incident classification and resolution using npl and deep learning. In: 2024 3rd International Conference on Automation, Computing and Renewable Systems (ICACRS). pp. 785–791. IEEE, Pune, India (2024)
work page 2024
-
[31]
Towey, H.: Salesforce ceo marc benioff says ai agents have replaced 4,000 cus- tomer service jobs (Sep 2025), https://fortune.com/2025/09/02/salesforce-ceo- billionaire-marc-benioff-ai-agents-jobs-layoffs-customer-\service-sales/, fortune, accessed September 3, 2025
work page 2025
-
[32]
Uzan, O., Cohen, Y., Levy, M.: Chatbot personalization and empathy: Enhancing customer experience through feedback analysis. Applied Sciences15(17) (2025)
work page 2025
-
[33]
In: 2020 IEEE European Symposium on Security and Privacy Workshops (Eu- roS&PW)
Valeros, V., Garcia, S.: Growth and commoditization of remote access trojans. In: 2020 IEEE European Symposium on Security and Privacy Workshops (Eu- roS&PW). pp. 454–462. IEEE, IEEE, Genoa, Italy (2020)
work page 2020
-
[34]
SoutheastCon 20221, 125–132 (2022)
Varlioglu, S., Elsayed, N., ElSayed, Z., Ozer, M.: The dangerous combo: Fileless malware and cryptojacking. SoutheastCon 20221, 125–132 (2022)
work page 2022
-
[35]
In: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI ’25)
Wang, X., Cao, Y.F., Xiong, J., Chen, S., Li, W., Zhang, J., Li, Q.: Cluecart: Supporting game story interpretation and narrative inference from fragmented clues. In: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI ’25). pp. 1–26. CHI ’25, ACM, New York, NY, USA (2025)
work page 2025
-
[36]
In: International Confer- ence on Database Systems for Advanced Applications
Wang, Z., Ma, W., Zhang, M.: To recommend or not: Recommendability identifi- cation in conversations with pre-trained language models. In: International Confer- ence on Database Systems for Advanced Applications. pp. 19–35. Springer (2024)
work page 2024
-
[37]
Woltman, G.: Prime95 (version 30.19). https://www.mersenne.org/download/ (June 2024), freeware CPU stress-testing and Mersenne prime search tool
work page 2024
-
[38]
Journal of Business Research 162, 113905 (2023)
Xu, Y., Wang, W., Lyu, H.: Anthropomorphic communication styles of chatbots in customer service: Effects on customer satisfaction. Journal of Business Research 162, 113905 (2023)
work page 2023
-
[39]
In: Proceedings of the 18th ACM Confer- ence on Recommender Systems
Yang, T., Chen, L.: Unleashing the retrieval potential of large language models in conversational recommender systems. In: Proceedings of the 18th ACM Confer- ence on Recommender Systems. p. 43–52. RecSys ’24, Association for Computing Machinery, New York, NY, USA (2024)
work page 2024
-
[40]
In: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems
Yeh, S.F., Wu, M.H., Chen, T.Y., Lin, Y.C., Chang, X., Chiang, Y.H., Chang, Y.J.: How to guide task-oriented chatbot users, and when: A mixed-methods study of combinations of chatbot guidance types and timings. In: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. CHI ’22, Association for Computing Machinery, New York, NY, USA ...
work page 2022
-
[41]
Zhou, Y., Li, X., Chen, K.: Adaptive tutoring with large language models: Per- sonalizing explanations to prior knowledge. In: Proceedings of the International Conference on Artificial Intelligence in Education. Springer, Germany (2022) 20 Y. Meidan et al. A Annotation guidelines for conversation logs 1.PC performance –Diagnosis to reach: A running resour...
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.