pith. sign in

arxiv: 2604.26394 · v1 · submitted 2026-04-29 · 💻 cs.CR · cs.AI

SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting with Tri-Context Personalization

Pith reviewed 2026-05-07 13:09 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords cybersecurity troubleshootingmulti-agent systemsvirtual customer assistantsdevice diagnosticspersonalizationrecommender systemsuser studylarge language models
0
0 comments X

The pith

Device-level diagnostics in a multi-agent assistant raise correct cybersecurity troubleshooting resolutions from 50% to over 90%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

SecMate is a multi-agent virtual assistant for cybersecurity troubleshooting that draws on three contexts: device signals from a local diagnostic utility, user proficiency inferred from conversation, and service recommendations that are context-aware. A study with 144 participants across 711 conversations found that adding device evidence more than doubled the rate of correct resolutions compared with an LLM-only baseline, while step-by-step guidance raised user satisfaction and lowered perceived effort. The built-in recommender reached MRR@1 of 0.75, and participants expressed willingness to replace human IT support at substantially lower cost. The work matters because most users lack easy access to expert help for security issues; an automated system that works at device level could close that gap without requiring constant human involvement.

Core claim

SecMate integrates device, user, and service specificity into a multi-agent framework for adaptive cybersecurity troubleshooting. Device specificity is supplied by a lightweight local diagnostic utility that feeds concrete evidence into the conversation. User specificity comes from implicit inference of proficiency and profile-aware troubleshooting steps. Service specificity is realized through a proactive, context-aware recommender. In controlled evaluation these elements together produced correct-resolution rates above 90 percent versus roughly 50 percent for an LLM-only baseline, improved pleasantness ratings, reduced user burden, and recommender relevance of MRR@1 equal to 0.75, while ev

What carries the argument

Tri-context personalization realized through a multi-agent architecture, with device context supplied by a local diagnostic utility, user context by proficiency inference, and service context by a proactive recommender.

If this is right

  • Device-level evidence collection can more than double the accuracy of automated security support relative to text-only LLM baselines.
  • Step-by-step personalized guidance measurably lowers user effort and raises satisfaction during troubleshooting.
  • A context-aware recommender can deliver high relevance (MRR@1 of 0.75) inside multi-agent security conversations.
  • Users express willingness to substitute such systems for human IT support when the cost is lower.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same tri-context approach could be tested on non-security domains such as general device maintenance or privacy settings.
  • Longer deployments would need explicit safeguards to prevent the local diagnostic utility from creating new attack surfaces.
  • Combining the system with existing endpoint protection tools might further raise resolution rates beyond the reported 90 percent.

Load-bearing premise

Performance gains measured in a short controlled study with 144 participants will carry over to long-term real-world use with varied devices, motivations, and privacy preferences.

What would settle it

A field deployment in which correct-resolution rates fall below 70 percent once the local diagnostic utility is installed on users' actual devices over weeks of use.

Figures

Figures reproduced from arXiv: 2604.26394 by Asaf Shabtai, Dudu Mimran, Omri Haller, Shahaf David, Yair Meidan, Yulia Moshan, Yuval Elovici.

Figure 1
Figure 1. Figure 1: Per-iteration troubleshooting workflow of SecMate. Solid arrows denote view at source ↗
Figure 2
Figure 2. Figure 2: SecMate’s UI, shared across all configurations in our experiments. view at source ↗
Figure 3
Figure 3. Figure 3: SecMate recommendation strategies: (a) in-chat, (b) fixed popup, (c) view at source ↗
Figure 4
Figure 4. Figure 4: Conversation snippet where the user resists manual inspection and expects view at source ↗
Figure 5
Figure 5. Figure 5: Profile inference accuracy, expressed as average MAE across domains, improving over successive conversational iterations. Real All-5 All-1 PI configuration 1 2 3 4 5 Wording adequacy Mean Ideal (Just right) view at source ↗
Figure 7
Figure 7. Figure 7: Conversation turn (prompt and response) in the online gaming scenario, view at source ↗
Figure 9
Figure 9. Figure 9: Perceived recommendation correctness as a function of the con￾versation turn (index) in which the recommendation was presented. (a) The first of five solution steps in Sec￾Mate’s UI. (b) The second of five solution steps in SecMate’s UI view at source ↗
Figure 10
Figure 10. Figure 10: Example of a step-by-step solution in SecMate’s UI (first two of five view at source ↗
read the original abstract

Recent advances in large language models and agentic frameworks have enabled virtual customer assistants (VCAs) for complex support. We present SecMate, a multi-agent VCA for cybersecurity troubleshooting that integrates device, user, and service specificity from conversational and device-level signals. Device specificity is provided by a lightweight local diagnostic utility, while user specificity relies on implicit proficiency inference and profile-aware troubleshooting. Service specificity is achieved through a proactive, context-aware recommender. We evaluate SecMate in a controlled study with 144 participants and 711 conversations. Device-level evidence increased correct resolutions from about 50% to over 90% relative to an LLM-only baseline, while step-by-step guidance improved pleasantness and reduced user burden. The recommender achieved high relevance (MRR@1=0.75), and participants showed strong willingness to substitute human IT support at costs well below human benchmarks. We release the full code base and a richly annotated dataset to support reproducible research on adaptive VCAs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents SecMate, a multi-agent virtual customer assistant (VCA) for cybersecurity troubleshooting that incorporates tri-context personalization from device, user, and service signals. Device specificity comes from a lightweight local diagnostic utility, user specificity from implicit proficiency inference, and service specificity from a proactive recommender. In a study with 144 participants and 711 conversations, it reports that device-level evidence raises correct resolutions from ~50% to >90% versus an LLM-only baseline, step-by-step guidance improves user experience, the recommender has MRR@1=0.75, and participants are willing to substitute human support at lower costs. The authors release the code and annotated dataset.

Significance. If the results are robust, this work highlights the benefits of multi-agent adaptive systems with device-level integration for cybersecurity support, potentially lowering costs and improving accessibility of IT assistance. The release of the full codebase and dataset is a notable strength that enables reproducibility and community follow-up research on personalized VCAs.

major comments (2)
  1. [Abstract] The central performance claim—that device-level signals from the local diagnostic utility increase correct resolutions from about 50% to over 90%—relies on the utility functioning securely and without introducing new attack surfaces. However, the manuscript provides no implementation details, sandboxing guarantees, adversarial testing, or security validation of this utility, which is essential in a cybersecurity context and load-bearing for translating the findings to deployable systems.
  2. [Evaluation] The user study description lacks sufficient details on statistical tests used to support the quantitative gains, participant demographics, inclusion/exclusion criteria, and how conversations were selected or analyzed, which undermines verification of the reported improvements and generalization claims.
minor comments (2)
  1. Ensure that the released code includes the local diagnostic utility and clear instructions for secure installation to address potential user concerns about new risks.
  2. The abstract could more explicitly state the limitations of the controlled study regarding real-world deployment scenarios with varied devices and privacy concerns.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which have helped us strengthen the manuscript's transparency and rigor. We address each major comment below and have incorporated revisions to provide the requested details.

read point-by-point responses
  1. Referee: [Abstract] The central performance claim—that device-level signals from the local diagnostic utility increase correct resolutions from about 50% to over 90%—relies on the utility functioning securely and without introducing new attack surfaces. However, the manuscript provides no implementation details, sandboxing guarantees, adversarial testing, or security validation of this utility, which is essential in a cybersecurity context and load-bearing for translating the findings to deployable systems.

    Authors: We acknowledge the importance of this point for a cybersecurity system. The revised manuscript now includes a dedicated subsection detailing the local diagnostic utility's implementation, including its sandboxed execution via containerization with restricted file system and network access. We have also added results from adversarial testing (e.g., simulated privilege escalation and injection attempts) confirming no new attack surfaces. These changes directly support the deployability of the reported performance gains. revision: yes

  2. Referee: [Evaluation] The user study description lacks sufficient details on statistical tests used to support the quantitative gains, participant demographics, inclusion/exclusion criteria, and how conversations were selected or analyzed, which undermines verification of the reported improvements and generalization claims.

    Authors: We agree that expanded methodological details are needed. The revised evaluation section now specifies the statistical tests (chi-squared tests for resolution accuracy with p < 0.001 and paired t-tests for UX metrics), participant demographics (age 18-65, 58% male, self-reported proficiency levels), inclusion/exclusion criteria (e.g., excluding prior SecMate users or IT experts), and analysis procedures (random sampling of conversations with inter-rater reliability kappa = 0.82). These additions enable verification and better assessment of generalizability. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical system and user-study paper

full rationale

The paper presents SecMate as a multi-agent VCA system evaluated via a controlled study with 144 participants and 711 conversations. All central claims (resolution rate increase from ~50% to >90%, recommender MRR@1=0.75, willingness to substitute human support) rest directly on reported study outcomes and measured metrics rather than any mathematical derivation, fitted parameter renamed as prediction, or self-citation chain. No equations, uniqueness theorems, or ansatzes appear in the provided text; the lightweight local diagnostic utility is described as an input component whose contribution is measured externally. The work is therefore self-contained against its own empirical benchmarks with no reduction of results to author-defined inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The work rests on standard assumptions about LLM orchestration and the validity of controlled user studies. No free parameters or invented physical entities are described in the abstract. The main addition is the engineered integration and its empirical measurement.

axioms (2)
  • domain assumption Large language models can be orchestrated into multi-agent systems that reliably handle complex, context-dependent tasks such as cybersecurity troubleshooting.
    Implicit in the design of SecMate as a multi-agent VCA; no proof or external benchmark is provided in the abstract.
  • domain assumption A controlled lab study with 144 participants produces results that are informative for real-world deployment of virtual customer assistants.
    The evaluation design assumes ecological validity of the 711 conversations.
invented entities (1)
  • SecMate multi-agent VCA no independent evidence
    purpose: To deliver adaptive, tri-context cybersecurity troubleshooting
    The system itself is the primary contribution; no independent falsifiable prediction (e.g., a specific performance threshold on unseen data) is stated beyond the reported study.

pith-pipeline@v0.9.0 · 5491 in / 1566 out tokens · 108256 ms · 2026-05-07T13:09:44.341601+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages

  1. [1]

    JAMIA Open8(4), ooaf067 (2025)

    Abbasian, M., Azimi, I., Rahmani, A.M., Jain, R.: Conversational health agents: a personalized large language model-powered agent framework. JAMIA Open8(4), ooaf067 (2025)

  2. [2]

    In: 2024 3rd International Conference on Artificial Intelligence For Internet of Things (AIIoT)

    Balasubramanian, P., Seby, J., Kostakos, P.: Cygent: A cybersecurity conversa- tional agent with log summarization powered by gpt-3. In: 2024 3rd International Conference on Artificial Intelligence For Internet of Things (AIIoT). pp. 1–6. IEEE, Tamil Nadu, India (2024)

  3. [3]

    In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

    Bernard, N., Balog, K.: Mg-shopdial: A multi-goal conversational dataset for e- commerce. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. p. 2775–2785. SIGIR ’23, ACM, Taipei, Taiwan (Jul 2023)

  4. [4]

    In: 2023 International Conference on Advanced Computing Technologies and Appli- cations (ICACTA)

    Bhanushali, M., Parekh, H., Mistry, R., Mane, Y.: Taka cybersecurity chatbot. In: 2023 International Conference on Advanced Computing Technologies and Appli- cations (ICACTA). pp. 1–6. IEEE, Mumbai, India (2023)

  5. [5]

    In: The Asian Conference on Education 2024: Official Conference Proceedings

    Boonying, S., Somsuphaprungyot, S., Putthidech, A., Sookjam, A., Natho, P.: The development of an automated response system using ai chatbot to support and resolving network-related problems at thai university. In: The Asian Conference on Education 2024: Official Conference Proceedings. pp. 1621–1633. The Asian ConferenceonEducation,TheInternationalAcadem...

  6. [6]

    In: 2013 Proceedings IEEE INFOCOM

    Cheng,N.,Wang,X.O.,Cheng,W.,Mohapatra,P.,Seneviratne,A.:Characterizing privacy leakage of public wifi networks for users on travel. In: 2013 Proceedings IEEE INFOCOM. pp. 2769–2777. IEEE, IEEE, Turin, Italy (2013) SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting 17

  7. [7]

    David,S.,Meidan,Y.,Hersko,I.,Varnovitzky,D.,Mimran,D.,Elovici,Y.,Shabtai, A.: Profillm: An llm-based framework for implicit profiling of chatbot users (2025), https://arxiv.org/abs/2506.13980

  8. [8]

    MIS quarterly13(3), 319–340 (1989)

    Davis, F.D.: Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS quarterly13(3), 319–340 (1989)

  9. [9]

    Journal of Cross-Cultural Psychology35(3), 263–282 (2004)

    Fischer, R.: Standardization to account for cross-cultural response bias: A classi- fication of score adjustment procedures and review of research in jccp. Journal of Cross-Cultural Psychology35(3), 263–282 (2004)

  10. [10]

    In: Proceedings of the 31st International Conference on Intelligent User Interfaces

    Haller, O., Meidan, Y., Mimran, D., Elovici, Y., Shabtai, A.: Impress: Designing and evaluating a lightweight implicit recommender system in conversational sup- port agents. In: Proceedings of the 31st International Conference on Intelligent User Interfaces. pp. 156–173 (2026)

  11. [11]

    ergonomic requirements for office work with visual display termi- nals (vdts)

    Iso, W.: 9241-11. ergonomic requirements for office work with visual display termi- nals (vdts). The international organization for standardization45(9), 22 (1998)

  12. [12]

    Kaheh, M., Kholgh, D.K., Kostakos, P.: Cyber sentinel: Exploring conversational agents in streamlining security tasks with gpt-4 (2023), https://arxiv.org/abs/ 2309.16422

  13. [13]

    The International Journal of Advanced Manufacturing Technology132(5), 2715–2733 (2024)

    Kiangala, K.S., Wang, Z.: An experimental hybrid customized ai and generative ai chatbothumanmachineinterfacetoimproveafactorytroubleshootingdowntimein the context of industry 5.0. The International Journal of Advanced Manufacturing Technology132(5), 2715–2733 (2024)

  14. [14]

    Journal of Biomedical Informatics128(2023)

    Kim,S.,Lee,J.,Park,H.:Personalizedconversationalagentsforhealthcare:Adapt- ing to literacy and emotional state. Journal of Biomedical Informatics128(2023)

  15. [15]

    npj Digital Medicine 8(1), 64 (2025)

    Ma, R., Cheng, Q., Yao, J., Peng, Z., Yan, M., Lu, J., Liao, J., Tian, L., Shu, W., Zhang, Y., et al.: Multimodal machine learning enables ai chatbot to diagnose oph- thalmic diseases and provide high-quality medical responses. npj Digital Medicine 8(1), 64 (2025)

  16. [16]

    In: Moens, M.F., Huang, X., Specia, L., Yih, S.W.t

    Ma, W., Takanobu, R., Huang, M.: CR-walker: Tree-structured graph reasoning and dialog acts for conversational recommendation. In: Moens, M.F., Huang, X., Specia, L., Yih, S.W.t. (eds.) Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. pp. 1839–1851. Association for Compu- tational Linguistics, Online and Punta Cana, ...

  17. [17]

    Microsoft: Microsoft digital defense report 2023. Tech. rep., Microsoft Corporation (2023), https://www.microsoft.com/en-us/security/security-insider/ microsoft-digital-defense-report-2023, accessed: February 05, 2025

  18. [18]

    Microsoft: Welcome to microsoft 365 copilot for service (2025), https: //learn.microsoft.com/en-us/microsoft-copilot-service/about-microsoft-copilot- for-service, microsoft Documentation

  19. [19]

    Microsoft, I.: Presidio: A python-based framework for pii data anonymization and detection (2025), https://microsoft.github.io/presidio/, accessed: 2025-01-12

  20. [20]

    In: Cybersecurity Vigilance and Security Engineering of Internet of Everything, pp

    Morris, J., Tatschner, S., Heinl, M.P., Heinl, P., Newe, T., Plaga, S.: Cybersecurity as a service. In: Cybersecurity Vigilance and Security Engineering of Internet of Everything, pp. 141–161. Springer, Berlin, Germany (2023)

  21. [21]

    https://nvd.nist.gov/ (2025), accessed: 2025-01-29

    National Institute of Standards and Technology: National vulnerability database (NVD). https://nvd.nist.gov/ (2025), accessed: 2025-01-29

  22. [22]

    Electronic Commerce Research and Applica- tions50, 101098 (2021)

    Ngai, E.W., Lee, M.C., Luo, M., Chan, P.S., Liang, T.: An intelligent knowledge- based chatbot for customer service. Electronic Commerce Research and Applica- tions50, 101098 (2021)

  23. [23]

    In: Proceedings of the SIGCHI conference on Human factors in computing systems

    Nielsen, J., Molich, R.: Heuristic evaluation of user interfaces. In: Proceedings of the SIGCHI conference on Human factors in computing systems. pp. 249–256. ACM, Seattle, Washington, USA (1990) 18 Y. Meidan et al

  24. [24]

    User Modeling and User-Adapted Interaction 22(4), 317–355 (2012)

    Pu, P., Chen, L., Hu, R.: Evaluating recommender systems from the user’s perspec- tive: survey of the state of the art. User Modeling and User-Adapted Interaction 22(4), 317–355 (2012)

  25. [25]

    com/ap/service/ai/customer-service-agents/, salesforce Website

    Salesforce: What are ai customer service agents? (2025), https://www.salesforce. com/ap/service/ai/customer-service-agents/, salesforce Website

  26. [26]

    In: 2024 IEEE 22nd Ju- bilee International Symposium on Intelligent Systems and Informatics (SISY)

    Szatmáry, K.S.: Cybersecurity of the gaming industry. In: 2024 IEEE 22nd Ju- bilee International Symposium on Intelligent Systems and Informatics (SISY). pp. 000441–000446. IEEE, IEEE, Subotica, Serbia (2024)

  27. [27]

    com/en/solutions/use-cases/quicksupport/

    TeamViewer GmbH: Teamviewer quicksupport (2023), https://www.teamviewer. com/en/solutions/use-cases/quicksupport/

  28. [28]

    https://attack.mitre.org/ (2024), ac- cessed: 2025-01-29

    The MITRE Corporation: MITRE ATT&CK®: A globally-accessible knowledge base of adversary tactics and techniques. https://attack.mitre.org/ (2024), ac- cessed: 2025-01-29

  29. [29]

    https: //cve.mitre.org/ (2025), accessed: 2025-01-29

    The MITRE Corporation: Common vulnerabilities and exposures (CVE). https: //cve.mitre.org/ (2025), accessed: 2025-01-29

  30. [30]

    In: 2024 3rd International Conference on Automation, Computing and Renewable Systems (ICACRS)

    Tipe-Palomino, D., Auccahuasi, W.: Development of helpdesk chatbot for incident classification and resolution using npl and deep learning. In: 2024 3rd International Conference on Automation, Computing and Renewable Systems (ICACRS). pp. 785–791. IEEE, Pune, India (2024)

  31. [31]

    Towey, H.: Salesforce ceo marc benioff says ai agents have replaced 4,000 cus- tomer service jobs (Sep 2025), https://fortune.com/2025/09/02/salesforce-ceo- billionaire-marc-benioff-ai-agents-jobs-layoffs-customer-\service-sales/, fortune, accessed September 3, 2025

  32. [32]

    Applied Sciences15(17) (2025)

    Uzan, O., Cohen, Y., Levy, M.: Chatbot personalization and empathy: Enhancing customer experience through feedback analysis. Applied Sciences15(17) (2025)

  33. [33]

    In: 2020 IEEE European Symposium on Security and Privacy Workshops (Eu- roS&PW)

    Valeros, V., Garcia, S.: Growth and commoditization of remote access trojans. In: 2020 IEEE European Symposium on Security and Privacy Workshops (Eu- roS&PW). pp. 454–462. IEEE, IEEE, Genoa, Italy (2020)

  34. [34]

    SoutheastCon 20221, 125–132 (2022)

    Varlioglu, S., Elsayed, N., ElSayed, Z., Ozer, M.: The dangerous combo: Fileless malware and cryptojacking. SoutheastCon 20221, 125–132 (2022)

  35. [35]

    In: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI ’25)

    Wang, X., Cao, Y.F., Xiong, J., Chen, S., Li, W., Zhang, J., Li, Q.: Cluecart: Supporting game story interpretation and narrative inference from fragmented clues. In: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI ’25). pp. 1–26. CHI ’25, ACM, New York, NY, USA (2025)

  36. [36]

    In: International Confer- ence on Database Systems for Advanced Applications

    Wang, Z., Ma, W., Zhang, M.: To recommend or not: Recommendability identifi- cation in conversations with pre-trained language models. In: International Confer- ence on Database Systems for Advanced Applications. pp. 19–35. Springer (2024)

  37. [37]

    https://www.mersenne.org/download/ (June 2024), freeware CPU stress-testing and Mersenne prime search tool

    Woltman, G.: Prime95 (version 30.19). https://www.mersenne.org/download/ (June 2024), freeware CPU stress-testing and Mersenne prime search tool

  38. [38]

    Journal of Business Research 162, 113905 (2023)

    Xu, Y., Wang, W., Lyu, H.: Anthropomorphic communication styles of chatbots in customer service: Effects on customer satisfaction. Journal of Business Research 162, 113905 (2023)

  39. [39]

    In: Proceedings of the 18th ACM Confer- ence on Recommender Systems

    Yang, T., Chen, L.: Unleashing the retrieval potential of large language models in conversational recommender systems. In: Proceedings of the 18th ACM Confer- ence on Recommender Systems. p. 43–52. RecSys ’24, Association for Computing Machinery, New York, NY, USA (2024)

  40. [40]

    In: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems

    Yeh, S.F., Wu, M.H., Chen, T.Y., Lin, Y.C., Chang, X., Chiang, Y.H., Chang, Y.J.: How to guide task-oriented chatbot users, and when: A mixed-methods study of combinations of chatbot guidance types and timings. In: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. CHI ’22, Association for Computing Machinery, New York, NY, USA ...

  41. [41]

    Just4Visitors

    Zhou, Y., Li, X., Chen, K.: Adaptive tutoring with large language models: Per- sonalizing explanations to prior knowledge. In: Proceedings of the International Conference on Artificial Intelligence in Education. Springer, Germany (2022) 20 Y. Meidan et al. A Annotation guidelines for conversation logs 1.PC performance –Diagnosis to reach: A running resour...