pith. sign in

arxiv: 2605.15569 · v1 · pith:IS74WVWBnew · submitted 2026-05-15 · 💻 cs.CR · cs.AI· cs.SE

Detecting Privilege Escalation in Polyglot Microservices via Agentic Program Analysis

Pith reviewed 2026-05-20 18:32 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.SE
keywords privilege escalationmicroservicesprogram analysisLLM agentspolyglot codebaseszero-day vulnerabilitiescloud security
0
0 comments X

The pith

Neo uses an LLM agent to detect privilege escalation vulnerabilities across polyglot microservices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Neo, a framework combining large language models with classic program analysis to detect privilege escalation in microservice architectures. Microservices introduce complexity in permission control due to cross-service interactions and polyglot codebases, making manual detection difficult. Neo's agent dynamically generates plans, adapts search strategies, and validates semantics to find these vulnerabilities. Evaluation on 25 applications showed it uncovering 24 zero-day issues with strong precision and recall, suggesting it can help secure modern cloud systems more effectively.

Core claim

We present Neo, an agentic program analysis framework that combines large language models (LLMs) with classic program analysis to address these challenges. Neo leverages an LLM-based agent that dynamically generates analysis plans, adapts code search strategies, and validates semantics. We develop code search primitives that enable Neo to perform scalable and flexible code exploration across services and languages. We evaluated Neo on 25 open-source microservice applications spanning 7 programming languages and 6.2 million lines of code. Neo uncovered 24 zero-day privilege escalation vulnerabilities and achieved 81.0% precision and 85.0% recall on a ground-truth dataset.

What carries the argument

Neo, the LLM-based agentic program analysis framework that dynamically generates analysis plans and applies code search primitives to explore permission checks across services and languages.

If this is right

  • Neo scales to applications spanning 6.2 million lines of code in seven languages.
  • It achieves 81.0 percent precision and 85.0 percent recall on ground-truth privilege escalation data.
  • It improves detection accuracy and scalability over prior program analysis and agentic methods.
  • The same framework extends to other domains and vulnerability types, as shown by discovering 18 additional zero-day issues.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Agentic planning could be adapted to trace other cross-service security properties such as data leaks or authentication bypasses.
  • Teams maintaining large microservice fleets might embed similar tools in continuous integration to catch permission errors before release.
  • The hybrid LLM-plus-classic-analysis pattern may transfer to static analysis tasks outside security, such as performance or compliance checks.

Load-bearing premise

The LLM-based agent can reliably generate correct analysis plans, adapt search strategies, and validate semantics across polyglot codebases without introducing critical errors or missing cross-service interactions.

What would settle it

Applying Neo to a microservice application where all privilege escalation paths have been exhaustively mapped by hand and checking whether the tool reports exactly those paths at the claimed precision and recall.

Figures

Figures reproduced from arXiv: 2605.15569 by Hong Yau Chong, Junfeng Yang, Penghui Li, Yinzhi Cao.

Figure 1
Figure 1. Figure 1: A user role update flows through three services. When the user Alice initiates a profile update to switch to the developer role, the request flows through three services. The Gateway service acts as the system’s entry point and performs authentication by verifying Alice’s credentials and routing her request to the appropriate backend. The UserPro￾file service handles user-specific updates and prepares the … view at source ↗
Figure 2
Figure 2. Figure 2: Code implementation of the cross-service privilege escalation vulnerability in [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Workflow of NEO. The LLM agent uses code search primitives to iteratively identify privileged operations, trace cross-service flows, and validate security checks. role being requested, e.g., dev or admin. This creates a severe privilege escalation vulnerability, allowing any authenticated user to assume higher-privileged roles. 3.2. Challenges We revisit the three challenges introduced in §1 using the code… view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of inter-service analysis. Global Flow Qglobalf low(source, sink). This query traces data flow across multiple services from a source element to a sink element, connecting intra-service flows through inter￾service communication points. Unlike Qf low that operates within a single service, Qglobalf low constructs a global reachability graph. As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p… view at source ↗
Figure 5
Figure 5. Figure 5: Impact of model selection and temperature on vulnerability detection. As shown in [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Impact of validation strategies. There are 135 flows initially. Summary: Application-specific privileged operations in￾crease recall from 33.3% to 92.9%, while code search primitives enable flexible search and avoid failures. The validation strategies together eliminate 62.2% of false positives. 6.4. Comparison In this section, we compare NEO with prior classic and agentic state-of-the-art program analysis… view at source ↗
Figure 7
Figure 7. Figure 7: Distribution of true positive detections by each approach. MScan and CodeQL are integrated with NEO’s validation and marked with a suffix *. MScan* is evaluated on Java-only vul￾nerabilities, while all other approaches are evaluated on the full polyglot dataset. MScan*+Sinks denotes MScan* enhanced with NEO’s identified privileged operations. interaction analysis. To enable a meaningful comparison, we inte… view at source ↗
Figure 8
Figure 8. Figure 8: A (simplified) privilege escalation in Newbee Mall that allows setting an arbitrary order as paid. 7. Discussion Limitations. We acknowledge a few limitations of NEO’s current implementation. First, NEO’s code search primitives are built on CodeQL and inherit its limitations in handling dynamic language features (e.g., dynamic function calls and reflection) and pointer aliases. While NEO supports multiple … view at source ↗
read the original abstract

Microservices are widely adopted in modern cloud systems due to their scalability and fault tolerance. However, microservice architectures introduce significant complexity in privilege and permission control, creating risks of privilege escalation where attackers can gain unauthorized access to resources or operations. Detecting such vulnerabilities is challenging due to complex cross-service interactions, polyglot codebases, and diverse privileged operations and permission checks. We present Neo, an agentic program analysis framework that combines large language models (LLMs) with classic program analysis to address these challenges. Neo leverages an LLM-based agent that dynamically generates analysis plans, adapts code search strategies, and validates semantics. We develop code search primitives that enable Neo to perform scalable and flexible code exploration across services and languages. We evaluated Neo on 25 open-source microservice applications spanning 7 programming languages and 6.2 million lines of code. Neo uncovered 24 zero-day privilege escalation vulnerabilities and achieved 81.0% precision and 85.0% recall on a ground-truth dataset. Compared to existing program analysis and agentic solutions, Neo demonstrated significant improvements in both detection accuracy and scalability. We further showcased Neo's extensibility by applying it to other application domains and vulnerability types, uncovering 18 additional zero-day vulnerabilities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents Neo, an agentic program analysis framework that integrates LLMs with classic program analysis to detect privilege escalation vulnerabilities in polyglot microservices. An LLM-based agent dynamically generates analysis plans, adapts code search strategies, and validates semantics, supported by new code search primitives for cross-service and cross-language exploration. Evaluation on 25 open-source microservice applications (7 languages, 6.2 million LOC) reports discovery of 24 zero-day vulnerabilities along with 81.0% precision and 85.0% recall on a ground-truth dataset, plus outperformance versus existing tools and extensibility to other vulnerability types with 18 additional zero-days found.

Significance. Privilege escalation in microservices is a practically important security problem given the prevalence of polyglot, distributed architectures. If the reported metrics and zero-day findings hold under independent scrutiny, Neo would represent a meaningful advance by showing how agentic LLM planning can scale program analysis across languages and services where purely static or manual methods fall short. The scale of the evaluation (25 applications) and the concrete count of zero-days are positive indicators of real-world relevance; the extensibility demonstration is also a strength.

major comments (2)
  1. [Evaluation] Evaluation section: The construction of the ground-truth dataset used for the 81.0% precision and 85.0% recall figures is not described in adequate detail. The manuscript provides no information on the labeling protocol for privilege-escalation paths (including cross-service and cross-language flows), whether labels were produced independently of Neo's own plans and semantic validations, inter-rater agreement, or post-hoc expert audit of the 24 reported zero-days. This information is load-bearing for the central empirical claims.
  2. [Evaluation] Evaluation section: Full baseline comparisons are referenced in the abstract but lack quantitative detail (e.g., per-baseline precision/recall tables or breakdowns by language or application). Without these, the claim of 'significant improvements in both detection accuracy and scalability' cannot be verified at the level required to support the headline results.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'significant improvements' would benefit from a brief parenthetical reference to the specific baseline families being compared.
  2. The manuscript would be clearer if it included a short table or paragraph summarizing the distribution of the 25 applications across languages and sizes.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the evaluation methodology. We address each major comment below and will revise the manuscript to provide the requested details on ground-truth construction and baseline comparisons.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: The construction of the ground-truth dataset used for the 81.0% precision and 85.0% recall figures is not described in adequate detail. The manuscript provides no information on the labeling protocol for privilege-escalation paths (including cross-service and cross-language flows), whether labels were produced independently of Neo's own plans and semantic validations, inter-rater agreement, or post-hoc expert audit of the 24 reported zero-days. This information is load-bearing for the central empirical claims.

    Authors: We agree that the ground-truth dataset construction requires more explicit description to support the reported precision and recall. In the revised manuscript we will add a dedicated subsection detailing the labeling protocol, including the process for identifying cross-service and cross-language privilege-escalation paths. We will state that labels were assigned by independent security researchers who had no involvement in designing or running Neo's agent plans or semantic validations, report inter-rater agreement (Cohen's kappa), and describe the post-hoc expert audit performed on the 24 zero-day findings, including audit criteria and resolution of any discrepancies. revision: yes

  2. Referee: [Evaluation] Evaluation section: Full baseline comparisons are referenced in the abstract but lack quantitative detail (e.g., per-baseline precision/recall tables or breakdowns by language or application). Without these, the claim of 'significant improvements in both detection accuracy and scalability' cannot be verified at the level required to support the headline results.

    Authors: We acknowledge the need for greater quantitative transparency in the baseline comparisons. The revised Evaluation section will include expanded tables reporting precision and recall for every baseline tool, with additional breakdowns by programming language and by application. We will also add explicit scalability metrics (analysis runtime and memory consumption per application) to allow direct verification of the claimed improvements in accuracy and scalability. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation on external open-source benchmarks

full rationale

The paper describes an LLM-augmented program analysis system evaluated on 25 independent open-source microservice applications (6.2M LOC across 7 languages). Reported metrics (81% precision, 85% recall, 24 zero-days) are standard information-retrieval quantities computed against a ground-truth dataset drawn from those external codebases. No equations, fitted parameters, self-definitional constructs, or load-bearing self-citations appear in the abstract or described methodology that would reduce the central claims to the inputs by construction. The evaluation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that current LLMs can perform reliable semantic validation and plan adaptation on real codebases; no free parameters or invented entities are introduced in the abstract description.

axioms (1)
  • domain assumption Large language models can understand and reason about code semantics and permission logic across different programming languages with sufficient accuracy for vulnerability detection.
    This assumption underpins the agentic planning and validation steps described in the abstract.

pith-pipeline@v0.9.0 · 5758 in / 1258 out tokens · 58599 ms · 2026-05-20T18:32:19.058466+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

77 extracted references · 77 canonical work pages · 1 internal anchor

  1. [1]

    Microservice architecture,

    C. Richardson, “Microservice architecture,” 2025, https://microservices. io/

  2. [2]

    Newman,Monolith to microservices: evolutionary patterns to transform your monolith

    S. Newman,Monolith to microservices: evolutionary patterns to transform your monolith. O’Reilly Media, 2019

  3. [3]

    Decomposition of monolith applications into microservices architectures: A systematic review,

    Y . Abgaz, A. McCarren, P. Elger, D. Solan, N. Lapuz, M. Bivol, G. Jackson, M. Yilmaz, J. Buckley, and P. Clarke, “Decomposition of monolith applications into microservices architectures: A systematic review,”IEEE Transactions on Software Engineering, vol. 49, no. 8, pp. 4213–4242, 2023

  4. [4]

    What led amazon to its own microservices architecture,

    R. Brigham, “What led amazon to its own microservices architecture,” 2015, https://thenewstack.io/led-amazon-microservices-architecture/

  5. [5]

    Microservice ecosystems of google and ebay,

    R. Shoup, “Microservice ecosystems of google and ebay,” 2015, https://highscalability.com/ deep-lessons-from-google-and-ebay-on-building-ecosystems-of/

  6. [6]

    What are microservices?

    A. W. Services, “What are microservices?” 2025, https://aws.amazon. com/microservices/

  7. [7]

    Microservices architecture on google cloud,

    G. Cloud, “Microservices architecture on google cloud,” 2025, https://cloud.google.com/blog/topics/developers-practitioners/ microservices-architecture-google-cloud

  8. [8]

    Mall4Cloud: A microservices-based e-commerce platform,

    Gz-yami, “Mall4Cloud: A microservices-based e-commerce platform,” 2025, accessed: October 2025. [Online]. Available: https://github.com/ gz-yami/mall4cloud

  9. [9]

    Crayfish homarus microservice remote code execution vulnerability,

    “Crayfish homarus microservice remote code execution vulnerability,” 2024, https://nvd.nist.gov/vuln/detail/CVE-2025-25286

  10. [10]

    Cisco ultra cloud core - subscriber microservices infrastructure privilege escalation vulnerability,

    “Cisco ultra cloud core - subscriber microservices infrastructure privilege escalation vulnerability,” 2022, https://www.cisco.com/c/en/ us/support/docs/csa/cisco-sa-uccsmi-prvesc-BQHGe4cm.html

  11. [11]

    Automatic policy generation for inter−service access control of microservices,

    X. Li, Y . Chen, Z. Lin, X. Wang, and J. H. Chen, “Automatic policy generation for inter−service access control of microservices,” in 30th USENIX Security Symposium (USENIX Security 21), 2021, pp. 3971–3988

  12. [12]

    grpc: A high performance, open source universal rpc framework,

    “grpc: A high performance, open source universal rpc framework,” 2025, https://grpc.io/

  13. [13]

    Representational state transfer (rest),

    R. T. Fielding, “Representational state transfer (rest),” 2000, https: //www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm

  14. [14]

    Apache kafka: A distributed streaming platform,

    “Apache kafka: A distributed streaming platform,” 2025, https://kafka. apache.org/

  15. [15]

    Mace: De- tecting privilege escalation vulnerabilities in web applications,

    M. Monshizadeh, P. Naldurg, and V . Venkatakrishnan, “Mace: De- tecting privilege escalation vulnerabilities in web applications,” in Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, 2014, pp. 690–701

  16. [16]

    Rolecast: finding missing security checks when you do not know what checks are,

    S. Son, K. S. McKinley, and V . Shmatikov, “Rolecast: finding missing security checks when you do not know what checks are,” inProceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications, 2011, pp. 1069–1084

  17. [17]

    Facilitating access control vulnerability detection in modern java web applications with accurate permission check identification,

    Y . Shi, F. Liu, G. Yang, Y . Zhang, Y . Cao, E. Li, X. Tan, X. Luo, M. Yang, and S. Chen, “Facilitating access control vulnerability detection in modern java web applications with accurate permission check identification,”IEEE Transactions on Information Forensics and Security, 2025

  18. [18]

    Mocguard: Automatically detecting missing-owner-check vulnerabilities in java web applications,

    F. Liu, Y . Shi, Y . Zhang, G. Yang, E. Li, and M. Yang, “Mocguard: Automatically detecting missing-owner-check vulnerabilities in java web applications,” in2025 IEEE Symposium on Security and Privacy (SP). IEEE, 2025, pp. 903–919

  19. [19]

    Detecting missing- permission-check vulnerabilities in distributed cloud systems,

    J. Lu, H. Li, C. Liu, L. Li, and K. Cheng, “Detecting missing- permission-check vulnerabilities in distributed cloud systems,” in Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 2022, pp. 2145–2158

  20. [20]

    Detecting taint-style vulnerabilities in microservice-structured web applications,

    F. Liu, Y . Zhang, T. Chen, Y . Shi, G. Yang, Z. Lin, M. Yang, J. He, and Q. Li, “Detecting taint-style vulnerabilities in microservice-structured web applications,” in2025 IEEE Symposium on Security and Privacy (SP). IEEE, 2025, pp. 972–990

  21. [21]

    Datalog- based language-agnostic change impact analysis for microservices,

    Q. Shi, X. Xie, X. Fu, P. Di, H. Li, A. Zhou, and G. Fan, “Datalog- based language-agnostic change impact analysis for microservices,” in2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE Computer Society, 2025, pp. 652–652

  22. [22]

    Privi- lege escalation attacks on android,

    L. Davi, A. Dmitrienko, A.-R. Sadeghi, and M. Winandy, “Privi- lege escalation attacks on android,” ininternational conference on Information security. Springer, 2010, pp. 346–360

  23. [23]

    CodeQL: Discover vulnerabilities across a codebase with queries,

    GitHub, Inc., “CodeQL: Discover vulnerabilities across a codebase with queries,” 2025, accessed: October 2025. [Online]. Available: https://codeql.github.com/

  24. [24]

    SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

    J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, S. Yao, K. R. Narasimhan, and O. Press, “SWE-agent: Agent-computer interfaces enable automated software engineering,” inThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. [Online]. Available: https://arxiv.org/abs/2405.15793

  25. [25]

    Openhands: An open platform for AI software developers as generalist agents,

    X. Wang, B. Li, Y . Song, F. F. Xu, X. Tang, M. Zhuge, J. Pan, Y . Song, B. Li, J. Singh, H. H. Tran, F. Li, R. Ma, M. Zheng, B. Qian, Y . Shao, N. Muennighoff, Y . Zhang, B. Hui, J. Lin, R. Brennan, H. Peng, H. Ji, and G. Neubig, “Openhands: An open platform for AI software developers as generalist agents,” inThe Thirteenth International Conference on Le...

  26. [26]

    EnIGMA: Interactive tools substantially assist LM agents in finding security vulnerabilities,

    T. Abramovich, M. Udeshi, M. Shao, K. Lieret, H. Xi, K. Milner, S. Jancheska, J. Yang, C. E. Jimenez, F. Khorrami, P. Krishnamurthy, B. Dolan-Gavitt, M. Shafique, K. R. Narasimhan, R. Karri, and O. Press, “EnIGMA: Interactive tools substantially assist LM agents in finding security vulnerabilities,” inForty-second International Conference on Machine Learn...

  27. [27]

    GraphQL specification,

    GraphQL Foundation, “GraphQL specification,” 2021, accessed: Octo- ber 2025. [Online]. Available: https://spec.graphql.org/September2025/

  28. [28]

    RabbitMQ: Messaging that just works,

    VMware, Inc., “RabbitMQ: Messaging that just works,” 2024, accessed: October 2025. [Online]. Available: https://www.rabbitmq. com/

  29. [29]

    Demystifying llm-based software engineering agents,

    C. S. Xia, Y . Deng, S. Dunn, and L. Zhang, “Demystifying llm-based software engineering agents,”Proc. ACM Softw. Eng., vol. 2, no. FSE, Jun. 2025. [Online]. Available: https://doi.org/10.1145/3715754

  30. [30]

    Enhancing static analysis for practical bug detection: An llm-integrated approach

    H. Li, Y . Hao, Y . Zhai, and Z. Qian, “Enhancing static analysis for practical bug detection: An llm-integrated approach,”Proc. ACM Program. Lang., vol. 8, no. OOPSLA1, Apr. 2024. [Online]. Available: https://doi.org/10.1145/3649828

  31. [31]

    Modeling and discovering vulnerabilities with code property graphs,

    F. Yamaguchi, N. Golde, D. Arp, and K. Rieck, “Modeling and discovering vulnerabilities with code property graphs,” in2014 IEEE symposium on security and privacy. IEEE, 2014, pp. 590–604

  32. [32]

    CodeQL queries,

    GitHub, Inc., “CodeQL queries,” 2025, accessed: October 2025. [On- line]. Available: https://codeql.github.com/docs/writing-codeql-queries/ codeql-queries/

  33. [33]

    Z3: An efficient smt solver,

    L. De Moura and N. Bjørner, “Z3: An efficient smt solver,” in International conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 2008, pp. 337–340

  34. [34]

    Llmsa: A compositional neuro-symbolic approach to compilation-free and customizable static analysis,

    C. Wang, Y . Gao, W. Zhang, X. Liu, Q. Shi, and X. Zhang, “Llmsa: A compositional neuro-symbolic approach to compilation-free and customizable static analysis,” inProceedings of the 2024 Empirical Methods in Natural Language Processing (EMNLP), Miami, FL, USA, Nov. 2024

  35. [35]

    Agentic concolic execution,

    Z. Luo, H. Zhao, D. Wolff, C. Cadar, and A. Roychoudhury, “Agentic concolic execution,” inProceedings of the IEEE Symposium on Security and Privacy (S&P), 2026

  36. [36]

    Light Reading Cloud: A microservices-based reading application,

    Zealon159, “Light Reading Cloud: A microservices-based reading application,” 2025, accessed: October 2025. [Online]. Available: https://github.com/Zealon159/light-reading-cloud

  37. [37]

    PiggyMetrics: Microservice architecture with spring boot, spring cloud and docker,

    sqshq, “PiggyMetrics: Microservice architecture with spring boot, spring cloud and docker,” 2025, accessed: October 2025. [Online]. Available: https://github.com/sqshq/piggymetrics

  38. [38]

    Pitstop: A sample application based on a garage management system for pitstop - a fictitious garage,

    EdwinVW, “Pitstop: A sample application based on a garage management system for pitstop - a fictitious garage,” 2025, accessed: October 2025. [Online]. Available: https://github.com/EdwinVW/ pitstop

  39. [39]

    SiteWhere: An industrial strength open-source application enablement platform for the internet of things (iot),

    sitewhere, “SiteWhere: An industrial strength open-source application enablement platform for the internet of things (iot),” 2025, accessed: October 2025. [Online]. Available: https://github.com/sitewhere/ sitewhere

  40. [40]

    SuperMarket: A project that simulates a real-world supermarket or retail store billing experience,

    ZongXR, “SuperMarket: A project that simulates a real-world supermarket or retail store billing experience,” 2025, accessed: October

  41. [41]

    Available: https://github.com/ZongXR/SuperMarket

    [Online]. Available: https://github.com/ZongXR/SuperMarket

  42. [42]

    Food Delivery (.NET): A practical food delivery microservices, built with .net 8, masstransit, domain-driven design, cqrs, and more,

    mehdihadeli, “Food Delivery (.NET): A practical food delivery microservices, built with .net 8, masstransit, domain-driven design, cqrs, and more,” 2025, accessed: October 2025. [Online]. Available: https://github.com/mehdihadeli/food-delivery-microservices

  43. [43]

    Online Boutique: Sample cloud-first application with 10 microservices showcasing kubernetes, istio, and grpc,

    GoogleCloudPlatform, “Online Boutique: Sample cloud-first application with 10 microservices showcasing kubernetes, istio, and grpc,” 2025, accessed: October 2025. [Online]. Available: https://github.com/GoogleCloudPlatform/microservices-demo

  44. [44]

    Booking Microservices: A practical microservices with the latest technologies and architectures like vertical slice architecture, event sourcing, cqrs, ddd, grpc, and .net 9,

    meysamhadeli, “Booking Microservices: A practical microservices with the latest technologies and architectures like vertical slice architecture, event sourcing, cqrs, ddd, grpc, and .net 9,” 2025, accessed: October 2025. [Online]. Available: https: //github.com/meysamhadeli/booking-microservices

  45. [45]

    Go Micro Services: An example of microservices in go using grpc,

    harlow, “Go Micro Services: An example of microservices in go using grpc,” 2025, accessed: October 2025. [Online]. Available: https://github.com/harlow/go-micro-services

  46. [46]

    JBone: A microservice platform based on spring cloud,

    417511458, “JBone: A microservice platform based on spring cloud,” 2025, accessed: October 2025. [Online]. Available: https: //github.com/417511458/jbone

  47. [47]

    Train Ticket: A benchmark microservice system,

    FudanSELab, “Train Ticket: A benchmark microservice system,” 2025, accessed: October 2025. [Online]. Available: https://github.com/ FudanSELab/train-ticket

  48. [48]

    AspnetRun E-Shop: Microservices on .net platforms used asp.net web api, docker, rabbitmq, masstransit, grpc, yarp api gateway, and more,

    aspnetrun, “AspnetRun E-Shop: Microservices on .net platforms used asp.net web api, docker, rabbitmq, masstransit, grpc, yarp api gateway, and more,” 2025, accessed: October 2025. [Online]. Available: https://github.com/aspnetrun/run-aspnetcore-microservices

  49. [49]

    Cinema Microservice: A nodejs microservice for a cinema booking system,

    crizstian, “Cinema Microservice: A nodejs microservice for a cinema booking system,” 2025, accessed: October 2025. [Online]. Available: https://github.com/crizstian/cinema-microservice

  50. [50]

    TODO App: An example microservice app written in different languages (go, java, nodejs, python, vuejs),

    elgris, “TODO App: An example microservice app written in different languages (go, java, nodejs, python, vuejs),” 2025, accessed: October 2025. [Online]. Available: https://github.com/elgris/ microservice-app-example

  51. [51]

    eShopOnAbp: Reference microservice solution built with the abp framework and .net,

    abpframework, “eShopOnAbp: Reference microservice solution built with the abp framework and .net,” 2025, accessed: October 2025. [Online]. Available: https://github.com/abpframework/eShopOnAbp

  52. [52]

    Spring Boot Basics Demo: Basic architecture framework to create complete microservices using spring boot and spring cloud,

    anilallewar, “Spring Boot Basics Demo: Basic architecture framework to create complete microservices using spring boot and spring cloud,” 2025, accessed: October 2025. [Online]. Available: https://github.com/anilallewar/microservices-basics-spring-boot

  53. [53]

    Magda: A federated, open-source data catalog for all your big data and small data,

    magda-io, “Magda: A federated, open-source data catalog for all your big data and small data,” 2025, accessed: October 2025. [Online]. Available: https://github.com/magda-io/magda

  54. [54]

    Genie: Distributed big data orchestration service,

    Netflix, “Genie: Distributed big data orchestration service,” 2025, accessed: October 2025. [Online]. Available: https://github.com/ Netflix/genie

  55. [55]

    DeathStarBench: An open-source benchmark suite for cloud microservices,

    C. Delimitrouet al., “DeathStarBench: An open-source benchmark suite for cloud microservices,” 2025, accessed: October 2025. [Online]. Available: https://github.com/delimitrou/DeathStarBench

  56. [56]

    Mall-Swarm: Microservices e-commerce system based on spring cloud,

    Macrozheng, “Mall-Swarm: Microservices e-commerce system based on spring cloud,” 2025, accessed: October 2025. [Online]. Available: https://github.com/macrozheng/mall-swarm

  57. [57]

    Common vulnerabilities and exposures (cve),

    MITRE, “Common vulnerabilities and exposures (cve),” 2025, accessed: October 2025. [Online]. Available: https://www.cve.org/

  58. [58]

    National vulnerability database,

    “National vulnerability database,” 2025, https://nvd.nist.gov/vuln/

  59. [59]

    newbee-mall: A distributed e-commerce system developed with Spring Boot and Vue,

    newbee-ltd, “newbee-mall: A distributed e-commerce system developed with Spring Boot and Vue,” 2025, accessed: October 2025. [Online]. Available: https://github.com/newbee-ltd/newbee-mall-cloud

  60. [60]

    ZLT Platform: A microservices platform based on spring cloud,

    zlt2000, “ZLT Platform: A microservices platform based on spring cloud,” 2025, accessed: October 2025. [Online]. Available: https://github.com/zlt2000/microservices-platform

  61. [61]

    Armeria: Your go-to microservice framework for any situation, from the creator of Netty et al,

    LINE Corporation, “Armeria: Your go-to microservice framework for any situation, from the creator of Netty et al,” 2025, accessed: October 2025. [Online]. Available: https://github.com/line/armeria

  62. [62]

    Spring Cloud Data Flow: A cloud-native orchestration service for composable microservice applications on modern runtimes,

    Spring Attic, “Spring Cloud Data Flow: A cloud-native orchestration service for composable microservice applications on modern runtimes,” 2025, accessed: October 2025. [Online]. Available: https://github.com/spring-attic/spring-cloud-dataflow

  63. [63]

    Cve-2023-38493 detail,

    NVD, “Cve-2023-38493 detail,” 2025, accessed: October 2025. [Online]. Available: https://nvd.nist.gov/vuln/detail/CVE-2023-38493

  64. [64]

    Where does it go? refining indirect-call targets with multi-layer type analysis,

    K. Lu and H. Hu, “Where does it go? refining indirect-call targets with multi-layer type analysis,” inProceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, 2019, pp. 1867–1881

  65. [65]

    Human-in-the-loop software development agents,

    W. Takerngsaksiri, J. Pasuksmit, P. Thongtanunam, C. Tantithamtha- vorn, R. Zhang, F. Jiang, J. Li, E. Cook, K. Chen, and M. Wu, “Human-in-the-loop software development agents,” in2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), 2025, pp. 342–352

  66. [66]

    Toss a fault to your witcher: Applying grey-box coverage-guided mutational fuzzing to detect sql and command injection vulnerabilities,

    E. Trickel, F. Pagani, C. Zhu, L. Dresel, G. Vigna, C. Kruegel, R. Wang, T. Bao, Y . Shoshitaishvili, and A. Doupé, “Toss a fault to your witcher: Applying grey-box coverage-guided mutational fuzzing to detect sql and command injection vulnerabilities,” inProceedings of the 44th IEEE Symposium on Security and Privacy (S&P), San Francisco, CA, USA, May 2023

  67. [67]

    newbee-mall: A distributed e-commerce system developed with Spring Boot and Vue,

    newbee-ltd, “newbee-mall: A distributed e-commerce system developed with Spring Boot and Vue,” 2025, accessed: October 2025. [Online]. Available: https://github.com/newbee-ltd/newbee-mall

  68. [68]

    Static detection of silent misconfigurations with deep interaction analysis,

    J. Zhang, R. Piskac, E. Zhai, and T. Xu, “Static detection of silent misconfigurations with deep interaction analysis,”Proceedings of the ACM on Programming Languages, vol. 5, no. OOPSLA, pp. 1–30, 2021

  69. [69]

    Scaling static taint analysis to industrial soa applications: A case study at alibaba,

    J. Wang, Y . Wu, G. Zhou, Y . Yu, Z. Guo, and Y . Xiong, “Scaling static taint analysis to industrial soa applications: A case study at alibaba,” inProceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2020, pp. 1477–1486

  70. [70]

    Repoaudit: An au- tonomous llm-agent for repository-level code auditing,

    J. Guo, C. Wang, X. Xu, Z. Su, and X. Zhang, “Repoaudit: An au- tonomous llm-agent for repository-level code auditing,” inProceedings of the 42nd International Conference on Machine Learning, 2025

  71. [71]

    Llm-assisted static analysis for detecting security vulnerabilities,

    Z. Li, S. Dutta, and M. Naik, “Llm-assisted static analysis for detecting security vulnerabilities,” inProceedings of the 13th International Conference on Learning Representations (ICLR), Singapore, Apr. 2025

  72. [72]

    LLMxCPG: Context-Aware vulnerability detection through code property Graph- Guided large language models,

    A. Lekssays, H. Mouhcine, K. Tran, T. Yu, and I. Khalil, “LLMxCPG: Context-Aware vulnerability detection through code property Graph- Guided large language models,” in34th USENIX Security Symposium (USENIX Security 25), 2025, pp. 489–507. Appendix A. Meta-Review The following meta-review was prepared by the program committee for the 2026 IEEE Symposium on...

  73. [73]

    NEO’s use of structural code -search primitives avoids the limitations of large -context ingestion and allows the agent to reason over heterogeneous, cross -service codebases

    The paper presents a well -designed system that effec- tively integrates LLMs with static analysis, enabling scalable and targeted detection of privilege -escalation vulnerabilities in complex microservice architectures. NEO’s use of structural code -search primitives avoids the limitations of large -context ingestion and allows the agent to reason over h...

  74. [74]

    The evaluation is strong and comprehensive. This pa- per analyzed 25 real -world microservice applications, uncovering previously unknown vulnerabilities—many confirmed and patched by developers—demonstrating clear practical impact. The system also generalizes be- yond its primary task, detecting other bug classes such as command injection and SQL injecti...

  75. [75]

    The inclusion of runtime and API cost measurements adds transparency into the system’s op- erational overhead

    Architecturally, NEO is well -motivated and logically structured, and its design is validated by robust ex- periments, diverse baselines, and well -posed research questions. The inclusion of runtime and API cost measurements adds transparency into the system’s op- erational overhead. A.4. Noteworthy Concerns

  76. [76]

    Model and Prompt Sensitivity and Reproducibility Con- cerns: NEO relies heavily on LLM reasoning and prompt design, as a result, using different LLM models may impact the results and require additional prompt tuning

  77. [77]

    Dependence on CodeQL and Its Limitations: NEO heavily depends on CodeQL, inheriting its constraints and reducing portability to other analysis backends