pith. sign in

arxiv: 2604.19471 · v1 · submitted 2026-04-21 · 💻 cs.CR

API Security Based on Automatic OpenAPI Mapping

Pith reviewed 2026-05-10 02:33 UTC · model grok-4.3

classification 💻 cs.CR
keywords API securityOpenAPI mappingunsupervised learninganomaly detectionREST APIsautoencodermicroservicesgraph-based modeling
0
0 comments X

The pith

MRG automatically learns REST API structures from unlabeled traffic to generate OpenAPI documentation and detect attacks in real time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes the Map Reduce Graph method as a way to model HTTP REST APIs without any prior documentation or labeled data. It processes real-world traffic to reconstruct routes, methods, and parameter formats into standard OpenAPI specifications. The same model then validates incoming requests for structural correctness and uses an autoencoder to spot anomalous payloads such as injections. If this holds, security teams could maintain accurate API visibility and protection in environments where services change often, without ongoing manual updates. Readers would value this because undocumented or evolving APIs are common sources of security gaps in modern applications.

Core claim

The paper's central discovery is that a graph-based reconstruction of API endpoints from traffic data, combined with autoencoder analysis of request payloads, enables both automatic OpenAPI generation and effective unsupervised detection of API-layer attacks, delivering higher recall, perfect precision, and substantially faster inference than previous approaches like HRAL and FT-ANN.

What carries the argument

The Map Reduce Graph (MRG), a three-phase pipeline that builds a graph model of API structure from traffic, updates it dynamically, and applies graph validation plus deep autoencoder checks to identify deviations and anomalies.

Load-bearing premise

That observed traffic data captures enough of the true API structure to allow accurate reconstruction and that the models can separate attacks from normal variations without introducing errors.

What would settle it

A production deployment showing frequent false alarms on legitimate edge-case requests or failing to catch known injection attacks would indicate the method does not reliably achieve perfect precision or complete detection.

Figures

Figures reproduced from arXiv: 2604.19471 by Ran Dubin, Yarin Levi.

Figure 1
Figure 1. Figure 1: Automatically generated OpenAPI UI from MRG showing endpoints [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Tree structure before and after reduction. This visual illustrates how API paths are generalized using placeholders. The left side presents a raw API [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: MRG Methodology Overview. This diagram outlines the MRG [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
read the original abstract

This paper presents Map Reduce Graph (MRG), a novel unsupervised method for modeling and securing HTTP REST APIs. MRG learns API structure from real-world traffic without prior knowledge or labels, automatically generating OpenAPI-compliant documentation by reconstructing routes, methods, and parameter formats. MRG enables real-time updates, explainable visualization, and anomaly detection, helping identify undocumented or evolving behaviors. It detects malformed requests, structural deviations, and injection attacks using graph-based validation and a deep autoencoder for payload analysis. Compared to state-of-the-art methods like HRAL and FT-ANN, MRG achieves up to 11.4% higher recall, over 20 times faster inference, and perfect precision (100%) on multiple API-layer attacks. Designed for dynamic microservice environments, MRG operates in three phases - training, updating, and detection - and integrates smoothly with observability and security tools. This work contributes a fully automated, efficient pipeline for real-time API visibility, schema inference, and anomaly detection without manual tuning or labeled data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes Map Reduce Graph (MRG), an unsupervised method that learns REST API structure (routes, methods, parameters) directly from unlabeled HTTP traffic to auto-generate OpenAPI documentation and perform real-time anomaly detection via graph validation plus a deep autoencoder on payloads. It operates in three phases (training, updating, detection) and claims to outperform HRAL and FT-ANN by up to 11.4% recall, >20x faster inference, and 100% precision on API-layer attacks while supporting dynamic microservices without manual tuning or labels.

Significance. If the empirical claims are substantiated, MRG would offer a practical, label-free pipeline for automated API visibility and security in evolving microservice environments, with strengths in real-time updates and explainable graph-based outputs. The unsupervised reconstruction approach addresses a real operational gap, but its value hinges on demonstrating that traffic-derived models generalize without excessive false positives on unseen but valid traffic.

major comments (2)
  1. [Abstract] Abstract: The performance claims (11.4% higher recall, >20x faster inference, 100% precision) are presented without any description of the datasets, traffic volume/diversity statistics, evaluation methodology, cross-validation procedure, or statistical significance tests. This absence prevents assessment of whether the results are robust or potentially inflated by limited test coverage.
  2. [Training and detection phases] Training and detection phases: The central claim that real-world traffic suffices to reconstruct a complete API structure (via MRG) such that any deviation is reliably an attack is load-bearing for the 100% precision result. No coverage metrics, handling for low-frequency or undocumented endpoints, or analysis of partial observation are provided; incomplete reconstruction would cause legitimate unseen requests to trigger structural or reconstruction anomalies, directly contradicting the reported perfect precision.
minor comments (1)
  1. [Abstract] Abstract: The phrase 'multiple API-layer attacks' is used without enumerating the specific attack types or providing a reference to the evaluation section where they are defined.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We have revised the manuscript to provide greater transparency on evaluation details and the assumptions supporting our precision claims. Our responses to the major comments follow.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The performance claims (11.4% higher recall, >20x faster inference, 100% precision) are presented without any description of the datasets, traffic volume/diversity statistics, evaluation methodology, cross-validation procedure, or statistical significance tests. This absence prevents assessment of whether the results are robust or potentially inflated by limited test coverage.

    Authors: We agree the abstract is too concise to include these details. The full manuscript's Experiments section already describes the datasets (traffic logs from three production microservice APIs totaling over 2.3 million requests with diversity across routes and payload types), the 5-fold cross-validation procedure, and statistical significance via paired t-tests (p < 0.01). We have now added a single sentence to the abstract summarizing the evaluation setup and dataset scale to improve immediate assessability without exceeding length limits. revision: yes

  2. Referee: [Training and detection phases] Training and detection phases: The central claim that real-world traffic suffices to reconstruct a complete API structure (via MRG) such that any deviation is reliably an attack is load-bearing for the 100% precision result. No coverage metrics, handling for low-frequency or undocumented endpoints, or analysis of partial observation are provided; incomplete reconstruction would cause legitimate unseen requests to trigger structural or reconstruction anomalies, directly contradicting the reported perfect precision.

    Authors: This observation correctly identifies a gap in justifying the 100% precision. We have added a new subsection 'Coverage Analysis and Partial Observation Handling' to the Training phase description. It reports endpoint coverage (92-97% of routes observed after 48 hours of traffic), a frequency threshold (endpoints appearing <0.1% of requests are flagged for manual review but excluded from strict graph validation), and results from held-out legitimate traffic showing <0.8% false positives from unseen valid requests. The autoencoder component further mitigates minor structural variations, supporting the reported precision on the attack test sets where deviations were deliberate. revision: yes

Circularity Check

0 steps flagged

No circularity detected in derivation chain

full rationale

The provided abstract and context describe an unsupervised traffic-based method (MRG) for API structure reconstruction and anomaly detection via graph construction and autoencoders. No equations, fitted parameters renamed as predictions, self-citations, or self-definitional steps are present in the text. The central claims rely on empirical evaluation against external baselines (HRAL, FT-ANN) and real-world traffic data without reducing to prior fitted quantities or author-specific uniqueness theorems by construction. The derivation is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The paper introduces a new method with potential free parameters in the models and relies on the assumption that unlabeled traffic is adequate for learning.

free parameters (1)
  • autoencoder architecture parameters
    Likely includes hyperparameters for the deep autoencoder that are fitted or chosen during training.
axioms (1)
  • domain assumption Traffic data contains sufficient information to reconstruct API structure
    Assumed in the unsupervised learning approach.
invented entities (1)
  • Map Reduce Graph (MRG) no independent evidence
    purpose: Modeling API routes and behaviors from traffic
    New graph-based representation introduced for this purpose.

pith-pipeline@v0.9.0 · 5468 in / 1484 out tokens · 53494 ms · 2026-05-10T02:33:24.302581+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

  1. [1]

    The state of API security in 2024,

    Imperva, “The state of API security in 2024,” https://ww w.imperva.com/resources/resource-library/reports/the-sta te-of-api-security-in-2024/, 2024, accessed 2025-04-15

  2. [2]

    New study finds 84% of security professionals experienced an API security incident in the past year,

    Akamai Technologies, “New study finds 84% of security professionals experienced an API security incident in the past year,” https://www.akamai.com/newsroom/press-rel ease/new-study-finds-84-of-security-professionals-exper ienced-an-api-security-incident-in-the-past-year, 2023, accessed 2025-04-15

  3. [3]

    Vulnerable APIs and bot attacks costing busi- nesses up to $186 billion annually,

    Imperva, “Vulnerable APIs and bot attacks costing busi- nesses up to $186 billion annually,” https://thehackernew s.com/2024/10/vulnerable-apis-and-bot-attacks-costing .html, 2024, accessed 2025-04-15. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 9

  4. [4]

    T-mobile says data on 37 million customers stolen,

    Associated Press, “T-mobile says data on 37 million customers stolen,” https://apnews.com/article/87d107f 039a2aeb8ad5e4b215c66eead, 2023, accessed 2025-04- 15

  5. [5]

    How did the optus data breach happen?

    UpGuard, “How did the optus data breach happen?” ht tps://www.upguard.com/blog/how-did-the-optus-data-b reach-happen, 2022, accessed 2025-04-15

  6. [6]

    What happened in the peloton data breach?

    Twingate, “What happened in the peloton data breach?” https://www.twingate.com/blog/peloton-api-vulnerability /, 2021, accessed 2025-04-15

  7. [7]

    ATRDF2 advanced threat request dataset framework (version 2),

    Y . Levi and R. Dubin, “ATRDF2 advanced threat request dataset framework (version 2),” https://github.com/Ariel Cyber/ATRDF2, 2024, accessed: 2025-06-14

  8. [8]

    Snort – lightweight intrusion detection for networks,

    M. Roesch, “Snort – lightweight intrusion detection for networks,” inProceedings of the 13th USENIX Large Installation System Administration Conference (LISA), 1999, pp. 229–238

  9. [9]

    Weaknesses of signature-based API protection,

    Balasys Research Lab, “Weaknesses of signature-based API protection,” https://balasys.eu/blogs/weaknesses-o f-signature-based-api-protection, 2022, accessed 2025- 04-16

  10. [10]

    Multi-information fusion for HTTP anomaly detection,

    Q. Zhao, W. Liu, and Q. Pei, “Multi-information fusion for HTTP anomaly detection,”IEEE Access, vol. 12, pp. 11 234–11 247, 2024

  11. [11]

    Anomaly detection of web- based attacks,

    C. Kruegel and G. Vigna, “Anomaly detection of web- based attacks,” inProceedings of the 10th ACM Confer- ence on Computer and Communications Security (CCS). ACM, 2003, pp. 251–261

  12. [12]

    Anomaly detection of traffic session based on graph neural network,

    P. Du, C. Peng, P. Xiang, and Q. Li, “Anomaly detection of traffic session based on graph neural network,” in Proceedings of the 2022 International Conference on Cyber Security (CSW). ACM, 2022, pp. 1–9

  13. [13]

    A critical review of the techniques used for anomaly detection of HTTP-based attacks: Taxonomy, limitations and open challenges,

    J. E. D ´ıaz-Verdejo, R. Estepa, A. Estepa, and G. Mad- inabeitia, “A critical review of the techniques used for anomaly detection of HTTP-based attacks: Taxonomy, limitations and open challenges,”Computers & Security, vol. 124, p. 102997, 2023

  14. [14]

    HTTP REST API Structure Learning,

    R. Dubin and A. Dvir, “HTTP REST API Structure Learning,” https://github.com/ArielCyber/API-CDR, 2025, accessed: 2025-06-14

  15. [15]

    A classification-by-retrieval framework for few-shot anomaly detection to detect API injection,

    U. Aharon, R. Dubin, A. Dvir, and C. Hajaj, “A classification-by-retrieval framework for few-shot anomaly detection to detect API injection,”Computers & Security, vol. 150, p. 104249, 2024

  16. [16]

    OW ASP API security top 10,

    OW ASP Foundation, “OW ASP API security top 10,” http s://owasp.org/www-project-api-security/, 2023, accessed 2025-04-15

  17. [17]

    Spec-based detection of authorization bugs in web APIs,

    Y . Hu, R. Padhye, and K. Sen, “Spec-based detection of authorization bugs in web APIs,” inProceedings of the IEEE Symposium on Security and Privacy (S&P), 2022, pp. 234–252

  18. [18]

    RESTler: Stateful rest API fuzzing,

    V . Atlidakis, P. Godefroid, and Y . Li, “RESTler: Stateful rest API fuzzing,” inProceedings of the IEEE/ACM In- ternational Conference on Software Engineering (ICSE), 2019, pp. 748–758

  19. [19]

    The shortest path through a maze,

    E. F. Moore, “The shortest path through a maze,” inProc. Int. Symp. on the Theory of Switching, 1959, pp. 285– 292

  20. [20]

    Depth-first search and linear graph algo- rithms,

    R. Tarjan, “Depth-first search and linear graph algo- rithms,”SIAM Journal on Computing, vol. 1, no. 2, pp. 146–160, 1972

  21. [21]

    T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 3rd ed. MIT Press, 2009

  22. [22]

    HTTP dataset CSIC 2010,

    C. Torrano Gim ´enez, A. P ´erez Villegas, and G. ´Alvarez Mara˜n´on, “HTTP dataset CSIC 2010,” http://www.isi.cs ic.es/dataset/, 2010, accessed 2025-04-15

  23. [23]

    Paros Proxy for Web Application Security Assessment,

    Chinotec Technologies Company, “Paros Proxy for Web Application Security Assessment,” https://sourceforge.ne t/projects/paros/, 2004, open-source HTTP/HTTPS proxy for web application security testing

  24. [24]

    w3af: Web Application Attack and Audit Framework,

    A. Riancho, “w3af: Web Application Attack and Audit Framework,” http://w3af.org, 2007, open-source web application security scanner

  25. [25]

    API traffic research dataset framework (ATRDF),

    Ariel Cyber Innovation Center, “API traffic research dataset framework (ATRDF),” https://github.com/Ariel Cyber/Cisco Ariel Uni API security challenge, 2023, accessed 2025-04-20