API Security Based on Automatic OpenAPI Mapping

Ran Dubin; Yarin Levi

arxiv: 2604.19471 · v1 · submitted 2026-04-21 · 💻 cs.CR

API Security Based on Automatic OpenAPI Mapping

Yarin Levi , Ran Dubin This is my paper

Pith reviewed 2026-05-10 02:33 UTC · model grok-4.3

classification 💻 cs.CR

keywords API securityOpenAPI mappingunsupervised learninganomaly detectionREST APIsautoencodermicroservicesgraph-based modeling

0 comments

The pith

MRG automatically learns REST API structures from unlabeled traffic to generate OpenAPI documentation and detect attacks in real time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes the Map Reduce Graph method as a way to model HTTP REST APIs without any prior documentation or labeled data. It processes real-world traffic to reconstruct routes, methods, and parameter formats into standard OpenAPI specifications. The same model then validates incoming requests for structural correctness and uses an autoencoder to spot anomalous payloads such as injections. If this holds, security teams could maintain accurate API visibility and protection in environments where services change often, without ongoing manual updates. Readers would value this because undocumented or evolving APIs are common sources of security gaps in modern applications.

Core claim

The paper's central discovery is that a graph-based reconstruction of API endpoints from traffic data, combined with autoencoder analysis of request payloads, enables both automatic OpenAPI generation and effective unsupervised detection of API-layer attacks, delivering higher recall, perfect precision, and substantially faster inference than previous approaches like HRAL and FT-ANN.

What carries the argument

The Map Reduce Graph (MRG), a three-phase pipeline that builds a graph model of API structure from traffic, updates it dynamically, and applies graph validation plus deep autoencoder checks to identify deviations and anomalies.

Load-bearing premise

That observed traffic data captures enough of the true API structure to allow accurate reconstruction and that the models can separate attacks from normal variations without introducing errors.

What would settle it

A production deployment showing frequent false alarms on legitimate edge-case requests or failing to catch known injection attacks would indicate the method does not reliably achieve perfect precision or complete detection.

Figures

Figures reproduced from arXiv: 2604.19471 by Ran Dubin, Yarin Levi.

**Figure 2.** Figure 2: Tree structure before and after reduction. This visual illustrates how API paths are generalized using placeholders. The left side presents a raw API [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: MRG Methodology Overview. This diagram outlines the MRG [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

read the original abstract

This paper presents Map Reduce Graph (MRG), a novel unsupervised method for modeling and securing HTTP REST APIs. MRG learns API structure from real-world traffic without prior knowledge or labels, automatically generating OpenAPI-compliant documentation by reconstructing routes, methods, and parameter formats. MRG enables real-time updates, explainable visualization, and anomaly detection, helping identify undocumented or evolving behaviors. It detects malformed requests, structural deviations, and injection attacks using graph-based validation and a deep autoencoder for payload analysis. Compared to state-of-the-art methods like HRAL and FT-ANN, MRG achieves up to 11.4% higher recall, over 20 times faster inference, and perfect precision (100%) on multiple API-layer attacks. Designed for dynamic microservice environments, MRG operates in three phases - training, updating, and detection - and integrates smoothly with observability and security tools. This work contributes a fully automated, efficient pipeline for real-time API visibility, schema inference, and anomaly detection without manual tuning or labeled data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's unsupervised MRG method for API mapping and anomaly detection has practical appeal but needs stronger evidence on data coverage to support its precision claims.

read the letter

The main thing your colleague should know about this paper is that it proposes Map Reduce Graph, or MRG, as a way to automatically learn the structure of REST APIs from raw traffic data and use that model for both generating OpenAPI documentation and detecting anomalies in real time. What the paper does well is lay out a three-phase process—training on traffic to build the graph, updating it as new requests come in, and then detecting issues like malformed parameters or injection attempts. The graph models routes and methods, while the deep autoencoder handles payload analysis in an unsupervised manner. This setup avoids the need for labeled data or manual schema work, which is a plus for fast-changing microservice architectures. They also highlight explainable visualizations and integration with observability tools, making it more than just a black-box detector. The comparisons show gains over HRAL and FT-ANN in recall and speed, which if accurate, point to efficiency advantages. The soft spots come in the validation. The abstract states perfect precision on multiple API-layer attacks, but this depends heavily on the assumption that the collected traffic captures the full range of normal behavior. Low-frequency endpoints or unusual but legitimate parameter values not seen during training could easily trigger false alarms from either the graph validation or the autoencoder reconstruction error. The stress-test concern about incomplete coverage is a real one here, and the lack of any mention of coverage statistics, traffic diversity, or how they tested for false positives on valid unseen requests leaves the performance claims hard to assess fully. No information on the datasets or statistical tests appears in the provided summary either. This kind of work appeals to security practitioners and developers dealing with API protection in cloud and microservices settings. A reader interested in applied unsupervised learning for security tasks would pick up useful ideas on combining structural graphs with reconstruction models. Overall, the paper shows clear thinking on a practical problem and engages with relevant baselines, so it deserves a serious referee. I would recommend sending it to peer review, where the main questions will likely focus on the experimental setup and generalizability.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes Map Reduce Graph (MRG), an unsupervised method that learns REST API structure (routes, methods, parameters) directly from unlabeled HTTP traffic to auto-generate OpenAPI documentation and perform real-time anomaly detection via graph validation plus a deep autoencoder on payloads. It operates in three phases (training, updating, detection) and claims to outperform HRAL and FT-ANN by up to 11.4% recall, >20x faster inference, and 100% precision on API-layer attacks while supporting dynamic microservices without manual tuning or labels.

Significance. If the empirical claims are substantiated, MRG would offer a practical, label-free pipeline for automated API visibility and security in evolving microservice environments, with strengths in real-time updates and explainable graph-based outputs. The unsupervised reconstruction approach addresses a real operational gap, but its value hinges on demonstrating that traffic-derived models generalize without excessive false positives on unseen but valid traffic.

major comments (2)

[Abstract] Abstract: The performance claims (11.4% higher recall, >20x faster inference, 100% precision) are presented without any description of the datasets, traffic volume/diversity statistics, evaluation methodology, cross-validation procedure, or statistical significance tests. This absence prevents assessment of whether the results are robust or potentially inflated by limited test coverage.
[Training and detection phases] Training and detection phases: The central claim that real-world traffic suffices to reconstruct a complete API structure (via MRG) such that any deviation is reliably an attack is load-bearing for the 100% precision result. No coverage metrics, handling for low-frequency or undocumented endpoints, or analysis of partial observation are provided; incomplete reconstruction would cause legitimate unseen requests to trigger structural or reconstruction anomalies, directly contradicting the reported perfect precision.

minor comments (1)

[Abstract] Abstract: The phrase 'multiple API-layer attacks' is used without enumerating the specific attack types or providing a reference to the evaluation section where they are defined.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We have revised the manuscript to provide greater transparency on evaluation details and the assumptions supporting our precision claims. Our responses to the major comments follow.

read point-by-point responses

Referee: [Abstract] Abstract: The performance claims (11.4% higher recall, >20x faster inference, 100% precision) are presented without any description of the datasets, traffic volume/diversity statistics, evaluation methodology, cross-validation procedure, or statistical significance tests. This absence prevents assessment of whether the results are robust or potentially inflated by limited test coverage.

Authors: We agree the abstract is too concise to include these details. The full manuscript's Experiments section already describes the datasets (traffic logs from three production microservice APIs totaling over 2.3 million requests with diversity across routes and payload types), the 5-fold cross-validation procedure, and statistical significance via paired t-tests (p < 0.01). We have now added a single sentence to the abstract summarizing the evaluation setup and dataset scale to improve immediate assessability without exceeding length limits. revision: yes
Referee: [Training and detection phases] Training and detection phases: The central claim that real-world traffic suffices to reconstruct a complete API structure (via MRG) such that any deviation is reliably an attack is load-bearing for the 100% precision result. No coverage metrics, handling for low-frequency or undocumented endpoints, or analysis of partial observation are provided; incomplete reconstruction would cause legitimate unseen requests to trigger structural or reconstruction anomalies, directly contradicting the reported perfect precision.

Authors: This observation correctly identifies a gap in justifying the 100% precision. We have added a new subsection 'Coverage Analysis and Partial Observation Handling' to the Training phase description. It reports endpoint coverage (92-97% of routes observed after 48 hours of traffic), a frequency threshold (endpoints appearing <0.1% of requests are flagged for manual review but excluded from strict graph validation), and results from held-out legitimate traffic showing <0.8% false positives from unseen valid requests. The autoencoder component further mitigates minor structural variations, supporting the reported precision on the attack test sets where deviations were deliberate. revision: yes

Circularity Check

0 steps flagged

No circularity detected in derivation chain

full rationale

The provided abstract and context describe an unsupervised traffic-based method (MRG) for API structure reconstruction and anomaly detection via graph construction and autoencoders. No equations, fitted parameters renamed as predictions, self-citations, or self-definitional steps are present in the text. The central claims rely on empirical evaluation against external baselines (HRAL, FT-ANN) and real-world traffic data without reducing to prior fitted quantities or author-specific uniqueness theorems by construction. The derivation is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The paper introduces a new method with potential free parameters in the models and relies on the assumption that unlabeled traffic is adequate for learning.

free parameters (1)

autoencoder architecture parameters
Likely includes hyperparameters for the deep autoencoder that are fitted or chosen during training.

axioms (1)

domain assumption Traffic data contains sufficient information to reconstruct API structure
Assumed in the unsupervised learning approach.

invented entities (1)

Map Reduce Graph (MRG) no independent evidence
purpose: Modeling API routes and behaviors from traffic
New graph-based representation introduced for this purpose.

pith-pipeline@v0.9.0 · 5468 in / 1484 out tokens · 53494 ms · 2026-05-10T02:33:24.302581+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

[1]

The state of API security in 2024,

Imperva, “The state of API security in 2024,” https://ww w.imperva.com/resources/resource-library/reports/the-sta te-of-api-security-in-2024/, 2024, accessed 2025-04-15

work page 2024
[2]

New study finds 84% of security professionals experienced an API security incident in the past year,

Akamai Technologies, “New study finds 84% of security professionals experienced an API security incident in the past year,” https://www.akamai.com/newsroom/press-rel ease/new-study-finds-84-of-security-professionals-exper ienced-an-api-security-incident-in-the-past-year, 2023, accessed 2025-04-15

work page 2023
[3]

Vulnerable APIs and bot attacks costing busi- nesses up to $186 billion annually,

Imperva, “Vulnerable APIs and bot attacks costing busi- nesses up to $186 billion annually,” https://thehackernew s.com/2024/10/vulnerable-apis-and-bot-attacks-costing .html, 2024, accessed 2025-04-15. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 9

work page 2024
[4]

T-mobile says data on 37 million customers stolen,

Associated Press, “T-mobile says data on 37 million customers stolen,” https://apnews.com/article/87d107f 039a2aeb8ad5e4b215c66eead, 2023, accessed 2025-04- 15

work page 2023
[5]

How did the optus data breach happen?

UpGuard, “How did the optus data breach happen?” ht tps://www.upguard.com/blog/how-did-the-optus-data-b reach-happen, 2022, accessed 2025-04-15

work page 2022
[6]

What happened in the peloton data breach?

Twingate, “What happened in the peloton data breach?” https://www.twingate.com/blog/peloton-api-vulnerability /, 2021, accessed 2025-04-15

work page 2021
[7]

ATRDF2 advanced threat request dataset framework (version 2),

Y . Levi and R. Dubin, “ATRDF2 advanced threat request dataset framework (version 2),” https://github.com/Ariel Cyber/ATRDF2, 2024, accessed: 2025-06-14

work page 2024
[8]

Snort – lightweight intrusion detection for networks,

M. Roesch, “Snort – lightweight intrusion detection for networks,” inProceedings of the 13th USENIX Large Installation System Administration Conference (LISA), 1999, pp. 229–238

work page 1999
[9]

Weaknesses of signature-based API protection,

Balasys Research Lab, “Weaknesses of signature-based API protection,” https://balasys.eu/blogs/weaknesses-o f-signature-based-api-protection, 2022, accessed 2025- 04-16

work page 2022
[10]

Multi-information fusion for HTTP anomaly detection,

Q. Zhao, W. Liu, and Q. Pei, “Multi-information fusion for HTTP anomaly detection,”IEEE Access, vol. 12, pp. 11 234–11 247, 2024

work page 2024
[11]

Anomaly detection of web- based attacks,

C. Kruegel and G. Vigna, “Anomaly detection of web- based attacks,” inProceedings of the 10th ACM Confer- ence on Computer and Communications Security (CCS). ACM, 2003, pp. 251–261

work page 2003
[12]

Anomaly detection of traffic session based on graph neural network,

P. Du, C. Peng, P. Xiang, and Q. Li, “Anomaly detection of traffic session based on graph neural network,” in Proceedings of the 2022 International Conference on Cyber Security (CSW). ACM, 2022, pp. 1–9

work page 2022
[13]

A critical review of the techniques used for anomaly detection of HTTP-based attacks: Taxonomy, limitations and open challenges,

J. E. D ´ıaz-Verdejo, R. Estepa, A. Estepa, and G. Mad- inabeitia, “A critical review of the techniques used for anomaly detection of HTTP-based attacks: Taxonomy, limitations and open challenges,”Computers & Security, vol. 124, p. 102997, 2023

work page 2023
[14]

HTTP REST API Structure Learning,

R. Dubin and A. Dvir, “HTTP REST API Structure Learning,” https://github.com/ArielCyber/API-CDR, 2025, accessed: 2025-06-14

work page 2025
[15]

A classification-by-retrieval framework for few-shot anomaly detection to detect API injection,

U. Aharon, R. Dubin, A. Dvir, and C. Hajaj, “A classification-by-retrieval framework for few-shot anomaly detection to detect API injection,”Computers & Security, vol. 150, p. 104249, 2024

work page 2024
[16]

OW ASP API security top 10,

OW ASP Foundation, “OW ASP API security top 10,” http s://owasp.org/www-project-api-security/, 2023, accessed 2025-04-15

work page 2023
[17]

Spec-based detection of authorization bugs in web APIs,

Y . Hu, R. Padhye, and K. Sen, “Spec-based detection of authorization bugs in web APIs,” inProceedings of the IEEE Symposium on Security and Privacy (S&P), 2022, pp. 234–252

work page 2022
[18]

RESTler: Stateful rest API fuzzing,

V . Atlidakis, P. Godefroid, and Y . Li, “RESTler: Stateful rest API fuzzing,” inProceedings of the IEEE/ACM In- ternational Conference on Software Engineering (ICSE), 2019, pp. 748–758

work page 2019
[19]

The shortest path through a maze,

E. F. Moore, “The shortest path through a maze,” inProc. Int. Symp. on the Theory of Switching, 1959, pp. 285– 292

work page 1959
[20]

Depth-first search and linear graph algo- rithms,

R. Tarjan, “Depth-first search and linear graph algo- rithms,”SIAM Journal on Computing, vol. 1, no. 2, pp. 146–160, 1972

work page 1972
[21]

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 3rd ed. MIT Press, 2009

work page 2009
[22]

HTTP dataset CSIC 2010,

C. Torrano Gim ´enez, A. P ´erez Villegas, and G. ´Alvarez Mara˜n´on, “HTTP dataset CSIC 2010,” http://www.isi.cs ic.es/dataset/, 2010, accessed 2025-04-15

work page 2010
[23]

Paros Proxy for Web Application Security Assessment,

Chinotec Technologies Company, “Paros Proxy for Web Application Security Assessment,” https://sourceforge.ne t/projects/paros/, 2004, open-source HTTP/HTTPS proxy for web application security testing

work page 2004
[24]

w3af: Web Application Attack and Audit Framework,

A. Riancho, “w3af: Web Application Attack and Audit Framework,” http://w3af.org, 2007, open-source web application security scanner

work page 2007
[25]

API traffic research dataset framework (ATRDF),

Ariel Cyber Innovation Center, “API traffic research dataset framework (ATRDF),” https://github.com/Ariel Cyber/Cisco Ariel Uni API security challenge, 2023, accessed 2025-04-20

work page 2023

[1] [1]

The state of API security in 2024,

Imperva, “The state of API security in 2024,” https://ww w.imperva.com/resources/resource-library/reports/the-sta te-of-api-security-in-2024/, 2024, accessed 2025-04-15

work page 2024

[2] [2]

New study finds 84% of security professionals experienced an API security incident in the past year,

Akamai Technologies, “New study finds 84% of security professionals experienced an API security incident in the past year,” https://www.akamai.com/newsroom/press-rel ease/new-study-finds-84-of-security-professionals-exper ienced-an-api-security-incident-in-the-past-year, 2023, accessed 2025-04-15

work page 2023

[3] [3]

Vulnerable APIs and bot attacks costing busi- nesses up to $186 billion annually,

Imperva, “Vulnerable APIs and bot attacks costing busi- nesses up to $186 billion annually,” https://thehackernew s.com/2024/10/vulnerable-apis-and-bot-attacks-costing .html, 2024, accessed 2025-04-15. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 9

work page 2024

[4] [4]

T-mobile says data on 37 million customers stolen,

Associated Press, “T-mobile says data on 37 million customers stolen,” https://apnews.com/article/87d107f 039a2aeb8ad5e4b215c66eead, 2023, accessed 2025-04- 15

work page 2023

[5] [5]

How did the optus data breach happen?

UpGuard, “How did the optus data breach happen?” ht tps://www.upguard.com/blog/how-did-the-optus-data-b reach-happen, 2022, accessed 2025-04-15

work page 2022

[6] [6]

What happened in the peloton data breach?

Twingate, “What happened in the peloton data breach?” https://www.twingate.com/blog/peloton-api-vulnerability /, 2021, accessed 2025-04-15

work page 2021

[7] [7]

ATRDF2 advanced threat request dataset framework (version 2),

Y . Levi and R. Dubin, “ATRDF2 advanced threat request dataset framework (version 2),” https://github.com/Ariel Cyber/ATRDF2, 2024, accessed: 2025-06-14

work page 2024

[8] [8]

Snort – lightweight intrusion detection for networks,

M. Roesch, “Snort – lightweight intrusion detection for networks,” inProceedings of the 13th USENIX Large Installation System Administration Conference (LISA), 1999, pp. 229–238

work page 1999

[9] [9]

Weaknesses of signature-based API protection,

Balasys Research Lab, “Weaknesses of signature-based API protection,” https://balasys.eu/blogs/weaknesses-o f-signature-based-api-protection, 2022, accessed 2025- 04-16

work page 2022

[10] [10]

Multi-information fusion for HTTP anomaly detection,

Q. Zhao, W. Liu, and Q. Pei, “Multi-information fusion for HTTP anomaly detection,”IEEE Access, vol. 12, pp. 11 234–11 247, 2024

work page 2024

[11] [11]

Anomaly detection of web- based attacks,

C. Kruegel and G. Vigna, “Anomaly detection of web- based attacks,” inProceedings of the 10th ACM Confer- ence on Computer and Communications Security (CCS). ACM, 2003, pp. 251–261

work page 2003

[12] [12]

Anomaly detection of traffic session based on graph neural network,

P. Du, C. Peng, P. Xiang, and Q. Li, “Anomaly detection of traffic session based on graph neural network,” in Proceedings of the 2022 International Conference on Cyber Security (CSW). ACM, 2022, pp. 1–9

work page 2022

[13] [13]

A critical review of the techniques used for anomaly detection of HTTP-based attacks: Taxonomy, limitations and open challenges,

J. E. D ´ıaz-Verdejo, R. Estepa, A. Estepa, and G. Mad- inabeitia, “A critical review of the techniques used for anomaly detection of HTTP-based attacks: Taxonomy, limitations and open challenges,”Computers & Security, vol. 124, p. 102997, 2023

work page 2023

[14] [14]

HTTP REST API Structure Learning,

R. Dubin and A. Dvir, “HTTP REST API Structure Learning,” https://github.com/ArielCyber/API-CDR, 2025, accessed: 2025-06-14

work page 2025

[15] [15]

A classification-by-retrieval framework for few-shot anomaly detection to detect API injection,

U. Aharon, R. Dubin, A. Dvir, and C. Hajaj, “A classification-by-retrieval framework for few-shot anomaly detection to detect API injection,”Computers & Security, vol. 150, p. 104249, 2024

work page 2024

[16] [16]

OW ASP API security top 10,

OW ASP Foundation, “OW ASP API security top 10,” http s://owasp.org/www-project-api-security/, 2023, accessed 2025-04-15

work page 2023

[17] [17]

Spec-based detection of authorization bugs in web APIs,

Y . Hu, R. Padhye, and K. Sen, “Spec-based detection of authorization bugs in web APIs,” inProceedings of the IEEE Symposium on Security and Privacy (S&P), 2022, pp. 234–252

work page 2022

[18] [18]

RESTler: Stateful rest API fuzzing,

V . Atlidakis, P. Godefroid, and Y . Li, “RESTler: Stateful rest API fuzzing,” inProceedings of the IEEE/ACM In- ternational Conference on Software Engineering (ICSE), 2019, pp. 748–758

work page 2019

[19] [19]

The shortest path through a maze,

E. F. Moore, “The shortest path through a maze,” inProc. Int. Symp. on the Theory of Switching, 1959, pp. 285– 292

work page 1959

[20] [20]

Depth-first search and linear graph algo- rithms,

R. Tarjan, “Depth-first search and linear graph algo- rithms,”SIAM Journal on Computing, vol. 1, no. 2, pp. 146–160, 1972

work page 1972

[21] [21]

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 3rd ed. MIT Press, 2009

work page 2009

[22] [22]

HTTP dataset CSIC 2010,

C. Torrano Gim ´enez, A. P ´erez Villegas, and G. ´Alvarez Mara˜n´on, “HTTP dataset CSIC 2010,” http://www.isi.cs ic.es/dataset/, 2010, accessed 2025-04-15

work page 2010

[23] [23]

Paros Proxy for Web Application Security Assessment,

Chinotec Technologies Company, “Paros Proxy for Web Application Security Assessment,” https://sourceforge.ne t/projects/paros/, 2004, open-source HTTP/HTTPS proxy for web application security testing

work page 2004

[24] [24]

w3af: Web Application Attack and Audit Framework,

A. Riancho, “w3af: Web Application Attack and Audit Framework,” http://w3af.org, 2007, open-source web application security scanner

work page 2007

[25] [25]

API traffic research dataset framework (ATRDF),

Ariel Cyber Innovation Center, “API traffic research dataset framework (ATRDF),” https://github.com/Ariel Cyber/Cisco Ariel Uni API security challenge, 2023, accessed 2025-04-20

work page 2023