API Security Based on Automatic OpenAPI Mapping
Pith reviewed 2026-05-10 02:33 UTC · model grok-4.3
The pith
MRG automatically learns REST API structures from unlabeled traffic to generate OpenAPI documentation and detect attacks in real time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper's central discovery is that a graph-based reconstruction of API endpoints from traffic data, combined with autoencoder analysis of request payloads, enables both automatic OpenAPI generation and effective unsupervised detection of API-layer attacks, delivering higher recall, perfect precision, and substantially faster inference than previous approaches like HRAL and FT-ANN.
What carries the argument
The Map Reduce Graph (MRG), a three-phase pipeline that builds a graph model of API structure from traffic, updates it dynamically, and applies graph validation plus deep autoencoder checks to identify deviations and anomalies.
Load-bearing premise
That observed traffic data captures enough of the true API structure to allow accurate reconstruction and that the models can separate attacks from normal variations without introducing errors.
What would settle it
A production deployment showing frequent false alarms on legitimate edge-case requests or failing to catch known injection attacks would indicate the method does not reliably achieve perfect precision or complete detection.
Figures
read the original abstract
This paper presents Map Reduce Graph (MRG), a novel unsupervised method for modeling and securing HTTP REST APIs. MRG learns API structure from real-world traffic without prior knowledge or labels, automatically generating OpenAPI-compliant documentation by reconstructing routes, methods, and parameter formats. MRG enables real-time updates, explainable visualization, and anomaly detection, helping identify undocumented or evolving behaviors. It detects malformed requests, structural deviations, and injection attacks using graph-based validation and a deep autoencoder for payload analysis. Compared to state-of-the-art methods like HRAL and FT-ANN, MRG achieves up to 11.4% higher recall, over 20 times faster inference, and perfect precision (100%) on multiple API-layer attacks. Designed for dynamic microservice environments, MRG operates in three phases - training, updating, and detection - and integrates smoothly with observability and security tools. This work contributes a fully automated, efficient pipeline for real-time API visibility, schema inference, and anomaly detection without manual tuning or labeled data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Map Reduce Graph (MRG), an unsupervised method that learns REST API structure (routes, methods, parameters) directly from unlabeled HTTP traffic to auto-generate OpenAPI documentation and perform real-time anomaly detection via graph validation plus a deep autoencoder on payloads. It operates in three phases (training, updating, detection) and claims to outperform HRAL and FT-ANN by up to 11.4% recall, >20x faster inference, and 100% precision on API-layer attacks while supporting dynamic microservices without manual tuning or labels.
Significance. If the empirical claims are substantiated, MRG would offer a practical, label-free pipeline for automated API visibility and security in evolving microservice environments, with strengths in real-time updates and explainable graph-based outputs. The unsupervised reconstruction approach addresses a real operational gap, but its value hinges on demonstrating that traffic-derived models generalize without excessive false positives on unseen but valid traffic.
major comments (2)
- [Abstract] Abstract: The performance claims (11.4% higher recall, >20x faster inference, 100% precision) are presented without any description of the datasets, traffic volume/diversity statistics, evaluation methodology, cross-validation procedure, or statistical significance tests. This absence prevents assessment of whether the results are robust or potentially inflated by limited test coverage.
- [Training and detection phases] Training and detection phases: The central claim that real-world traffic suffices to reconstruct a complete API structure (via MRG) such that any deviation is reliably an attack is load-bearing for the 100% precision result. No coverage metrics, handling for low-frequency or undocumented endpoints, or analysis of partial observation are provided; incomplete reconstruction would cause legitimate unseen requests to trigger structural or reconstruction anomalies, directly contradicting the reported perfect precision.
minor comments (1)
- [Abstract] Abstract: The phrase 'multiple API-layer attacks' is used without enumerating the specific attack types or providing a reference to the evaluation section where they are defined.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We have revised the manuscript to provide greater transparency on evaluation details and the assumptions supporting our precision claims. Our responses to the major comments follow.
read point-by-point responses
-
Referee: [Abstract] Abstract: The performance claims (11.4% higher recall, >20x faster inference, 100% precision) are presented without any description of the datasets, traffic volume/diversity statistics, evaluation methodology, cross-validation procedure, or statistical significance tests. This absence prevents assessment of whether the results are robust or potentially inflated by limited test coverage.
Authors: We agree the abstract is too concise to include these details. The full manuscript's Experiments section already describes the datasets (traffic logs from three production microservice APIs totaling over 2.3 million requests with diversity across routes and payload types), the 5-fold cross-validation procedure, and statistical significance via paired t-tests (p < 0.01). We have now added a single sentence to the abstract summarizing the evaluation setup and dataset scale to improve immediate assessability without exceeding length limits. revision: yes
-
Referee: [Training and detection phases] Training and detection phases: The central claim that real-world traffic suffices to reconstruct a complete API structure (via MRG) such that any deviation is reliably an attack is load-bearing for the 100% precision result. No coverage metrics, handling for low-frequency or undocumented endpoints, or analysis of partial observation are provided; incomplete reconstruction would cause legitimate unseen requests to trigger structural or reconstruction anomalies, directly contradicting the reported perfect precision.
Authors: This observation correctly identifies a gap in justifying the 100% precision. We have added a new subsection 'Coverage Analysis and Partial Observation Handling' to the Training phase description. It reports endpoint coverage (92-97% of routes observed after 48 hours of traffic), a frequency threshold (endpoints appearing <0.1% of requests are flagged for manual review but excluded from strict graph validation), and results from held-out legitimate traffic showing <0.8% false positives from unseen valid requests. The autoencoder component further mitigates minor structural variations, supporting the reported precision on the attack test sets where deviations were deliberate. revision: yes
Circularity Check
No circularity detected in derivation chain
full rationale
The provided abstract and context describe an unsupervised traffic-based method (MRG) for API structure reconstruction and anomaly detection via graph construction and autoencoders. No equations, fitted parameters renamed as predictions, self-citations, or self-definitional steps are present in the text. The central claims rely on empirical evaluation against external baselines (HRAL, FT-ANN) and real-world traffic data without reducing to prior fitted quantities or author-specific uniqueness theorems by construction. The derivation is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- autoencoder architecture parameters
axioms (1)
- domain assumption Traffic data contains sufficient information to reconstruct API structure
invented entities (1)
-
Map Reduce Graph (MRG)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
The state of API security in 2024,
Imperva, “The state of API security in 2024,” https://ww w.imperva.com/resources/resource-library/reports/the-sta te-of-api-security-in-2024/, 2024, accessed 2025-04-15
work page 2024
-
[2]
New study finds 84% of security professionals experienced an API security incident in the past year,
Akamai Technologies, “New study finds 84% of security professionals experienced an API security incident in the past year,” https://www.akamai.com/newsroom/press-rel ease/new-study-finds-84-of-security-professionals-exper ienced-an-api-security-incident-in-the-past-year, 2023, accessed 2025-04-15
work page 2023
-
[3]
Vulnerable APIs and bot attacks costing busi- nesses up to $186 billion annually,
Imperva, “Vulnerable APIs and bot attacks costing busi- nesses up to $186 billion annually,” https://thehackernew s.com/2024/10/vulnerable-apis-and-bot-attacks-costing .html, 2024, accessed 2025-04-15. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 9
work page 2024
-
[4]
T-mobile says data on 37 million customers stolen,
Associated Press, “T-mobile says data on 37 million customers stolen,” https://apnews.com/article/87d107f 039a2aeb8ad5e4b215c66eead, 2023, accessed 2025-04- 15
work page 2023
-
[5]
How did the optus data breach happen?
UpGuard, “How did the optus data breach happen?” ht tps://www.upguard.com/blog/how-did-the-optus-data-b reach-happen, 2022, accessed 2025-04-15
work page 2022
-
[6]
What happened in the peloton data breach?
Twingate, “What happened in the peloton data breach?” https://www.twingate.com/blog/peloton-api-vulnerability /, 2021, accessed 2025-04-15
work page 2021
-
[7]
ATRDF2 advanced threat request dataset framework (version 2),
Y . Levi and R. Dubin, “ATRDF2 advanced threat request dataset framework (version 2),” https://github.com/Ariel Cyber/ATRDF2, 2024, accessed: 2025-06-14
work page 2024
-
[8]
Snort – lightweight intrusion detection for networks,
M. Roesch, “Snort – lightweight intrusion detection for networks,” inProceedings of the 13th USENIX Large Installation System Administration Conference (LISA), 1999, pp. 229–238
work page 1999
-
[9]
Weaknesses of signature-based API protection,
Balasys Research Lab, “Weaknesses of signature-based API protection,” https://balasys.eu/blogs/weaknesses-o f-signature-based-api-protection, 2022, accessed 2025- 04-16
work page 2022
-
[10]
Multi-information fusion for HTTP anomaly detection,
Q. Zhao, W. Liu, and Q. Pei, “Multi-information fusion for HTTP anomaly detection,”IEEE Access, vol. 12, pp. 11 234–11 247, 2024
work page 2024
-
[11]
Anomaly detection of web- based attacks,
C. Kruegel and G. Vigna, “Anomaly detection of web- based attacks,” inProceedings of the 10th ACM Confer- ence on Computer and Communications Security (CCS). ACM, 2003, pp. 251–261
work page 2003
-
[12]
Anomaly detection of traffic session based on graph neural network,
P. Du, C. Peng, P. Xiang, and Q. Li, “Anomaly detection of traffic session based on graph neural network,” in Proceedings of the 2022 International Conference on Cyber Security (CSW). ACM, 2022, pp. 1–9
work page 2022
-
[13]
J. E. D ´ıaz-Verdejo, R. Estepa, A. Estepa, and G. Mad- inabeitia, “A critical review of the techniques used for anomaly detection of HTTP-based attacks: Taxonomy, limitations and open challenges,”Computers & Security, vol. 124, p. 102997, 2023
work page 2023
-
[14]
HTTP REST API Structure Learning,
R. Dubin and A. Dvir, “HTTP REST API Structure Learning,” https://github.com/ArielCyber/API-CDR, 2025, accessed: 2025-06-14
work page 2025
-
[15]
A classification-by-retrieval framework for few-shot anomaly detection to detect API injection,
U. Aharon, R. Dubin, A. Dvir, and C. Hajaj, “A classification-by-retrieval framework for few-shot anomaly detection to detect API injection,”Computers & Security, vol. 150, p. 104249, 2024
work page 2024
-
[16]
OW ASP Foundation, “OW ASP API security top 10,” http s://owasp.org/www-project-api-security/, 2023, accessed 2025-04-15
work page 2023
-
[17]
Spec-based detection of authorization bugs in web APIs,
Y . Hu, R. Padhye, and K. Sen, “Spec-based detection of authorization bugs in web APIs,” inProceedings of the IEEE Symposium on Security and Privacy (S&P), 2022, pp. 234–252
work page 2022
-
[18]
RESTler: Stateful rest API fuzzing,
V . Atlidakis, P. Godefroid, and Y . Li, “RESTler: Stateful rest API fuzzing,” inProceedings of the IEEE/ACM In- ternational Conference on Software Engineering (ICSE), 2019, pp. 748–758
work page 2019
-
[19]
The shortest path through a maze,
E. F. Moore, “The shortest path through a maze,” inProc. Int. Symp. on the Theory of Switching, 1959, pp. 285– 292
work page 1959
-
[20]
Depth-first search and linear graph algo- rithms,
R. Tarjan, “Depth-first search and linear graph algo- rithms,”SIAM Journal on Computing, vol. 1, no. 2, pp. 146–160, 1972
work page 1972
-
[21]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 3rd ed. MIT Press, 2009
work page 2009
-
[22]
C. Torrano Gim ´enez, A. P ´erez Villegas, and G. ´Alvarez Mara˜n´on, “HTTP dataset CSIC 2010,” http://www.isi.cs ic.es/dataset/, 2010, accessed 2025-04-15
work page 2010
-
[23]
Paros Proxy for Web Application Security Assessment,
Chinotec Technologies Company, “Paros Proxy for Web Application Security Assessment,” https://sourceforge.ne t/projects/paros/, 2004, open-source HTTP/HTTPS proxy for web application security testing
work page 2004
-
[24]
w3af: Web Application Attack and Audit Framework,
A. Riancho, “w3af: Web Application Attack and Audit Framework,” http://w3af.org, 2007, open-source web application security scanner
work page 2007
-
[25]
API traffic research dataset framework (ATRDF),
Ariel Cyber Innovation Center, “API traffic research dataset framework (ATRDF),” https://github.com/Ariel Cyber/Cisco Ariel Uni API security challenge, 2023, accessed 2025-04-20
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.