CSTS: A Canonical Security Telemetry Substrate for AI-Native Cyber Detection
Pith reviewed 2026-05-15 00:24 UTC · model grok-4.3
The pith
The Canonical Security Telemetry Substrate (CSTS) unifies heterogeneous cybersecurity data into a common structure over entities, relations, events, state, and provenance for portable AI analytics.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CSTS is a canonical, AI-ready telemetry foundation designed to harmonize heterogeneous cyber data into a common representation over persistent entities, typed relations, events, temporal state, and provenance, while preserving source-specific nuance through explicit mappings and extensible metadata so that the same models can run across environments.
What carries the argument
The CSTS representational model of persistent entities, typed relations, events, temporal state, and provenance, with explicit mappings and extensible metadata that retain source detail for downstream inference.
If this is right
- The same AI models for anomaly detection, graph learning, forecasting, behavior modeling, and agentic response can run on data from any mapped source.
- Analytics programs no longer need separate ingestion pipelines for each vendor or deployment environment.
- A single substrate supports both on-prem and multi-cloud operation without re-engineering the data layer.
- Downstream tasks become model-agnostic because the input representation is fixed and portable.
Where Pith is reading between the lines
- Vendors might begin emitting data directly in CSTS-compatible form to reduce customer integration work.
- A shared substrate could enable cross-organization sharing of detection models without exposing raw logs.
- Empirical tests could measure whether models trained on CSTS-mapped data retain detection accuracy compared with native formats.
Load-bearing premise
Heterogeneous cyber data from arbitrary vendors can be mapped into the CSTS structure while keeping every detail required for accurate AI inference without unacceptable loss or added complexity.
What would settle it
A concrete example of a vendor log or alert whose critical attributes cannot be expressed in CSTS without omitting information that changes the outcome of an anomaly-detection or graph-learning model.
read the original abstract
Cybersecurity data remains fragmented across vendors, formats, schemas, and deployment environments, forcing AI and analytics programs to spend disproportionate effort on ingestion, normalization, and brittle source-specific engineering. This paper introduces the Canonical Security Telemetry Substrate (CSTS), a canonical, AI-ready telemetry foundation designed to harmonize heterogeneous cyber data into a common representation over persistent entities, typed relations, events, temporal state, and provenance. CSTS is intended to move cybersecurity analytics beyond ad hoc record normalization toward a reusable substrate that supports anomaly detection, graph learning, forecasting, behavior-based modeling, and agentic cyber AI. We formalize the core design principles of CSTS, define its representational components, and explain how it preserves source-specific nuance through explicit mappings and extensible metadata while still enabling portable downstream inference. We further position CSTS as a cloud-agnostic and deployment-agnostic substrate suitable for on-prem, hybrid, and multi-cloud environments. The result is a unifying telemetry model that reduces the blue-collar burden of cyber data engineering and creates a clearer path to scalable, interoperable, and model-agnostic cyber AI.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the Canonical Security Telemetry Substrate (CSTS), a conceptual design for a canonical, AI-ready telemetry foundation that harmonizes heterogeneous cybersecurity data into a common representation built on persistent entities, typed relations, events, temporal state, and provenance. It formalizes design principles, defines representational components, explains preservation of source-specific nuance via explicit mappings and extensible metadata, and positions CSTS as cloud- and deployment-agnostic to support anomaly detection, graph learning, forecasting, and agentic cyber AI while reducing data engineering effort.
Significance. If the proposed mappings can be shown to preserve necessary nuance with acceptable complexity, CSTS could meaningfully address data fragmentation in cybersecurity and enable more portable, model-agnostic AI applications. The conceptual framing targets a genuine practical bottleneck, but the absence of any concrete mappings, loss quantification, or validation means the significance is prospective rather than demonstrated.
major comments (2)
- [Abstract] Abstract and introduction: the central claim that heterogeneous vendor data can be mapped to CSTS components while preserving all necessary source-specific nuance for downstream AI inference (anomaly detection, graph learning, forecasting) rests entirely on stated design principles with no worked examples, no explicit mapping definitions, and no analysis of information loss or added complexity.
- [Representational components] Representational components section: the definitions of persistent entities, typed relations, events, temporal state, and provenance are presented at a high level without formal syntax, axioms, or completeness arguments, leaving open whether the substrate is sufficiently expressive for arbitrary cyber telemetry without reintroducing source-specific engineering downstream.
minor comments (2)
- [Abstract] The abstract could include a short statement on the intended scope of the formalization and any acknowledged limitations of the conceptual approach.
- [Introduction] Terminology such as 'blue-collar burden' is informal for a journal submission and could be replaced with more precise language about engineering overhead.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment point by point below, indicating the revisions we will make to strengthen the presentation while preserving the conceptual focus of the work.
read point-by-point responses
-
Referee: [Abstract] Abstract and introduction: the central claim that heterogeneous vendor data can be mapped to CSTS components while preserving all necessary source-specific nuance for downstream AI inference (anomaly detection, graph learning, forecasting) rests entirely on stated design principles with no worked examples, no explicit mapping definitions, and no analysis of information loss or added complexity.
Authors: We agree that the manuscript would be strengthened by concrete illustrations of the mapping process. Although the paper is primarily a conceptual contribution focused on design principles, we will add a new subsection with worked examples mapping representative heterogeneous sources (e.g., network flow records, endpoint telemetry, and SIEM alerts) to CSTS persistent entities, typed relations, events, and provenance. The revision will include explicit mapping rules, discussion of how source-specific nuance is retained via extensible metadata, and a qualitative analysis of information loss and added complexity. This addresses the concern directly. revision: yes
-
Referee: [Representational components] Representational components section: the definitions of persistent entities, typed relations, events, temporal state, and provenance are presented at a high level without formal syntax, axioms, or completeness arguments, leaving open whether the substrate is sufficiently expressive for arbitrary cyber telemetry without reintroducing source-specific engineering downstream.
Authors: The high-level presentation was chosen to emphasize reusability across deployments. We will revise the section to include a lightweight formal syntax (using tuple-based notation for entities, relations, and events with type constraints), a small set of consistency axioms (e.g., temporal ordering and provenance chaining), and a completeness argument demonstrating coverage of standard cyber telemetry categories. The revision will also clarify that the explicit mapping layer and metadata extensibility are designed to avoid reintroducing source-specific engineering in downstream AI tasks. revision: yes
Circularity Check
No significant circularity: definitional framework without derivations or self-referential reductions
full rationale
The paper introduces CSTS by defining its representational components (persistent entities, typed relations, events, temporal state, provenance) and design principles for harmonizing heterogeneous data. No equations, fitted parameters, or predictions appear in the provided text. No self-citations are invoked as load-bearing justifications for uniqueness or ansatzes. The central claim is a proposal for a canonical substrate that preserves nuance via explicit mappings and extensible metadata; this does not reduce to its own inputs by construction, as the framework is presented as an organizing definition rather than a derived result from prior fitted data or author-specific theorems. The lack of concrete mappings is a validation gap but does not create circularity in any derivation chain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Heterogeneous cyber data sources can be mapped to a common set of entities, relations, events, temporal states, and provenance without unacceptable loss of information.
invented entities (1)
-
CSTS substrate
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Zero day threat detection using graph and flow based security telemetry,
C. Redino, D. Nandakumar, R. Schiller, K. Choi, A. Rahman, E. Bowen, M. Weeks, A. Shaha, and J. Nehila, “Zero day threat detection using graph and flow based security telemetry,” 2022. [Online]. Available: https://arxiv.org/abs/2205.02298
-
[2]
Zero day threat detection using metric learning autoencoders,
D. Nandakumar, R. Schiller, C. Redino, K. Choi, A. Rahman, E. Bowen, M. Vucovich, J. Nehila, M. Weeks, and A. Shaha, “Zero day threat detection using metric learning autoencoders,” 2022. [Online]. Available: https://arxiv.org/abs/2211.00441
-
[3]
Cross-temporal detection of novel ransomware campaigns: A multi-modal alert approach,
S. Murli, D. Nandakumar, P. K. Kushwaha, C. Wang, C. Redino, A. Rahman, S. Israni, T. Singh, and E. Bowen, “Cross-temporal detection of novel ransomware campaigns: A multi-modal alert approach,” 2023. [Online]. Available: https://arxiv.org/abs/2309.00700
-
[4]
Lateral movement detection using user behavioral analysis,
D. Kushwaha, D. Nandakumar, A. Kakkar, S. Gupta, K. Choi, C. Redino, A. Rahman, S. S. Chandramohan, E. Bowen, M. Weeks, A. Shaha, and J. Nehila, “Lateral movement detection using user behavioral analysis,” 2022. [Online]. Available: https://arxiv.org/abs/2208.13524
-
[5]
Open cybersecurity schema framework (ocsf),
Open Cybersecurity Schema Framework, “Open cybersecurity schema framework (ocsf),” https://ocsf.io, 2023, accessed 2026
work page 2023
-
[6]
Data security architecture considerations for telemetry post processing environments,
J. Kalibjian, “Data security architecture considerations for telemetry post processing environments,” inInternational Telemetering Confer- ence Proceedings. International Foundation for Telemetering, 2017
work page 2017
-
[7]
Telemetry networks cyber security architecture,
W. Zegeye and M. Odejobi, “Telemetry networks cyber security architecture,” inInternational Telemetering Conference Proceedings. International Foundation for Telemetering, 2022
work page 2022
-
[8]
Cyber security architecture for networked telemetry,
R. Dean, W. Akpose, W. Zegeye, and F. Moazzami, “Cyber security architecture for networked telemetry,” inInternational Telemetering Conference Proceedings. International Foundation for Telemetering, 2024
work page 2024
-
[9]
Elastic, “Elastic common schema (ecs),” https://www.elastic.co/guide/en/ecs/current/index.html, 2023, accessed 2026
work page 2023
-
[10]
Cloud Native Computing Foundation, “Opentelemetry,” https://opentelemetry.io, 2023, accessed 2026
work page 2023
-
[11]
Leveraging opentelemetry for enhanced application security through telemetry data,
L. P. Rongali, “Leveraging opentelemetry for enhanced application security through telemetry data,” TechRxiv preprint, 2025, dOI: 10.36227/techrxiv.175790707.71761473/v1. [Online]. Available: https://doi.org/10.36227/techrxiv.175790707.71761473/v1
-
[12]
Towards an open format for scalable system telemetry,
T. Taylor, F. Araujo, and X. Shu, “Towards an open format for scalable system telemetry,” in2020 IEEE International Conference on Big Data (Big Data). IEEE, 2020, pp. 1031–1040, arXiv:2101.10474. [Online]. Available: https://arxiv.org/abs/2101.10474
-
[13]
Advanced intrusion detection in telemetry enterprise networks,
F. Okonkwo, “Advanced intrusion detection in telemetry enterprise networks,” inInternational Telemetering Conference Proceedings, vol. 59. International Foundation for Telemetering, 2024, final published version; available via UA Campus Repository. [Online]. Available: http://hdl.handle.net/10150/675420
work page 2024
-
[14]
F. Martinez-Lopez, L. Santana, and M. Rahouti, “Learning in multiple spaces: Few-shot network attack detection with metric-fused prototypical networks,” 2024. [Online]. Available: https://arxiv.org/abs/2501.00050
-
[15]
Self-supervised transformer- based contrastive learning for intrusion detection systems,
I. Koukoulis, I. Syrigos, and T. Korakis, “Self-supervised transformer- based contrastive learning for intrusion detection systems,” 2025. [Online]. Available: https://arxiv.org/abs/2505.08816
-
[16]
Open cyber threat intelligence knowledge graph,
I. Sarhanet al., “Open cyber threat intelligence knowledge graph,” Information Sciences, vol. 578, p. 123456, 2021, constructs a cyber threat intelligence knowledge graph from unstructured APT reports and neural entity/relation extraction models
work page 2021
-
[17]
Cybersecurity knowledge graphs: Representing and rea- soning about complex security relationships,
L. F. Sikos, “Cybersecurity knowledge graphs: Representing and rea- soning about complex security relationships,”Applied Soft Computing, vol. 132, p. 110234, 2023, survey of cybersecurity knowledge graph methods, reasoning, and applications
work page 2023
-
[18]
Knowledge graph reasoning for cyber attack detection,
E. Gilliard, J. Liu, and A. A. Aliyu, “Knowledge graph reasoning for cyber attack detection,”IET Communications, vol. 18, no. 6, pp. 297– 308, 2024, graph reasoning enhances detection by inferring semantic attack relationships
work page 2024
-
[19]
Knowgraph: Knowledge-enabled anomaly detection via graph-embedded reasoning,
A. Zhouet al., “Knowgraph: Knowledge-enabled anomaly detection via graph-embedded reasoning,” 2024. [Online]. Available: https://arxiv.org/abs/2410.08390
-
[20]
SETC: A vulnerability telemetry collection framework,
R. Holeman, J. Hastings, and V . M. Vaidyan, “SETC: A vulnerability telemetry collection framework,” arXiv preprint, 2024, arXiv:2406.05942. [Online]. Available: https://arxiv.org/abs/2406.05942
-
[21]
M. A. Shyaa, N. F. Ibrahim, Z. Zainol, R. Abdullah, M. Anbar, and L. Alzubaidi, “Evolving cybersecurity frontiers: A comprehensive survey on concept drift and feature dynamics aware machine and deep learning in intrusion detection systems,”Engineering Applications of Artificial Intelligence, vol. 137, p. 109143, 2024. [Online]. Available: https://www.scie...
work page 2024
-
[22]
F. Hinder, V . Vaquet, and B. Hammer, “One or two things we know about concept drift—a survey on monitoring in evolving environments. part a: detecting concept drift,”Frontiers in Artificial Intelligence, vol. 7, p. 1330257, 2024. [Online]. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC11220237/
work page 2024
-
[23]
Eg-conmix: An intrusion detection method based on graph contrastive learning,
L. Wu, S. Lei, F. Liao, Y . Zheng, Y . Liu, W. Fu, H. Song, and J. Zhou, “Eg-conmix: An intrusion detection method based on graph contrastive learning,” 2024. [Online]. Available: https://arxiv.org/abs/2403.17980
-
[24]
A novel contrastive loss for zero-day network intrusion detection,
J. Wilkie, H. Hindy, C. Michie, C. Tachtatzis, J. Irvine, and R. Atkinson, “A novel contrastive loss for zero-day network intrusion detection,” 2026. [Online]. Available: https://arxiv.org/abs/2601.09902
-
[25]
Supervised contrastive learning,
P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y . Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan, “Supervised contrastive learning,” inAdvances in Neural Information Processing Systems, 2020
work page 2020
-
[26]
Anomaly detection using autoencoders with nonlinear dimensionality reduction,
M. Sakurada and T. Yairi, “Anomaly detection using autoencoders with nonlinear dimensionality reduction,” inProceedings of the MLSDA 2014 Workshop, 2014
work page 2014
-
[27]
Network motifs: Simple building blocks of complex net- works,
R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon, “Network motifs: Simple building blocks of complex net- works,”Science, vol. 298, no. 5594, pp. 824–827, 2002
work page 2002
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.