arxiv: 2601.04722 · v2 · pith:66IEJF3Snew · submitted 2026-01-08 · 💻 cs.DB

Toward Temporal Attribution Analytics in Dataflows

Chrysanthi Kosyfaki , Ruiyuan Zhang , Nikos Mamoulis , Xiaofang Zhou This is my paper

Pith reviewed 2026-05-16 16:39 UTC · model grok-4.3

classification 💻 cs.DB

keywords temporal attributiondata provenancedataflowsstreaming systemstemporal interaction networksprovenance queriesstate-based indexing

0 comments

The pith

Temporal attribution provides a lightweight provenance method to quantitatively track data dependencies between components in streaming dataflows over time without storing fine-grained metadata.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines temporal attribution as a new lightweight form of data provenance suited to monitoring how data moves quantitatively between system components across time intervals. Traditional provenance approaches store detailed dependency graphs that scale super-linearly with data volume in systems like streaming processors, creating prohibitive costs. By adapting volume-based tracking from temporal interaction networks to model exchanges between operators, the work classifies data as discrete or liquid, specifies five temporal query types, and introduces a state-based index to answer those queries efficiently. A reader would care because this approach could make ongoing dependency analysis practical in large-scale dataflows where full provenance tracing remains too expensive. The paper presents this as a vision for scalable, time-focused analytics rather than a complete implementation.

Core claim

Temporal attribution is introduced as a lightweight provenance technique that models quantified data exchanges between dataflow operators using temporal interaction networks to support time-focused analysis without requiring fine-grained tuple-level dependency metadata. The method classifies data into discrete and liquid types, defines five temporal provenance query types, and proposes a state-based indexing approach to enable efficient processing of these queries in streaming systems and workflows.

What carries the argument

The state-based indexing approach built on temporal interaction networks that succinctly records quantified data exchanges between operators over time intervals.

If this is right

Quantitative monitoring of dependencies between dataflow components becomes feasible over time without storing full provenance graphs.
Five specific temporal query types can be answered using only summarized state information from the interaction networks.
The technique applies to both streaming processors and general processing workflows by treating data exchanges as discrete or liquid flows.
Storage and computation costs remain lower than traditional fine-grained provenance methods as data volumes increase.
Research directions are outlined for turning temporal attribution into a practical tool for large-scale dataflow analytics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach might integrate into existing stream engines by adding compact indexes rather than retrofitting full dependency tracking.
Similar modeling could apply to time-based auditing in other distributed systems where only aggregate flows matter.
A concrete test would measure index size and query latency on real streaming traces with varying operator counts.
If effective, it could reduce the barrier to provenance use in production monitoring dashboards.

Load-bearing premise

A state-based indexing approach can efficiently support the five temporal provenance query types for large-scale dataflows without requiring fine-grained tuple-level dependency metadata.

What would settle it

Implementing the proposed state-based index on a large streaming workload and measuring that query times or storage costs grow super-linearly with data volume would show the efficiency assumption does not hold.

Figures

Figures reproduced from arXiv: 2601.04722 by Chrysanthi Kosyfaki, Nikos Mamoulis, Ruiyuan Zhang, Xiaofang Zhou.

**Figure 1.** Figure 1: TIN-based provenance framework. streaming systems. Liquid data introduces extra complexity because the origin of a quantity becomes ambiguous and not unique after multiple transformations. For instance, an amount of money, originating from one account, can be split across several transactions, merged with other funds, and eventually appear in multiple destinations. Similarly, in streaming systems like Ap… view at source ↗

read the original abstract

Data provenance (the process of determining the origin and derivation of data outputs) has applications across multiple domains including explaining database query results and auditing scientific workflows. Despite decades of research, provenance tracing remains challenging due to its high computational cost and storage requirements. In streaming systems such as Apache Flink, fine-grained provenance graphs can grow super-linearly with data volume, posing significant scalability challenges. We define temporal attribution, a new lightweight form of provenance, appropriate for certain tasks, such as monitoring dependencies between system components over time quantitatively. Temporal attribution enables time-focused analysis that does not require fine-grained, tuple-level dependency meta-data. Inspired by volume-based provenance tracking in Temporal Interaction Networks (TINs), we demonstrate TINs' applicability in succinctly modeling quantified data exchanges between dataflow operators in stream data processing systems and in processing workflows, in general, over time. We classify data into discrete and liquid types, define five temporal provenance query types, and propose a state-based indexing approach. Our vision outlines research directions toward making this new form of temporal attribution a practical tool for large-scale dataflow analytics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a clean vision paper sketching temporal attribution as a lightweight volume-based alternative to full provenance in dataflows, but the scalability claims sit on unproven ground.

read the letter

The paper's main contribution is defining temporal attribution as a stripped-down provenance that tracks quantified data exchanges between operators over time instead of full tuple dependencies. It borrows the volume-tracking approach from Temporal Interaction Networks, splits data into discrete and liquid categories, lists five query types, and points to state-based indexing as the implementation route. That framing is new enough to stand apart from standard provenance work on streaming systems like Flink.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes temporal attribution as a lightweight provenance mechanism for dataflow systems (e.g., Apache Flink streams). Inspired by volume-based tracking in Temporal Interaction Networks (TINs), it classifies data into discrete and liquid types, defines five temporal provenance query types for quantitative dependency monitoring over time, and sketches a state-based indexing approach that avoids fine-grained tuple-level metadata.

Significance. If the indexing approach can be made concrete and efficient, the work could enable scalable temporal analysis of operator exchanges in streaming and workflow systems, offering a lower-overhead alternative to traditional provenance graphs whose size grows super-linearly with data volume.

major comments (3)

[Abstract and §3] Abstract and §3 (proposal): the central claim that state-based indexing supports the five temporal queries (volume, dependency strength, etc.) scalably and correctly without tuple-level metadata is unsupported; no index schema, query algorithms, storage/time complexity bounds, or worked example are supplied.
[§4] §4 (data classification): the discrete/liquid distinction is introduced without formal definitions or invariants showing that aggregated state suffices to answer the queries while preserving the quantified-exchange semantics from the TIN inspiration.
[§5] §5 (vision): no reduction or mapping to the TIN model is given that would allow verification that the proposed queries remain well-defined or sub-linear in stream volume once the discrete/liquid classification is applied.

minor comments (1)

A small concrete example (one query type, one operator pair, one time window) would clarify how state-based indexing answers a query without tuple metadata.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help clarify the scope of our vision paper. As the manuscript introduces the concept of temporal attribution and sketches future research directions rather than presenting a fully implemented system, we address each point by indicating how we will strengthen the presentation while remaining faithful to the paper's vision-oriented nature.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (proposal): the central claim that state-based indexing supports the five temporal queries (volume, dependency strength, etc.) scalably and correctly without tuple-level metadata is unsupported; no index schema, query algorithms, storage/time complexity bounds, or worked example are supplied.

Authors: We agree that the manuscript, as a vision paper, does not supply a concrete index schema, algorithms, complexity bounds, or worked example; the state-based indexing is proposed at a conceptual level to motivate future implementation. We will revise the abstract and §3 to include a high-level index structure sketch, pseudocode outlines for the five query types, and asymptotic arguments showing sub-linear scaling via aggregation. A worked example for one query will also be added to illustrate correctness. revision: yes
Referee: [§4] §4 (data classification): the discrete/liquid distinction is introduced without formal definitions or invariants showing that aggregated state suffices to answer the queries while preserving the quantified-exchange semantics from the TIN inspiration.

Authors: The discrete/liquid classification is introduced intuitively to guide aggregation strategies drawn from TIN volume tracking. We acknowledge the absence of formal definitions and invariants in the current draft. In revision we will add precise definitions for the two data types together with invariants demonstrating that aggregated state suffices to answer the queries while preserving TIN-style quantified-exchange semantics. revision: yes
Referee: [§5] §5 (vision): no reduction or mapping to the TIN model is given that would allow verification that the proposed queries remain well-defined or sub-linear in stream volume once the discrete/liquid classification is applied.

Authors: §5 is explicitly a forward-looking vision section. A full formal reduction lies outside the scope of this initial proposal. We will add a high-level mapping subsection in the revised §5 that relates the five queries to TIN concepts and sketches an argument for sub-linearity based on state aggregation; a complete verification is left for subsequent technical papers. revision: partial

Circularity Check

0 steps flagged

No circularity detected in derivation chain

full rationale

The manuscript is a vision paper that introduces temporal attribution as a new lightweight provenance concept, classifies data as discrete or liquid, defines five query types, and sketches a state-based indexing approach inspired by external TINs work. No equations, fitted parameters, or self-citations appear in the provided text that reduce any claim to its own inputs by construction. The proposal consists of independent definitions and research directions rather than a closed derivation that presupposes its conclusions, satisfying the self-contained criterion with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the domain assumption that TINs volume-based tracking can be adapted to dataflow operators, with new concepts introduced but no free parameters or mathematical derivations.

axioms (1)

domain assumption Volume-based provenance tracking in Temporal Interaction Networks can be applied to model quantified data exchanges between dataflow operators
Explicitly stated as inspiration for the temporal attribution model in streaming systems.

invented entities (2)

temporal attribution no independent evidence
purpose: Lightweight provenance for quantitative time-focused dependency monitoring
Newly defined form of provenance appropriate for specific tasks without fine-grained metadata.
discrete and liquid data types no independent evidence
purpose: Classification to support temporal analysis of different data behaviors
Introduced to enable the five query types in the proposed model.

pith-pipeline@v0.9.0 · 5493 in / 1350 out tokens · 73579 ms · 2026-05-16T16:39:51.392259+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · 3 internal anchors

[1]

Umut Acar, Peter Buneman, James Cheney, Jan Van den Bussche, Natalia Kwasnikowska, and Stijn Vansummeren. 2010. A graph model of data and workflow provenance

work page 2010
[2]

Daniel Alabi, Sainyam Galhotra, Shagufta Mehnaz, Zeyu Song, and Eugene Wu. 2025. Privacy and Security in Distributed Data Markets. InCompanion of the International Conference on Management of Data. 775–787

work page 2025
[3]

Abdullah Hamed Almuntashiri, Luis-Daniel Ibàńez, and Adriane Chapman

work page
[4]

In2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)

LLMs for the post-hoc creation of provenance. In2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW). IEEE, 562– 566

work page
[5]

Abdullah Hamed Almuntashiri, Luis-Daniel Ibáñez, and Adriane Chapman

work page
[6]

InProceedings of the ProvenanceWeek 2025

Using LLMs to infer provenance information. InProceedings of the ProvenanceWeek 2025. 1–10. Does Provenance Interact? [Vision Paper]

work page 2025
[7]

Mohamed Jehad Baeth and Mehmet S Aktas. 2019. Detecting misinforma- tion in social networks using provenance data.Concurrency and Compu- tation: Practice and Experience31, 3 (2019), e4793

work page 2019
[8]

2013.Provenance data in social media

Geoffrey Barbier, Zhuo Feng, and Pritam Gundecha. 2013.Provenance data in social media. Morgan & Claypool Publishers

work page 2013
[9]

Seyed-Mehdi-Reza Beheshti, Hamid Reza Motahari-Nezhad, and Boualem Benatallah. 2012. Temporal provenance model (TPM): model and query language.arXiv preprint arXiv:1211.5009(2012)

work page internal anchor Pith review Pith/arXiv arXiv 2012
[10]

Peter Buneman, Sanjeev Khanna, and Wang Chiew Tan. 2001. Why and Where: A Characterization of Data Provenance. InDatabase Theory - ICDT, 8th International Conference, London, UK, January 4-6, Proceedings (Lecture Notes in Computer Science), Vol. 1973. Springer, 316–330

work page 2001
[11]

Peter Buneman, Sanjeev Khanna, and Wang Chiew Tan. 2002. On Propa- gation of Deletions and Annotations Through Views. InProceedings of the Twenty-first ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 3-5, Madison, Wisconsin, USA. ACM, 150–158

work page 2002
[12]

Peter Buneman, Sanjeev Khanna, and Wang-Chiew Tan. 2002. On propa- gation of deletions and annotations through views. InProceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 150–158

work page 2002
[13]

Peter Buneman and Wang-Chiew Tan. 2007. Provenance in databases. In Proceedings of the 2007 ACM SIGMOD international conference on Manage- ment of data. 1171–1173

work page 2007
[14]

Adriane Chapman, Luca Lauro, Paolo Missier, and Riccardo Torlone. 2024. Supporting better insights of data science pipelines with fine-grained provenance.ACM Transactions on Database Systems49, 2 (2024), 1–42

work page 2024
[15]

Adriane Chapman, Paolo Missier, Giulia Simonelli, and Riccardo Torlone

work page
[16]

Capturing and querying fine-grained provenance of preprocessing pipelines in data science.Proceedings of the VLDB Endowment14, 4 (2020), 507–520

work page 2020
[17]

Adriane P Chapman, Hosagrahar V Jagadish, and Prakash Ramanan. 2008. Efficient provenance storage. InProceedings of the 2008 ACM SIGMOD international conference on Management of data. 993–1006

work page 2008
[18]

Peng Chen, Beth Plale, and Mehmet S Aktas. 2012. Temporal representa- tion for scientific data provenance. In2012 IEEE 8th International Confer- ence on E-Science. IEEE, 1–8

work page 2012
[19]

Susan B Davidson, Tova Milo, and Sudeepa Roy. 2013. A propagation model for provenance views of public/private workflows. InProceedings of the 16th International Conference on Database Theory. 165–176

work page 2013
[20]

Daniel de Oliveira, Flavio Costa, Vítor Silva, Kary ACS Ocaña, and Marta Mattoso. 2014. Debugging Scientific Workflows with Provenance: Achieve- ments and Lessons Learned.. InSBBD. 67–76

work page 2014
[21]

Boris Glavic et al . 2021. Data provenance.Foundations and Trends in Databases9, 3-4 (2021), 209–441

work page 2021
[22]

Boris Glavic, Kyumars Sheykh Esmaili, Peter Michael Fischer, and Nesime Tatbul. 2013. Ariadne: Managing fine-grained provenance on data streams. InProceedings of the 7th ACM international conference on Distributed event- based systems. 39–50

work page 2013
[23]

Todd J Green, Zachary G Ives, Grigoris Karvounarakis, and Val Tannen

work page
[24]

Provenance in ORCHESTRA. (2010)

work page 2010
[25]

Todd J Green, Grigoris Karvounarakis, and Val Tannen. 2007. Prove- nance semirings. InProceedings of the twenty-sixth ACM SIGMOD-SIGACT- SIGART symposium on Principles of database systems. 31–40

work page 2007
[26]

Todd J Green and Val Tannen. 2017. The semiring framework for data- base provenance. InProceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 93–99

work page 2017
[27]

Pritam Gundecha, Zhuo Feng, and Huan Liu. 2013. Seeking provenance of information using social media. InProceedings of the 22nd ACM interna- tional conference on Information & Knowledge Management. 1691–1696

work page 2013
[28]

Matteo Interlandi, Kshitij Shah, Sai Deep Tetali, Muhammad Ali Gulzar, Seunghyun Yoo, Miryung Kim, Todd Millstein, and Tyson Condie. 2015. Titian: Data provenance support in spark. InProceedings of the VLDB Endowment International Conference on Very Large Data Bases, Vol. 9. 216

work page 2015
[29]

Marco Johns, Lena Baum, and Fabian Prasser. 2025. Tracking provenance in clinical data warehouses for quality management.International Journal of Medical Informatics193 (2025), 105690

work page 2025
[30]

Grigoris Karvounarakis, Zachary G Ives, and Val Tannen. 2010. Querying data provenance. InProceedings of the 2010 ACM SIGMOD International Conference on Management of data. 951–962

work page 2010
[31]

Anastasios Kementsietsidis and Min Wang. 2009. Provenance query evalu- ation: what’s so special about it?. InProceedings of the 18th ACM conference on Information and knowledge management. 681–690

work page 2009
[32]

Chrysanthi Kosyfaki and Nikos Mamoulis. 2022. Provenance in Temporal Interaction Networks. In2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2277–2290

work page 2022
[33]

Chrysanthi Kosyfaki and Nikos Mamoulis. 2022. Provenance in Tempo- ral Interaction Networks. In38th IEEE International Conference on Data Engineering, ICDE, Kuala Lumpur, Malaysia, May 9-12. IEEE, 2277–2290

work page 2022
[34]

Chrysanthi Kosyfaki, Nikos Mamoulis, Evaggelia Pitoura, and Panayiotis Tsaparas. 2018. Flow motifs in interaction networks.arXiv preprint arXiv:1810.08408(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[35]

Chrysanthi Kosyfaki, Nikos Mamoulis, Evaggelia Pitoura, and Panayiotis Tsaparas. 2019. Flow Motifs in Interaction Networks. InAdvances in Database Technology - 22nd International Conference on Extending Database Technology, EDBT, Lisbon, Portugal, March 26-29. OpenProceedings.org, 241–252

work page 2019
[36]

Chrysanthi Kosyfaki, Nikos Mamoulis, Evaggelia Pitoura, and Panayiotis Tsaparas. 2021. Flow computation in temporal interaction networks. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 660–671

work page 2021
[37]

Chrysanthi Kosyfaki, Nikos Mamoulis, Evaggelia Pitoura, and Panayiotis Tsaparas. 2021. Flow Computation in Temporal Interaction Networks. In37th IEEE International Conference on Data Engineering, ICDE, Chania, Greece, April 19-22. IEEE, 660–671

work page 2021
[38]

Rohit Kumar and Toon Calders. 2017. Information propagation in interac- tion networks. InAdvances in Database Technology, EDBT 2017: Proceedings of the 20th International Conference on Extending Database Technology Venice, Italy, March 2124. 270–281

work page 2017
[39]

Samuele Langhi, Angela Bonifati, and Riccardo Tommasini. 2025. Evaluat- ing continuous queries with inconsistency annotations.Proceedings of the VLDB Endowment18, 5 (2025), 1321–1334

work page 2025
[40]

Kisung Lee, Raghu Ganti, Mudhakar Srivatsa, and Prasant Mohapatra

work page
[41]

InInternational Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops)

Spatio-temporal provenance: Identifying location information from unstructured text. InInternational Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops). IEEE, 499–504

work page
[42]

Brandon Lucia and Luis Ceze. 2015. Data provenance tracking for con- current programs. In2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 146–156

work page 2015
[43]

Haneen Mohammed and Eugene Wu. 2025. Lineage Capture Trade-offs: A Case Study in DuckDB. InProceedings of the ProvenanceWeek 2025. 32–36

work page 2025
[44]

Luc Moreau, Ben Clifford, Juliana Freire, Joe Futrelle, Yolanda Gil, Paul Groth, Natalia Kwasnikowska, Simon Miles, Paolo Missier, Jim Myers, et al. 2011. The open provenance model core specification (v1. 1).Future generation computer systems27, 6 (2011), 743–756

work page 2011
[45]

Tobias Müller and Pascal Engel. 2022. How, Where, and Why Data Provenance Improves Query Debugging: A Visual Demonstration of Fine– Grained Provenance Analysis for SQL. In2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 3178–3181

work page 2022
[46]

Xing Niu, Bahareh Sadat Arab, Seokki Lee, Su Feng, Xun Zou, Dieter Gawlick, Vasudha Krishnaswamy, Zhen Hua Liu, and Boris Glavic. 2017. Debugging transactions and tracking their provenance with reenactment. arXiv preprint arXiv:1707.09930(2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[47]

Dimitris Palyvos-Giannas, Vincenzo Gulisano, and Marina Papatri- antafilou. 2018. Genealog: Fine-grained data streaming provenance at the edge. InProceedings of the 19th International Middleware Conference. 227–238

work page 2018
[48]

Dimitris Palyvos-Giannas, Bastian Havers, Marina Papatriantafilou, and Vincenzo Gulisano. 2020. Ananke: a streaming framework for live forward provenance.Proceedings of the VLDB Endowment14, 3 (2020), 391–403

work page 2020
[49]

Vicky Papavasileiou, Ken Yocum, and Alin Deutsch. 2019. Ariadne: Online provenance for big graph analytics. InProceedings of the 2019 International Conference on Management of Data. 521–536

work page 2019
[50]

Beatriz Pérez, Julio Rubio, and Carlos Sáenz-Adán. 2018. A systematic review of provenance systems.Knowledge and Information Systems57, 3 (2018), 495–543

work page 2018
[51]

Jakub Reha, Giulio Lovisotto, Michele Russo, Alessio Gravina, and Claas Grohnfeldt. 2023. Anomaly detection in continuous-time temporal prove- nance graphs. InTemporal Graph Learning Workshop@ NeurIPS 2023

work page 2023
[52]

Aryak Sen, Silviu Maniu, and Pierre Senellart. 2025. ProvSQL: A General System for Keeping Track of the Provenance and Probability of Data.arXiv preprint arXiv:2504.12058(2025)

work page arXiv 2025
[53]

Pierre Senellart. 2019. Provenance in databases: Principles and applications. InReasoning Web. Explainable Artificial Intelligence: 15th International Summer School 2019, Bolzano, Italy, September 20–24, 2019, Tutorial Lectures. Springer, 104–109

work page 2019
[54]

Pierre Senellart, Louis Jachiet, Silviu Maniu, and Yann Ramusat. 2018. ProvSQL: Provenance and probability management in PostgreSQL.Pro- ceedings of the VLDB Endowment (PVLDB)11, 12 (2018), 2034–2037

work page 2018
[55]

Wang Chiew Tan et al. 2007. Provenance in databases: Past, current, and future.IEEE Data Eng. Bull.30, 4 (2007), 3–12

work page 2007
[56]

2018.Information diffusion and provenance in social media

Io Taxidou. 2018.Information diffusion and provenance in social media. Chrysanthi Kosyfaki, Ruiyuan Zhang, Nikos Mamoulis, and Xiaofang Zhou Ph.D. Dissertation. Dissertation, Universität Freiburg

work page 2018
[57]

Io Taxidou, Tom De Nies, Ruben Verborgh, Peter M Fischer, Erik Mannens, and Rik Van de Walle. 2015. Modeling information diffusion in social media as provenance with W3C PROV. InProceedings of the 24th international conference on world wide web. 819–824

work page 2015
[58]

Xiaolan Wang, Alexandra Meliou, and Eugene Wu. 2017. QFix: Diagnosing errors through query histories. InProceedings of the ACM International Conference on Management of Data. 1369–1384

work page 2017
[59]

Michael Whittaker, Cristina Teodoropol, Peter Alvaro, and Joseph M Hellerstein. 2018. Debugging distributed systems with why-across-time provenance. InProceedings of the ACM symposium on cloud computing. 333–346

work page 2018
[60]

Albert Ariel Widiaatmaja, Belkis Djeffal, Ashish Dandekar, and Pierre Senellart. 2025. Demonstration of ProvSQL Update Provenance through Temporal Databases. InProceedings of the ProvenanceWeek 2025. 71–76

work page 2025
[61]

Yinjun Wu, Abdussalam Alawini, Daniel Deutch, Tova Milo, and Susan Davidson. 2019. ProvCite: provenance-based data citation.Proceedings of the VLDB Endowment12, 7 (2019), 738–751

work page 2019
[62]

Yang Wu, Ang Chen, and Linh Thi Xuan Phan. 2019. Zeno: Diagnos- ing performance problems with temporal provenance. In16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). 395–420

work page 2019
[63]

Yang Wu, Mingchen Zhao, Andreas Haeberlen, Wenchao Zhou, and Boon Thau Loo. 2014. Diagnosing missing events in distributed systems with negative provenance.ACM SIGCOMM Computer Communication Review44, 4 (2014), 383–394

work page 2014
[64]

Masaya Yamada, Hiroyuki Kitagawa, Salman Ahmed Shaikh, Toshiyuki Amagasa, and Akiyoshi Matono. 2025. LPStream: Fine-grained Lazy Prove- nance for Stream Processing.Proceedings of the ACM on Management of Data3, 4 (2025), 1–25

work page 2025
[65]

Yuankai Zhang, Adam O’Neill, Micah Sherr, and Wenchao Zhou. 2017. Privacy-preserving network provenance.Proceedings of the VLDB Endow- ment10, 11 (2017), 1550–1561

work page 2017
[66]

David Zhao, Pavle Subotić, and Bernhard Scholz. 2020. Debugging large- scale datalog: A scalable provenance evaluation strategy.ACM Transactions on Programming Languages and Systems (TOPLAS)42, 2 (2020), 1–35

work page 2020
[67]

Wenchao Zhou, Ling Ding, Andreas Haeberlen, Zachary Ives, and Boon Thau Loo. 2011. {TAP}: Time-aware Provenance for Distributed Systems. In3rd USENIX Workshop on the Theory and Practice of Provenance (TaPP 11)

work page 2011
[68]

Wenchao Zhou, Suyog Mapara, Yiqing Ren, Yang Li, Andreas Haeberlen, Zachary Ives, Boon Thau Loo, and Micah Sherr. 2012. Distributed time- aware provenance.Proceedings of the VLDB Endowment6, 2 (2012), 49–60

work page 2012
[69]

Michael Zipperle, Florian Gottwalt, Elizabeth Chang, and Tharam Dillon

work page
[70]

Surveys55, 7 (2022), 1–36

Provenance-based intrusion detection systems: A survey.Comput. Surveys55, 7 (2022), 1–36

work page 2022