pith. sign in

arxiv: 1907.05045 · v1 · pith:MCEIYLXSnew · submitted 2019-07-11 · 💻 cs.PL · cs.LO

Provenance for Large-scale Datalog

Pith reviewed 2026-05-24 22:59 UTC · model grok-4.3

classification 💻 cs.PL cs.LO
keywords Datalogprovenancedebuggingsemi-naive evaluationstatic program analysisproof treesfixed-point semanticsSouffle
0
0 comments X

The pith

A new provenance lattice with proof annotations and adjusted semi-naive semantics lets Datalog programs answer arbitrary debugging queries on large analyses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a bottom-up evaluation strategy for Datalog that records provenance during computation so that developers can debug large recursive programs used in static analysis and network analysis. It augments the standard lattice with proof annotations and changes the fixed-point iteration in semi-naive evaluation to keep track of how each tuple was derived. A separate query mechanism then extracts minimal-height partial proof trees for any requested tuple without recomputing the entire program. Experiments on DOOP analyses with tens of millions of tuples show the approach runs with only 1.27 times the normal cost while supporting more flexible queries than prior methods. The work targets the practical problem that existing database provenance techniques fail to scale when Datalog rules number in the hundreds and data volume reaches millions of facts.

Core claim

By introducing a provenance lattice that carries proof annotations alongside ordinary Datalog facts and by redefining the fixed-point semantics of semi-naive evaluation to propagate those annotations, the engine can answer arbitrary provenance queries that construct partial proof trees of minimal height for any output tuple, all while preserving the original program semantics and incurring only modest runtime overhead on large-scale inputs such as DOOP analyses.

What carries the argument

The provenance lattice augmented with proof annotations, together with the modified fixed-point operator for semi-naive evaluation that records derivation information during iteration.

If this is right

  • Debugging queries can request provenance for any tuple and receive a minimal-height partial proof tree on demand.
  • The same engine can produce provenance for tens of millions of tuples produced by hundreds of recursive rules.
  • Runtime cost stays within a factor of roughly 1.27 compared with ordinary evaluation.
  • The technique supports more flexible queries than previous state-of-the-art provenance methods for Datalog.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same lattice construction could be applied to other bottom-up logic engines that already use semi-naive evaluation.
  • Minimal-height proof trees might serve as concise explanations when an analysis reports an unexpected result.
  • Interactive tools could cache the annotated facts and answer successive debugging queries without restarting evaluation.

Load-bearing premise

The new lattice and modified semi-naive fixed-point semantics preserve the original Datalog semantics while correctly recording all proof information needed for later debugging queries.

What would settle it

Execute the instrumented engine on a small, manually verifiable Datalog program whose complete set of minimal derivations is already known, then compare the proof trees returned by debugging queries against the expected derivations and check that no extra or missing tuples appear in the output.

Figures

Figures reproduced from arXiv: 1907.05045 by Bernhard Scholz, David Zhao, Pavle Subotic.

Figure 1
Figure 1. Figure 1: Program Analysis Datalog Setup a b c e d l1 l3 l4 new assign assign load[f] store[f] new new load[f] [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Points-to Input Diagram • large-scale experiments using the Doop program analysis framework with DaCapo benchmarks with tens of millions of tuples, measuring on average 1.27× overheads for runtime and 1.45× overheads for memory. The paper is organized as follows: In Section 2 we motivate our provenance method and describe its use in a real-world program analysis use case. In Section 3 we detail the theoret… view at source ↗
Figure 3
Figure 3. Figure 3: Full proof tree for alias(a, b) The importance of minimality of proof tree height is shown in [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Infinitely many derivations for vpt(b, l1), resulting from the circular assignment in lines l2 and l8 in the input program provide explanations for the existence of a tuple, the proof tree of an unexpected tuple will help identify the fault in the logic specification. An example fault could be if rule r3 was altered as follows, r3: vpt(Var, Obj) :- load(Var, Y, F), store(P, F, Q), vpt(Q, Obj), vpt(P, Obj1)… view at source ↗
Figure 5
Figure 5. Figure 5: Interactive exploration of fragments of a proof tree fo [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Exploring the proof of alias(a, e) to find the erroneous rule r3 minimize the number of user interactions (i.e., proof tree fragments) required to discover the root cause for an anomaly. Note that provenance queries for databases (cf. [13, 26, 27]) have in general shorter evaluation times, and, hence, Database approaches may re-evaluate for each provenance query the whole specification. How￾ever, for large… view at source ↗
Figure 7
Figure 7. Figure 7: One level of a proof tree of minimal height for [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Synthesized Proof Tree Generator system 3.1 Standard Bottom-Up Evaluation The basis of our approach is the standard bottom-up evaluation strategy for Datalog specification [2]. The computational domain of standard bottom-up evaluation is the subset lattice consisting of sets of tuples, denoted instances I. The na¨ıve algorithm for evaluation is based on the immediate consequence operator, ΓP , which genera… view at source ↗
Figure 9
Figure 9. Figure 9: Connecting a tuple to a proof tree via a height annotation [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: IDB relation vpt in each iteration of the fixpoint computation for the example Datalog specification It remains to be shown that the provenance evaluation strategy is correct, i.e., that TP terminates and results in the same set of tuples as ΓP . Additionally, we must show that the height annotations resulting from provenance evaluation strategy is minimal. Lemma 2. TP computes the same tuples as ΓP at fi… view at source ↗
Figure 11
Figure 11. Figure 11: Example Datalog specification demonstrating the upper b [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: depicts a snippet of a Souffl´e RAM program, part of the semi-na¨ıve evaluation of the rule r2 : vpt(V ar, Obj) :- assign(V ar, V ar2), vpt(V ar2, Obj). The join is performed via a loop nest iterating over tuples of relations efficiently via indexes [38]. Line 2 computes the new tuples to be added to the ∆i+1 vpt relation, where the NOT IN operation is an existence check to ensure the generated tuple does… view at source ↗
Figure 13
Figure 13. Figure 13: Provenance version of RAM loop nest However, the specializations in the data structures still remain to be discussed. Souffl´e employs a highly specialized parallel B-Tree data structure [25], with index orderings for the attributes generated automatically via an optimization problem [38]. The B-Tree employs a special optimistic read/write lock for each node, allowing high throughput for parallel insertio… view at source ↗
Figure 14
Figure 14. Figure 14: Explaining the tuple alias(a, b) It is critical that the proof tree construction procedures are highly performant since the constructed IDB may be very large, and we may need to search through many tuples to construct a proof tree fragment. Therefore, the proof tree construction procedures must be tightly integrated into the Souffl´e system to enable a high-performance, parallel search. We integrate these… view at source ↗
Figure 15
Figure 15. Figure 15: Explaining the non-existence of the tuple [PITH_FULL_IMAGE:figures/full_fig_p021_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Subroutine for example program Lines 2-5 represent a search through a database that is already constructed by the initial bottom-up evaluation, to find tuples which satisfy the constraints required for the construction of a proof tree fragment. The values of argument(0) and argument(1) are the values in the head tuple, and argument(2) is the height annotation of the head tuple. Therefore, this subproof se… view at source ↗
Figure 18
Figure 18. Figure 18: Proof Tree Construction and Statistics the computation graphs presented in [10], and are equally effective for debugging. They can also be seen as an extension of how-provenance [9]. However, our hybrid method for generating proof trees is novel, i.e., it permits multiple debugging queries in a single debugging cycle. Hence, our method is especially useful for debugging large Datalog specifications. Prove… view at source ↗
read the original abstract

Logic programming languages such as Datalog have become popular as Domain Specific Languages (DSLs) for solving large-scale, real-world problems, in particular, static program analysis and network analysis. The logic specifications which model analysis problems, process millions of tuples of data and contain hundreds of highly recursive rules. As a result, they are notoriously difficult to debug. While the database community has proposed several data-provenance techniques that address the Declarative Debugging Challenge for Databases, in the cases of analysis problems, these state-of-the-art techniques do not scale. In this paper, we introduce a novel bottom-up Datalog evaluation strategy for debugging: our provenance evaluation strategy relies on a new provenance lattice that includes proof annotations, and a new fixed-point semantics for semi-naive evaluation. A debugging query mechanism allows arbitrary provenance queries, constructing partial proof trees of tuples with minimal height. We integrate our technique into Souffle, a Datalog engine that synthesizes C++ code, and achieve high performance by using specialized parallel data structures. Experiments are conducted with DOOP/DaCapo, producing proof annotations for tens of millions of output tuples. We show that our method has a runtime overhead of 1.27x on average while being more flexible than existing state-of-the-art techniques.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to introduce a novel bottom-up Datalog evaluation strategy for debugging large-scale programs (e.g., static analyses with millions of tuples and hundreds of recursive rules). It relies on a new provenance lattice that includes proof annotations together with a modified fixed-point semantics for semi-naive evaluation; this enables a debugging query mechanism that constructs partial proof trees of minimal height. The approach is integrated into Soufflé via specialized parallel data structures and is evaluated on DOOP/DaCapo, reporting an average runtime overhead of 1.27× while producing annotations for tens of millions of output tuples and offering greater flexibility than prior provenance techniques.

Significance. If the lattice definition and semantic preservation claims hold, the work supplies a practical, scalable provenance mechanism for Datalog DSLs used in program and network analysis. The combination of proof annotations, minimal-height trees, and low-overhead parallel implementation in Soufflé addresses a recognized debugging bottleneck; the reported experiments on real-world benchmarks constitute a concrete strength.

major comments (2)
  1. [§3 (Semantics)] The central claim that the new provenance lattice and modified semi-naive fixed-point semantics preserve the original Datalog least-fixed-point semantics while correctly recording all necessary proof information is load-bearing; the manuscript must supply the lattice definition, the revised immediate-consequence operator, and a proof of equivalence (or a machine-checked statement) to allow verification that no tuples are added or lost.
  2. [§5 (Evaluation)] Table 1 (or equivalent experimental summary) reports 1.27× average overhead; the paper should clarify whether this figure includes the cost of materializing the proof annotations for all output tuples or only the base evaluation, because the former is required to support the scalability claim for tens of millions of tuples.
minor comments (2)
  1. [§2–3] Notation for the provenance lattice elements (e.g., how proof annotations are encoded inside the lattice values) should be introduced once and used consistently; the abstract uses “proof annotations” while later sections appear to switch between “annotations” and “labels.”
  2. [§4] The description of the debugging query mechanism would benefit from a small worked example showing a concrete provenance query, the resulting partial proof tree, and how minimality of height is enforced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation of minor revision. We address each major comment below.

read point-by-point responses
  1. Referee: [§3 (Semantics)] The central claim that the new provenance lattice and modified semi-naive fixed-point semantics preserve the original Datalog least-fixed-point semantics while correctly recording all necessary proof information is load-bearing; the manuscript must supply the lattice definition, the revised immediate-consequence operator, and a proof of equivalence (or a machine-checked statement) to allow verification that no tuples are added or lost.

    Authors: We agree that explicit verification of semantic preservation is essential. The revised manuscript will expand §3 with the complete definition of the provenance lattice, the revised immediate-consequence operator used in the modified semi-naive evaluation, and a formal proof (or machine-checkable statement) establishing equivalence to the standard least fixed-point semantics, confirming that no tuples are added or lost. revision: yes

  2. Referee: [§5 (Evaluation)] Table 1 (or equivalent experimental summary) reports 1.27× average overhead; the paper should clarify whether this figure includes the cost of materializing the proof annotations for all output tuples or only the base evaluation, because the former is required to support the scalability claim for tens of millions of tuples.

    Authors: The 1.27× figure in Table 1 measures the full provenance-enabled evaluation, which includes materialization of proof annotations for all output tuples. We will add an explicit statement to this effect in the evaluation section of the revised manuscript to remove any ambiguity. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper claims a novel provenance lattice with proof annotations and modified fixed-point semantics for semi-naive Datalog evaluation that preserves original semantics while supporting minimal-height proof trees. No equations, lattice definitions, or semantic derivations are supplied in the abstract or visible text that reduce any result to its inputs by construction. No self-citations are invoked as load-bearing premises, no parameters are fitted then renamed as predictions, and no uniqueness theorems or ansatzes are smuggled via prior work. The derivation chain is presented as building on standard least-fixed-point Datalog evaluation with independent new structures, making the central claims self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper relies on standard Datalog evaluation machinery but introduces new structures whose correctness is not independently evidenced in the abstract.

axioms (1)
  • standard math Fixed-point semantics for semi-naive evaluation of Datalog
    Invoked as the base for the new provenance semantics.
invented entities (1)
  • provenance lattice with proof annotations no independent evidence
    purpose: Track derivation information during bottom-up evaluation
    New data structure introduced for provenance queries

pith-pipeline@v0.9.0 · 5752 in / 1222 out tokens · 22386 ms · 2026-05-24T22:59:54.998546+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 1 internal anchor

  1. [1]

    souffl´ e synthesizes a native parallel c++ program from a logic specifi cation

    souffle-lang/souffle: Souffl´ e is a variant of datalog for tool desig ners crafting analyses in horn clauses. souffl´ e synthesizes a native parallel c++ program from a logic specifi cation. more information can be found here: http://souffle-lang.github.io/, 2017. Accessed: 19-1 0-2017

  2. [2]

    Abiteboul, R

    S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases . Addison-Wesley Publishing Company, 1995

  3. [3]

    Allen, B

    N. Allen, B. Scholz, and P. Krishnan. Staged Points-to Analysis for Large Code Bases , pages 131–150. Springer Berlin Heidelberg, 2015

  4. [4]

    B. S. Arab, D. Gawlick, V. Krishnaswamy, V. Radhakrishnan, and B. Glavic. Using reenactment to retroactively capture provenance for transactions. IEEE Transactions on Knowledge and Data Engi- neering, 30(3):599–612, March 2018

  5. [5]

    M. Aref, B. ten Cate, T. J. Green, B. Kimelfeld, D. Olteanu, E. Pa salic, T. L. Veldhuizen, and G. Wash- burn. Design and implementation of the logicblox system. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data , SIGMOD ’15, pages 1371–1382, New York, NY, USA, 2015. ACM

  6. [6]

    Arora, R

    T. Arora, R. Ramakrishnan, W. G. Roth, P. Seshadri, and D. Sriv astava”. Explaining program execution in deductive systems. Proceedings of Deductive and Object-Oriented Databases: T hird International Conference, pages 101–119, 1993

  7. [7]

    Benjelloun, A

    O. Benjelloun, A. D. Sarma, C. Hayworth, and J. Widom. An introd uction to uldbs and the trio system. IEEE Data Engineering Bulletin , 2006

  8. [8]

    Bravenboer and Y

    M. Bravenboer and Y. Smaragdakis. Strictly declarative specific ation of sophisticated points-to analyses. SIGPLAN Not. , 44(10):243–262, 2009

  9. [9]

    Buneman, S

    P. Buneman, S. Khanna, and W.-C. Tan. Why and where: A charac terization of data provenance. Proceedings of the International Conference on Database Th eory, 1973:316–330, 2001

  10. [10]

    Caballero, Y

    R. Caballero, Y. Garc ´ ıa-Ruiz, and F. S´ aenz-P´ erez. A theoretical framework for the declarative debugging of datalog programs. In K.-D. Schewe and B. Thalheim, editors, Semantics in Data and Knowledge Bases, pages 143–159, Berlin, Heidelberg, 2008. Springer Berlin Heidelber g. 26

  11. [11]

    Caballero, Y

    R. Caballero, Y. Garca-Ruiz, and F. Senz-Prez. Debugging of w rong and missing answers for datalog programs with constraint handling rules. 07 2015

  12. [12]

    Cheney, L

    J. Cheney, L. Chiticariu, and W.-C. Tan. Provenance in databas es: Why, how, and where. Foundations and Trends in Databases , 1:379–474, 2009

  13. [13]

    Deutch, A

    D. Deutch, A. Gilad, and Y. Moskovitch. Selective provenance f or datalog programs using top-k queries. Proceedings of the VLDB Endowment , 8:1394–1405, 2015

  14. [14]

    Deutch, A

    D. Deutch, A. Gilad, and Y. Moskovitch. Efficient provenance tr acking for datalog using top-k queries. The VLDB Journal , 27(2):245–269, Apr 2018

  15. [15]

    Deutch, T

    D. Deutch, T. Milo, S. Roy, and V. Tannen. Circuits for datalog p rovenance. Conference on Database Theory, 17:201–212, 2014

  16. [16]

    Drabent, S

    W. Drabent, S. Nadjm-Tehrani, and J. Maluszy´ nski. Meta-pr ogramming in logic programming. pages 501–521, 1989

  17. [17]

    Futamura

    Y. Futamura. Partial evaluation of computation process – an a pproach to a compiler-compiler. Higher Order Symbol. Comput. , 12(4):381–391, Dec. 1999

  18. [18]

    Glavic and G

    B. Glavic and G. Alonso. Perm: Processing provenance and data on the same data model through query rewriting. IEEE International Conference on Data Engineering , 25:174–185, 2009

  19. [19]

    Glavic, R

    B. Glavic, R. J. Miller, and G. Alonso3. Using sql for efficient gener ation and querying of provenance information. Lecture Notes in Computer Science , 8000:291–320, 2013

  20. [20]

    Grech, L

    N. Grech, L. Brent, B. Scholz, and Y. Smaragdakis. Gigahorse : Thorough, declarative decompilation of smart contracts. In Proceedings of the 41th International Conference on Softwa re Engineering, ICSE 2019, page (to appear), Montreal, QC, Canada, May 2019. ACM

  21. [21]

    Grech, M

    N. Grech, M. Kong, A. Jurisevic, L. Brent, B. Scholz, and Y. Sm aragdakis. Madmax: Surviving out- of-gas conditions in ethereum smart contracts. In SPLASH 2018 OOPSLA , 2018

  22. [22]

    Hoder, N

    K. Hoder, N. Bjørner, and L. de Moura. µz– an efficient engine for fixed points with constraints. In G. Gopalakrishnan and S. Qadeer, editors, Computer Aided Verification , pages 457–462, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg

  23. [23]

    S. S. Huang, T. J. Green, and B. T. Loo. Datalog and emerging a pplications: An interactive tutorial. In Proceedings of the 2011 ACM SIGMOD International Conferenc e on Management of Data , SIGMOD ’11, pages 1213–1216. ACM, 2011

  24. [24]

    Jordan, B

    H. Jordan, B. Scholz, and P. Subotic. Souffl´ e: On synthesis of program analyzers. Proceedings of Computer Aided Verification , 28:422–430, 2016

  25. [25]

    Jordan, P

    H. Jordan, P. Suboti´ c, D. Zhao, and B. Scholz. A specialized b -tree for concurrent datalog evaluation. In Proceedings of the 24th Symposium on Principles and Practic e of Parallel Programming , PPoPP ’19, pages 327–339, New York, NY, USA, 2019. ACM

  26. [26]

    K¨ ohler, B

    S. K¨ ohler, B. Lud¨ ascher, and Y. Smaragdakis. Declarative datalog debugging for mere mortals. Lecture Notes in Computer Science , 7494:111–122, 2012

  27. [27]

    S. Lee, S. K¨ ohler, B. Lud¨ ascher, and B. Glavic. Efficiently com puting provenance graphs for queries with negation. CoRR, abs/1701.05699, 2017

  28. [28]

    S. Lee, B. Lud¨ ascher, and B. Glavic. Provenance summaries f or answers and non-answers. Proc. VLDB Endow., 11(12):1954–1957, Aug. 2018

  29. [29]

    Liang, P

    S. Liang, P. Fodor, H. Wan, and M. Kifer. Openrulebench: An an alysis of the performance of rule engines. In Proceedings of the 18th International Conference on World W ide Web , WWW ’09, pages 601–610, 2009. 27

  30. [30]

    Madsen, M.-H

    M. Madsen, M.-H. Yee, and O. Lhot´ ak. From datalog to flix: A de clarative language for fixed points on lattices. In Proceedings of the 37th ACM SIGPLAN Conference on Programmi ng Language Design and Implementation , PLDI ’16, pages 194–208, New York, NY, USA, 2016. ACM

  31. [31]

    Mangal, X

    R. Mangal, X. Zhang, A. V. Nori, and M. Naik. A user-guided appr oach to program analysis. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pages 462–473, New York, NY, USA, 2015. ACM

  32. [32]

    X. Ou, S. Govindavajhala, and A. W. Appel. Mulval: A logic-based n etwork security analyzer. In Proceedings of the 14th Conference on USENIX Security Sympo sium - Volume 14 , SSYM’05, pages 8–8, Berkeley, CA, USA, 2005. USENIX Association

  33. [33]

    Raghothaman, S

    M. Raghothaman, S. Kulkarni, K. Heo, and M. Naik. User-guided program reasoning using bayesian inference. pages 722–735, 06 2018

  34. [34]

    Ramakrishnan and S

    R. Ramakrishnan and S. Sudarshan. Top-down vs. bottom-up revisited. In In Proceedings of the International Logic Programming Symposium , pages 321–336. MIT Press, 1991

  35. [35]

    E. Y. Shapiro. Algorithmic Program DeBugging . MIT Press, Cambridge, MA, USA, 1983

  36. [36]

    Sridharan, D

    M. Sridharan, D. Gopan, L. Shan, and R. Bod ´ ık. Demand-drive n points-to analysis for java. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Obj ect-oriented Programming, Systems, Languages, and Applications , OOPSLA ’05, pages 59–76, New York, NY, USA, 2005. ACM

  37. [37]

    Stamatogiannakis, P

    M. Stamatogiannakis, P. Groth, and H. Bos. Decoupling proven ance capture and analysis from exe- cution. In 7th USENIX Workshop on the Theory and Practice of Provenance (TaPP 15) , Edinburgh, Scotland, 2015. USENIX Association

  38. [38]

    Subotic, H

    P. Subotic, H. Jordan, L. Chang, A. Fekete, and B. Scholz. Au tomatic index selection for large-scale datalog computation. PVLDB, 12(2):141–153, 2018

  39. [39]

    J. D. Ullman. Bottom-up beats top-down for datalog. In Proceedings of the Eighth ACM SIGACT- SIGMOD-SIGART Symposium on Principles of Database Systems , PODS ’89, pages 140–149, New York, NY, USA, 1989. ACM

  40. [40]

    Whaley, D

    J. Whaley, D. Avots, M. Carbin, and M. S. Lam. Using Datalog with Binary Decision Diagrams for Program Analysis, pages 97–118. Springer Berlin Heidelberg, Berlin, Heidelberg, 2005

  41. [41]

    J. Widom. Trio: A system for integrated management of data, a ccuracy, and lineage. In 2nd Biennial Conference on Innovative Data Systems Research, CIDR 2005 , pages 262–276, 01 2005

  42. [42]

    Zhang, R

    X. Zhang, R. Grigore, X. Si, and M. Naik. Effective interactive re solution of static analysis alarms. Proc. ACM Program. Lang. , 1(OOPSLA):57:1–57:30, Oct. 2017

  43. [43]

    Zhang, R

    X. Zhang, R. Mangal, R. Grigore, M. Naik, and H. Yang. On abstr action refinement for program analyses in datalog. In Proceedings of the 35th ACM SIGPLAN Conference on Programmi ng Language Design and Implementation , PLDI ’14, pages 239–248, New York, NY, USA, 2014. ACM

  44. [44]

    D. Zhao. Honours thesis: Large-scale provenance for souffl´ e. 2017

  45. [45]

    W. Zhou, M. Sherr, T. Tao, X. Li, B. T. Loo, and Y. Mao. Efficient querying and maintenance of network provenance at internet-scale. Proceedings of the 2010 ACM SIGMOD International Conferenc e on Management of Data , pages 615–626, 2010. 28