Provenance for Large-scale Datalog

Bernhard Scholz; David Zhao; Pavle Subotic

arxiv: 1907.05045 · v1 · pith:MCEIYLXSnew · submitted 2019-07-11 · 💻 cs.PL · cs.LO

Provenance for Large-scale Datalog

David Zhao , Pavle Subotic , Bernhard Scholz This is my paper

Pith reviewed 2026-05-24 22:59 UTC · model grok-4.3

classification 💻 cs.PL cs.LO

keywords Datalogprovenancedebuggingsemi-naive evaluationstatic program analysisproof treesfixed-point semanticsSouffle

0 comments

The pith

A new provenance lattice with proof annotations and adjusted semi-naive semantics lets Datalog programs answer arbitrary debugging queries on large analyses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a bottom-up evaluation strategy for Datalog that records provenance during computation so that developers can debug large recursive programs used in static analysis and network analysis. It augments the standard lattice with proof annotations and changes the fixed-point iteration in semi-naive evaluation to keep track of how each tuple was derived. A separate query mechanism then extracts minimal-height partial proof trees for any requested tuple without recomputing the entire program. Experiments on DOOP analyses with tens of millions of tuples show the approach runs with only 1.27 times the normal cost while supporting more flexible queries than prior methods. The work targets the practical problem that existing database provenance techniques fail to scale when Datalog rules number in the hundreds and data volume reaches millions of facts.

Core claim

By introducing a provenance lattice that carries proof annotations alongside ordinary Datalog facts and by redefining the fixed-point semantics of semi-naive evaluation to propagate those annotations, the engine can answer arbitrary provenance queries that construct partial proof trees of minimal height for any output tuple, all while preserving the original program semantics and incurring only modest runtime overhead on large-scale inputs such as DOOP analyses.

What carries the argument

The provenance lattice augmented with proof annotations, together with the modified fixed-point operator for semi-naive evaluation that records derivation information during iteration.

If this is right

Debugging queries can request provenance for any tuple and receive a minimal-height partial proof tree on demand.
The same engine can produce provenance for tens of millions of tuples produced by hundreds of recursive rules.
Runtime cost stays within a factor of roughly 1.27 compared with ordinary evaluation.
The technique supports more flexible queries than previous state-of-the-art provenance methods for Datalog.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same lattice construction could be applied to other bottom-up logic engines that already use semi-naive evaluation.
Minimal-height proof trees might serve as concise explanations when an analysis reports an unexpected result.
Interactive tools could cache the annotated facts and answer successive debugging queries without restarting evaluation.

Load-bearing premise

The new lattice and modified semi-naive fixed-point semantics preserve the original Datalog semantics while correctly recording all proof information needed for later debugging queries.

What would settle it

Execute the instrumented engine on a small, manually verifiable Datalog program whose complete set of minimal derivations is already known, then compare the proof trees returned by debugging queries against the expected derivations and check that no extra or missing tuples appear in the output.

Figures

Figures reproduced from arXiv: 1907.05045 by Bernhard Scholz, David Zhao, Pavle Subotic.

**Figure 2.** Figure 2: Points-to Input Diagram • large-scale experiments using the Doop program analysis framework with DaCapo benchmarks with tens of millions of tuples, measuring on average 1.27× overheads for runtime and 1.45× overheads for memory. The paper is organized as follows: In Section 2 we motivate our provenance method and describe its use in a real-world program analysis use case. In Section 3 we detail the theoret… view at source ↗

**Figure 3.** Figure 3: Full proof tree for alias(a, b) The importance of minimality of proof tree height is shown in [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Infinitely many derivations for vpt(b, l1), resulting from the circular assignment in lines l2 and l8 in the input program provide explanations for the existence of a tuple, the proof tree of an unexpected tuple will help identify the fault in the logic specification. An example fault could be if rule r3 was altered as follows, r3: vpt(Var, Obj) :- load(Var, Y, F), store(P, F, Q), vpt(Q, Obj), vpt(P, Obj1)… view at source ↗

**Figure 5.** Figure 5: Interactive exploration of fragments of a proof tree fo [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Exploring the proof of alias(a, e) to find the erroneous rule r3 minimize the number of user interactions (i.e., proof tree fragments) required to discover the root cause for an anomaly. Note that provenance queries for databases (cf. [13, 26, 27]) have in general shorter evaluation times, and, hence, Database approaches may re-evaluate for each provenance query the whole specification. However, for large… view at source ↗

**Figure 7.** Figure 7: One level of a proof tree of minimal height for [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Synthesized Proof Tree Generator system 3.1 Standard Bottom-Up Evaluation The basis of our approach is the standard bottom-up evaluation strategy for Datalog specification [2]. The computational domain of standard bottom-up evaluation is the subset lattice consisting of sets of tuples, denoted instances I. The na¨ıve algorithm for evaluation is based on the immediate consequence operator, ΓP , which genera… view at source ↗

**Figure 9.** Figure 9: Connecting a tuple to a proof tree via a height annotation [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

**Figure 10.** Figure 10: IDB relation vpt in each iteration of the fixpoint computation for the example Datalog specification It remains to be shown that the provenance evaluation strategy is correct, i.e., that TP terminates and results in the same set of tuples as ΓP . Additionally, we must show that the height annotations resulting from provenance evaluation strategy is minimal. Lemma 2. TP computes the same tuples as ΓP at fi… view at source ↗

**Figure 11.** Figure 11: Example Datalog specification demonstrating the upper b [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗

**Figure 12.** Figure 12: depicts a snippet of a Souffl´e RAM program, part of the semi-na¨ıve evaluation of the rule r2 : vpt(V ar, Obj) :- assign(V ar, V ar2), vpt(V ar2, Obj). The join is performed via a loop nest iterating over tuples of relations efficiently via indexes [38]. Line 2 computes the new tuples to be added to the ∆i+1 vpt relation, where the NOT IN operation is an existence check to ensure the generated tuple does… view at source ↗

**Figure 13.** Figure 13: Provenance version of RAM loop nest However, the specializations in the data structures still remain to be discussed. Souffl´e employs a highly specialized parallel B-Tree data structure [25], with index orderings for the attributes generated automatically via an optimization problem [38]. The B-Tree employs a special optimistic read/write lock for each node, allowing high throughput for parallel insertio… view at source ↗

**Figure 14.** Figure 14: Explaining the tuple alias(a, b) It is critical that the proof tree construction procedures are highly performant since the constructed IDB may be very large, and we may need to search through many tuples to construct a proof tree fragment. Therefore, the proof tree construction procedures must be tightly integrated into the Souffl´e system to enable a high-performance, parallel search. We integrate these… view at source ↗

**Figure 15.** Figure 15: Explaining the non-existence of the tuple [PITH_FULL_IMAGE:figures/full_fig_p021_15.png] view at source ↗

**Figure 16.** Figure 16: Subroutine for example program Lines 2-5 represent a search through a database that is already constructed by the initial bottom-up evaluation, to find tuples which satisfy the constraints required for the construction of a proof tree fragment. The values of argument(0) and argument(1) are the values in the head tuple, and argument(2) is the height annotation of the head tuple. Therefore, this subproof se… view at source ↗

**Figure 18.** Figure 18: Proof Tree Construction and Statistics the computation graphs presented in [10], and are equally effective for debugging. They can also be seen as an extension of how-provenance [9]. However, our hybrid method for generating proof trees is novel, i.e., it permits multiple debugging queries in a single debugging cycle. Hence, our method is especially useful for debugging large Datalog specifications. Prove… view at source ↗

read the original abstract

Logic programming languages such as Datalog have become popular as Domain Specific Languages (DSLs) for solving large-scale, real-world problems, in particular, static program analysis and network analysis. The logic specifications which model analysis problems, process millions of tuples of data and contain hundreds of highly recursive rules. As a result, they are notoriously difficult to debug. While the database community has proposed several data-provenance techniques that address the Declarative Debugging Challenge for Databases, in the cases of analysis problems, these state-of-the-art techniques do not scale. In this paper, we introduce a novel bottom-up Datalog evaluation strategy for debugging: our provenance evaluation strategy relies on a new provenance lattice that includes proof annotations, and a new fixed-point semantics for semi-naive evaluation. A debugging query mechanism allows arbitrary provenance queries, constructing partial proof trees of tuples with minimal height. We integrate our technique into Souffle, a Datalog engine that synthesizes C++ code, and achieve high performance by using specialized parallel data structures. Experiments are conducted with DOOP/DaCapo, producing proof annotations for tens of millions of output tuples. We show that our method has a runtime overhead of 1.27x on average while being more flexible than existing state-of-the-art techniques.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds proof annotations to a provenance lattice and tweaks semi-naive evaluation so Datalog debugging queries can build minimal proof trees at scale.

read the letter

The main point is that this work makes provenance practical for the kind of Datalog programs used in static analysis. It introduces a lattice that carries proof annotations and a modified fixed-point semantics for semi-naive evaluation. A query mechanism then extracts partial proof trees of minimal height for any tuple. The authors integrate the scheme into Souffle and test it on DOOP runs over DaCapo, producing annotations for tens of millions of tuples at 1.27x average overhead. That combination of scale and measured cost is the concrete contribution worth noting first.

Referee Report

2 major / 2 minor

Summary. The paper claims to introduce a novel bottom-up Datalog evaluation strategy for debugging large-scale programs (e.g., static analyses with millions of tuples and hundreds of recursive rules). It relies on a new provenance lattice that includes proof annotations together with a modified fixed-point semantics for semi-naive evaluation; this enables a debugging query mechanism that constructs partial proof trees of minimal height. The approach is integrated into Soufflé via specialized parallel data structures and is evaluated on DOOP/DaCapo, reporting an average runtime overhead of 1.27× while producing annotations for tens of millions of output tuples and offering greater flexibility than prior provenance techniques.

Significance. If the lattice definition and semantic preservation claims hold, the work supplies a practical, scalable provenance mechanism for Datalog DSLs used in program and network analysis. The combination of proof annotations, minimal-height trees, and low-overhead parallel implementation in Soufflé addresses a recognized debugging bottleneck; the reported experiments on real-world benchmarks constitute a concrete strength.

major comments (2)

[§3 (Semantics)] The central claim that the new provenance lattice and modified semi-naive fixed-point semantics preserve the original Datalog least-fixed-point semantics while correctly recording all necessary proof information is load-bearing; the manuscript must supply the lattice definition, the revised immediate-consequence operator, and a proof of equivalence (or a machine-checked statement) to allow verification that no tuples are added or lost.
[§5 (Evaluation)] Table 1 (or equivalent experimental summary) reports 1.27× average overhead; the paper should clarify whether this figure includes the cost of materializing the proof annotations for all output tuples or only the base evaluation, because the former is required to support the scalability claim for tens of millions of tuples.

minor comments (2)

[§2–3] Notation for the provenance lattice elements (e.g., how proof annotations are encoded inside the lattice values) should be introduced once and used consistently; the abstract uses “proof annotations” while later sections appear to switch between “annotations” and “labels.”
[§4] The description of the debugging query mechanism would benefit from a small worked example showing a concrete provenance query, the resulting partial proof tree, and how minimality of height is enforced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation of minor revision. We address each major comment below.

read point-by-point responses

Referee: [§3 (Semantics)] The central claim that the new provenance lattice and modified semi-naive fixed-point semantics preserve the original Datalog least-fixed-point semantics while correctly recording all necessary proof information is load-bearing; the manuscript must supply the lattice definition, the revised immediate-consequence operator, and a proof of equivalence (or a machine-checked statement) to allow verification that no tuples are added or lost.

Authors: We agree that explicit verification of semantic preservation is essential. The revised manuscript will expand §3 with the complete definition of the provenance lattice, the revised immediate-consequence operator used in the modified semi-naive evaluation, and a formal proof (or machine-checkable statement) establishing equivalence to the standard least fixed-point semantics, confirming that no tuples are added or lost. revision: yes
Referee: [§5 (Evaluation)] Table 1 (or equivalent experimental summary) reports 1.27× average overhead; the paper should clarify whether this figure includes the cost of materializing the proof annotations for all output tuples or only the base evaluation, because the former is required to support the scalability claim for tens of millions of tuples.

Authors: The 1.27× figure in Table 1 measures the full provenance-enabled evaluation, which includes materialization of proof annotations for all output tuples. We will add an explicit statement to this effect in the evaluation section of the revised manuscript to remove any ambiguity. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper claims a novel provenance lattice with proof annotations and modified fixed-point semantics for semi-naive Datalog evaluation that preserves original semantics while supporting minimal-height proof trees. No equations, lattice definitions, or semantic derivations are supplied in the abstract or visible text that reduce any result to its inputs by construction. No self-citations are invoked as load-bearing premises, no parameters are fitted then renamed as predictions, and no uniqueness theorems or ansatzes are smuggled via prior work. The derivation chain is presented as building on standard least-fixed-point Datalog evaluation with independent new structures, making the central claims self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper relies on standard Datalog evaluation machinery but introduces new structures whose correctness is not independently evidenced in the abstract.

axioms (1)

standard math Fixed-point semantics for semi-naive evaluation of Datalog
Invoked as the base for the new provenance semantics.

invented entities (1)

provenance lattice with proof annotations no independent evidence
purpose: Track derivation information during bottom-up evaluation
New data structure introduced for provenance queries

pith-pipeline@v0.9.0 · 5752 in / 1222 out tokens · 22386 ms · 2026-05-24T22:59:54.998546+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery; no lattice-height minimization unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

our provenance evaluation strategy relies on a new provenance lattice that includes proof annotations, and a new fixed-point semantics for semi-naive evaluation... TP((I,h)) = (ΓP(I), h′) where h′(t) = min_g∈Gt {max ti∈g h(ti)} + 1
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Definition 3 (Provenance Instance)... Lemma 1. ⊑ is a partial order... Theorem 1. ... height annotations are minimal

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 1 internal anchor

[1]

souﬄ´ e synthesizes a native parallel c++ program from a logic speciﬁ cation

souﬄe-lang/souﬄe: Souﬄ´ e is a variant of datalog for tool desig ners crafting analyses in horn clauses. souﬄ´ e synthesizes a native parallel c++ program from a logic speciﬁ cation. more information can be found here: http://souﬄe-lang.github.io/, 2017. Accessed: 19-1 0-2017

work page 2017
[2]

Abiteboul, R

S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases . Addison-Wesley Publishing Company, 1995

work page 1995
[3]

Allen, B

N. Allen, B. Scholz, and P. Krishnan. Staged Points-to Analysis for Large Code Bases , pages 131–150. Springer Berlin Heidelberg, 2015

work page 2015
[4]

B. S. Arab, D. Gawlick, V. Krishnaswamy, V. Radhakrishnan, and B. Glavic. Using reenactment to retroactively capture provenance for transactions. IEEE Transactions on Knowledge and Data Engi- neering, 30(3):599–612, March 2018

work page 2018
[5]

M. Aref, B. ten Cate, T. J. Green, B. Kimelfeld, D. Olteanu, E. Pa salic, T. L. Veldhuizen, and G. Wash- burn. Design and implementation of the logicblox system. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data , SIGMOD ’15, pages 1371–1382, New York, NY, USA, 2015. ACM

work page 2015
[6]

Arora, R

T. Arora, R. Ramakrishnan, W. G. Roth, P. Seshadri, and D. Sriv astava”. Explaining program execution in deductive systems. Proceedings of Deductive and Object-Oriented Databases: T hird International Conference, pages 101–119, 1993

work page 1993
[7]

Benjelloun, A

O. Benjelloun, A. D. Sarma, C. Hayworth, and J. Widom. An introd uction to uldbs and the trio system. IEEE Data Engineering Bulletin , 2006

work page 2006
[8]

Bravenboer and Y

M. Bravenboer and Y. Smaragdakis. Strictly declarative speciﬁc ation of sophisticated points-to analyses. SIGPLAN Not. , 44(10):243–262, 2009

work page 2009
[9]

Buneman, S

P. Buneman, S. Khanna, and W.-C. Tan. Why and where: A charac terization of data provenance. Proceedings of the International Conference on Database Th eory, 1973:316–330, 2001

work page 1973
[10]

Caballero, Y

R. Caballero, Y. Garc ´ ıa-Ruiz, and F. S´ aenz-P´ erez. A theoretical framework for the declarative debugging of datalog programs. In K.-D. Schewe and B. Thalheim, editors, Semantics in Data and Knowledge Bases, pages 143–159, Berlin, Heidelberg, 2008. Springer Berlin Heidelber g. 26

work page 2008
[11]

Caballero, Y

R. Caballero, Y. Garca-Ruiz, and F. Senz-Prez. Debugging of w rong and missing answers for datalog programs with constraint handling rules. 07 2015

work page 2015
[12]

Cheney, L

J. Cheney, L. Chiticariu, and W.-C. Tan. Provenance in databas es: Why, how, and where. Foundations and Trends in Databases , 1:379–474, 2009

work page 2009
[13]

Deutch, A

D. Deutch, A. Gilad, and Y. Moskovitch. Selective provenance f or datalog programs using top-k queries. Proceedings of the VLDB Endowment , 8:1394–1405, 2015

work page 2015
[14]

Deutch, A

D. Deutch, A. Gilad, and Y. Moskovitch. Eﬃcient provenance tr acking for datalog using top-k queries. The VLDB Journal , 27(2):245–269, Apr 2018

work page 2018
[15]

Deutch, T

D. Deutch, T. Milo, S. Roy, and V. Tannen. Circuits for datalog p rovenance. Conference on Database Theory, 17:201–212, 2014

work page 2014
[16]

Drabent, S

W. Drabent, S. Nadjm-Tehrani, and J. Maluszy´ nski. Meta-pr ogramming in logic programming. pages 501–521, 1989

work page 1989
[17]

Futamura

Y. Futamura. Partial evaluation of computation process – an a pproach to a compiler-compiler. Higher Order Symbol. Comput. , 12(4):381–391, Dec. 1999

work page 1999
[18]

Glavic and G

B. Glavic and G. Alonso. Perm: Processing provenance and data on the same data model through query rewriting. IEEE International Conference on Data Engineering , 25:174–185, 2009

work page 2009
[19]

Glavic, R

B. Glavic, R. J. Miller, and G. Alonso3. Using sql for eﬃcient gener ation and querying of provenance information. Lecture Notes in Computer Science , 8000:291–320, 2013

work page 2013
[20]

Grech, L

N. Grech, L. Brent, B. Scholz, and Y. Smaragdakis. Gigahorse : Thorough, declarative decompilation of smart contracts. In Proceedings of the 41th International Conference on Softwa re Engineering, ICSE 2019, page (to appear), Montreal, QC, Canada, May 2019. ACM

work page 2019
[21]

Grech, M

N. Grech, M. Kong, A. Jurisevic, L. Brent, B. Scholz, and Y. Sm aragdakis. Madmax: Surviving out- of-gas conditions in ethereum smart contracts. In SPLASH 2018 OOPSLA , 2018

work page 2018
[22]

Hoder, N

K. Hoder, N. Bjørner, and L. de Moura. µz– an eﬃcient engine for ﬁxed points with constraints. In G. Gopalakrishnan and S. Qadeer, editors, Computer Aided Veriﬁcation , pages 457–462, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg

work page 2011
[23]

S. S. Huang, T. J. Green, and B. T. Loo. Datalog and emerging a pplications: An interactive tutorial. In Proceedings of the 2011 ACM SIGMOD International Conferenc e on Management of Data , SIGMOD ’11, pages 1213–1216. ACM, 2011

work page 2011
[24]

Jordan, B

H. Jordan, B. Scholz, and P. Subotic. Souﬄ´ e: On synthesis of program analyzers. Proceedings of Computer Aided Veriﬁcation , 28:422–430, 2016

work page 2016
[25]

Jordan, P

H. Jordan, P. Suboti´ c, D. Zhao, and B. Scholz. A specialized b -tree for concurrent datalog evaluation. In Proceedings of the 24th Symposium on Principles and Practic e of Parallel Programming , PPoPP ’19, pages 327–339, New York, NY, USA, 2019. ACM

work page 2019
[26]

K¨ ohler, B

S. K¨ ohler, B. Lud¨ ascher, and Y. Smaragdakis. Declarative datalog debugging for mere mortals. Lecture Notes in Computer Science , 7494:111–122, 2012

work page 2012
[27]

S. Lee, S. K¨ ohler, B. Lud¨ ascher, and B. Glavic. Eﬃciently com puting provenance graphs for queries with negation. CoRR, abs/1701.05699, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[28]

S. Lee, B. Lud¨ ascher, and B. Glavic. Provenance summaries f or answers and non-answers. Proc. VLDB Endow., 11(12):1954–1957, Aug. 2018

work page 1954
[29]

Liang, P

S. Liang, P. Fodor, H. Wan, and M. Kifer. Openrulebench: An an alysis of the performance of rule engines. In Proceedings of the 18th International Conference on World W ide Web , WWW ’09, pages 601–610, 2009. 27

work page 2009
[30]

Madsen, M.-H

M. Madsen, M.-H. Yee, and O. Lhot´ ak. From datalog to ﬂix: A de clarative language for ﬁxed points on lattices. In Proceedings of the 37th ACM SIGPLAN Conference on Programmi ng Language Design and Implementation , PLDI ’16, pages 194–208, New York, NY, USA, 2016. ACM

work page 2016
[31]

Mangal, X

R. Mangal, X. Zhang, A. V. Nori, and M. Naik. A user-guided appr oach to program analysis. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pages 462–473, New York, NY, USA, 2015. ACM

work page 2015
[32]

X. Ou, S. Govindavajhala, and A. W. Appel. Mulval: A logic-based n etwork security analyzer. In Proceedings of the 14th Conference on USENIX Security Sympo sium - Volume 14 , SSYM’05, pages 8–8, Berkeley, CA, USA, 2005. USENIX Association

work page 2005
[33]

Raghothaman, S

M. Raghothaman, S. Kulkarni, K. Heo, and M. Naik. User-guided program reasoning using bayesian inference. pages 722–735, 06 2018

work page 2018
[34]

Ramakrishnan and S

R. Ramakrishnan and S. Sudarshan. Top-down vs. bottom-up revisited. In In Proceedings of the International Logic Programming Symposium , pages 321–336. MIT Press, 1991

work page 1991
[35]

E. Y. Shapiro. Algorithmic Program DeBugging . MIT Press, Cambridge, MA, USA, 1983

work page 1983
[36]

Sridharan, D

M. Sridharan, D. Gopan, L. Shan, and R. Bod ´ ık. Demand-drive n points-to analysis for java. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Obj ect-oriented Programming, Systems, Languages, and Applications , OOPSLA ’05, pages 59–76, New York, NY, USA, 2005. ACM

work page 2005
[37]

Stamatogiannakis, P

M. Stamatogiannakis, P. Groth, and H. Bos. Decoupling proven ance capture and analysis from exe- cution. In 7th USENIX Workshop on the Theory and Practice of Provenance (TaPP 15) , Edinburgh, Scotland, 2015. USENIX Association

work page 2015
[38]

Subotic, H

P. Subotic, H. Jordan, L. Chang, A. Fekete, and B. Scholz. Au tomatic index selection for large-scale datalog computation. PVLDB, 12(2):141–153, 2018

work page 2018
[39]

J. D. Ullman. Bottom-up beats top-down for datalog. In Proceedings of the Eighth ACM SIGACT- SIGMOD-SIGART Symposium on Principles of Database Systems , PODS ’89, pages 140–149, New York, NY, USA, 1989. ACM

work page 1989
[40]

Whaley, D

J. Whaley, D. Avots, M. Carbin, and M. S. Lam. Using Datalog with Binary Decision Diagrams for Program Analysis, pages 97–118. Springer Berlin Heidelberg, Berlin, Heidelberg, 2005

work page 2005
[41]

J. Widom. Trio: A system for integrated management of data, a ccuracy, and lineage. In 2nd Biennial Conference on Innovative Data Systems Research, CIDR 2005 , pages 262–276, 01 2005

work page 2005
[42]

Zhang, R

X. Zhang, R. Grigore, X. Si, and M. Naik. Eﬀective interactive re solution of static analysis alarms. Proc. ACM Program. Lang. , 1(OOPSLA):57:1–57:30, Oct. 2017

work page 2017
[43]

Zhang, R

X. Zhang, R. Mangal, R. Grigore, M. Naik, and H. Yang. On abstr action reﬁnement for program analyses in datalog. In Proceedings of the 35th ACM SIGPLAN Conference on Programmi ng Language Design and Implementation , PLDI ’14, pages 239–248, New York, NY, USA, 2014. ACM

work page 2014
[44]

D. Zhao. Honours thesis: Large-scale provenance for souﬄ´ e. 2017

work page 2017
[45]

W. Zhou, M. Sherr, T. Tao, X. Li, B. T. Loo, and Y. Mao. Eﬃcient querying and maintenance of network provenance at internet-scale. Proceedings of the 2010 ACM SIGMOD International Conferenc e on Management of Data , pages 615–626, 2010. 28

work page 2010

[1] [1]

souﬄ´ e synthesizes a native parallel c++ program from a logic speciﬁ cation

souﬄe-lang/souﬄe: Souﬄ´ e is a variant of datalog for tool desig ners crafting analyses in horn clauses. souﬄ´ e synthesizes a native parallel c++ program from a logic speciﬁ cation. more information can be found here: http://souﬄe-lang.github.io/, 2017. Accessed: 19-1 0-2017

work page 2017

[2] [2]

Abiteboul, R

S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases . Addison-Wesley Publishing Company, 1995

work page 1995

[3] [3]

Allen, B

N. Allen, B. Scholz, and P. Krishnan. Staged Points-to Analysis for Large Code Bases , pages 131–150. Springer Berlin Heidelberg, 2015

work page 2015

[4] [4]

B. S. Arab, D. Gawlick, V. Krishnaswamy, V. Radhakrishnan, and B. Glavic. Using reenactment to retroactively capture provenance for transactions. IEEE Transactions on Knowledge and Data Engi- neering, 30(3):599–612, March 2018

work page 2018

[5] [5]

M. Aref, B. ten Cate, T. J. Green, B. Kimelfeld, D. Olteanu, E. Pa salic, T. L. Veldhuizen, and G. Wash- burn. Design and implementation of the logicblox system. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data , SIGMOD ’15, pages 1371–1382, New York, NY, USA, 2015. ACM

work page 2015

[6] [6]

Arora, R

T. Arora, R. Ramakrishnan, W. G. Roth, P. Seshadri, and D. Sriv astava”. Explaining program execution in deductive systems. Proceedings of Deductive and Object-Oriented Databases: T hird International Conference, pages 101–119, 1993

work page 1993

[7] [7]

Benjelloun, A

O. Benjelloun, A. D. Sarma, C. Hayworth, and J. Widom. An introd uction to uldbs and the trio system. IEEE Data Engineering Bulletin , 2006

work page 2006

[8] [8]

Bravenboer and Y

M. Bravenboer and Y. Smaragdakis. Strictly declarative speciﬁc ation of sophisticated points-to analyses. SIGPLAN Not. , 44(10):243–262, 2009

work page 2009

[9] [9]

Buneman, S

P. Buneman, S. Khanna, and W.-C. Tan. Why and where: A charac terization of data provenance. Proceedings of the International Conference on Database Th eory, 1973:316–330, 2001

work page 1973

[10] [10]

Caballero, Y

R. Caballero, Y. Garc ´ ıa-Ruiz, and F. S´ aenz-P´ erez. A theoretical framework for the declarative debugging of datalog programs. In K.-D. Schewe and B. Thalheim, editors, Semantics in Data and Knowledge Bases, pages 143–159, Berlin, Heidelberg, 2008. Springer Berlin Heidelber g. 26

work page 2008

[11] [11]

Caballero, Y

R. Caballero, Y. Garca-Ruiz, and F. Senz-Prez. Debugging of w rong and missing answers for datalog programs with constraint handling rules. 07 2015

work page 2015

[12] [12]

Cheney, L

J. Cheney, L. Chiticariu, and W.-C. Tan. Provenance in databas es: Why, how, and where. Foundations and Trends in Databases , 1:379–474, 2009

work page 2009

[13] [13]

Deutch, A

D. Deutch, A. Gilad, and Y. Moskovitch. Selective provenance f or datalog programs using top-k queries. Proceedings of the VLDB Endowment , 8:1394–1405, 2015

work page 2015

[14] [14]

Deutch, A

D. Deutch, A. Gilad, and Y. Moskovitch. Eﬃcient provenance tr acking for datalog using top-k queries. The VLDB Journal , 27(2):245–269, Apr 2018

work page 2018

[15] [15]

Deutch, T

D. Deutch, T. Milo, S. Roy, and V. Tannen. Circuits for datalog p rovenance. Conference on Database Theory, 17:201–212, 2014

work page 2014

[16] [16]

Drabent, S

W. Drabent, S. Nadjm-Tehrani, and J. Maluszy´ nski. Meta-pr ogramming in logic programming. pages 501–521, 1989

work page 1989

[17] [17]

Futamura

Y. Futamura. Partial evaluation of computation process – an a pproach to a compiler-compiler. Higher Order Symbol. Comput. , 12(4):381–391, Dec. 1999

work page 1999

[18] [18]

Glavic and G

B. Glavic and G. Alonso. Perm: Processing provenance and data on the same data model through query rewriting. IEEE International Conference on Data Engineering , 25:174–185, 2009

work page 2009

[19] [19]

Glavic, R

B. Glavic, R. J. Miller, and G. Alonso3. Using sql for eﬃcient gener ation and querying of provenance information. Lecture Notes in Computer Science , 8000:291–320, 2013

work page 2013

[20] [20]

Grech, L

N. Grech, L. Brent, B. Scholz, and Y. Smaragdakis. Gigahorse : Thorough, declarative decompilation of smart contracts. In Proceedings of the 41th International Conference on Softwa re Engineering, ICSE 2019, page (to appear), Montreal, QC, Canada, May 2019. ACM

work page 2019

[21] [21]

Grech, M

N. Grech, M. Kong, A. Jurisevic, L. Brent, B. Scholz, and Y. Sm aragdakis. Madmax: Surviving out- of-gas conditions in ethereum smart contracts. In SPLASH 2018 OOPSLA , 2018

work page 2018

[22] [22]

Hoder, N

K. Hoder, N. Bjørner, and L. de Moura. µz– an eﬃcient engine for ﬁxed points with constraints. In G. Gopalakrishnan and S. Qadeer, editors, Computer Aided Veriﬁcation , pages 457–462, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg

work page 2011

[23] [23]

S. S. Huang, T. J. Green, and B. T. Loo. Datalog and emerging a pplications: An interactive tutorial. In Proceedings of the 2011 ACM SIGMOD International Conferenc e on Management of Data , SIGMOD ’11, pages 1213–1216. ACM, 2011

work page 2011

[24] [24]

Jordan, B

H. Jordan, B. Scholz, and P. Subotic. Souﬄ´ e: On synthesis of program analyzers. Proceedings of Computer Aided Veriﬁcation , 28:422–430, 2016

work page 2016

[25] [25]

Jordan, P

H. Jordan, P. Suboti´ c, D. Zhao, and B. Scholz. A specialized b -tree for concurrent datalog evaluation. In Proceedings of the 24th Symposium on Principles and Practic e of Parallel Programming , PPoPP ’19, pages 327–339, New York, NY, USA, 2019. ACM

work page 2019

[26] [26]

K¨ ohler, B

S. K¨ ohler, B. Lud¨ ascher, and Y. Smaragdakis. Declarative datalog debugging for mere mortals. Lecture Notes in Computer Science , 7494:111–122, 2012

work page 2012

[27] [27]

S. Lee, S. K¨ ohler, B. Lud¨ ascher, and B. Glavic. Eﬃciently com puting provenance graphs for queries with negation. CoRR, abs/1701.05699, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[28] [28]

S. Lee, B. Lud¨ ascher, and B. Glavic. Provenance summaries f or answers and non-answers. Proc. VLDB Endow., 11(12):1954–1957, Aug. 2018

work page 1954

[29] [29]

Liang, P

S. Liang, P. Fodor, H. Wan, and M. Kifer. Openrulebench: An an alysis of the performance of rule engines. In Proceedings of the 18th International Conference on World W ide Web , WWW ’09, pages 601–610, 2009. 27

work page 2009

[30] [30]

Madsen, M.-H

M. Madsen, M.-H. Yee, and O. Lhot´ ak. From datalog to ﬂix: A de clarative language for ﬁxed points on lattices. In Proceedings of the 37th ACM SIGPLAN Conference on Programmi ng Language Design and Implementation , PLDI ’16, pages 194–208, New York, NY, USA, 2016. ACM

work page 2016

[31] [31]

Mangal, X

R. Mangal, X. Zhang, A. V. Nori, and M. Naik. A user-guided appr oach to program analysis. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pages 462–473, New York, NY, USA, 2015. ACM

work page 2015

[32] [32]

X. Ou, S. Govindavajhala, and A. W. Appel. Mulval: A logic-based n etwork security analyzer. In Proceedings of the 14th Conference on USENIX Security Sympo sium - Volume 14 , SSYM’05, pages 8–8, Berkeley, CA, USA, 2005. USENIX Association

work page 2005

[33] [33]

Raghothaman, S

M. Raghothaman, S. Kulkarni, K. Heo, and M. Naik. User-guided program reasoning using bayesian inference. pages 722–735, 06 2018

work page 2018

[34] [34]

Ramakrishnan and S

R. Ramakrishnan and S. Sudarshan. Top-down vs. bottom-up revisited. In In Proceedings of the International Logic Programming Symposium , pages 321–336. MIT Press, 1991

work page 1991

[35] [35]

E. Y. Shapiro. Algorithmic Program DeBugging . MIT Press, Cambridge, MA, USA, 1983

work page 1983

[36] [36]

Sridharan, D

M. Sridharan, D. Gopan, L. Shan, and R. Bod ´ ık. Demand-drive n points-to analysis for java. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Obj ect-oriented Programming, Systems, Languages, and Applications , OOPSLA ’05, pages 59–76, New York, NY, USA, 2005. ACM

work page 2005

[37] [37]

Stamatogiannakis, P

M. Stamatogiannakis, P. Groth, and H. Bos. Decoupling proven ance capture and analysis from exe- cution. In 7th USENIX Workshop on the Theory and Practice of Provenance (TaPP 15) , Edinburgh, Scotland, 2015. USENIX Association

work page 2015

[38] [38]

Subotic, H

P. Subotic, H. Jordan, L. Chang, A. Fekete, and B. Scholz. Au tomatic index selection for large-scale datalog computation. PVLDB, 12(2):141–153, 2018

work page 2018

[39] [39]

J. D. Ullman. Bottom-up beats top-down for datalog. In Proceedings of the Eighth ACM SIGACT- SIGMOD-SIGART Symposium on Principles of Database Systems , PODS ’89, pages 140–149, New York, NY, USA, 1989. ACM

work page 1989

[40] [40]

Whaley, D

J. Whaley, D. Avots, M. Carbin, and M. S. Lam. Using Datalog with Binary Decision Diagrams for Program Analysis, pages 97–118. Springer Berlin Heidelberg, Berlin, Heidelberg, 2005

work page 2005

[41] [41]

J. Widom. Trio: A system for integrated management of data, a ccuracy, and lineage. In 2nd Biennial Conference on Innovative Data Systems Research, CIDR 2005 , pages 262–276, 01 2005

work page 2005

[42] [42]

Zhang, R

X. Zhang, R. Grigore, X. Si, and M. Naik. Eﬀective interactive re solution of static analysis alarms. Proc. ACM Program. Lang. , 1(OOPSLA):57:1–57:30, Oct. 2017

work page 2017

[43] [43]

Zhang, R

X. Zhang, R. Mangal, R. Grigore, M. Naik, and H. Yang. On abstr action reﬁnement for program analyses in datalog. In Proceedings of the 35th ACM SIGPLAN Conference on Programmi ng Language Design and Implementation , PLDI ’14, pages 239–248, New York, NY, USA, 2014. ACM

work page 2014

[44] [44]

D. Zhao. Honours thesis: Large-scale provenance for souﬄ´ e. 2017

work page 2017

[45] [45]

W. Zhou, M. Sherr, T. Tao, X. Li, B. T. Loo, and Y. Mao. Eﬃcient querying and maintenance of network provenance at internet-scale. Proceedings of the 2010 ACM SIGMOD International Conferenc e on Management of Data , pages 615–626, 2010. 28

work page 2010