Language-Integrated Recursive Queries (Full Version)
Pith reviewed 2026-05-22 21:44 UTC · model grok-4.3
The pith
A Scala embedding derives properties from recursive queries to reject errors at the type level.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TyQL automatically derives mathematical properties from embedded recursive queries and, depending on the database backend, rejects queries that may lead to the three classes of recursive query errors: database errors, incorrect results, and non-termination. The implementation uses Named-Tuples and type-level pattern matching in Scala to ensure query portability and safety, with no performance penalty compared to raw SQL strings and a three-orders-of-magnitude speedup over non-recursive SQL queries.
What carries the argument
A calculus that automatically derives mathematical properties from embedded recursive queries, realized through Named-Tuples and type-level pattern matching.
If this is right
- Recursive queries become portable across database backends because safety checks are independent of vendor-specific semantics.
- The three classes of errors (database errors, incorrect results, non-termination) are caught before execution rather than at runtime.
- Performance of safe recursive queries matches raw SQL strings while exceeding non-recursive alternatives by three orders of magnitude.
- Fixed-point computations used in program, network, and distributed-system analysis can be expressed inside the host language without separate SQL scripting.
Where Pith is reading between the lines
- The same property-derivation approach could be applied to other host languages that support sufficiently expressive static type systems.
- Derived properties might be used to guide query optimizers or to prove equivalence between different recursive formulations.
- Extending the calculus to additional SQL features such as window functions or common table expressions beyond recursion would be a direct next step.
Load-bearing premise
Type-level pattern matching and Named-Tuples in Scala are expressive enough to capture every mathematical property needed to detect the three error classes for arbitrary recursive queries.
What would settle it
An executable recursive query that the type checker accepts yet produces non-termination or an incorrect result on at least one supported database backend.
Figures
read the original abstract
Performance-critical industrial applications, including large-scale program, network, and distributed system analyses, rely on fixed-point computations. The introduction of recursive common table expressions (CTEs) using the WITH RECURSIVE keyword in SQL:1999 extended the ability of relational database systems to handle fixed-point computations, unlocking significant performance advantages by allowing computation to move closer to the data. Yet with recursion, SQL becomes a Turing-complete programming language and, with that, unrecoverable safety and correctness risks. SQL itself lacks a fixed semantics, as the SQL specification is written in natural language, full of ambiguities that database vendors resolve in divergent ways. As a result, reasoning about the correctness of recursive SQL programs must rely on isolated mathematical properties of queries rather than wrestling a unified formal model out of a language with notoriously inconsistent semantics. To address these challenges, we propose a calculus that automatically derives mathematical properties from embedded recursive queries and, depending on the database backend, rejects queries that may lead to the three classes of recursive query errors - database errors, incorrect results, and non-termination. We introduce TyQL, a practical implementation in Scala for safe, recursive language-integrated query. Using Named-Tuples and type-level pattern matching, TyQL ensures query portability and safety, showing no performance penalty compared to raw SQL strings while unlocking a three-orders-of-magnitude speedup over non-recursive SQL queries.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces TyQL, a language-integrated query system embedded in Scala that uses Named-Tuples for row typing and type-level pattern matching to derive mathematical properties (monotonicity, stratification, dialect constraints) from recursive CTEs. It claims to automatically reject queries leading to the three error classes (DB errors, incorrect results, non-termination), ensure portability across backends, incur no runtime overhead versus raw SQL strings, and deliver three-orders-of-magnitude speedups over non-recursive SQL.
Significance. If the type-level encoding is shown to be both sound and sufficiently expressive for arbitrary recursive queries (including mutual recursion and deep nesting), the work would provide a practical, compile-time safety layer for fixed-point computations in industrial analyses, combining the performance of recursive SQL with the guarantees absent from the SQL standard itself.
major comments (2)
- [Type-level encoding / calculus definition (inferred from abstract and implementation description)] The central safety claim rests on the assertion that Scala 3 type-level pattern matching and Named-Tuples can encode and decide all required properties for arbitrary queries. No section provides a formal metatheory, termination argument for the type-level reduction, or complexity bound; the skeptic concern about recursion limits and lack of full dependent types therefore directly threatens the claim that all three error classes are rejected for every well-formed recursive CTE.
- [Portability and backend-specific constraints] The portability guarantee requires that the derived properties are backend-independent yet still correctly reject dialect-specific errors. Without an explicit mapping from the type-level derivations to the concrete SQL:1999 semantics (or vendor-specific extensions) for each supported backend, it is unclear whether the rejection rules are conservative enough to avoid both false negatives and false positives across vendors.
minor comments (2)
- [Evaluation] The abstract states performance results but the manuscript should include the exact benchmark queries, data sizes, and hardware configuration used to measure the three-orders-of-magnitude speedup and the zero-overhead claim versus raw SQL strings.
- [Introduction / calculus] Notation for the three error classes and the derived properties (monotonicity, stratification, etc.) should be introduced with explicit definitions before the type-level encoding is presented.
Simulated Author's Rebuttal
We thank the referee for the valuable feedback. Below we respond to each major comment, offering clarifications and committing to revisions where appropriate to strengthen the presentation of TyQL's type-level safety guarantees.
read point-by-point responses
-
Referee: [Type-level encoding / calculus definition (inferred from abstract and implementation description)] The central safety claim rests on the assertion that Scala 3 type-level pattern matching and Named-Tuples can encode and decide all required properties for arbitrary queries. No section provides a formal metatheory, termination argument for the type-level reduction, or complexity bound; the skeptic concern about recursion limits and lack of full dependent types therefore directly threatens the claim that all three error classes are rejected for every well-formed recursive CTE.
Authors: The implementation of TyQL provides an executable encoding of the calculus via Scala's type system, where the pattern matching rules serve as the decision procedure for the properties. Termination is ensured by the finite nature of type-level recursion in Scala (bounded by compiler limits, which we respect in our design). We demonstrate coverage of arbitrary queries including mutual recursion through examples in the paper. We acknowledge the value of a more explicit metatheory and will add a new section outlining the type-level reduction rules, their termination argument, and a complexity bound (polynomial in query size for the derivations). This addresses the skeptic concern by making the soundness explicit. revision: partial
-
Referee: [Portability and backend-specific constraints] The portability guarantee requires that the derived properties are backend-independent yet still correctly reject dialect-specific errors. Without an explicit mapping from the type-level derivations to the concrete SQL:1999 semantics (or vendor-specific extensions) for each supported backend, it is unclear whether the rejection rules are conservative enough to avoid both false negatives and false positives across vendors.
Authors: TyQL derives core properties independently of the backend and then applies backend-specific constraints encoded in the type rules. The paper discusses support for multiple dialects and shows how rules are adjusted (e.g., for recursion syntax variations). To clarify the mapping, we will include an explicit table in the revised manuscript that links each type-level derivation step to the relevant SQL:1999 semantics and vendor-specific behaviors, ensuring the rules are shown to be conservative. revision: yes
Circularity Check
No circularity; derivation is self-contained via external Scala type system
full rationale
The paper presents TyQL as a new calculus and Scala implementation that uses Named-Tuples and type-level pattern matching to automatically derive mathematical properties of recursive queries and reject error classes. No equations, definitions, or claims in the provided abstract reduce by construction to fitted parameters, self-referential inputs, or load-bearing self-citations. The safety and portability claims rest on the expressiveness of an external language feature (Scala 3 type-level computation) rather than any internal derivation that loops back to the paper's own results. This matches the default case of a self-contained implementation paper with no detectable circular steps.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Recursive SQL queries can be classified into three error classes (database errors, incorrect results, non-termination) by static analysis of their structure.
- ad hoc to paper Type-level pattern matching in Scala is sufficient to derive the required mathematical properties for arbitrary recursive queries.
invented entities (1)
-
TyQL calculus
no independent evidence
Reference graph
Works this paper leans on
-
[1]
15 Shumo Chu, Konstantin Weitz, Alvin Cheung, and Dan Suciu
Association for Computing Machinery.doi:10.1145/2588555.2612186. 15 Shumo Chu, Konstantin Weitz, Alvin Cheung, and Dan Suciu. Hottsql: proving query rewrites with univalent sql semantics. InProceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, page 510–524, New York, NY, USA, 2017. Association for Com...
-
[2]
22 Zhiwei Fan, Jianqiao Zhu, Zuyu Zhang, Aws Albarghouthi, Paraschos Koutris, and Jignesh M
URL:https://api.semanticscholar.org/CorpusID:202540968. 22 Zhiwei Fan, Jianqiao Zhu, Zuyu Zhang, Aws Albarghouthi, Paraschos Koutris, and Jignesh M. Patel. Scaling-up in-memory datalog processing: observations and techniques.Proc. VLDB Endow., 12(6):695–708, February 2019.doi:10.14778/3311880.3311886. 23 Tim Fischer, Denis Hirn, and Torsten Grust. Snakes ...
-
[3]
URL: https://www.cidrdb.org/cidr2023/papers/p14-hirn
www.cidrdb.org, 2023. URL: https://www.cidrdb.org/cidr2023/papers/p14-hirn. pdf. 31 Recursion example: bill of materials, 2021. URL: https://www.ibm.com/docs/en/ db2-event-store/2.0.0?topic=expression-recursion-example-bill-materials. 32 International Organization for Standardization. ISO/IEC 9075-1:1999, Information technology — Database languages — SQL ...
-
[4]
By the IR typing rules, for each term that contains an aliasα, the premise gives that α∈Π
Step 1 appliesto-NRLSDΨ to eachqbasej, producing non-recursiveDatalog¬sprograms. By the IR typing rules, for each term that contains an aliasα, the premise gives that α∈Π. By the syntax of normalizedλIR,rec-queryterms can only be on the right-hand side ofletrecbindings therefore each qbasej contains no nestedrec-queryterms. By distinct-ir, each input toto...
-
[5]
Step 2 introduces rulesSj of the formαj(Ψ(pbasej)) :–pbasej(Ψ(pbasej)). Byfix-ir, Π(αj) = Aj and Qj = Aj, therefore the schemas ofpbasej and αj are equivalent and contain only a single positive body atom, thereforeSj are safe.Sj introduces the non- recursive dependencypbasej ⇝αj; thus, the combination of the rules ofPbasej and Sj for j∈1..mj produces a sa...
-
[6]
Step 3 appliesto-NRLSD to eachqrecj, producing non-recursiveLSD-Datalog¬. We can use Lemma 2 in the same way as in Step 1, except we apply thefix-irrule instead 38 Language-Integrated Recursive Queries (ra-combinator-q) combinator∈{map,flatMap,filter}p1⇓RA q1 f⇓RA (x)→p2 p2[x↦→t-ref(α)]⇓RA q2 combinator ( p1 :Query[A,C], f:...) :...⇓RA query ( α, q1, q2,c...
-
[7]
Step 4 introduces rulesRj of the formαj(Ψ(precurj)) :–precurj(Ψ(precurj)). The resulting program will be safe because the schemasΨ(pbasej)andΨ( precurj)will be the same: by fix-ir,⊢qbasej :Query [Aj,...]and ⊢qrecj :RQuery [Aj,...]. Byra-combinator-q/r, Π(αj) =A j, thereforeΨ(p recurj)≡Ψ(pbasej). This step “closes the loop”, e.g., introduces a cycle in the...
-
[8]
Step 5 combines the rules produced from Steps 1-4 into a single program. Byto- NRLSDproperty P4 (namespacing), newly introduced head predicates are disjoint across invocations, so the union preserves safety and acyclicity. Conclusion.Therefore, eachletrecbinding ri contributes a well-formed fragment. The body q is translated to non-recursiveDatalog¬susing...
-
[9]
Translate base-cases to non-recursive Datalog¬s (Pbasej,pbasej,Ψbasej ) mi j=1 =to-NRLSD Ψxi (distinct(q basej ))
-
[10]
For each aliasj, add a rule assigning it to the corresponding base-case Sj =αj(Ψ(p basej )):-p basej (Ψ(p basej ))
-
[11]
Translate recursive-case to non-recursive Datalog¬s (P recurj,precurj,Ψrecurj ) mi j=1 =to-NRLSD Ψxi (qrecj )
-
[12]
Add recursive rules Rj =αj(Ψ(p recurj )):-p recurj (Ψ(p recurj ))
-
[13]
B"), (p)→ p. direct==False) (a)λRQL termt filter( fix( map( filter( table(Edges), (e)→e.src==
Assemble LSD-Datalog¬program Pi = ⋃ j∈1..mi (Pbasej∪Precurj∪{Rj}∪{Sj}) Ψ i = ⋃ j∈1..mi (Ψ basej∪Ψrecurj ) pi =αj wherej=w B. Translate the body of the let-rec into non-recursive Datalog Ψs = ⋃ i∈1..nΨi (Pbody,pbody,Ψbody ) =to-NRLSD Ψs (q) C. Combine final program P′= (⋃ i∈1..nPi ) ∪Pbody p′=p body Ψ′= Ψ s∪Ψbody Figure 26Translation Function to LSD-Datalo...
-
[14]
if p is the head predicate symbol and q is the predicate symbol of a positive body literal, then q belongs to a stratum lower than or equal to the stratum of p
-
[15]
A Datalog program P is stratified if it has a stratification
if p is the head predicate symbol and q is the predicate symbol of a negative body literal, then q belongs to a stratum lower than the stratum of p. A Datalog program P is stratified if it has a stratification
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.