pith. sign in

arxiv: 2504.02443 · v4 · submitted 2025-04-03 · 💻 cs.PL

Language-Integrated Recursive Queries (Full Version)

Pith reviewed 2026-05-22 21:44 UTC · model grok-4.3

classification 💻 cs.PL
keywords recursive queriesSQLlanguage-integrated queriestype safetyScalafixed-point computationquery portabilitytermination checking
0
0 comments X

The pith

A Scala embedding derives properties from recursive queries to reject errors at the type level.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TyQL as a way to embed recursive SQL queries inside Scala so that a calculus can automatically check for safety properties. It targets the risks that arise once SQL recursion makes the language Turing-complete, including divergent vendor behaviors and the absence of a fixed formal semantics. By deriving mathematical properties through type-level pattern matching and Named-Tuples, the system rejects queries that would produce database errors, incorrect results, or non-termination. A sympathetic reader would care because this moves the safety guarantee from runtime testing or vendor-specific reasoning into the host language's type system while keeping execution speed identical to raw SQL strings.

Core claim

TyQL automatically derives mathematical properties from embedded recursive queries and, depending on the database backend, rejects queries that may lead to the three classes of recursive query errors: database errors, incorrect results, and non-termination. The implementation uses Named-Tuples and type-level pattern matching in Scala to ensure query portability and safety, with no performance penalty compared to raw SQL strings and a three-orders-of-magnitude speedup over non-recursive SQL queries.

What carries the argument

A calculus that automatically derives mathematical properties from embedded recursive queries, realized through Named-Tuples and type-level pattern matching.

If this is right

  • Recursive queries become portable across database backends because safety checks are independent of vendor-specific semantics.
  • The three classes of errors (database errors, incorrect results, non-termination) are caught before execution rather than at runtime.
  • Performance of safe recursive queries matches raw SQL strings while exceeding non-recursive alternatives by three orders of magnitude.
  • Fixed-point computations used in program, network, and distributed-system analysis can be expressed inside the host language without separate SQL scripting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same property-derivation approach could be applied to other host languages that support sufficiently expressive static type systems.
  • Derived properties might be used to guide query optimizers or to prove equivalence between different recursive formulations.
  • Extending the calculus to additional SQL features such as window functions or common table expressions beyond recursion would be a direct next step.

Load-bearing premise

Type-level pattern matching and Named-Tuples in Scala are expressive enough to capture every mathematical property needed to detect the three error classes for arbitrary recursive queries.

What would settle it

An executable recursive query that the type checker accepts yet produces non-termination or an incorrect result on at least one supported database backend.

Figures

Figures reproduced from arXiv: 2504.02443 by Amir Shaikhha, Anastasia Ailamaki, Anna Herlihy, Martin Odersky.

Figure 1
Figure 1. Figure 1: The reason why RDBMS may return incorrect results is an internal optimization [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
read the original abstract

Performance-critical industrial applications, including large-scale program, network, and distributed system analyses, rely on fixed-point computations. The introduction of recursive common table expressions (CTEs) using the WITH RECURSIVE keyword in SQL:1999 extended the ability of relational database systems to handle fixed-point computations, unlocking significant performance advantages by allowing computation to move closer to the data. Yet with recursion, SQL becomes a Turing-complete programming language and, with that, unrecoverable safety and correctness risks. SQL itself lacks a fixed semantics, as the SQL specification is written in natural language, full of ambiguities that database vendors resolve in divergent ways. As a result, reasoning about the correctness of recursive SQL programs must rely on isolated mathematical properties of queries rather than wrestling a unified formal model out of a language with notoriously inconsistent semantics. To address these challenges, we propose a calculus that automatically derives mathematical properties from embedded recursive queries and, depending on the database backend, rejects queries that may lead to the three classes of recursive query errors - database errors, incorrect results, and non-termination. We introduce TyQL, a practical implementation in Scala for safe, recursive language-integrated query. Using Named-Tuples and type-level pattern matching, TyQL ensures query portability and safety, showing no performance penalty compared to raw SQL strings while unlocking a three-orders-of-magnitude speedup over non-recursive SQL queries.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces TyQL, a language-integrated query system embedded in Scala that uses Named-Tuples for row typing and type-level pattern matching to derive mathematical properties (monotonicity, stratification, dialect constraints) from recursive CTEs. It claims to automatically reject queries leading to the three error classes (DB errors, incorrect results, non-termination), ensure portability across backends, incur no runtime overhead versus raw SQL strings, and deliver three-orders-of-magnitude speedups over non-recursive SQL.

Significance. If the type-level encoding is shown to be both sound and sufficiently expressive for arbitrary recursive queries (including mutual recursion and deep nesting), the work would provide a practical, compile-time safety layer for fixed-point computations in industrial analyses, combining the performance of recursive SQL with the guarantees absent from the SQL standard itself.

major comments (2)
  1. [Type-level encoding / calculus definition (inferred from abstract and implementation description)] The central safety claim rests on the assertion that Scala 3 type-level pattern matching and Named-Tuples can encode and decide all required properties for arbitrary queries. No section provides a formal metatheory, termination argument for the type-level reduction, or complexity bound; the skeptic concern about recursion limits and lack of full dependent types therefore directly threatens the claim that all three error classes are rejected for every well-formed recursive CTE.
  2. [Portability and backend-specific constraints] The portability guarantee requires that the derived properties are backend-independent yet still correctly reject dialect-specific errors. Without an explicit mapping from the type-level derivations to the concrete SQL:1999 semantics (or vendor-specific extensions) for each supported backend, it is unclear whether the rejection rules are conservative enough to avoid both false negatives and false positives across vendors.
minor comments (2)
  1. [Evaluation] The abstract states performance results but the manuscript should include the exact benchmark queries, data sizes, and hardware configuration used to measure the three-orders-of-magnitude speedup and the zero-overhead claim versus raw SQL strings.
  2. [Introduction / calculus] Notation for the three error classes and the derived properties (monotonicity, stratification, etc.) should be introduced with explicit definitions before the type-level encoding is presented.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the valuable feedback. Below we respond to each major comment, offering clarifications and committing to revisions where appropriate to strengthen the presentation of TyQL's type-level safety guarantees.

read point-by-point responses
  1. Referee: [Type-level encoding / calculus definition (inferred from abstract and implementation description)] The central safety claim rests on the assertion that Scala 3 type-level pattern matching and Named-Tuples can encode and decide all required properties for arbitrary queries. No section provides a formal metatheory, termination argument for the type-level reduction, or complexity bound; the skeptic concern about recursion limits and lack of full dependent types therefore directly threatens the claim that all three error classes are rejected for every well-formed recursive CTE.

    Authors: The implementation of TyQL provides an executable encoding of the calculus via Scala's type system, where the pattern matching rules serve as the decision procedure for the properties. Termination is ensured by the finite nature of type-level recursion in Scala (bounded by compiler limits, which we respect in our design). We demonstrate coverage of arbitrary queries including mutual recursion through examples in the paper. We acknowledge the value of a more explicit metatheory and will add a new section outlining the type-level reduction rules, their termination argument, and a complexity bound (polynomial in query size for the derivations). This addresses the skeptic concern by making the soundness explicit. revision: partial

  2. Referee: [Portability and backend-specific constraints] The portability guarantee requires that the derived properties are backend-independent yet still correctly reject dialect-specific errors. Without an explicit mapping from the type-level derivations to the concrete SQL:1999 semantics (or vendor-specific extensions) for each supported backend, it is unclear whether the rejection rules are conservative enough to avoid both false negatives and false positives across vendors.

    Authors: TyQL derives core properties independently of the backend and then applies backend-specific constraints encoded in the type rules. The paper discusses support for multiple dialects and shows how rules are adjusted (e.g., for recursion syntax variations). To clarify the mapping, we will include an explicit table in the revised manuscript that links each type-level derivation step to the relevant SQL:1999 semantics and vendor-specific behaviors, ensuring the rules are shown to be conservative. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation is self-contained via external Scala type system

full rationale

The paper presents TyQL as a new calculus and Scala implementation that uses Named-Tuples and type-level pattern matching to automatically derive mathematical properties of recursive queries and reject error classes. No equations, definitions, or claims in the provided abstract reduce by construction to fitted parameters, self-referential inputs, or load-bearing self-citations. The safety and portability claims rest on the expressiveness of an external language feature (Scala 3 type-level computation) rather than any internal derivation that loops back to the paper's own results. This matches the default case of a self-contained implementation paper with no detectable circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The abstract does not enumerate free parameters or invented entities. The central claim rests on the existence of a sound calculus that classifies all three error classes and on the assumption that Scala's type system can encode the necessary checks without runtime overhead.

axioms (2)
  • domain assumption Recursive SQL queries can be classified into three error classes (database errors, incorrect results, non-termination) by static analysis of their structure.
    Invoked when the paper states the calculus rejects queries that may lead to these errors.
  • ad hoc to paper Type-level pattern matching in Scala is sufficient to derive the required mathematical properties for arbitrary recursive queries.
    The implementation relies on Named-Tuples and type-level pattern matching to ensure safety.
invented entities (1)
  • TyQL calculus no independent evidence
    purpose: Automatically derives mathematical properties from embedded recursive queries and rejects unsafe ones.
    New construct introduced to address the lack of fixed semantics in SQL.

pith-pipeline@v0.9.0 · 5779 in / 1441 out tokens · 51208 ms · 2026-05-22T21:44:42.249179+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

  1. [1]

    15 Shumo Chu, Konstantin Weitz, Alvin Cheung, and Dan Suciu

    Association for Computing Machinery.doi:10.1145/2588555.2612186. 15 Shumo Chu, Konstantin Weitz, Alvin Cheung, and Dan Suciu. Hottsql: proving query rewrites with univalent sql semantics. InProceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, page 510–524, New York, NY, USA, 2017. Association for Com...

  2. [2]

    22 Zhiwei Fan, Jianqiao Zhu, Zuyu Zhang, Aws Albarghouthi, Paraschos Koutris, and Jignesh M

    URL:https://api.semanticscholar.org/CorpusID:202540968. 22 Zhiwei Fan, Jianqiao Zhu, Zuyu Zhang, Aws Albarghouthi, Paraschos Koutris, and Jignesh M. Patel. Scaling-up in-memory datalog processing: observations and techniques.Proc. VLDB Endow., 12(6):695–708, February 2019.doi:10.14778/3311880.3311886. 23 Tim Fischer, Denis Hirn, and Torsten Grust. Snakes ...

  3. [3]

    URL: https://www.cidrdb.org/cidr2023/papers/p14-hirn

    www.cidrdb.org, 2023. URL: https://www.cidrdb.org/cidr2023/papers/p14-hirn. pdf. 31 Recursion example: bill of materials, 2021. URL: https://www.ibm.com/docs/en/ db2-event-store/2.0.0?topic=expression-recursion-example-bill-materials. 32 International Organization for Standardization. ISO/IEC 9075-1:1999, Information technology — Database languages — SQL ...

  4. [4]

    By the IR typing rules, for each term that contains an aliasα, the premise gives that α∈Π

    Step 1 appliesto-NRLSDΨ to eachqbasej, producing non-recursiveDatalog¬sprograms. By the IR typing rules, for each term that contains an aliasα, the premise gives that α∈Π. By the syntax of normalizedλIR,rec-queryterms can only be on the right-hand side ofletrecbindings therefore each qbasej contains no nestedrec-queryterms. By distinct-ir, each input toto...

  5. [5]

    Step 2 introduces rulesSj of the formαj(Ψ(pbasej)) :–pbasej(Ψ(pbasej)). Byfix-ir, Π(αj) = Aj and Qj = Aj, therefore the schemas ofpbasej and αj are equivalent and contain only a single positive body atom, thereforeSj are safe.Sj introduces the non- recursive dependencypbasej ⇝αj; thus, the combination of the rules ofPbasej and Sj for j∈1..mj produces a sa...

  6. [6]

    incoming edges

    Step 3 appliesto-NRLSD to eachqrecj, producing non-recursiveLSD-Datalog¬. We can use Lemma 2 in the same way as in Step 1, except we apply thefix-irrule instead 38 Language-Integrated Recursive Queries (ra-combinator-q) combinator∈{map,flatMap,filter}p1⇓RA q1 f⇓RA (x)→p2 p2[x↦→t-ref(α)]⇓RA q2 combinator ( p1 :Query[A,C], f:...) :...⇓RA query ( α, q1, q2,c...

  7. [7]

    closes the loop

    Step 4 introduces rulesRj of the formαj(Ψ(precurj)) :–precurj(Ψ(precurj)). The resulting program will be safe because the schemasΨ(pbasej)andΨ( precurj)will be the same: by fix-ir,⊢qbasej :Query [Aj,...]and ⊢qrecj :RQuery [Aj,...]. Byra-combinator-q/r, Π(αj) =A j, thereforeΨ(p recurj)≡Ψ(pbasej). This step “closes the loop”, e.g., introduces a cycle in the...

  8. [8]

    Byto- NRLSDproperty P4 (namespacing), newly introduced head predicates are disjoint across invocations, so the union preserves safety and acyclicity

    Step 5 combines the rules produced from Steps 1-4 into a single program. Byto- NRLSDproperty P4 (namespacing), newly introduced head predicates are disjoint across invocations, so the union preserves safety and acyclicity. Conclusion.Therefore, eachletrecbinding ri contributes a well-formed fragment. The body q is translated to non-recursiveDatalog¬susing...

  9. [9]

    Translate base-cases to non-recursive Datalog¬s (Pbasej,pbasej,Ψbasej ) mi j=1 =to-NRLSD Ψxi (distinct(q basej ))

  10. [10]

    For each aliasj, add a rule assigning it to the corresponding base-case Sj =αj(Ψ(p basej )):-p basej (Ψ(p basej ))

  11. [11]

    Translate recursive-case to non-recursive Datalog¬s (P recurj,precurj,Ψrecurj ) mi j=1 =to-NRLSD Ψxi (qrecj )

  12. [12]

    Add recursive rules Rj =αj(Ψ(p recurj )):-p recurj (Ψ(p recurj ))

  13. [13]

    B"), (p)→ p. direct==False) (a)λRQL termt filter( fix( map( filter( table(Edges), (e)→e.src==

    Assemble LSD-Datalog¬program Pi = ⋃ j∈1..mi (Pbasej∪Precurj∪{Rj}∪{Sj}) Ψ i = ⋃ j∈1..mi (Ψ basej∪Ψrecurj ) pi =αj wherej=w B. Translate the body of the let-rec into non-recursive Datalog Ψs = ⋃ i∈1..nΨi (Pbody,pbody,Ψbody ) =to-NRLSD Ψs (q) C. Combine final program P′= (⋃ i∈1..nPi ) ∪Pbody p′=p body Ψ′= Ψ s∪Ψbody Figure 26Translation Function to LSD-Datalo...

  14. [14]

    if p is the head predicate symbol and q is the predicate symbol of a positive body literal, then q belongs to a stratum lower than or equal to the stratum of p

  15. [15]

    A Datalog program P is stratified if it has a stratification

    if p is the head predicate symbol and q is the predicate symbol of a negative body literal, then q belongs to a stratum lower than the stratum of p. A Datalog program P is stratified if it has a stratification