pith. sign in

arxiv: 2504.01157 · v1 · submitted 2025-04-01 · 💻 cs.DB · cs.IR

Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB

Pith reviewed 2026-05-22 22:29 UTC · model grok-4.3

classification 💻 cs.DB cs.IR
keywords FlockMTLDuckDBLLM integrationRAGmodel-driven functionsPROMPT and MODEL DDLknowledge-intensive analyticsdatabase extensions
0
0 comments X

The pith

FlockMTL embeds LLM calls and RAG directly into DuckDB through model-driven SQL functions and new PROMPT and MODEL schema objects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FlockMTL as a DBMS extension to simplify building applications that combine structured tabular data with unstructured documents for reasoning. It adds model-driven scalar and aggregate functions that perform LLM-based predictions at the tuple level and support chaining. The work also defines PROMPT and MODEL as first-class DDL objects alongside tables to enable cost-based optimizations such as batching and caching. This integration keeps LLM operations inside the database engine rather than requiring separate systems. The central goal is to lower the effort of orchestration and data movement in knowledge-intensive analytics.

Core claim

FlockMTL extends a DBMS with model-driven scalar and aggregate functions for chained LLM predictions on tuples, together with PROMPT and MODEL as first-class schema objects that sit alongside TABLE; these abstractions permit cost-based optimizations including batching and caching while providing resource independence, allowing SQL queries to handle both structured data and retrieval-augmented generation without external orchestration.

What carries the argument

Model-driven scalar and aggregate functions plus PROMPT and MODEL DDL objects treated as first-class schema elements, which carry LLM integration into query execution and optimization.

If this is right

  • SQL queries can express tuple-level LLM mappings and reductions without leaving the DBMS.
  • Batching and caching of LLM calls occur automatically through existing cost-based mechanisms.
  • Data movement between database and language-model services is eliminated for these workloads.
  • Development of applications that mix tabular retrieval with document-based reasoning becomes a single SQL task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same DDL abstractions could be adopted by other relational engines to achieve similar integration.
  • Workload-specific cost models for LLM latency and token usage might emerge as natural extensions of the optimizer.
  • Applications that already run inside DuckDB could gain new capabilities for unstructured data without pipeline changes.

Load-bearing premise

The assumption that embedding LLM calls as model-driven functions plus new PROMPT and MODEL DDL objects will meaningfully reduce orchestration effort and data movement compared with existing heterogeneous pipelines.

What would settle it

A side-by-side comparison of lines of code, development time, and end-to-end latency for the same knowledge-intensive analytical task written once with FlockMTL and once with separate database and LLM systems.

read the original abstract

Knowledge-intensive analytical applications retrieve context from both structured tabular data and unstructured, text-free documents for effective decision-making. Large language models (LLMs) have made it significantly easier to prototype such retrieval and reasoning data pipelines. However, implementing these pipelines efficiently still demands significant effort and has several challenges. This often involves orchestrating heterogeneous data systems, managing data movement, and handling low-level implementation details, e.g., LLM context management. To address these challenges, we introduce FlockMTL: an extension for DBMSs that deeply integrates LLM capabilities and retrieval-augmented generation (RAG). FlockMTL includes model-driven scalar and aggregate functions, enabling chained predictions through tuple-level mappings and reductions. Drawing inspiration from the relational model, FlockMTL incorporates: (i) cost-based optimizations, which seamlessly apply techniques such as batching and caching; and (ii) resource independence, enabled through novel SQL DDL abstractions: PROMPT and MODEL, introduced as first-class schema objects alongside TABLE. FlockMTL streamlines the development of knowledge-intensive analytical applications, and its optimizations ease the implementation burden.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces FlockMTL as a DuckDB extension for deep integration of LLMs and RAG into a DBMS. It proposes model-driven scalar and aggregate functions to enable tuple-level mappings and reductions for chained LLM predictions, along with first-class PROMPT and MODEL DDL objects (modeled after TABLE) to support resource independence. The system incorporates cost-based optimizations such as batching and caching, with the central claim that this approach reduces orchestration effort, data movement, and low-level implementation details relative to heterogeneous pipelines for knowledge-intensive analytical applications.

Significance. If substantiated, the approach could lower barriers for building applications that combine structured tabular data with unstructured documents via LLMs inside a single DBMS. The conceptual framing around relational-model-inspired DDL abstractions and optimizer integration is a plausible direction for reducing pipeline complexity. However, the manuscript supplies no implementation details, benchmarks, user studies, or controlled comparisons, so the claimed reductions in burden cannot be evaluated and the significance remains speculative.

major comments (2)
  1. [Abstract] Abstract: the claim that 'FlockMTL streamlines the development of knowledge-intensive analytical applications, and its optimizations ease the implementation burden' is presented as the outcome of the work, yet the manuscript contains no evaluation results, metrics on orchestration effort, data-movement measurements, or comparisons against heterogeneous pipelines to support this assertion.
  2. [Abstract] Abstract: the description of 'model-driven scalar and aggregate functions' and 'novel SQL DDL abstractions: PROMPT and MODEL' is given at a high level only, with no specification of their semantics, how they interact with the query optimizer, or how cost-based decisions for batching/caching are realized; this absence prevents assessment of whether the proposed deep integration is technically feasible or load-bearing for the central claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting issues with the abstract's claims and level of technical detail. We agree that both points require attention and will revise the manuscript accordingly to better align the presentation with the content provided.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that 'FlockMTL streamlines the development of knowledge-intensive analytical applications, and its optimizations ease the implementation burden' is presented as the outcome of the work, yet the manuscript contains no evaluation results, metrics on orchestration effort, data-movement measurements, or comparisons against heterogeneous pipelines to support this assertion.

    Authors: We agree that the abstract overstates these benefits as demonstrated outcomes. The current manuscript is a system design and architecture paper without empirical evaluations. We will revise the abstract to present the streamlining and burden reduction as the intended benefits of the proposed design and optimizations, rather than as measured results. We will also add a dedicated discussion section outlining qualitative arguments for these benefits based on the architecture. revision: yes

  2. Referee: [Abstract] Abstract: the description of 'model-driven scalar and aggregate functions' and 'novel SQL DDL abstractions: PROMPT and MODEL' is given at a high level only, with no specification of their semantics, how they interact with the query optimizer, or how cost-based decisions for batching/caching are realized; this absence prevents assessment of whether the proposed deep integration is technically feasible or load-bearing for the central claim.

    Authors: The body of the manuscript provides usage examples and high-level descriptions of the functions and DDL objects, along with mentions of optimizer extensions for batching and caching. However, we acknowledge that the abstract is overly terse and that more precise semantics and interaction details would aid assessment. We will revise the abstract to include brief semantic highlights and ensure the relevant sections (on function definitions, DDL modeling, and cost-based planning) are expanded with additional specification of semantics and optimizer integration in the revised version. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper proposes a system design (FlockMTL) for embedding LLM/RAG capabilities into DuckDB via scalar/aggregate functions and new PROMPT/MODEL DDL objects. No mathematical derivations, equations, fitted parameters, or predictions appear in the provided material. Claims about reduced orchestration effort are design assertions rather than results derived from inputs by construction. No self-citation chains, uniqueness theorems, or ansatzes are load-bearing. This matches the default expectation for non-circular system papers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The design rests on the domain assumption that LLM calls can be treated as relational operations without prohibitive latency or correctness issues; the new PROMPT and MODEL objects are invented entities introduced to achieve resource independence.

axioms (1)
  • domain assumption LLM inference can be exposed as scalar and aggregate functions that support tuple-level mappings and reductions without breaking relational semantics.
    Invoked when defining model-driven functions as the core integration mechanism.
invented entities (1)
  • PROMPT and MODEL DDL objects no independent evidence
    purpose: First-class schema objects that enable cost-based optimizations and resource independence for LLM usage.
    New abstractions introduced in the paper to treat prompts and models like tables.

pith-pipeline@v0.9.0 · 5734 in / 1179 out tokens · 31135 ms · 2026-05-22T22:29:20.602177+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. PLOP: Cost-Based Placement of Semantic Operators in Hybrid Query Plans

    cs.DB 2026-04 conditional novelty 7.0

    PLOP is a cost-based optimizer that finds optimal placements for semantic LLM operators in hybrid query plans via dynamic programming, delivering up to 1.5x speedup and 4.29x cost reduction on 44 benchmark queries whi...

  2. LLM+Graph@VLDB'2025 Workshop Summary

    cs.DB 2026-04 unverdicted novelty 1.0

    The report summarizes key research directions, challenges, and solutions from the LLM+Graph workshop at VLDB 2025.