Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB

Amine Mhedhbi; Anas Dorbani; Jimmy Lin; Sunny Yasser

arxiv: 2504.01157 · v1 · submitted 2025-04-01 · 💻 cs.DB · cs.IR

Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB

Anas Dorbani , Sunny Yasser , Jimmy Lin , Amine Mhedhbi This is my paper

Pith reviewed 2026-05-22 22:29 UTC · model grok-4.3

classification 💻 cs.DB cs.IR

keywords FlockMTLDuckDBLLM integrationRAGmodel-driven functionsPROMPT and MODEL DDLknowledge-intensive analyticsdatabase extensions

0 comments

The pith

FlockMTL embeds LLM calls and RAG directly into DuckDB through model-driven SQL functions and new PROMPT and MODEL schema objects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FlockMTL as a DBMS extension to simplify building applications that combine structured tabular data with unstructured documents for reasoning. It adds model-driven scalar and aggregate functions that perform LLM-based predictions at the tuple level and support chaining. The work also defines PROMPT and MODEL as first-class DDL objects alongside tables to enable cost-based optimizations such as batching and caching. This integration keeps LLM operations inside the database engine rather than requiring separate systems. The central goal is to lower the effort of orchestration and data movement in knowledge-intensive analytics.

Core claim

FlockMTL extends a DBMS with model-driven scalar and aggregate functions for chained LLM predictions on tuples, together with PROMPT and MODEL as first-class schema objects that sit alongside TABLE; these abstractions permit cost-based optimizations including batching and caching while providing resource independence, allowing SQL queries to handle both structured data and retrieval-augmented generation without external orchestration.

What carries the argument

Model-driven scalar and aggregate functions plus PROMPT and MODEL DDL objects treated as first-class schema elements, which carry LLM integration into query execution and optimization.

If this is right

SQL queries can express tuple-level LLM mappings and reductions without leaving the DBMS.
Batching and caching of LLM calls occur automatically through existing cost-based mechanisms.
Data movement between database and language-model services is eliminated for these workloads.
Development of applications that mix tabular retrieval with document-based reasoning becomes a single SQL task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same DDL abstractions could be adopted by other relational engines to achieve similar integration.
Workload-specific cost models for LLM latency and token usage might emerge as natural extensions of the optimizer.
Applications that already run inside DuckDB could gain new capabilities for unstructured data without pipeline changes.

Load-bearing premise

The assumption that embedding LLM calls as model-driven functions plus new PROMPT and MODEL DDL objects will meaningfully reduce orchestration effort and data movement compared with existing heterogeneous pipelines.

What would settle it

A side-by-side comparison of lines of code, development time, and end-to-end latency for the same knowledge-intensive analytical task written once with FlockMTL and once with separate database and LLM systems.

read the original abstract

Knowledge-intensive analytical applications retrieve context from both structured tabular data and unstructured, text-free documents for effective decision-making. Large language models (LLMs) have made it significantly easier to prototype such retrieval and reasoning data pipelines. However, implementing these pipelines efficiently still demands significant effort and has several challenges. This often involves orchestrating heterogeneous data systems, managing data movement, and handling low-level implementation details, e.g., LLM context management. To address these challenges, we introduce FlockMTL: an extension for DBMSs that deeply integrates LLM capabilities and retrieval-augmented generation (RAG). FlockMTL includes model-driven scalar and aggregate functions, enabling chained predictions through tuple-level mappings and reductions. Drawing inspiration from the relational model, FlockMTL incorporates: (i) cost-based optimizations, which seamlessly apply techniques such as batching and caching; and (ii) resource independence, enabled through novel SQL DDL abstractions: PROMPT and MODEL, introduced as first-class schema objects alongside TABLE. FlockMTL streamlines the development of knowledge-intensive analytical applications, and its optimizations ease the implementation burden.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a design sketch for adding PROMPT and MODEL as first-class DDL objects in DuckDB to embed LLM calls and RAG, but the abstract gives no implementation or measurements to support the claims.

read the letter

The core idea is to treat prompts and models like tables in the schema so the DBMS can manage LLM invocations through scalar and aggregate functions that support chaining. This setup is meant to let cost-based optimizations such as batching and caching happen inside the engine rather than in application code. The authors position it as a way to cut down on juggling separate systems and moving data around for apps that combine tabular queries with unstructured retrieval and reasoning. That motivation matches real pain points people hit when wiring DuckDB to external LLM services. The relational framing for resource independence is a reasonable starting point and not something the cited prior work appears to have done in exactly this form. What the paper does well is keep the proposal grounded in SQL abstractions that feel native to a relational engine. The soft spot is the complete absence of any concrete evidence. The abstract states that the new objects and functions will reduce orchestration effort and data movement, yet supplies no prototype description, no query examples that show the claimed chaining, and no numbers on latency, cost, or developer time. Without those, the central assumption stays untested. The work is aimed at database researchers and practitioners who build extensions for DuckDB or similar systems and want to explore tighter LLM integration. A reader already working on model serving inside query engines could extract useful design points from the DDL proposal. I would send the full version to peer review if it includes a working implementation and at least basic measurements; based on the abstract alone it is too early for that step.

Referee Report

2 major / 0 minor

Summary. The paper introduces FlockMTL as a DuckDB extension for deep integration of LLMs and RAG into a DBMS. It proposes model-driven scalar and aggregate functions to enable tuple-level mappings and reductions for chained LLM predictions, along with first-class PROMPT and MODEL DDL objects (modeled after TABLE) to support resource independence. The system incorporates cost-based optimizations such as batching and caching, with the central claim that this approach reduces orchestration effort, data movement, and low-level implementation details relative to heterogeneous pipelines for knowledge-intensive analytical applications.

Significance. If substantiated, the approach could lower barriers for building applications that combine structured tabular data with unstructured documents via LLMs inside a single DBMS. The conceptual framing around relational-model-inspired DDL abstractions and optimizer integration is a plausible direction for reducing pipeline complexity. However, the manuscript supplies no implementation details, benchmarks, user studies, or controlled comparisons, so the claimed reductions in burden cannot be evaluated and the significance remains speculative.

major comments (2)

[Abstract] Abstract: the claim that 'FlockMTL streamlines the development of knowledge-intensive analytical applications, and its optimizations ease the implementation burden' is presented as the outcome of the work, yet the manuscript contains no evaluation results, metrics on orchestration effort, data-movement measurements, or comparisons against heterogeneous pipelines to support this assertion.
[Abstract] Abstract: the description of 'model-driven scalar and aggregate functions' and 'novel SQL DDL abstractions: PROMPT and MODEL' is given at a high level only, with no specification of their semantics, how they interact with the query optimizer, or how cost-based decisions for batching/caching are realized; this absence prevents assessment of whether the proposed deep integration is technically feasible or load-bearing for the central claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting issues with the abstract's claims and level of technical detail. We agree that both points require attention and will revise the manuscript accordingly to better align the presentation with the content provided.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'FlockMTL streamlines the development of knowledge-intensive analytical applications, and its optimizations ease the implementation burden' is presented as the outcome of the work, yet the manuscript contains no evaluation results, metrics on orchestration effort, data-movement measurements, or comparisons against heterogeneous pipelines to support this assertion.

Authors: We agree that the abstract overstates these benefits as demonstrated outcomes. The current manuscript is a system design and architecture paper without empirical evaluations. We will revise the abstract to present the streamlining and burden reduction as the intended benefits of the proposed design and optimizations, rather than as measured results. We will also add a dedicated discussion section outlining qualitative arguments for these benefits based on the architecture. revision: yes
Referee: [Abstract] Abstract: the description of 'model-driven scalar and aggregate functions' and 'novel SQL DDL abstractions: PROMPT and MODEL' is given at a high level only, with no specification of their semantics, how they interact with the query optimizer, or how cost-based decisions for batching/caching are realized; this absence prevents assessment of whether the proposed deep integration is technically feasible or load-bearing for the central claim.

Authors: The body of the manuscript provides usage examples and high-level descriptions of the functions and DDL objects, along with mentions of optimizer extensions for batching and caching. However, we acknowledge that the abstract is overly terse and that more precise semantics and interaction details would aid assessment. We will revise the abstract to include brief semantic highlights and ensure the relevant sections (on function definitions, DDL modeling, and cost-based planning) are expanded with additional specification of semantics and optimizer integration in the revised version. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper proposes a system design (FlockMTL) for embedding LLM/RAG capabilities into DuckDB via scalar/aggregate functions and new PROMPT/MODEL DDL objects. No mathematical derivations, equations, fitted parameters, or predictions appear in the provided material. Claims about reduced orchestration effort are design assertions rather than results derived from inputs by construction. No self-citation chains, uniqueness theorems, or ansatzes are load-bearing. This matches the default expectation for non-circular system papers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The design rests on the domain assumption that LLM calls can be treated as relational operations without prohibitive latency or correctness issues; the new PROMPT and MODEL objects are invented entities introduced to achieve resource independence.

axioms (1)

domain assumption LLM inference can be exposed as scalar and aggregate functions that support tuple-level mappings and reductions without breaking relational semantics.
Invoked when defining model-driven functions as the core integration mechanism.

invented entities (1)

PROMPT and MODEL DDL objects no independent evidence
purpose: First-class schema objects that enable cost-based optimizations and resource independence for LLM usage.
New abstractions introduced in the paper to treat prompts and models like tables.

pith-pipeline@v0.9.0 · 5734 in / 1179 out tokens · 31135 ms · 2026-05-22T22:29:20.602177+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

PLOP: Cost-Based Placement of Semantic Operators in Hybrid Query Plans
cs.DB 2026-04 conditional novelty 7.0

PLOP is a cost-based optimizer that finds optimal placements for semantic LLM operators in hybrid query plans via dynamic programming, delivering up to 1.5x speedup and 4.29x cost reduction on 44 benchmark queries whi...
LLM+Graph@VLDB'2025 Workshop Summary
cs.DB 2026-04 unverdicted novelty 1.0

The report summarizes key research directions, challenges, and solutions from the LLM+Graph workshop at VLDB 2025.