arxiv: 2602.18775 · v2 · submitted 2026-02-21 · 💻 cs.DB

Recognition: 2 theorem links

· Lean Theorem

Should I Hide My Duck in the Lake?

Jonas Dann , Gustavo Alonso

Authors on Pith no claims yet

Pith reviewed 2026-05-15 20:46 UTC · model grok-4.3

classification 💻 cs.DB

keywords SmartNICdata lakesParquetquery offloadingdecodingDuckDBnetwork datapathdisaggregated storage

0 comments

The pith

A SmartNIC on the network path can offload Parquet decoding and filtering to raise data lake query speeds while allowing cheaper CPUs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Data lakes spend much of their query time scanning remote storage, where decoding Parquet files alone takes 46 percent of runtime on TPC-H workloads. The paper puts forward a SmartNIC placed directly on the compute node's network datapath to decode files and apply pushed-down operators before data reaches the CPU. Estimates run with DuckDB show that queries can then work on this pre-filtered stream, delivering higher performance while still matching the throughput of conventional systems that use larger CPUs.

Core claim

By positioning a data processing SmartNIC on the network datapath, decoding and operator pushdown happen before data arrives at the host, so queries operate directly on pre-filtered results. This hides the cost of parsing raw files and allows the same query throughput with smaller, less expensive CPUs.

What carries the argument

A data processing SmartNIC that performs decoding and pushed-down operators on the network datapath to deliver pre-filtered data to the host.

If this is right

Query processing nodes can use smaller CPUs while keeping the same throughput.
System cost drops because less expensive hardware suffices for the same workload.
The scanning and decoding bottleneck in disaggregated storage is reduced.
Queries spend less time waiting on remote file access.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The design could be added to existing cloud networks with minimal changes to query engines like DuckDB.
Extending the same offload logic to other file formats would broaden the approach beyond Parquet.
The work points to a tighter coupling between network hardware and data processing that future systems may adopt.

Load-bearing premise

A practical SmartNIC can decode files and push down operators at full line rate without adding latency, power draw, or integration problems that erase the gains.

What would settle it

A working SmartNIC prototype that decodes Parquet at network line rate but causes end-to-end query latency to rise above current CPU-only baselines would disprove the performance benefit.

Figures

Figures reproduced from arXiv: 2602.18775 by Gustavo Alonso, Jonas Dann.

**Figure 2.** Figure 2: Per-query breakdown of Parquet decoding, filtering, and the remaining query runtime for the benchmarks TPC-H [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: DuckDB scan rewriter optimizer extension. [PITH_FULL_IMAGE:figures/full_fig_p002_3.png] view at source ↗

**Figure 6.** Figure 6: Data processing SmartNIC architecture. scales slightly better, while CSV parsing plateaus. Overall, throughput for both formats is dramatically lower than even for Parquet, where we see 14 − 16× higher throughput. While Parquet decoding already implies a significant overhead, text-based formats—lacking columnar organization, binary encoding, and predicate-relevant metadata—are far more costly to parse [P… view at source ↗

**Figure 5.** Figure 5: TPC-H CSV and JSON throughput (scale factor 10) [PITH_FULL_IMAGE:figures/full_fig_p003_5.png] view at source ↗

read the original abstract

Data lakes spend a significant fraction of query execution time on scanning data from remote, disaggregated storage. Decoding alone accounts for 46% of runtime when running TPC-H directly on Parquet files. To address this bottleneck, we propose a vision for a data processing SmartNIC for the cloud that sits on the network datapath of compute nodes to offload decoding and pushed-down operators, effectively hiding the cost of parsing raw files. Our experimental estimations with DuckDB suggest that by operating directly on pre-filtered data, as delivered by a SmartNIC, we can significantly increase query processing performance and can still match query throughput of traditional setups with smaller, less expensive CPUs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a vision for a cloud data-processing SmartNIC placed on the network datapath of compute nodes. It claims that offloading Parquet decoding and pushed-down operators hides scanning costs in disaggregated data lakes; DuckDB estimations are cited to show that pre-filtered data delivery yields significantly higher query performance while still matching traditional throughput on smaller, less expensive CPUs. Decoding is asserted to consume 46% of TPC-H runtime on Parquet.

Significance. If a SmartNIC meeting the stated line-rate and integration assumptions can be built, the approach would materially lower CPU provisioning costs for cloud analytics workloads that currently spend substantial time on remote file parsing.

major comments (2)

[Abstract] Abstract: the central performance claim rests on DuckDB estimations whose methodology, workload details (queries, scale factors, Parquet configurations), measurement method, and error margins are not described, preventing assessment of the reported 46% decoding overhead or the projected net gains.
[Vision Proposal] Vision section: the assumption that decoding plus operator pushdown can be performed at line rate on the network datapath without offsetting PCIe handoff, memory-coherence, or sustained-parsing latency/power costs is stated without hardware model, prototype data, or sensitivity analysis; if any of these costs exceed the modeled savings, the headline claim does not hold.

minor comments (1)

The manuscript would benefit from an explicit limitations subsection that enumerates the hardware assumptions required for the SmartNIC to deliver the claimed benefits.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the detailed and constructive feedback on our vision paper. We address each major comment below and will revise the manuscript to improve clarity and completeness while preserving its visionary nature.

read point-by-point responses

Referee: [Abstract] Abstract: the central performance claim rests on DuckDB estimations whose methodology, workload details (queries, scale factors, Parquet configurations), measurement method, and error margins are not described, preventing assessment of the reported 46% decoding overhead or the projected net gains.

Authors: We agree that the DuckDB estimation methodology requires more detail to allow proper evaluation. In the revised manuscript we will expand both the abstract and the main text with a dedicated subsection (or appendix) that specifies the TPC-H queries and scale factors used, Parquet file configurations and compression settings, the exact measurement procedure within DuckDB, and any error margins or simplifying assumptions applied to the 46% decoding overhead figure. revision: yes
Referee: [Vision Proposal] Vision section: the assumption that decoding plus operator pushdown can be performed at line rate on the network datapath without offsetting PCIe handoff, memory-coherence, or sustained-parsing latency/power costs is stated without hardware model, prototype data, or sensitivity analysis; if any of these costs exceed the modeled savings, the headline claim does not hold.

Authors: As this is a vision paper, we do not possess a hardware prototype or detailed RTL-level model. We will nevertheless revise the Vision section to explicitly list the key hardware assumptions (line-rate parsing, PCIe transfer costs, memory coherence overheads, and power budgets), discuss how these costs could offset savings, and include a simple sensitivity analysis that varies the relative cost of offload versus host processing to show the conditions under which the proposed benefits remain valid. revision: partial

standing simulated objections not resolved

Empirical prototype data or a concrete hardware implementation of the proposed SmartNIC, which does not yet exist because the work is a forward-looking vision rather than an implementation study.

Circularity Check

0 steps flagged

No circularity: vision paper relies on external DuckDB estimations without self-referential derivation

full rationale

The manuscript presents a forward-looking vision for SmartNIC offload of decoding and operators, supported by DuckDB-based experimental estimations on TPC-H Parquet workloads. No equations, derivations, fitted parameters, or first-principles results are claimed. The performance suggestions (e.g., matching throughput on smaller CPUs via pre-filtered data) are presented as empirical observations from external tooling rather than quantities that reduce to the paper's own inputs by construction. No self-citation chains, ansatzes, or uniqueness theorems are invoked as load-bearing steps. The argument therefore remains self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The proposal rests on the unverified premise that SmartNIC hardware can execute decoding and filters at network speed; no free parameters or new entities are introduced beyond the hardware concept itself.

axioms (1)

domain assumption Decoding accounts for 46% of TPC-H runtime on Parquet files
Stated directly in the abstract without citation or measurement details.

pith-pipeline@v0.9.0 · 5393 in / 1094 out tokens · 78243 ms · 2026-05-15T20:46:05.284891+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SCENIC: Stream Computation-Enhanced SmartNIC
cs.AR 2026-04 unverdicted novelty 7.0

SCENIC delivers a programmable 200G SmartNIC with offloaded protocol stacks, stream compute units, and full OS transparency that matches commercial performance for custom offloads like collective communication and GPU...

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Azim Afroozeh and Peter Boncz. 2025. The FastLanes File Format.Proc. VLDB Endow.18, 11 (2025), 4629–4643. doi:10.14778/3749646.3749718

work page doi:10.14778/3749646.3749718 2025
[2]

Apache Software Foundation. 2025. Apache Parquet Format Specification. https: //parquet.apache.org/. Accessed: 2026-04-30

work page 2025
[3]

Nikos Armenatzoglou, Sanuj Basu, Naga Bhanoori, Mengchu Cai, Naresh Chainani, Kiran Chinta, Venkatraman Govindaraju, Todd J. Green, Monish Gupta, Sebastian Hillig, Eric Hotinger, Yan Leshinksy, Jintian Liang, Michael McCreedy, Fabian Nagel, Ippokratis Pandis, Panos Parchas, Rahul Pathak, Orestis Polychro- niou, Foyzur Rahman, Gaurav Saxena, Gokul Soundara...

work page doi:10.1145/3514221.3526045 2022
[4]

Mengchu Cai, Martin Grund, Anurag Gupta, Fabian Nagel, Ippokratis Pandis, Yannis Papakonstantinou, and Michalis Petropoulos. 2018. Integrated Querying of SQL database data and S3 data in Amazon Redshift.IEEE Data Eng. Bull.41, 2 (2018), 82–90

work page 2018
[5]

Jonas Dann, Daniel Ritter, and Holger Fröning. 2023. Non-relational Databases on FPGAs: Survey, Design Decisions, Challenges.ACM Comput. Surv.55, 11 (2023), 225:1–225:37. doi:10.1145/3568990

work page doi:10.1145/3568990 2023
[6]

Jonas Dann, Royden Wagner, Daniel Ritter, Christian Faerber, and Holger Fröning

work page
[7]

PipeJSON: Parsing JSON at Line Speed on FPGAs. InDaMoN. ACM, 3:1–3:7. doi:10.1145/3533737.3535094

work page doi:10.1145/3533737.3535094
[8]

Caulfield, Eric S

Daniel Firestone, Andrew Putnam, Sambrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian M. Caulfield, Eric S. Chung, Harish Kumar Chandrappa, Somesh Chaturmohta, Matt Humphrey, Jack Lavier, Norman Lam, Fengfen Liu, Kalin Ovtcharov, Jitu Padhye, Gautham Popuri, Shachar Raindel, Tejas Sapre, Mark Shaw, Gabriel Sil...

work page 2018
[9]

Mateusz Gienieczko, Maximilian Kuschewski, Thomas Neumann, Viktor Leis, and Jana Giceva. 2025. AnyBlox: A Framework for Self-Decoding Datasets.Proc. VLDB Endow.18, 11 (2025), 4017–4031. doi:10.14778/3749646.3749672

work page doi:10.14778/3749646.3749672 2025
[10]

Dimitrios Giouroukis, Dwi P. A. Nugroho, Varun Pandey, Steffen Zeuch, and Volker Markl. 2025. Analyzing Near-Network Hardware Acceleration with Co- Processing on DPUs.Proc. VLDB Endow.18, 13 (2025), 5689–5702. doi:10.14778/ 3773731.377374

work page arXiv 2025
[11]

Maximilian Jakob Heer, Benjamin Ramhorst, Yu Zhu, Luhao Liu, Zhiyi Hu, Jonas Dann, and Gustavo Alonso. 2025. RoCE BALBOA: Service-enhanced Data Center RDMA for SmartNICs.CoRRabs/2507.20412 (2025). doi:10.48550/ARXIV.2507. 20412

work page doi:10.48550/arxiv.2507 2025
[12]

Bernstein, Jialin Li, and Qizhen Zhang

Jason Hu, Philip A. Bernstein, Jialin Li, and Qizhen Zhang. 2025. DPDPU: Data Processing with DPUs. InCIDR. www.cidrdb.org

work page 2025
[13]

Yoon, Jeong-Uk Kang, Sangyeun Cho, Daniel D

Insoon Jo, Duck-Ho Bae, Andre S. Yoon, Jeong-Uk Kang, Sangyeun Cho, Daniel D. G. Lee, and Jaeheon Jeong. 2016. YourSQL: A High-Performance Database System Leveraging In-Storage Computing.Proc. VLDB Endow.9, 12 (2016), 924–

work page 2016
[14]

doi:10.14778/2994509.2994512

work page doi:10.14778/2994509.2994512
[15]

Marko Kabic, Bowen Wu, Jonas Dann, and Gustavo Alonso. 2025. Powerful GPUs or Fast Interconnects: Analyzing Relational Workloads on Modern GPUs.Proc. VLDB Endow.18, 11 (2025), 4350–4363. doi:10.14778/3749646.3749698

work page doi:10.14778/3749646.3749698 2025
[16]

Kfoury, Samia Choueiri, Ali Mazloum, Ali AlSabeh, Jose Gomez, and Jorge Crichigno

Elie F. Kfoury, Samia Choueiri, Ali Mazloum, Ali AlSabeh, Jose Gomez, and Jorge Crichigno. 2024. A Comprehensive Survey on SmartNICs: Architectures, Development Models, Applications, and Research Directions.IEEE Access12 (2024), 107297–107336. doi:10.1109/ACCESS.2024.3437203

work page doi:10.1109/access.2024.3437203 2024
[17]

Milojicic, and Gustavo Alonso

Dario Korolija, Dimitrios Koutsoukos, Kimberly Keeton, Konstantin Taranov, Dejan S. Milojicic, and Gustavo Alonso. 2022. Farview: Disaggregated Memory with Operator Off-loading for Database Engines. InCIDR. www.cidrdb.org

work page 2022
[18]

Maximilian Kuschewski, Jana Giceva, Thomas Neumann, and Viktor Leis. 2024. High-Performance Query Processing with NVMe Arrays: Spilling without Killing Performance.Proc. ACM Manag. Data2, 6 (2024), 238:1–238:27. doi:10.1145/ 3698813

work page 2024
[19]

Maximilian Kuschewski, David Sauerwein, Adnan Alhomssi, and Viktor Leis

work page
[20]

ACM Manag

BtrBlocks: Efficient Columnar Compression for Data Lakes.Proc. ACM Manag. Data1, 2 (2023), 118:1–118:26. doi:10.1145/3589263

work page doi:10.1145/3589263 2023
[21]

Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shiv- akumar, Matt Tolton, Theo Vassilakis, Hossein Ahmadi, Dan Delorey, Slava Min, Mosha Pasumansky, and Jeff Shute. 2020. Dremel: A Decade of Interac- tive SQL Analysis at Web Scale.Proc. VLDB Endow.13, 12 (2020), 3461–3472. doi:10.14778/3415478.3415568

work page doi:10.14778/3415478.3415568 2020
[22]

2012.A Technical Overview of the Oracle Exadata Data- base Machine and Exadata Storage Server

Oracle Corporation. 2012.A Technical Overview of the Oracle Exadata Data- base Machine and Exadata Storage Server. White Paper. Oracle Corpora- tion. https://www.oracle.com/technetwork/server-storage/engineered-systems/ exadata/dbmachine-x3-twp-1867467.pdf

work page 2012
[23]

Muhsen Owaida, David Sidler, Kaan Kara, and Gustavo Alonso. 2017. Centaur: A Framework for Hybrid CPU-FPGA Databases. InFCCM. IEEE Computer Society, 211–218. doi:10.1109/FCCM.2017.37

work page doi:10.1109/fccm.2017.37 2017
[24]

Jong-Hyeok Park, Soyee Choi, Gihwan Oh, and Sang Won Lee. 2021. SaS: SSD as SQL Database System.Proc. VLDB Endow.14, 9 (2021), 1481–1488. doi:10.14778/ 3461535.3461538

work page arXiv 2021
[25]

Johan Peltenburg, Ákos Hadnagy, Matthijs Brobbel, Robert Morrow, and Zaid Al-Ars. 2021. Tens of gigabytes per second JSON-to-Arrow conversion with FPGA accelerators. InFPT. IEEE, 1–9. doi:10.1109/ICFPT52863.2021.9609833

work page doi:10.1109/icfpt52863.2021.9609833 2021
[26]

Johan Peltenburg, Lars T. J. van Leeuwen, Joost Hoozemans, Jian Fang, Zaid Al-Ars, and H. Peter Hofstee. 2020. Battling the CPU Bottleneck in Apache Parquet to Arrow Conversion Using FPGA. InFPT. IEEE, 281–286. doi:10.1109/ ICFPT51103.2020.00048

work page internal anchor Pith review Pith/arXiv arXiv 2020
[27]

Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: an Embeddable Analytical Database. InSIGMOD. ACM, 1981–1984. doi:10.1145/3299869.3320212

work page doi:10.1145/3299869.3320212 2019
[28]

Benjamin Ramhorst, Maximilian Jakob Heer, Luhao Liu, Heejae Kim, Jonas Dann, Jin-Soo Kim, and Gustavo Alonso. 2026. SCENIC: Stream Computation-Enhanced SmartNIC.CoRRabs/2604.15128 (2026). doi:10.48550/arXiv.2604.15128

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.15128 2026
[29]

Benjamin Ramhorst, Dario Korolija, Maximilian Jakob Heer, Jonas Dann, Luhao Liu, and Gustavo Alonso. 2025. Coyote v2: Raising the Level of Abstraction for Data Center FPGAs. InSOSP. ACM, 639–654. doi:10.1145/3731569.3764845

work page doi:10.1145/3731569.3764845 2025
[30]

Jan Vincent Szlang, Sebastian Breß, Sebastian Cattes, Jonathan Dees, Florian Funke, Max Heimel, Michel Oleynik, Ismail Oukid, and Tobias Maltenberger

work page
[31]

VLDB Endow.18, 12 (2025), 5126–5138

Workload Insights From the Snowflake Data Cloud: What Do Production Analytic Queries Really Look Like?Proc. VLDB Endow.18, 12 (2025), 5126–5138. doi:10.14778/3750601.3750632

work page doi:10.14778/3750601.3750632 2025
[32]

Alexander van Renen, Dominik Horn, Pascal Pfeil, Kapil Vaidya, Wenjian Dong, Murali Narayanaswamy, Zhengchun Liu, Gaurav Saxena, Andreas Kipf, and Tim Kraska. 2024. Why TPC Is Not Enough: An Analysis of the Amazon Redshift Fleet.Proc. VLDB Endow.17, 11 (2024), 3694–3706. doi:10.14778/3681954.3682031

work page doi:10.14778/3681954.3682031 2024
[33]

Alexander van Renen and Viktor Leis. 2023. Cloud Analytics Benchmark.Proc. VLDB Endow.16, 6 (2023), 1413–1425. doi:10.14778/3583140.3583156

work page doi:10.14778/3583140.3583156 2023
[34]

Midhul Vuppalapati, Justin Miron, Rachit Agarwal, Dan Truong, Ashish Motivala, and Thierry Cruanes. 2020. Building An Elastic Query Engine on Disaggregated Storage. InNSDI. USENIX Association, 449–462

work page 2020
[35]

Zeke Wang, Jie Zhang, Hongjing Huang, Yingtao Li, Xueying Zhu, Mo Sun, Zihan Yang, De Ma, Huajin Tang, Gang Pan, Fei Wu, Bingsheng He, and Gustavo Alonso

work page
[36]

doi:10.48550/ARXIV.2503.09318

FpgaHub: Fpga-centric Hyper-heterogeneous Computing Platform for Big Data Analytics.CoRRabs/2503.09318 (2025). doi:10.48550/ARXIV.2503.09318

work page doi:10.48550/arxiv.2503.09318 2025
[37]

Louis Woods, Zsolt István, and Gustavo Alonso. 2014. Ibex - An Intelligent Storage Engine with Support for Advanced SQL Off-loading.Proc. VLDB Endow. 7, 11 (2014), 963–974. doi:10.14778/2732967.2732972

work page doi:10.14778/2732967.2732972 2014
[38]

Yifei Yang, Xiangyao Yu, Marco Serafini, Ashraf Aboulnaga, and Michael Stone- braker. 2024. FlexpushdownDB: rethinking computation pushdown for cloud OLAP DBMSs.VLDB J.33, 5 (2024), 1643–1670. doi:10.1007/s00778-024-00867-8

work page doi:10.1007/s00778-024-00867-8 2024
[39]

Woicik, Abdurrahman Ghanem, Marco Serafini, Ashraf Aboulnaga, and Michael Stonebraker

Xiangyao Yu, Matt Youill, Matthew E. Woicik, Abdurrahman Ghanem, Marco Serafini, Ashraf Aboulnaga, and Michael Stonebraker. 2020. PushdownDB: Accel- erating a DBMS Using S3 Computation. InICDE. IEEE, 1802–1805. doi:10.1109/ ICDE48307.2020.00174

work page arXiv 2020
[40]

Xinyu Zeng, Yulong Hui, Jiahong Shen, Andrew Pavlo, Wes McKinney, and Huanchen Zhang. 2023. An Empirical Evaluation of Columnar Storage Formats. Proc. VLDB Endow.17, 2 (2023), 148–161. doi:10.14778/3626292.3626298

work page doi:10.14778/3626292.3626298 2023
[41]

Andreas Zimmerer, Damien Dam, Jan Kossmann, Juliane Waack, Ismail Oukid, and Andreas Kipf. 2025. Pruning in Snowflake: Working Smarter, Not Harder. In SIGMOD. ACM, 757–770. doi:10.1145/3722212.3724447

work page doi:10.1145/3722212.3724447 2025