pith. machine review for the scientific record. sign in

arxiv: 2602.18775 · v2 · submitted 2026-02-21 · 💻 cs.DB

Recognition: 2 theorem links

· Lean Theorem

Should I Hide My Duck in the Lake?

Authors on Pith no claims yet

Pith reviewed 2026-05-15 20:46 UTC · model grok-4.3

classification 💻 cs.DB
keywords SmartNICdata lakesParquetquery offloadingdecodingDuckDBnetwork datapathdisaggregated storage
0
0 comments X

The pith

A SmartNIC on the network path can offload Parquet decoding and filtering to raise data lake query speeds while allowing cheaper CPUs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Data lakes spend much of their query time scanning remote storage, where decoding Parquet files alone takes 46 percent of runtime on TPC-H workloads. The paper puts forward a SmartNIC placed directly on the compute node's network datapath to decode files and apply pushed-down operators before data reaches the CPU. Estimates run with DuckDB show that queries can then work on this pre-filtered stream, delivering higher performance while still matching the throughput of conventional systems that use larger CPUs.

Core claim

By positioning a data processing SmartNIC on the network datapath, decoding and operator pushdown happen before data arrives at the host, so queries operate directly on pre-filtered results. This hides the cost of parsing raw files and allows the same query throughput with smaller, less expensive CPUs.

What carries the argument

A data processing SmartNIC that performs decoding and pushed-down operators on the network datapath to deliver pre-filtered data to the host.

If this is right

  • Query processing nodes can use smaller CPUs while keeping the same throughput.
  • System cost drops because less expensive hardware suffices for the same workload.
  • The scanning and decoding bottleneck in disaggregated storage is reduced.
  • Queries spend less time waiting on remote file access.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The design could be added to existing cloud networks with minimal changes to query engines like DuckDB.
  • Extending the same offload logic to other file formats would broaden the approach beyond Parquet.
  • The work points to a tighter coupling between network hardware and data processing that future systems may adopt.

Load-bearing premise

A practical SmartNIC can decode files and push down operators at full line rate without adding latency, power draw, or integration problems that erase the gains.

What would settle it

A working SmartNIC prototype that decodes Parquet at network line rate but causes end-to-end query latency to rise above current CPU-only baselines would disprove the performance benefit.

Figures

Figures reproduced from arXiv: 2602.18775 by Gustavo Alonso, Jonas Dann.

Figure 1
Figure 1. Figure 1: DuckDB TPC-H (scale factor 30, 4 streams) through [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Per-query breakdown of Parquet decoding, filtering, and the remaining query runtime for the benchmarks TPC-H [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: DuckDB scan rewriter optimizer extension. [PITH_FULL_IMAGE:figures/full_fig_p002_3.png] view at source ↗
Figure 6
Figure 6. Figure 6: Data processing SmartNIC architecture. scales slightly better, while CSV parsing plateaus. Overall, through￾put for both formats is dramatically lower than even for Parquet, where we see 14 − 16× higher throughput. While Parquet decoding already implies a significant overhead, text-based formats—lacking columnar organization, binary encoding, and predicate-relevant metadata—are far more costly to parse [P… view at source ↗
Figure 5
Figure 5. Figure 5: TPC-H CSV and JSON throughput (scale factor 10) [PITH_FULL_IMAGE:figures/full_fig_p003_5.png] view at source ↗
read the original abstract

Data lakes spend a significant fraction of query execution time on scanning data from remote, disaggregated storage. Decoding alone accounts for 46% of runtime when running TPC-H directly on Parquet files. To address this bottleneck, we propose a vision for a data processing SmartNIC for the cloud that sits on the network datapath of compute nodes to offload decoding and pushed-down operators, effectively hiding the cost of parsing raw files. Our experimental estimations with DuckDB suggest that by operating directly on pre-filtered data, as delivered by a SmartNIC, we can significantly increase query processing performance and can still match query throughput of traditional setups with smaller, less expensive CPUs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a vision for a cloud data-processing SmartNIC placed on the network datapath of compute nodes. It claims that offloading Parquet decoding and pushed-down operators hides scanning costs in disaggregated data lakes; DuckDB estimations are cited to show that pre-filtered data delivery yields significantly higher query performance while still matching traditional throughput on smaller, less expensive CPUs. Decoding is asserted to consume 46% of TPC-H runtime on Parquet.

Significance. If a SmartNIC meeting the stated line-rate and integration assumptions can be built, the approach would materially lower CPU provisioning costs for cloud analytics workloads that currently spend substantial time on remote file parsing.

major comments (2)
  1. [Abstract] Abstract: the central performance claim rests on DuckDB estimations whose methodology, workload details (queries, scale factors, Parquet configurations), measurement method, and error margins are not described, preventing assessment of the reported 46% decoding overhead or the projected net gains.
  2. [Vision Proposal] Vision section: the assumption that decoding plus operator pushdown can be performed at line rate on the network datapath without offsetting PCIe handoff, memory-coherence, or sustained-parsing latency/power costs is stated without hardware model, prototype data, or sensitivity analysis; if any of these costs exceed the modeled savings, the headline claim does not hold.
minor comments (1)
  1. The manuscript would benefit from an explicit limitations subsection that enumerates the hardware assumptions required for the SmartNIC to deliver the claimed benefits.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the detailed and constructive feedback on our vision paper. We address each major comment below and will revise the manuscript to improve clarity and completeness while preserving its visionary nature.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central performance claim rests on DuckDB estimations whose methodology, workload details (queries, scale factors, Parquet configurations), measurement method, and error margins are not described, preventing assessment of the reported 46% decoding overhead or the projected net gains.

    Authors: We agree that the DuckDB estimation methodology requires more detail to allow proper evaluation. In the revised manuscript we will expand both the abstract and the main text with a dedicated subsection (or appendix) that specifies the TPC-H queries and scale factors used, Parquet file configurations and compression settings, the exact measurement procedure within DuckDB, and any error margins or simplifying assumptions applied to the 46% decoding overhead figure. revision: yes

  2. Referee: [Vision Proposal] Vision section: the assumption that decoding plus operator pushdown can be performed at line rate on the network datapath without offsetting PCIe handoff, memory-coherence, or sustained-parsing latency/power costs is stated without hardware model, prototype data, or sensitivity analysis; if any of these costs exceed the modeled savings, the headline claim does not hold.

    Authors: As this is a vision paper, we do not possess a hardware prototype or detailed RTL-level model. We will nevertheless revise the Vision section to explicitly list the key hardware assumptions (line-rate parsing, PCIe transfer costs, memory coherence overheads, and power budgets), discuss how these costs could offset savings, and include a simple sensitivity analysis that varies the relative cost of offload versus host processing to show the conditions under which the proposed benefits remain valid. revision: partial

standing simulated objections not resolved
  • Empirical prototype data or a concrete hardware implementation of the proposed SmartNIC, which does not yet exist because the work is a forward-looking vision rather than an implementation study.

Circularity Check

0 steps flagged

No circularity: vision paper relies on external DuckDB estimations without self-referential derivation

full rationale

The manuscript presents a forward-looking vision for SmartNIC offload of decoding and operators, supported by DuckDB-based experimental estimations on TPC-H Parquet workloads. No equations, derivations, fitted parameters, or first-principles results are claimed. The performance suggestions (e.g., matching throughput on smaller CPUs via pre-filtered data) are presented as empirical observations from external tooling rather than quantities that reduce to the paper's own inputs by construction. No self-citation chains, ansatzes, or uniqueness theorems are invoked as load-bearing steps. The argument therefore remains self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The proposal rests on the unverified premise that SmartNIC hardware can execute decoding and filters at network speed; no free parameters or new entities are introduced beyond the hardware concept itself.

axioms (1)
  • domain assumption Decoding accounts for 46% of TPC-H runtime on Parquet files
    Stated directly in the abstract without citation or measurement details.

pith-pipeline@v0.9.0 · 5393 in / 1094 out tokens · 78243 ms · 2026-05-15T20:46:05.284891+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. SCENIC: Stream Computation-Enhanced SmartNIC

    cs.AR 2026-04 unverdicted novelty 7.0

    SCENIC delivers a programmable 200G SmartNIC with offloaded protocol stacks, stream compute units, and full OS transparency that matches commercial performance for custom offloads like collective communication and GPU...

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Azim Afroozeh and Peter Boncz. 2025. The FastLanes File Format.Proc. VLDB Endow.18, 11 (2025), 4629–4643. doi:10.14778/3749646.3749718

  2. [2]

    Apache Software Foundation. 2025. Apache Parquet Format Specification. https: //parquet.apache.org/. Accessed: 2026-04-30

  3. [3]

    Nikos Armenatzoglou, Sanuj Basu, Naga Bhanoori, Mengchu Cai, Naresh Chainani, Kiran Chinta, Venkatraman Govindaraju, Todd J. Green, Monish Gupta, Sebastian Hillig, Eric Hotinger, Yan Leshinksy, Jintian Liang, Michael McCreedy, Fabian Nagel, Ippokratis Pandis, Panos Parchas, Rahul Pathak, Orestis Polychro- niou, Foyzur Rahman, Gaurav Saxena, Gokul Soundara...

  4. [4]

    Mengchu Cai, Martin Grund, Anurag Gupta, Fabian Nagel, Ippokratis Pandis, Yannis Papakonstantinou, and Michalis Petropoulos. 2018. Integrated Querying of SQL database data and S3 data in Amazon Redshift.IEEE Data Eng. Bull.41, 2 (2018), 82–90

  5. [5]

    Jonas Dann, Daniel Ritter, and Holger Fröning. 2023. Non-relational Databases on FPGAs: Survey, Design Decisions, Challenges.ACM Comput. Surv.55, 11 (2023), 225:1–225:37. doi:10.1145/3568990

  6. [6]

    Jonas Dann, Royden Wagner, Daniel Ritter, Christian Faerber, and Holger Fröning

  7. [7]

    PipeJSON: Parsing JSON at Line Speed on FPGAs. InDaMoN. ACM, 3:1–3:7. doi:10.1145/3533737.3535094

  8. [8]

    Caulfield, Eric S

    Daniel Firestone, Andrew Putnam, Sambrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian M. Caulfield, Eric S. Chung, Harish Kumar Chandrappa, Somesh Chaturmohta, Matt Humphrey, Jack Lavier, Norman Lam, Fengfen Liu, Kalin Ovtcharov, Jitu Padhye, Gautham Popuri, Shachar Raindel, Tejas Sapre, Mark Shaw, Gabriel Sil...

  9. [9]

    Mateusz Gienieczko, Maximilian Kuschewski, Thomas Neumann, Viktor Leis, and Jana Giceva. 2025. AnyBlox: A Framework for Self-Decoding Datasets.Proc. VLDB Endow.18, 11 (2025), 4017–4031. doi:10.14778/3749646.3749672

  10. [10]

    Dimitrios Giouroukis, Dwi P. A. Nugroho, Varun Pandey, Steffen Zeuch, and Volker Markl. 2025. Analyzing Near-Network Hardware Acceleration with Co- Processing on DPUs.Proc. VLDB Endow.18, 13 (2025), 5689–5702. doi:10.14778/ 3773731.377374

  11. [11]

    Maximilian Jakob Heer, Benjamin Ramhorst, Yu Zhu, Luhao Liu, Zhiyi Hu, Jonas Dann, and Gustavo Alonso. 2025. RoCE BALBOA: Service-enhanced Data Center RDMA for SmartNICs.CoRRabs/2507.20412 (2025). doi:10.48550/ARXIV.2507. 20412

  12. [12]

    Bernstein, Jialin Li, and Qizhen Zhang

    Jason Hu, Philip A. Bernstein, Jialin Li, and Qizhen Zhang. 2025. DPDPU: Data Processing with DPUs. InCIDR. www.cidrdb.org

  13. [13]

    Yoon, Jeong-Uk Kang, Sangyeun Cho, Daniel D

    Insoon Jo, Duck-Ho Bae, Andre S. Yoon, Jeong-Uk Kang, Sangyeun Cho, Daniel D. G. Lee, and Jaeheon Jeong. 2016. YourSQL: A High-Performance Database System Leveraging In-Storage Computing.Proc. VLDB Endow.9, 12 (2016), 924–

  14. [14]

    doi:10.14778/2994509.2994512

  15. [15]

    Marko Kabic, Bowen Wu, Jonas Dann, and Gustavo Alonso. 2025. Powerful GPUs or Fast Interconnects: Analyzing Relational Workloads on Modern GPUs.Proc. VLDB Endow.18, 11 (2025), 4350–4363. doi:10.14778/3749646.3749698

  16. [16]

    Kfoury, Samia Choueiri, Ali Mazloum, Ali AlSabeh, Jose Gomez, and Jorge Crichigno

    Elie F. Kfoury, Samia Choueiri, Ali Mazloum, Ali AlSabeh, Jose Gomez, and Jorge Crichigno. 2024. A Comprehensive Survey on SmartNICs: Architectures, Development Models, Applications, and Research Directions.IEEE Access12 (2024), 107297–107336. doi:10.1109/ACCESS.2024.3437203

  17. [17]

    Milojicic, and Gustavo Alonso

    Dario Korolija, Dimitrios Koutsoukos, Kimberly Keeton, Konstantin Taranov, Dejan S. Milojicic, and Gustavo Alonso. 2022. Farview: Disaggregated Memory with Operator Off-loading for Database Engines. InCIDR. www.cidrdb.org

  18. [18]

    Maximilian Kuschewski, Jana Giceva, Thomas Neumann, and Viktor Leis. 2024. High-Performance Query Processing with NVMe Arrays: Spilling without Killing Performance.Proc. ACM Manag. Data2, 6 (2024), 238:1–238:27. doi:10.1145/ 3698813

  19. [19]

    Maximilian Kuschewski, David Sauerwein, Adnan Alhomssi, and Viktor Leis

  20. [20]

    ACM Manag

    BtrBlocks: Efficient Columnar Compression for Data Lakes.Proc. ACM Manag. Data1, 2 (2023), 118:1–118:26. doi:10.1145/3589263

  21. [21]

    Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shiv- akumar, Matt Tolton, Theo Vassilakis, Hossein Ahmadi, Dan Delorey, Slava Min, Mosha Pasumansky, and Jeff Shute. 2020. Dremel: A Decade of Interac- tive SQL Analysis at Web Scale.Proc. VLDB Endow.13, 12 (2020), 3461–3472. doi:10.14778/3415478.3415568

  22. [22]

    2012.A Technical Overview of the Oracle Exadata Data- base Machine and Exadata Storage Server

    Oracle Corporation. 2012.A Technical Overview of the Oracle Exadata Data- base Machine and Exadata Storage Server. White Paper. Oracle Corpora- tion. https://www.oracle.com/technetwork/server-storage/engineered-systems/ exadata/dbmachine-x3-twp-1867467.pdf

  23. [23]

    Muhsen Owaida, David Sidler, Kaan Kara, and Gustavo Alonso. 2017. Centaur: A Framework for Hybrid CPU-FPGA Databases. InFCCM. IEEE Computer Society, 211–218. doi:10.1109/FCCM.2017.37

  24. [24]

    Jong-Hyeok Park, Soyee Choi, Gihwan Oh, and Sang Won Lee. 2021. SaS: SSD as SQL Database System.Proc. VLDB Endow.14, 9 (2021), 1481–1488. doi:10.14778/ 3461535.3461538

  25. [25]

    Johan Peltenburg, Ákos Hadnagy, Matthijs Brobbel, Robert Morrow, and Zaid Al-Ars. 2021. Tens of gigabytes per second JSON-to-Arrow conversion with FPGA accelerators. InFPT. IEEE, 1–9. doi:10.1109/ICFPT52863.2021.9609833

  26. [26]

    Johan Peltenburg, Lars T. J. van Leeuwen, Joost Hoozemans, Jian Fang, Zaid Al-Ars, and H. Peter Hofstee. 2020. Battling the CPU Bottleneck in Apache Parquet to Arrow Conversion Using FPGA. InFPT. IEEE, 281–286. doi:10.1109/ ICFPT51103.2020.00048

  27. [27]

    Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: an Embeddable Analytical Database. InSIGMOD. ACM, 1981–1984. doi:10.1145/3299869.3320212

  28. [28]

    Benjamin Ramhorst, Maximilian Jakob Heer, Luhao Liu, Heejae Kim, Jonas Dann, Jin-Soo Kim, and Gustavo Alonso. 2026. SCENIC: Stream Computation-Enhanced SmartNIC.CoRRabs/2604.15128 (2026). doi:10.48550/arXiv.2604.15128

  29. [29]

    Benjamin Ramhorst, Dario Korolija, Maximilian Jakob Heer, Jonas Dann, Luhao Liu, and Gustavo Alonso. 2025. Coyote v2: Raising the Level of Abstraction for Data Center FPGAs. InSOSP. ACM, 639–654. doi:10.1145/3731569.3764845

  30. [30]

    Jan Vincent Szlang, Sebastian Breß, Sebastian Cattes, Jonathan Dees, Florian Funke, Max Heimel, Michel Oleynik, Ismail Oukid, and Tobias Maltenberger

  31. [31]

    VLDB Endow.18, 12 (2025), 5126–5138

    Workload Insights From the Snowflake Data Cloud: What Do Production Analytic Queries Really Look Like?Proc. VLDB Endow.18, 12 (2025), 5126–5138. doi:10.14778/3750601.3750632

  32. [32]

    Alexander van Renen, Dominik Horn, Pascal Pfeil, Kapil Vaidya, Wenjian Dong, Murali Narayanaswamy, Zhengchun Liu, Gaurav Saxena, Andreas Kipf, and Tim Kraska. 2024. Why TPC Is Not Enough: An Analysis of the Amazon Redshift Fleet.Proc. VLDB Endow.17, 11 (2024), 3694–3706. doi:10.14778/3681954.3682031

  33. [33]

    Alexander van Renen and Viktor Leis. 2023. Cloud Analytics Benchmark.Proc. VLDB Endow.16, 6 (2023), 1413–1425. doi:10.14778/3583140.3583156

  34. [34]

    Midhul Vuppalapati, Justin Miron, Rachit Agarwal, Dan Truong, Ashish Motivala, and Thierry Cruanes. 2020. Building An Elastic Query Engine on Disaggregated Storage. InNSDI. USENIX Association, 449–462

  35. [35]

    Zeke Wang, Jie Zhang, Hongjing Huang, Yingtao Li, Xueying Zhu, Mo Sun, Zihan Yang, De Ma, Huajin Tang, Gang Pan, Fei Wu, Bingsheng He, and Gustavo Alonso

  36. [36]

    doi:10.48550/ARXIV.2503.09318

    FpgaHub: Fpga-centric Hyper-heterogeneous Computing Platform for Big Data Analytics.CoRRabs/2503.09318 (2025). doi:10.48550/ARXIV.2503.09318

  37. [37]

    Louis Woods, Zsolt István, and Gustavo Alonso. 2014. Ibex - An Intelligent Storage Engine with Support for Advanced SQL Off-loading.Proc. VLDB Endow. 7, 11 (2014), 963–974. doi:10.14778/2732967.2732972

  38. [38]

    Yifei Yang, Xiangyao Yu, Marco Serafini, Ashraf Aboulnaga, and Michael Stone- braker. 2024. FlexpushdownDB: rethinking computation pushdown for cloud OLAP DBMSs.VLDB J.33, 5 (2024), 1643–1670. doi:10.1007/s00778-024-00867-8

  39. [39]

    Woicik, Abdurrahman Ghanem, Marco Serafini, Ashraf Aboulnaga, and Michael Stonebraker

    Xiangyao Yu, Matt Youill, Matthew E. Woicik, Abdurrahman Ghanem, Marco Serafini, Ashraf Aboulnaga, and Michael Stonebraker. 2020. PushdownDB: Accel- erating a DBMS Using S3 Computation. InICDE. IEEE, 1802–1805. doi:10.1109/ ICDE48307.2020.00174

  40. [40]

    Xinyu Zeng, Yulong Hui, Jiahong Shen, Andrew Pavlo, Wes McKinney, and Huanchen Zhang. 2023. An Empirical Evaluation of Columnar Storage Formats. Proc. VLDB Endow.17, 2 (2023), 148–161. doi:10.14778/3626292.3626298

  41. [41]

    Andreas Zimmerer, Damien Dam, Jan Kossmann, Juliane Waack, Ismail Oukid, and Andreas Kipf. 2025. Pruning in Snowflake: Working Smarter, Not Harder. In SIGMOD. ACM, 757–770. doi:10.1145/3722212.3724447