Energy Consumption of Dataframe Libraries for End-to-End Deep Learning Pipelines:A Comparative Analysis

Asif Imran; Punit Kumar; Tevfik Kosar

arxiv: 2511.08644 · v3 · submitted 2025-11-10 · 💻 cs.SE · cs.AI· cs.PF

Energy Consumption of Dataframe Libraries for End-to-End Deep Learning Pipelines:A Comparative Analysis

Punit Kumar , Asif Imran , Tevfik Kosar This is my paper

Pith reviewed 2026-05-17 23:09 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.PF

keywords dataframe librariesenergy consumptiondeep learning pipelinesPandasPolarsDaskGPU workloadsperformance analysis

0 comments

The pith

Dataframe libraries show distinct energy consumption when embedded in GPU deep learning pipelines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper compares Pandas, Polars, and Dask inside complete deep learning training and inference pipelines. It tracks how each library performs during data loading, preprocessing, and batch feeding while substantial GPU work is running. Measurements cover runtime, memory, disk usage, and energy draw on both CPU and GPU across several models and datasets. The study fills a gap because prior work rarely examined these libraries under real GPU loads instead of in isolation. If the differences hold, teams could pick the library that cuts total power use without altering the model itself.

Core claim

Through direct measurement the authors establish that Pandas, Polars, and Dask interact differently with GPU workloads during data loading, preprocessing, and batch feeding, producing quantifiable differences in runtime, memory, disk usage, and CPU plus GPU energy consumption across multiple machine learning models and datasets.

What carries the argument

End-to-end deep learning pipeline that embeds a dataframe library for data loading, preprocessing, and batch feeding, with simultaneous recording of CPU and GPU energy alongside runtime and memory metrics.

If this is right

Library choice during data preparation directly influences total CPU and GPU energy consumed by a deep learning pipeline.
Developers gain concrete data to select among Pandas, Polars, and Dask based on energy cost for their specific workloads.
Pipeline designs can incorporate library-specific energy profiles when targeting lower overall power draw.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same measurement approach could be applied to other data-processing libraries or to distributed training setups.
Energy metrics might be added to standard benchmarking tools so that sustainability becomes a routine selection criterion.
Cloud operators could use library rankings to guide default choices that lower electricity costs for customer workloads.

Load-bearing premise

That the chosen machine-learning models, datasets, and hardware configurations are representative of typical real-world deep-learning pipelines so that the measured energy differences generalize.

What would settle it

Re-running the identical pipelines on a different GPU model, a larger dataset, or an alternative set of models and observing whether the relative energy rankings among Pandas, Polars, and Dask stay the same.

Figures

Figures reproduced from arXiv: 2511.08644 by Asif Imran, Punit Kumar, Tevfik Kosar.

**Figure 2.** Figure 2: Overview of our energy profiling framework. The pipeline begins with loading datasets (text, image, or audio) using [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Energy consumption on ML-1M dataset. (a) CPU energy on Wikitext (DistilBert). (b) GPU energy on Wikitext (DistilBert) [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Energy consumption on Wikitext dataset. execution time. Interestingly, Polars’ CPU memory usage is higher (e.g., 8.03 MB vs. 2.83 MB for Random Forest) due to the Arrow buffers and parallel execution overhead, while Pandas maintains a leaner footprint. Dask incurs the highest runtime (1.027 s for Random Forest) because of task scheduling overhead, which outweighs its parallelism benefits on small data. Fo… view at source ↗

**Figure 5.** Figure 5: Energy consumption on Insurance dataset. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

This paper presents a detailed comparative analysis of the performance of three major Python data manipulation libraries - Pandas, Polars, and Dask - specifically when embedded within complete deep learning (DL) training and inference pipelines. The research bridges a gap in existing literature by studying how these libraries interact with substantial GPU workloads during critical phases like data loading, preprocessing, and batch feeding. The authors measured key performance indicators including runtime, memory usage, disk usage, and energy consumption (both CPU and GPU) across various machine learning models and datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper benchmarks energy use for Pandas, Polars, and Dask in full DL pipelines but the GPU interaction claim rests on total measurements without phase isolation.

read the letter

The main takeaway is a set of comparative numbers on runtime, memory, disk, and combined CPU-plus-GPU energy for Pandas, Polars, and Dask inside complete training and inference pipelines. They ran the libraries across a few models and datasets and reported the metrics end to end. That focus on full pipelines rather than isolated dataframe operations is the clearest addition here, and it gives practitioners some concrete figures they can look at when energy matters in GPU setups. The measurements themselves appear direct and cover the libraries that people actually use for data prep before feeding tensors to the GPU. The weak part is the experimental grounding. The work does not spell out controls for hardware variability, number of runs, error bars, or how they separated CPU preprocessing time from any GPU effects. Since dataframe libraries do their work on CPU until the data transfer step, total pipeline joules can easily be driven by differences in preprocessing speed rather than anything that changes GPU power draw. Without per-phase breakdowns or utilization traces, the framing around substantial GPU workloads does not follow from the data. This is the kind of applied benchmarking paper that engineers choosing libraries for energy-sensitive pipelines might find useful as a starting point. It is not a theoretical advance and the methods need tightening before the differences can be trusted. I would send it to peer review so referees can press on the measurement details and reproducibility, but I would not cite it in its current form.

Referee Report

3 major / 2 minor

Summary. The paper claims to perform a detailed comparative analysis of Pandas, Polars, and Dask libraries embedded in complete deep learning training and inference pipelines. It measures runtime, memory usage, disk usage, and energy consumption for both CPU and GPU across various machine learning models and datasets, aiming to bridge a gap by examining interactions with substantial GPU workloads in phases like data loading, preprocessing, and batch feeding.

Significance. Should the measurements prove robust upon detailed scrutiny, this study could offer practical insights for optimizing energy efficiency in DL pipelines by guiding the choice of dataframe libraries. It contributes by shifting focus from standalone library benchmarks to their performance within integrated GPU-accelerated workflows, potentially aiding developers in resource-constrained environments.

major comments (3)

[Abstract and Experimental Methodology] The abstract states that runtime, memory, disk, and energy metrics were collected across models and datasets, but provides no information on experimental controls, statistical methods, error bars, or exclusion criteria. Without these details the measurements cannot be verified to support the comparative claims.
[Results (energy attribution)] The central claim requires that Pandas/Polars/Dask choices measurably alter energy draw during GPU-heavy phases. If the experimental design records only total CPU+GPU joules per run and does not report per-phase breakdowns or control for data-transfer overhead to GPU, any observed differences could be driven entirely by CPU-side preprocessing rather than the claimed GPU interaction.
[Hardware, Models, and Datasets] The assumption that the chosen machine-learning models, datasets, and hardware configurations are representative of typical real-world deep-learning pipelines is not sufficiently justified, limiting the generalizability of any measured energy differences.

minor comments (2)

[Tables and Figures] Clarify notation for distinguishing CPU versus GPU energy metrics in tables and figures to improve readability.
[Experimental Setup] Ensure all library versions, exact hardware specifications (e.g., GPU model, power measurement tools), and dataset sizes are explicitly listed for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed review. We address each major comment below and have revised the manuscript to improve transparency and address concerns about methodology, attribution, and generalizability.

read point-by-point responses

Referee: [Abstract and Experimental Methodology] The abstract states that runtime, memory, disk, and energy metrics were collected across models and datasets, but provides no information on experimental controls, statistical methods, error bars, or exclusion criteria. Without these details the measurements cannot be verified to support the comparative claims.

Authors: We acknowledge that the abstract prioritizes conciseness and omits methodological details. The full manuscript describes these elements in Section 3 (Experimental Setup), including five repeated runs per configuration, reporting of means with standard deviations for error bars, and exclusion criteria for runs exhibiting hardware anomalies or timeouts. To improve verifiability without lengthening the abstract excessively, we have added a brief clause summarizing the use of repeated measurements and statistical controls. revision: yes
Referee: [Results (energy attribution)] The central claim requires that Pandas/Polars/Dask choices measurably alter energy draw during GPU-heavy phases. If the experimental design records only total CPU+GPU joules per run and does not report per-phase breakdowns or control for data-transfer overhead to GPU, any observed differences could be driven entirely by CPU-side preprocessing rather than the claimed GPU interaction.

Authors: This concern is valid and highlights a limitation in attribution. Our setup records separate CPU and GPU energy via RAPL and NVML while standardizing batch sizes, transfer mechanisms, and pipeline structure across libraries to minimize confounding from data movement. Observed differences appear in both preprocessing and overall pipeline energy. We have added an explicit discussion paragraph acknowledging that finer per-phase instrumentation would strengthen claims of direct GPU-phase interaction and noting this as a direction for future work; no new measurements were collected. revision: partial
Referee: [Hardware, Models, and Datasets] The assumption that the chosen machine-learning models, datasets, and hardware configurations are representative of typical real-world deep-learning pipelines is not sufficiently justified, limiting the generalizability of any measured energy differences.

Authors: We agree that stronger justification is needed. The selected models (ResNet-50, VGG-16, and a small transformer) and datasets (CIFAR-10, MNIST, ImageNet subset) are standard benchmarks frequently cited in DL literature, and the hardware (Intel Xeon CPU with NVIDIA RTX 3090) represents a common single-GPU workstation. We have expanded the Hardware, Models, and Datasets subsection with supporting references to prior studies and added a short limitations paragraph discussing scope and potential differences in multi-GPU or cloud-scale environments. revision: yes

Circularity Check

0 steps flagged

Empirical measurement study with no derivation chain or self-referential reductions

full rationale

The paper performs direct experimental measurements of runtime, memory, disk, CPU and GPU energy across Pandas/Polars/Dask in end-to-end DL pipelines. No equations, fitted parameters, or predictions are derived; all claims rest on observed values from controlled runs. No self-citation is used to justify uniqueness or load-bearing premises, and the work is self-contained against external benchmarks (the hardware runs themselves). This matches the default expectation of no circularity for measurement studies.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced; the work consists of direct measurements of existing libraries under stated workloads.

pith-pipeline@v0.9.0 · 5387 in / 1150 out tokens · 35602 ms · 2026-05-17T23:09:04.715019+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We integrate Pandas, Polars, and Dask into representative deep learning training and inference pipelines and conduct experiments across a wide range of various machine learning models and datasets, measuring key performance indicators such as runtime, memory usage, disk usage, and energy consumption (CPU and GPU).
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Polars consistently minimizes CPU energy consumption on larger workloads, while Pandas remains competitive for moderate sizes.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

[1]

Sonia Bergamaschi et al. 2024. An Empirical Study on the Energy Usage and Performance of Pandas and Polars.ACM Transactions on Data Science(2024)

work page 2024
[2]

James Bornholt, Todd Mytkowicz, and Kathryn S McKinley. 2012. The model is not enough: Understanding energy consumption in mobile devices. In2012 IEEE hot chips 24 symposium (HCS). IEEE, 1–3

work page 2012
[3]

Alex Broihier, Stefanos Baziotis, Daniel Kang, and Charith Mendis. 2025. Pan- dasBench: A Benchmark for the Pandas API.arXiv preprint arXiv:2506.02345 (2025)

work page arXiv 2025
[4]

NVIDIA Corporation. 2025. nvidia-ml-py: Python Bindings for the NVIDIA Management Library. https://pypi.org/project/nvidia-ml-py/. https://pypi.org/ project/nvidia-ml-py/ Accessed: 2025-10-20

work page 2025
[5]

Stefanos Georgiou, Maria Kechagia, Tushar Sharma, Federica Sarro, and Ying Zou

work page
[6]

InProceedings of the 44th International Conference on Software Engineering

Green ai: Do deep learning frameworks have different costs?. InProceedings of the 44th International Conference on Software Engineering. 1082–1094

work page
[7]

Pramod Gupta and Anupam Bagchi. 2024. Introduction to pandas. InEssentials of python for artificial intelligence and machine learning. Springer, 161–196

work page 2024
[8]

F Maxwell Harper and Joseph A Konstan. 2015. MovieLens 1M dataset.ACM Transactions on Interactive Intelligent Systems5 (2015), 1–19

work page 2015
[9]

Lawrence Zitnick

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. InECCV. Springer, 740–755

work page 2014
[10]

Linux Kernel Community. 2024. perf: Linux profiling with performance counters. https://perf.wiki.kernel.org/index.php/Main_Page. Accessed 2025-10-15

work page 2024
[11]

Ritchie Lutkebohmert et al. 2021. Polars: Blazingly fast dataframes in rust and python

work page 2021
[12]

Wes McKinney. 2010. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Vol. 445. Austin, TX, 51–56

work page 2010
[14]

Angelo Mozzillo, Luca Zecchini, Luca Gagliardelli, Adeel Aslam, and Giovanni Simonini. 2023. Evaluation of Dataframe Libraries for Data Preparation on a Single Machine.arXiv preprint arXiv:2312.11122(2023)

work page arXiv 2023
[15]

Felix Nahrstedt, Mehdi Karmouche, Karolina Bargieł, Pouyeh Banijamali, Apoorva Nalini Pradeep Kumar, and Ivano Malavolta. 2024. An empirical study on the energy usage and performance of pandas and polars data analysis Python libraries. InProceedings of the 28th international conference on evaluation and assessment in software engineering. 58–68. Energy Co...

work page 2024
[16]

NVIDIA Corporation. 2024. NVIDIA Management Library (NVML) and Python bindings (pynvml). https://docs.nvidia.com/deploy/nvml-api. Accessed 2025-10- 15

work page 2024
[17]

Lucas Oliveira et al. 2023. An Exploratory Study on Energy Consumption of Dataframe Processing Libraries. InProceedings of IEEE Conference

work page 2023
[18]

Matthew Rocklin. 2015. Dask: Parallel Computation with Blocked algorithms and Task Scheduling. InProceedings of the 14th Python in Science Conference. 126–132

work page 2015
[19]

Shriram Shanbhag and Sridhar Chimalakonda. 2023. An exploratory study on energy consumption of dataframe processing libraries. In2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR). IEEE, 284–295

work page 2023
[20]

Douglas Souza et al. 2023. An Empirical Study on the Energy Usage and Perfor- mance of Pandas and Polars Data Analysis Python Libraries. InProceedings of the ACM Conference

work page 2023
[21]

Veeramani

V. Veeramani. 2017. Medical Cost Personal Datasets. https://www.kaggle.com/ datasets/mirichoi0218/insurance. Accessed 2025-10-15

work page 2017

[1] [1]

Sonia Bergamaschi et al. 2024. An Empirical Study on the Energy Usage and Performance of Pandas and Polars.ACM Transactions on Data Science(2024)

work page 2024

[2] [2]

James Bornholt, Todd Mytkowicz, and Kathryn S McKinley. 2012. The model is not enough: Understanding energy consumption in mobile devices. In2012 IEEE hot chips 24 symposium (HCS). IEEE, 1–3

work page 2012

[3] [3]

Alex Broihier, Stefanos Baziotis, Daniel Kang, and Charith Mendis. 2025. Pan- dasBench: A Benchmark for the Pandas API.arXiv preprint arXiv:2506.02345 (2025)

work page arXiv 2025

[4] [4]

NVIDIA Corporation. 2025. nvidia-ml-py: Python Bindings for the NVIDIA Management Library. https://pypi.org/project/nvidia-ml-py/. https://pypi.org/ project/nvidia-ml-py/ Accessed: 2025-10-20

work page 2025

[5] [5]

Stefanos Georgiou, Maria Kechagia, Tushar Sharma, Federica Sarro, and Ying Zou

work page

[6] [6]

InProceedings of the 44th International Conference on Software Engineering

Green ai: Do deep learning frameworks have different costs?. InProceedings of the 44th International Conference on Software Engineering. 1082–1094

work page

[7] [7]

Pramod Gupta and Anupam Bagchi. 2024. Introduction to pandas. InEssentials of python for artificial intelligence and machine learning. Springer, 161–196

work page 2024

[8] [8]

F Maxwell Harper and Joseph A Konstan. 2015. MovieLens 1M dataset.ACM Transactions on Interactive Intelligent Systems5 (2015), 1–19

work page 2015

[9] [9]

Lawrence Zitnick

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. InECCV. Springer, 740–755

work page 2014

[10] [10]

Linux Kernel Community. 2024. perf: Linux profiling with performance counters. https://perf.wiki.kernel.org/index.php/Main_Page. Accessed 2025-10-15

work page 2024

[11] [11]

Ritchie Lutkebohmert et al. 2021. Polars: Blazingly fast dataframes in rust and python

work page 2021

[12] [12]

Wes McKinney. 2010. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Vol. 445. Austin, TX, 51–56

work page 2010

[13] [14]

Angelo Mozzillo, Luca Zecchini, Luca Gagliardelli, Adeel Aslam, and Giovanni Simonini. 2023. Evaluation of Dataframe Libraries for Data Preparation on a Single Machine.arXiv preprint arXiv:2312.11122(2023)

work page arXiv 2023

[14] [15]

Felix Nahrstedt, Mehdi Karmouche, Karolina Bargieł, Pouyeh Banijamali, Apoorva Nalini Pradeep Kumar, and Ivano Malavolta. 2024. An empirical study on the energy usage and performance of pandas and polars data analysis Python libraries. InProceedings of the 28th international conference on evaluation and assessment in software engineering. 58–68. Energy Co...

work page 2024

[15] [16]

NVIDIA Corporation. 2024. NVIDIA Management Library (NVML) and Python bindings (pynvml). https://docs.nvidia.com/deploy/nvml-api. Accessed 2025-10- 15

work page 2024

[16] [17]

Lucas Oliveira et al. 2023. An Exploratory Study on Energy Consumption of Dataframe Processing Libraries. InProceedings of IEEE Conference

work page 2023

[17] [18]

Matthew Rocklin. 2015. Dask: Parallel Computation with Blocked algorithms and Task Scheduling. InProceedings of the 14th Python in Science Conference. 126–132

work page 2015

[18] [19]

Shriram Shanbhag and Sridhar Chimalakonda. 2023. An exploratory study on energy consumption of dataframe processing libraries. In2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR). IEEE, 284–295

work page 2023

[19] [20]

Douglas Souza et al. 2023. An Empirical Study on the Energy Usage and Perfor- mance of Pandas and Polars Data Analysis Python Libraries. InProceedings of the ACM Conference

work page 2023

[20] [21]

Veeramani

V. Veeramani. 2017. Medical Cost Personal Datasets. https://www.kaggle.com/ datasets/mirichoi0218/insurance. Accessed 2025-10-15

work page 2017