Apache Parquet,

Apache Software Foundation, “Apache Parquet,” [Online]

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Optimizing High-Throughput Distributed Data Pipelines for Reproducible Deep Learning at Scale

cs.DC · 2026-04-23 · unverdicted · novelty 4.0

Optimizations to Petastorm and Parquet data pipelines with caching and deterministic queues reduce large-scale deep learning training time by 6x while raising GPU utilization above 60% and eliminating run-to-run variance.

citing papers explorer

Showing 1 of 1 citing paper.

Optimizing High-Throughput Distributed Data Pipelines for Reproducible Deep Learning at Scale cs.DC · 2026-04-23 · unverdicted · none · ref 5
Optimizations to Petastorm and Parquet data pipelines with caching and deterministic queues reduce large-scale deep learning training time by 6x while raising GPU utilization above 60% and eliminating run-to-run variance.

Apache Parquet,

fields

years

verdicts

representative citing papers

citing papers explorer