pith. sign in

arxiv: 1906.10496 · v1 · pith:KJNDNTJInew · submitted 2019-06-21 · 💻 cs.DC

The Coming Age of Pervasive Data Processing

Pith reviewed 2026-05-25 18:15 UTC · model grok-4.3

classification 💻 cs.DC
keywords pervasive data processingbig data frameworksinefficienciesheterogeneous devicesedge computingdata analyticsmachine learningsystem design directions
0
0 comments X

The pith

Current data processing frameworks must eliminate inefficiencies to support an era of pervasive computing across all device scales.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that big data analytics and machine learning applications increasingly require computational power distributed across heterogeneous devices from high-performance clusters to small embedded systems. Existing frameworks rely on assumptions suited to homogeneous scale-out environments, creating addressable sources of inefficiency that hinder performance in mixed settings. A sympathetic reader would care because future workloads will collect and process data everywhere from sensors to supercomputers, and without redesign the systems will not scale efficiently. The authors review the challenges of this pervasive data processing era and outline directions for next-generation frameworks.

Core claim

The central claim is that in order to prepare for an era where data collection and processing occur on a wide range of devices, from powerful HPC machines to small embedded devices, it is crucial to investigate and eliminate the potential sources of inefficiency in the current state of the art platforms, and the paper addresses the current and upcoming challenges while presenting directions for designing the next generation of large-scale data processing systems.

What carries the argument

The central mechanism is the systematic review of inefficiencies in current large-scale data processing frameworks when extended beyond homogeneous clusters into heterogeneous pervasive environments.

Load-bearing premise

Current large-scale data processing frameworks contain addressable inefficiencies that new design directions can resolve for pervasive environments.

What would settle it

An experiment showing that unmodified current frameworks maintain high efficiency and low overhead when deployed across a mix of HPC machines and embedded devices would falsify the need to eliminate sources of inefficiency.

Figures

Figures reproduced from arXiv: 1906.10496 by Apourva Parthasarathy, Dan Graur, Jan S. Rellermeyer, Sobhan Omranian Khorasani.

Figure 1
Figure 1. Figure 1: Modular Class Data Sharing for Big Data Processing [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: ResNet50 on TensorFlow using 9 nodes in different [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
read the original abstract

Emerging Big Data analytics and machine learning applications require a significant amount of computational power. While there exists a plethora of large-scale data processing frameworks which thrive in handling the various complexities of data-intensive workloads, the ever-increasing demand of applications have made us reconsider the traditional ways of scaling (e.g., scale-out) and seek new opportunities for improving the performance. In order to prepare for an era where data collection and processing occur on a wide range of devices, from powerful HPC machines to small embedded devices, it is crucial to investigate and eliminate the potential sources of inefficiency in the current state of the art platforms. In this paper, we address the current and upcoming challenges of pervasive data processing and present directions for designing the next generation of large-scale data processing systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript is a position paper arguing that emerging Big Data analytics and machine learning applications require reconsideration of traditional scale-out approaches. To support pervasive data processing across a spectrum of devices from HPC machines to small embedded systems, the authors call for investigation and elimination of inefficiencies in current large-scale data processing frameworks and outline directions for next-generation systems.

Significance. A clear articulation of design directions for heterogeneous, edge-to-HPC environments could usefully focus community attention on the limitations of existing frameworks when data collection and processing become truly pervasive.

minor comments (2)
  1. [Abstract] Abstract, paragraph 3: the claim that it is 'crucial to investigate and eliminate the potential sources of inefficiency' would be strengthened by at least one concrete example of an inefficiency that current frameworks exhibit in pervasive settings.
  2. The manuscript would benefit from explicit enumeration of the 'directions for designing the next generation' promised in the abstract so that readers can assess their novelty and actionability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our position paper and the recommendation for minor revision. The report contains no specific major comments to address.

Circularity Check

0 steps flagged

No significant circularity: position paper with no derivation chain

full rationale

The manuscript is a position paper whose central claim is a prescriptive call to investigate inefficiencies in existing large-scale data processing frameworks for pervasive (edge-to-HPC) environments. It contains no equations, formal derivations, empirical predictions, fitted parameters, or load-bearing technical results. The text motivates future design directions without asserting any result that reduces to its own inputs by construction, self-citation, or renaming. No circular steps exist.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract contains no technical derivations, fitted parameters, axioms, or new entities.

pith-pipeline@v0.9.0 · 5667 in / 911 out tokens · 21404 ms · 2026-05-25T18:15:02.308017+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 1 internal anchor

  1. [2]

    End to End Learning for Self-Driving Cars

    [Online]. Available: http://arxiv.org/abs/1604.07316 [3]D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, Q. Cheng, G. Chen, J. Chen, J. Chen, Z. Chen, M. Chrzanowski, A. Coates, G. Diamos, K. Ding, N. Du, E. Elsen, J. Engel, W. Fang, L. Fan, C. Fougner, L. Gao, C. Gong, A. Hannun, T. Han, L. Johannes, B....

  2. [3]

    Consumer credit-risk mod- els via machine-learning algorithms,

    New Y ork, New Y ork, USA: PMLR, 20–22 Jun 2016, pp. 173–182. [Online]. Available: http://proceedings.mlr.press/v48/amodei16.html [4]A. E. Khandani, A. J. Kim, and A. W. Lo, “Consumer credit-risk mod- els via machine-learning algorithms,”Journal of Banking & Finance, vol. 34, no. 11, pp. 2767–2787,

  3. [4]

    Fast, scalable and secure onloading of edge functions using airbox,

    [6]K. Bhardwaj, M.-W. Shih, P . Agarwal, A. Gavrilovska, T. Kim, and K. Schwan, “Fast, scalable and secure onloading of edge functions using airbox,” in2016 IEEE/ACM Symposium on Edge Computing (SEC). IEEE, 2016, pp. 14–27. [7]W. Shi, J. Cao, Q. Zhang, Y . Li, and L. Xu, “Edge computing: Vision and challenges,”IEEE Internet of Things Journal, vol. 3, no. ...

  4. [5]

    The jboss extensible server,

    [17]M. Fleury and F. Reverbel, “The jboss extensible server,” inProceedings of the ACM/IFIP/USENIX 2003 International Conference on Middle- ware. Springer-V erlag New Y ork, Inc., 2003, pp. 344–373. [18]S. Tilkov and S. Vinoski, “Node. js: Using javascript to build high- performance network programs,”IEEE Internet Computing, vol. 14, no. 6, pp. 80–83,

  5. [6]

    Object/relational mapping 2008: hibernate and the entity data model (edm),

    [20]E. J. O’Neil, “Object/relational mapping 2008: hibernate and the entity data model (edm),” inProceedings of the 2008 ACM SIGMOD interna- tional conference on Management of data. ACM, 2008, pp. 1351–1356. [21]G. Brose, “Jacorb: Implementation and design of a java-orb.” inDAIS, 1997, pp. 143–154. [22]N. Balani and R. Hathi,Apache Cxf web service develop...

  6. [7]

    Myths and realities: The performance impact of garbage collection,

    ACM, 2005, pp. 313–326. [29]S. M. Blackburn, P . Cheng, and K. S. McKinley, “Myths and realities: The performance impact of garbage collection,” inACM SIGMETRICS Performance Evaluation Review, vol. 32, no

  7. [8]

    Trash day: Co- ordinating garbage collection in distributed systems,

    ACM, 2004, pp. 25–36. [30]M. Maas, T. Harris, K. Asanovi´c, and J. Kubiatowicz, “Trash day: Co- ordinating garbage collection in distributed systems,” in15th Workshop on Hot Topics in Operating Systems (HotOS{XV}),

  8. [9]

    Y ak: A high-performance big-data-friendly garbage collec- tor,

    [31]K. Nguyen, L. Fang, G. Xu, B. Demsky, S. Lu, S. Alamian, and O. Mutlu, “Y ak: A high-performance big-data-friendly garbage collec- tor,” in12th{USENIX}Symposium on Operating Systems Design and Implementation ({OSDI}16), 2016, pp. 349–365. [32]Y . Y u, T. Lei, W. Zhang, H. Chen, and B. Zang, “Performance analysis and optimization of full garbage collec...

  9. [10]

    Scaling spark in the real world: performance and usability,

    ACM, 2016, pp. 123–130. [33]M. Armbrust, T. Das, A. Davidson, A. Ghodsi, A. Or, J. Rosen, I. Stoica, P . Wendell, R. Xin, and M. Zaharia, “Scaling spark in the real world: performance and usability,”Proceedings of the VLDB Endowment, vol. 8, no. 12, pp. 1840–1843,

  10. [11]

    Don’t get caught in the cold, warm-up your{JVM}: Understand and eliminate {JVM}warm-up overhead in data-parallel systems,

    [34]D. Lion, A. Chiu, H. Sun, X. Zhuang, N. Grcevski, and D. Y uan, “Don’t get caught in the cold, warm-up your{JVM}: Understand and eliminate {JVM}warm-up overhead in data-parallel systems,” in12th{USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), 2016, pp. 383–400. [35]S. Kavulya, J. Tan, R. Gandhi, and P . Narasimhan, “An an...

  11. [12]

    Hibench: A representative and comprehensive hadoop benchmark suite,

    [38]S. Huang, J. Huang, Y . Liu, L. Yi, and J. Dai, “Hibench: A representative and comprehensive hadoop benchmark suite,” inProc. ICDE Workshops, 2010, pp. 41–51. [39]H. Bal, D. Epema, C. de Laat, R. van Nieuwpoort, J. Romein, F. Seinstra, C. Snoek, and H. Wijshoff, “A medium-scale distributed system for computer science research: Infrastructure for the l...

  12. [13]

    Cloud platforms and em- bedded computing: the operating systems of the future,

    [41]J. S. Rellermeyer, S.-W. Lee, and M. Kistler, “Cloud platforms and em- bedded computing: the operating systems of the future,” inProceedings of the 50th Annual Design Automation Conference. ACM, 2013, p

  13. [14]

    Execution templates: Caching control plane decisions for strong scaling of data analytics,

    [42]O. Mashayekhi, H. Qu, C. Shah, and P . Levis, “Execution templates: Caching control plane decisions for strong scaling of data analytics,” inProceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference, ser. USENIX A TC ’17. Berkeley, CA, USA: USENIX Association, 2017, pp. 513–526. [Online]. Available: http://dl.acm.org/citation.cfm...

  14. [15]

    A case study of accelerating apache spark with FPGA,

    [Online]. Available: https://doi.org/10.1109/bigdata.2016.7840603 [44]J. Hou, Y . Zhu, L. Kong, Z. Wang, S. Du, S. Song, and T. Huang, “A case study of accelerating apache spark with FPGA,” in2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And En...

  15. [16]

    What are fpgas and project brainwave,

    [Online]. Available: https://doi.org/10.1109/trustcom/bigdatase.2018.00123 [45]“What are fpgas and project brainwave,” https://docs.microsoft.com/en- us/azure/machine-learning/service/concept-accelerate-with-fpgas, accessed: 2019-04-16. [46]J. Chen, “Analysis of moore’s law on intel processors,” inProceedings of the 2013 International Conference on Electr...

  16. [17]

    Tensorflow: A system for large-scale machine learning,

    [Online]. Available: https://doi.org/10.1145/3136014.3136031 [51]M. Abadi, P . Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P . Tucker, V . V asudevan, P . Warden, M. Wicke, Y . Y u, and X. Zheng, “Tensorflow: A system for large-scale machine ...

  17. [18]

    Making sense of performance in data analytics frameworks

    [58]K. Ousterhout, R. Rasti, S. Ratnasamy, S. Shenker, and B.-G. Chun, “Making sense of performance in data analytics frameworks.” inNSDI ’15, vol. 15, 2015, pp. 293–307. [59]Q. Zhu, B. Akin, H. E. Sumbul, F. Sadi, J. C. Hoe, L. Pileggi, and F. Franchetti, “A 3d-stacked logic-in-memory accelerator for application- specific data intensive computing,” in 201...

  18. [19]

    Spark-GPU: An accelerated in-memory data processing engine on clusters,

    [61]Y . Y uan, M. F. Salmi, Y . Huai, K. Wang, R. Lee, and X. Zhang, “Spark-GPU: An accelerated in-memory data processing engine on clusters,” in 2016 IEEE International Conference on Big Data (Big Data). IEEE, Dec

  19. [20]

    High- performance design of apache spark with RDMA and its benefits on various workloads,

    [Online]. Available: https: //doi.org/10.1109/bigdata.2016.7840613 [62]X. Lu, D. Shankar, S. Gugnani, and D. K. D. K. Panda, “High- performance design of apache spark with RDMA and its benefits on various workloads,” in 2016 IEEE International Conference on Big Data (Big Data) . IEEE, Dec

  20. [21]

    High- performance design of apache spark with RDMA and its benefits on various workloads,

    [Online]. Available: https://doi.org/10.1109/bigdata.2016.7840611 [63]C. Metz, “Big bets on ai open a new frontier for chip start-ups, too,” The New York Times, vol. 14,

  21. [22]

    In-datacenter performance analysis of a tensor processing unit,

    [65]N. P . Jouppi, C. Y oung, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers et al., “In-datacenter performance analysis of a tensor processing unit,” in2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2017, pp. 1–12. [66]J. S. Rellermeyer, M. Duller, K. Gilmer, D. Maragkos,...