The Coming Age of Pervasive Data Processing
Pith reviewed 2026-05-25 18:15 UTC · model grok-4.3
The pith
Current data processing frameworks must eliminate inefficiencies to support an era of pervasive computing across all device scales.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that in order to prepare for an era where data collection and processing occur on a wide range of devices, from powerful HPC machines to small embedded devices, it is crucial to investigate and eliminate the potential sources of inefficiency in the current state of the art platforms, and the paper addresses the current and upcoming challenges while presenting directions for designing the next generation of large-scale data processing systems.
What carries the argument
The central mechanism is the systematic review of inefficiencies in current large-scale data processing frameworks when extended beyond homogeneous clusters into heterogeneous pervasive environments.
Load-bearing premise
Current large-scale data processing frameworks contain addressable inefficiencies that new design directions can resolve for pervasive environments.
What would settle it
An experiment showing that unmodified current frameworks maintain high efficiency and low overhead when deployed across a mix of HPC machines and embedded devices would falsify the need to eliminate sources of inefficiency.
Figures
read the original abstract
Emerging Big Data analytics and machine learning applications require a significant amount of computational power. While there exists a plethora of large-scale data processing frameworks which thrive in handling the various complexities of data-intensive workloads, the ever-increasing demand of applications have made us reconsider the traditional ways of scaling (e.g., scale-out) and seek new opportunities for improving the performance. In order to prepare for an era where data collection and processing occur on a wide range of devices, from powerful HPC machines to small embedded devices, it is crucial to investigate and eliminate the potential sources of inefficiency in the current state of the art platforms. In this paper, we address the current and upcoming challenges of pervasive data processing and present directions for designing the next generation of large-scale data processing systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a position paper arguing that emerging Big Data analytics and machine learning applications require reconsideration of traditional scale-out approaches. To support pervasive data processing across a spectrum of devices from HPC machines to small embedded systems, the authors call for investigation and elimination of inefficiencies in current large-scale data processing frameworks and outline directions for next-generation systems.
Significance. A clear articulation of design directions for heterogeneous, edge-to-HPC environments could usefully focus community attention on the limitations of existing frameworks when data collection and processing become truly pervasive.
minor comments (2)
- [Abstract] Abstract, paragraph 3: the claim that it is 'crucial to investigate and eliminate the potential sources of inefficiency' would be strengthened by at least one concrete example of an inefficiency that current frameworks exhibit in pervasive settings.
- The manuscript would benefit from explicit enumeration of the 'directions for designing the next generation' promised in the abstract so that readers can assess their novelty and actionability.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our position paper and the recommendation for minor revision. The report contains no specific major comments to address.
Circularity Check
No significant circularity: position paper with no derivation chain
full rationale
The manuscript is a position paper whose central claim is a prescriptive call to investigate inefficiencies in existing large-scale data processing frameworks for pervasive (edge-to-HPC) environments. It contains no equations, formal derivations, empirical predictions, fitted parameters, or load-bearing technical results. The text motivates future design directions without asserting any result that reduces to its own inputs by construction, self-citation, or renaming. No circular steps exist.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[2]
End to End Learning for Self-Driving Cars
[Online]. Available: http://arxiv.org/abs/1604.07316 [3]D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, Q. Cheng, G. Chen, J. Chen, J. Chen, Z. Chen, M. Chrzanowski, A. Coates, G. Diamos, K. Ding, N. Du, E. Elsen, J. Engel, W. Fang, L. Fan, C. Fougner, L. Gao, C. Gong, A. Hannun, T. Han, L. Johannes, B....
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
Consumer credit-risk mod- els via machine-learning algorithms,
New Y ork, New Y ork, USA: PMLR, 20–22 Jun 2016, pp. 173–182. [Online]. Available: http://proceedings.mlr.press/v48/amodei16.html [4]A. E. Khandani, A. J. Kim, and A. W. Lo, “Consumer credit-risk mod- els via machine-learning algorithms,”Journal of Banking & Finance, vol. 34, no. 11, pp. 2767–2787,
work page 2016
-
[4]
Fast, scalable and secure onloading of edge functions using airbox,
[6]K. Bhardwaj, M.-W. Shih, P . Agarwal, A. Gavrilovska, T. Kim, and K. Schwan, “Fast, scalable and secure onloading of edge functions using airbox,” in2016 IEEE/ACM Symposium on Edge Computing (SEC). IEEE, 2016, pp. 14–27. [7]W. Shi, J. Cao, Q. Zhang, Y . Li, and L. Xu, “Edge computing: Vision and challenges,”IEEE Internet of Things Journal, vol. 3, no. ...
work page 2016
-
[5]
[17]M. Fleury and F. Reverbel, “The jboss extensible server,” inProceedings of the ACM/IFIP/USENIX 2003 International Conference on Middle- ware. Springer-V erlag New Y ork, Inc., 2003, pp. 344–373. [18]S. Tilkov and S. Vinoski, “Node. js: Using javascript to build high- performance network programs,”IEEE Internet Computing, vol. 14, no. 6, pp. 80–83,
work page 2003
-
[6]
Object/relational mapping 2008: hibernate and the entity data model (edm),
[20]E. J. O’Neil, “Object/relational mapping 2008: hibernate and the entity data model (edm),” inProceedings of the 2008 ACM SIGMOD interna- tional conference on Management of data. ACM, 2008, pp. 1351–1356. [21]G. Brose, “Jacorb: Implementation and design of a java-orb.” inDAIS, 1997, pp. 143–154. [22]N. Balani and R. Hathi,Apache Cxf web service develop...
work page 2008
-
[7]
Myths and realities: The performance impact of garbage collection,
ACM, 2005, pp. 313–326. [29]S. M. Blackburn, P . Cheng, and K. S. McKinley, “Myths and realities: The performance impact of garbage collection,” inACM SIGMETRICS Performance Evaluation Review, vol. 32, no
work page 2005
-
[8]
Trash day: Co- ordinating garbage collection in distributed systems,
ACM, 2004, pp. 25–36. [30]M. Maas, T. Harris, K. Asanovi´c, and J. Kubiatowicz, “Trash day: Co- ordinating garbage collection in distributed systems,” in15th Workshop on Hot Topics in Operating Systems (HotOS{XV}),
work page 2004
-
[9]
Y ak: A high-performance big-data-friendly garbage collec- tor,
[31]K. Nguyen, L. Fang, G. Xu, B. Demsky, S. Lu, S. Alamian, and O. Mutlu, “Y ak: A high-performance big-data-friendly garbage collec- tor,” in12th{USENIX}Symposium on Operating Systems Design and Implementation ({OSDI}16), 2016, pp. 349–365. [32]Y . Y u, T. Lei, W. Zhang, H. Chen, and B. Zang, “Performance analysis and optimization of full garbage collec...
work page 2016
-
[10]
Scaling spark in the real world: performance and usability,
ACM, 2016, pp. 123–130. [33]M. Armbrust, T. Das, A. Davidson, A. Ghodsi, A. Or, J. Rosen, I. Stoica, P . Wendell, R. Xin, and M. Zaharia, “Scaling spark in the real world: performance and usability,”Proceedings of the VLDB Endowment, vol. 8, no. 12, pp. 1840–1843,
work page 2016
-
[11]
[34]D. Lion, A. Chiu, H. Sun, X. Zhuang, N. Grcevski, and D. Y uan, “Don’t get caught in the cold, warm-up your{JVM}: Understand and eliminate {JVM}warm-up overhead in data-parallel systems,” in12th{USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), 2016, pp. 383–400. [35]S. Kavulya, J. Tan, R. Gandhi, and P . Narasimhan, “An an...
work page 2016
-
[12]
Hibench: A representative and comprehensive hadoop benchmark suite,
[38]S. Huang, J. Huang, Y . Liu, L. Yi, and J. Dai, “Hibench: A representative and comprehensive hadoop benchmark suite,” inProc. ICDE Workshops, 2010, pp. 41–51. [39]H. Bal, D. Epema, C. de Laat, R. van Nieuwpoort, J. Romein, F. Seinstra, C. Snoek, and H. Wijshoff, “A medium-scale distributed system for computer science research: Infrastructure for the l...
work page 2010
-
[13]
Cloud platforms and em- bedded computing: the operating systems of the future,
[41]J. S. Rellermeyer, S.-W. Lee, and M. Kistler, “Cloud platforms and em- bedded computing: the operating systems of the future,” inProceedings of the 50th Annual Design Automation Conference. ACM, 2013, p
work page 2013
-
[14]
Execution templates: Caching control plane decisions for strong scaling of data analytics,
[42]O. Mashayekhi, H. Qu, C. Shah, and P . Levis, “Execution templates: Caching control plane decisions for strong scaling of data analytics,” inProceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference, ser. USENIX A TC ’17. Berkeley, CA, USA: USENIX Association, 2017, pp. 513–526. [Online]. Available: http://dl.acm.org/citation.cfm...
-
[15]
A case study of accelerating apache spark with FPGA,
[Online]. Available: https://doi.org/10.1109/bigdata.2016.7840603 [44]J. Hou, Y . Zhu, L. Kong, Z. Wang, S. Du, S. Song, and T. Huang, “A case study of accelerating apache spark with FPGA,” in2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And En...
-
[16]
What are fpgas and project brainwave,
[Online]. Available: https://doi.org/10.1109/trustcom/bigdatase.2018.00123 [45]“What are fpgas and project brainwave,” https://docs.microsoft.com/en- us/azure/machine-learning/service/concept-accelerate-with-fpgas, accessed: 2019-04-16. [46]J. Chen, “Analysis of moore’s law on intel processors,” inProceedings of the 2013 International Conference on Electr...
-
[17]
Tensorflow: A system for large-scale machine learning,
[Online]. Available: https://doi.org/10.1145/3136014.3136031 [51]M. Abadi, P . Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P . Tucker, V . V asudevan, P . Warden, M. Wicke, Y . Y u, and X. Zheng, “Tensorflow: A system for large-scale machine ...
-
[18]
Making sense of performance in data analytics frameworks
[58]K. Ousterhout, R. Rasti, S. Ratnasamy, S. Shenker, and B.-G. Chun, “Making sense of performance in data analytics frameworks.” inNSDI ’15, vol. 15, 2015, pp. 293–307. [59]Q. Zhu, B. Akin, H. E. Sumbul, F. Sadi, J. C. Hoe, L. Pileggi, and F. Franchetti, “A 3d-stacked logic-in-memory accelerator for application- specific data intensive computing,” in 201...
work page 2015
-
[19]
Spark-GPU: An accelerated in-memory data processing engine on clusters,
[61]Y . Y uan, M. F. Salmi, Y . Huai, K. Wang, R. Lee, and X. Zhang, “Spark-GPU: An accelerated in-memory data processing engine on clusters,” in 2016 IEEE International Conference on Big Data (Big Data). IEEE, Dec
work page 2016
-
[20]
High- performance design of apache spark with RDMA and its benefits on various workloads,
[Online]. Available: https: //doi.org/10.1109/bigdata.2016.7840613 [62]X. Lu, D. Shankar, S. Gugnani, and D. K. D. K. Panda, “High- performance design of apache spark with RDMA and its benefits on various workloads,” in 2016 IEEE International Conference on Big Data (Big Data) . IEEE, Dec
-
[21]
High- performance design of apache spark with RDMA and its benefits on various workloads,
[Online]. Available: https://doi.org/10.1109/bigdata.2016.7840611 [63]C. Metz, “Big bets on ai open a new frontier for chip start-ups, too,” The New York Times, vol. 14,
-
[22]
In-datacenter performance analysis of a tensor processing unit,
[65]N. P . Jouppi, C. Y oung, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers et al., “In-datacenter performance analysis of a tensor processing unit,” in2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2017, pp. 1–12. [66]J. S. Rellermeyer, M. Duller, K. Gilmer, D. Maragkos,...
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.