pith. sign in

arxiv: 1604.08484 · v1 · pith:SYBOYR67new · submitted 2016-04-28 · 💻 cs.DC · cs.AR· cs.PF

Architectural Impact on Performance of In-memory Data Analytics: Apache Spark Case Study

classification 💻 cs.DC cs.ARcs.PF
keywords dataprocessingbatchperformanceanalyticsapachemicro-architecturalspark
0
0 comments X
read the original abstract

While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics for being a unified framework for both, batch and stream data processing. However, recent studies on micro-architectural characterization of in-memory data analytics are limited to only batch processing workloads. We compare micro-architectural performance of batch processing and stream processing workloads in Apache Spark using hardware performance counters on a dual socket server. In our evaluation experiments, we have found that batch processing are stream processing workloads have similar micro-architectural characteristics and are bounded by the latency of frequent data access to DRAM. For data accesses we have found that simultaneous multi-threading is effective in hiding the data latencies. We have also observed that (i) data locality on NUMA nodes can improve the performance by 10% on average and(ii) disabling next-line L1-D prefetchers can reduce the execution time by up-to 14\% and (iii) multiple small executors can provide up-to 36\% speedup over single large executor.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.