Quantitative Impact Evaluation of an Abstraction Layer for Data Stream Processing Systems

Christoph Matthies; Guenter Hesse; Johannes Huegle; Kelvin Glass; Matthias Uflacker

arxiv: 1907.08302 · v1 · pith:GQ4776WGnew · submitted 2019-07-18 · 💻 cs.PF · cs.DC

Quantitative Impact Evaluation of an Abstraction Layer for Data Stream Processing Systems

Guenter Hesse , Christoph Matthies , Kelvin Glass , Johannes Huegle , Matthias Uflacker This is my paper

Pith reviewed 2026-05-24 19:33 UTC · model grok-4.3

classification 💻 cs.PF cs.DC

keywords Apache Beamdata stream processingperformance evaluationabstraction layerbenchmarkApache Spark StreamingApache FlinkApache Apex

0 comments

The pith

Using Apache Beam as an abstraction layer slows streaming query execution by up to a factor of 58 compared to native code on Spark, Flink, and Apex.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a benchmark architecture to measure the runtime cost of writing streaming applications once in Apache Beam and running them on multiple engines. It executes the same queries both through the Beam layer and in native implementations on Apache Spark Streaming, Apache Flink, and Apache Apex. The measurements show large variance and slowdowns reaching 58 times when the abstraction is used. A reader would care because the stated purpose of Beam is to avoid costly rewrites when switching frameworks, yet the results indicate that this portability comes with a measurable performance price that must be weighed against the benefit.

Core claim

Usage of Apache Beam for the examined streaming applications caused a high variance of query execution times with a slowdown of up to a factor of 58 compared to queries developed without the abstraction layer on the three surveyed frameworks.

What carries the argument

A novel benchmark architecture that runs identical streaming queries with and without the Apache Beam abstraction layer on Spark Streaming, Flink, and Apex.

If this is right

Portability across stream processors carries a concrete execution-time cost that developers must quantify for each workload.
Native code on a single framework can deliver substantially lower latency than the same logic expressed through Beam.
Performance comparisons between frameworks should include the abstraction layer when portability is a requirement.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Teams prioritizing raw speed over framework flexibility may prefer direct native implementations for production workloads.
The benchmark could be extended to newer engines or additional query patterns to test whether the overhead pattern persists.
Runtime profiling of Beam-translated jobs could reveal specific operators responsible for the largest slowdowns.

Load-bearing premise

The chosen streaming applications, queries, and native implementations are representative of typical use and were optimized to the same degree as the Beam versions.

What would settle it

Re-executing the benchmark on a fresh collection of streaming workloads or with further-tuned native implementations that eliminate most of the observed gap would show whether the reported slowdowns are inherent to the abstraction.

Figures

Figures reproduced from arXiv: 1907.08302 by Christoph Matthies, Guenter Hesse, Johannes Huegle, Kelvin Glass, Matthias Uflacker.

**Figure 2.** Figure 2: Architecture of Apache Spark in Cluster Mode (based on [19], [22]) [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 4.** Figure 4: Architecture of an Apache Hadoop YARN (based on [24], [37]) [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗

**Figure 5.** Figure 5: Overview About the General Benchmark Architecture and Process [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 6.** Figure 6: Average Execution Times - Identity Query [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Average Execution Times - Sample Query tween the analyzed systems and parallelism factors. Compared to identity query results, times are slightly lower overall, which could be a result of the lower number of output records as described in Section III-B. The Apex Beam implementation is an exception as there is a major difference. To be more concrete, the average execution times for the sample query amount t… view at source ↗

**Figure 9.** Figure 9: Average Execution Times - Grep Query query-SDK combination. By SDK it is distinguished between using Apache Beam or native system APIs for application development. Deviations for the two parallelism factors are averaged and condensed in this way. This is done since separate visualizations for different parallelisms would not reveal any further insights. Additionally, the reduced number of values simplifies… view at source ↗

**Figure 10.** Figure 10: Relative Standard Deviation for System-Query-SDK Combinations [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗

**Figure 12.** Figure 12: Apache Flink Execution Plan for the Grep Query [PITH_FULL_IMAGE:figures/full_fig_p008_12.png] view at source ↗

**Figure 11.** Figure 11: Slowdown Factor for the Analyzed Systems and Queries [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗

**Figure 13.** Figure 13: Apache Flink Execution Plan for the Grep Query Implemented Using [PITH_FULL_IMAGE:figures/full_fig_p009_13.png] view at source ↗

read the original abstract

With the demand to process ever-growing data volumes, a variety of new data stream processing frameworks have been developed. Moving an implementation from one such system to another, e.g., for performance reasons, requires adapting existing applications to new interfaces. Apache Beam addresses these high substitution costs by providing an abstraction layer that enables executing programs on any of the supported streaming frameworks. In this paper, we present a novel benchmark architecture for comparing the performance impact of using Apache Beam on three streaming frameworks: Apache Spark Streaming, Apache Flink, and Apache Apex. We find significant performance penalties when using Apache Beam for application development in the surveyed systems. Overall, usage of Apache Beam for the examined streaming applications caused a high variance of query execution times with a slowdown of up to a factor of 58 compared to queries developed without the abstraction layer. All developed benchmark artifacts are publicly available to ensure reproducible results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Beam shows up to 58x slowdown and high variance versus native code on the three systems, with public artifacts, but the native baselines may not have received matching optimization effort.

read the letter

The main point is that this paper measures a substantial performance penalty from Apache Beam, reaching slowdowns of 58 times with noticeably higher variance in execution times across Spark Streaming, Flink, and Apex for the tested applications. They built a benchmark architecture to run the same logic both through the abstraction and directly on each framework. The work stands out for releasing all artifacts publicly, which lets others rerun or inspect the exact setups. The results come from direct benchmark executions rather than any modeling or fitted parameters, so the numbers are straightforward empirical observations that extend prior performance studies on these systems. The approach is consistent enough to make the comparisons usable for the workloads they chose. The softer part is the assumption that the native implementations were developed and tuned to the same degree as the Beam versions. Without detailed records of profiling, tuning steps, or framework-specific optimizations applied only to the direct code, some of the measured gap could reflect differences in implementation quality instead of the abstraction layer alone. The workloads are specific streaming queries, which limits how far the exact factors generalize. This paper is mainly for practitioners who need concrete data when deciding between Beam's portability and raw performance on these platforms. The public code makes the claims checkable, so it deserves a serious referee even if revisions would be needed to document the baseline tuning process more clearly.

Referee Report

2 major / 1 minor

Summary. The paper presents a benchmark architecture for measuring the performance overhead of Apache Beam as an abstraction layer when executing streaming applications on Apache Spark Streaming, Apache Flink, and Apache Apex. It reports that Beam usage produces high variance in query execution times and slowdowns of up to a factor of 58 relative to native implementations, while releasing all benchmark artifacts publicly.

Significance. If the native and Beam implementations are shown to have received comparable optimization effort, the quantitative results would establish a concrete performance cost for the abstraction layer, helping practitioners weigh portability against efficiency. The public release of artifacts is a clear strength that enables direct verification and reuse.

major comments (2)

[Benchmark Architecture] Benchmark Architecture section: the description of the three native implementations does not document tuning steps, profiling data, or framework-specific optimizations (e.g., custom state backends or partitioning) applied outside Beam, leaving open the possibility that part of the reported 58x slowdown arises from unequal implementation quality rather than the abstraction layer itself.
[Results] Results section (tables/figures reporting slowdown factors): the headline claim of 'high variance' and the maximum slowdown of 58x would be strengthened by explicit reporting of the number of runs, statistical measures of variance, and per-query breakdowns so that readers can assess whether the observed differences are robust across workloads.

minor comments (1)

[Abstract] Clarify in the abstract and introduction whether the selected applications and queries are intended to be representative or merely illustrative.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our benchmark results. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Benchmark Architecture] Benchmark Architecture section: the description of the three native implementations does not document tuning steps, profiling data, or framework-specific optimizations (e.g., custom state backends or partitioning) applied outside Beam, leaving open the possibility that part of the reported 58x slowdown arises from unequal implementation quality rather than the abstraction layer itself.

Authors: We agree that the Benchmark Architecture section would benefit from additional detail on the native implementations. In the revision we will document the exact configurations used for each native framework (Spark Streaming, Flink, Apex), any tuning steps performed, and the rationale for not applying framework-specific optimizations beyond standard settings. This will make explicit that the native versions represent typical out-of-the-box usage and allow readers to judge the degree of comparability. revision: yes
Referee: [Results] Results section (tables/figures reporting slowdown factors): the headline claim of 'high variance' and the maximum slowdown of 58x would be strengthened by explicit reporting of the number of runs, statistical measures of variance, and per-query breakdowns so that readers can assess whether the observed differences are robust across workloads.

Authors: We will revise the Results section to report the number of runs executed for each experiment, include statistical measures (e.g., standard deviation or inter-quartile range) to quantify the observed variance, and add per-query breakdowns of execution times and slowdown factors. These additions will allow readers to evaluate the robustness of the 58x maximum and the high-variance claim across the workloads. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical benchmark measurements

full rationale

The paper reports direct runtime measurements from executed queries on Spark Streaming, Flink, and Apex, with and without Beam. No equations, fitted parameters, predictions, or derivations appear in the abstract or described content. Claims rest on observed execution times rather than any reduction to prior fits or self-citations. The central result (up to 58x slowdown) is a raw empirical comparison, not a constructed quantity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical benchmark study; no free parameters, mathematical axioms, or invented entities are introduced or required.

pith-pipeline@v0.9.0 · 5688 in / 929 out tokens · 23855 ms · 2026-05-24T19:33:34.074599+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages

[1]

”one size ﬁts all

M. Stonebraker and U. C ¸ etintemel, “”one size ﬁts all”: An idea whose time has come and gone (abstract),” in Proc. International Conference on Data Engineering, ICDE , 2005, pp. 2–11. [Online]. Available: https://doi.org/10.1109/ICDE.2005.1

work page doi:10.1109/icde.2005.1 2005
[2]

Apache Beam Overview,

“Apache Beam Overview,” https://beam.apache.org/get-started/ beam-overview/, accessed: 2018-10-30

work page 2018
[3]

Object- Relational Mapping Revisited - A Quantitative Study on the Impact of Database Technology on O/R Mapping Strategies,

M. Lorenz, J. Rudolph, G. Hesse, M. Uﬂacker, and H. Plattner, “Object- Relational Mapping Revisited - A Quantitative Study on the Impact of Database Technology on O/R Mapping Strategies,” in Hawaii Interna- tional Conference on System Sciences, HICSS , 2017

work page 2017
[4]

Apache Flink™: Stream and Batch Processing in a Single Engine,

P. Carbone, A. Katsifodimos, S. Ewen, V . Markl, S. Haridi, and K. Tzoumas, “Apache Flink™: Stream and Batch Processing in a Single Engine,” IEEE Data Eng. Bull. , vol. 38, no. 4, pp. 28–38, 2015. [Online]. Available: http://sites.computer.org/debull/A15dec/p28.pdf

work page 2015
[5]

Apache Spark: A Uniﬁed Engine for Big Data Processing,

M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin, A. Ghodsi, J. Gonzalez, S. Shenker, and I. Stoica, “Apache Spark: A Uniﬁed Engine for Big Data Processing,” Commun. ACM , vol. 59, no. 11, pp. 56–65, 2016. [Online]. Available: http://doi.acm.org/10.1145/2934664

work page doi:10.1145/2934664 2016
[6]

Apache Apex,

“Apache Apex,” https://apex.apache.org/docs/apex/, accessed: 2018-09- 11

work page 2018
[7]

Apache Beam,

“Apache Beam,” https://github.com/apache/beam, accessed: 2018-08-17

work page 2018
[8]

Apache Beam Programming Guide,

“Apache Beam Programming Guide,” https://beam.apache.org/ documentation/programming-guide/, accessed: 2018-10-18

work page 2018
[9]

Runner Authoring Guide,

“Runner Authoring Guide,” https://github.com/apache/beam/blob/ master/website/src/contribute/runner-guide.md, accessed: 2018-10-18

work page 2018
[10]

The Dataﬂow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of- Order Data Processing,

T. Akidau, R. Bradshaw, C. Chambers, S. Chernyak, R. Fern ´andez- Moctezuma, R. Lax, S. McVeety, D. Mills, F. Perry, E. Schmidt, and S. Whittle, “The Dataﬂow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of- Order Data Processing,” PVLDB, vol. 8, no. 12, pp. 1792–1803, 2015. [Online]. Available: h...

work page 2015
[11]

Beam Capability Matrix,

“Beam Capability Matrix,” https://beam.apache.org/documentation/ runners/capability-matrix/#cap-summary-what, accessed: 2018-09-19

work page 2018
[12]

Apache Gearpump,

“Apache Gearpump,” https://gearpump.apache.org/overview.html, ac- cessed: 2018-10-15

work page 2018
[13]

MapReduce Tutorial,

“MapReduce Tutorial,” https://hadoop.apache.org/docs/stable/ hadoop-mapreduce-client/hadoop-mapreduce-client-core/ MapReduceTutorial.html, accessed: 2018-10-15

work page 2018
[14]

What is Samza?

“What is Samza?” https://samza.apache.org, accessed: 2018-10-15

work page 2018
[15]

Alibaba JStorm,

“Alibaba JStorm,” http://jstorm.io, accessed: 2018-10-15

work page 2018
[16]

IBM Streams,

“IBM Streams,” https://www.ibm.com/de-en/marketplace/ stream-computing, accessed: 2018-10-15

work page 2018
[17]

CLOUD DATAFLOW - Simpliﬁed stream and batch data processing, with equal reliability and expressiveness,

“CLOUD DATAFLOW - Simpliﬁed stream and batch data processing, with equal reliability and expressiveness,” https://cloud.google.com/ dataﬂow/, accessed: 2018-08-17

work page 2018
[18]

Cloud Dataﬂow, Apache Beam and you,

“Cloud Dataﬂow, Apache Beam and you,” https://cloud.google.com/ blog/products/gcp/cloud-dataﬂow-apache-beam-and-you, accessed: 2018-10-15

work page 2018
[19]

Conceptual Survey on Data Stream Processing Systems,

G. Hesse and M. Lorenz, “Conceptual Survey on Data Stream Processing Systems,” in IEEE International Conference on Parallel and Distributed Systems, ICPADS , 2015, pp. 797–802. [Online]. Available: https://doi.org/10.1109/ICPADS.2015.106

work page doi:10.1109/icpads.2015.106 2015
[20]

Flink - Distributed Runtime Environment,

“Flink - Distributed Runtime Environment,” https://ci.apache.org/ projects/ﬂink/ﬂink-docs-master/concepts/runtime.html, accessed: 2018- 09-27

work page 2018
[21]

Spark Streaming Programming Guide,

“Spark Streaming Programming Guide,” https://spark.apache.org/docs/ latest/streaming-programming-guide.html, accessed: 2018-09-26

work page 2018
[22]

Apache Spark - Cluster Mode Overview,

“Apache Spark - Cluster Mode Overview,” https://spark.apache.org/ docs/2.3.1/cluster-overview.html, accessed: 2018-09-10

work page 2018
[23]

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center,

B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. H. Katz, S. Shenker, and I. Stoica, “Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center,” in Proc. USENIX Symposium on Networked Systems Design and Implementation, NSDI ,

work page
[24]

Available: https://www.usenix.org/conference/nsdi11/ mesos-platform-ﬁne-grained-resource-sharing-data-center

[Online]. Available: https://www.usenix.org/conference/nsdi11/ mesos-platform-ﬁne-grained-resource-sharing-data-center

work page
[25]

Apache Hadoop Y ARN: Yet Another Resource Negotiator,

V . K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O’Malley, S. Radia, B. Reed, and E. Baldeschwieler, “Apache Hadoop Y ARN: Yet Another Resource Negotiator,” inACM Symposium on Cloud Computing, SOCC , 2013, pp. 5:1–5:16. [Online]. Available: http://doi.acm.org/10.1145...

work page doi:10.1145/2523616.2523633 2013
[26]

Kubernetes and the Path to Cloud Native,

E. A. Brewer, “Kubernetes and the Path to Cloud Native,” in Proc. ACM Symposium on Cloud Computing, SoCC , 2015, p. 167. [Online]. Available: http://doi.acm.org/10.1145/2806777.2809955

work page doi:10.1145/2806777.2809955 2015
[27]

Accelerating Spark with RDMA for Big Data Processing: Early Experiences,

X. Lu, M. Wasi-ur-Rahman, N. S. Islam, D. Shankar, and D. K. Panda, “Accelerating Spark with RDMA for Big Data Processing: Early Experiences,” in IEEE Annual Symposium on High- Performance Interconnects, HOTI , 2014, pp. 9–16. [Online]. Available: https://doi.org/10.1109/HOTI.2014.15

work page doi:10.1109/hoti.2014.15 2014
[28]

Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing,

M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauly, M. J. Franklin, S. Shenker, and I. Stoica, “Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing,” in Proc. USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2012, pp. 15–28. [Online]. Available: https://www.usenix.org/ conference/nsd...

work page 2012
[29]

Discretized Streams: Fault-Tolerant Streaming Computation at Scale,

M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica, “Discretized Streams: Fault-Tolerant Streaming Computation at Scale,” in ACM SIGOPS Symposium on Operating Systems Principles, SOSP , 2013, pp. 423–438. [Online]. Available: http://doi.acm.org/10.1145/ 2517349.2522737

work page arXiv 2013
[30]

Apache Hadoop,

“Apache Hadoop,” https://hadoop.apache.org, accessed: 2018-09-11

work page 2018
[31]

HDFS Architecture,

“HDFS Architecture,” http://hadoop.apache.org/docs/current/ hadoop-project-dist/hadoop-hdfs/HdfsDesign.html, accessed: 2018- 09-11

work page 2018
[32]

AdBench: A Complete Benchmark for Modern Data Pipelines,

M. Bhandarkar, “AdBench: A Complete Benchmark for Modern Data Pipelines,” in TPC Technology Conference, TPCTC , 2016, pp. 107–120. [Online]. Available: https://doi.org/10.1007/978-3-319-54334-5 8

work page doi:10.1007/978-3-319-54334-5 2016
[33]

Dunning and E

T. Dunning and E. Friedman, Streaming Architecture: New Designs Us- ing Apache Kafka and MapR Streams . O’Reilly Media, 2016. [Online]. Available: https://books.google.de/books?id=EU8kDAAAQBAJ

work page 2016
[34]

Kafka: a Distributed Messaging System for Log Processing,

J. Kreps, N. Narkhede, and J. Rao, “Kafka: a Distributed Messaging System for Log Processing,” in Proc. International Workshop on Net- working Meets Databases, NetDB , 2011, pp. 1–7

work page 2011
[35]

Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications,

B. Saha, H. Shah, S. Seth, G. Vijayaraghavan, A. C. Murthy, and C. Curino, “Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications,” in Proc. International Conference on Management of Data, ACM SIGMOD , 2015, pp. 1357–

work page 2015
[36]

Available: http://doi.acm.org/10.1145/2723372.2742790

[Online]. Available: http://doi.acm.org/10.1145/2723372.2742790

work page doi:10.1145/2723372.2742790
[37]

Flink - Y ARN Setup,

“Flink - Y ARN Setup,” https://ci.apache.org/projects/ﬂink/ ﬂink-docs-master/ops/deployment/yarn setup.html, accessed: 2018-09- 23

work page 2018
[38]

Running Spark on Y ARN,

“Running Spark on Y ARN,” https://spark.apache.org/docs/latest/ running-on-yarn.html, accessed: 2018-09-23

work page 2018
[39]

Apache Hadoop Y ARN,

“Apache Hadoop Y ARN,” https://hadoop.apache.org/docs/r3.0.3/ hadoop-yarn/hadoop-yarn-site/Y ARN.html, accessed: 2018-09-25

work page 2018
[40]

Apache Apex Documentation - Application Developer Guide,

“Apache Apex Documentation - Application Developer Guide,” http://apex.apache.org/docs/apex-3.7/application development/, accessed: 2018-09-26

work page 2018
[41]

Apache Flink,

“Apache Flink,” https://github.com/apache/ﬂink, accessed: 2018-10-21

work page 2018
[42]

Mirror of Apache Apex core,

“Mirror of Apache Apex core,” https://github.com/apache/apex-core, accessed: 2018-10-21

work page 2018
[43]

Mirror of Apache Spark,

“Mirror of Apache Spark,” https://github.com/apache/spark, accessed: 2018-10-21

work page 2018
[44]

AOL Search Query Logs,

“AOL Search Query Logs,” http://www.researchpipeline.com/ mediawiki/index.php?title=AOL Search Query Logs, accessed: 2018-09-28

work page 2018
[45]

StreamBench: Towards Benchmarking Modern Distributed Stream Computing Frameworks,

R. Lu, G. Wu, B. Xie, and J. Hu, “StreamBench: Towards Benchmarking Modern Distributed Stream Computing Frameworks,” in Proc. IEEE/ACM International Conference on Utility and Cloud Computing, UCC , 2014, pp. 69–78. [Online]. Available: https: //doi.org/10.1109/UCC.2014.15

work page doi:10.1109/ucc.2014.15 2014
[46]

Documentation - Kafka 0.10.2 Documentation,

“Documentation - Kafka 0.10.2 Documentation,” https://kafka.apache. org/documentation/, accessed: 2017-04-24

work page 2017
[47]

Flink - Command-Line Interface,

“Flink - Command-Line Interface,” https://ci.apache.org/projects/ﬂink/ ﬂink-docs-release-1.6/ops/cli.html, accessed: 2018-09-27

work page 2018
[48]

Spark Conﬁguration,

“Spark Conﬁguration,” https://spark.apache.org/docs/latest/ conﬁguration.html#spark-properties, accessed: 2018-09-27

work page 2018
[49]

Interface Context.OperatorContext,

“Interface Context.OperatorContext,” https://ci.apache.org/projects/ apex-core/apex-core-javadoc-release-3.6/com/datatorrent/api/Context. OperatorContext.html, accessed: 2018-10-29

work page 2018
[50]

Benchmarking Distributed Stream Data Processing Systems,

J. Karimov, T. Rabl, A. Katsifodimos, R. Samarev, H. Heiskanen, and V . Markl, “Benchmarking Distributed Stream Data Processing Systems,” in IEEE International Conference on Data Engineering, ICDE , 2018, pp. 1507–1518. [Online]. Available: http://doi.ieeecomputersociety.org/ 10.1109/ICDE.2018.00169

work page doi:10.1109/icde.2018.00169 2018
[51]

Challenges and Experiences in Building an Efﬁcient Apache Beam Runner For IBM Streams,

S. Li, P. Gerver, J. Macmillan, D. Debrunner, W. Marshall, and K. Wu, “Challenges and Experiences in Building an Efﬁcient Apache Beam Runner For IBM Streams,” PVLDB, vol. 11, no. 12, pp. 1742–1754,

work page
[52]

Available: http://www.vldb.org/pvldb/vol11/p1742-li.pdf

[Online]. Available: http://www.vldb.org/pvldb/vol11/p1742-li.pdf

work page
[53]

The CQL Continuous Query Language: Semantic Foundations and Query Execution,

A. Arasu, S. Babu, and J. Widom, “The CQL Continuous Query Language: Semantic Foundations and Query Execution,” VLDB J. , vol. 15, no. 2, pp. 121–142, 2006. [Online]. Available: https://doi.org/10.1007/s00778-004-0147-z

work page doi:10.1007/s00778-004-0147-z 2006
[54]

STREAM: The Stanford Stream Data Manager,

A. Arasu, B. Babcock, S. Babu, M. Datar, K. Ito, R. Motwani, I. Nishizawa, U. Srivastava, D. Thomas, R. Varma, and J. Widom, “STREAM: The Stanford Stream Data Manager,” IEEE Data Eng. Bull. , vol. 26, no. 1, pp. 19–26, 2003. [Online]. Available: http://sites.computer.org/debull/A03mar/paper.ps

work page 2003
[55]

Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources,

E. Begoli, J. Camacho-Rodr ´ıguez, J. Hyde, M. J. Mior, and D. Lemire, “Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources,” in Proc. International Conference on Management of Data, ACM SIGMOD , 2018, pp. 221–

work page 2018
[56]

Available: http://doi.acm.org/10.1145/3183713.3190662

[Online]. Available: http://doi.acm.org/10.1145/3183713.3190662

work page doi:10.1145/3183713.3190662
[57]

Streaming,

“Streaming,” https://calcite.apache.org/docs/stream.html, accessed: 2018-10-19

work page 2018
[58]

Towards a Streaming SQL Standard,

N. Jain, S. Mishra, A. Srinivasan, J. Gehrke, J. Widom, H. Balakrishnan, U. C ¸ etintemel, M. Cherniack, R. Tibbetts, and S. B. Zdonik, “Towards a Streaming SQL Standard,” PVLDB, vol. 1, no. 2, pp. 1379–1390,

work page
[59]

Available: http://www.vldb.org/pvldb/1/1454179.pdf

[Online]. Available: http://www.vldb.org/pvldb/1/1454179.pdf

work page
[60]

Oracle CEP CQL Language Reference 11g Release 1 (11.1.1),

“Oracle CEP CQL Language Reference 11g Release 1 (11.1.1),” https: //docs.oracle.com/cd/E16764 01/doc.1111/e12048/intro.htm, accessed: 2018-10-19

work page 2018
[61]

StreamSQL Overview,

“StreamSQL Overview,” https://docs.tibco.com/pub/sb-lv/2.1.8/doc/ html/streamsql/ssql-intro.html, accessed: 2018-10-19

work page 2018
[62]

KSQL and Kafka Streams,

“KSQL and Kafka Streams,” https://docs.conﬂuent.io/current/ streams-ksql.html, accessed: 2018-10-19

work page 2018
[63]

SAP HANA Smart Data Streaming: Developer Guide,

“SAP HANA Smart Data Streaming: Developer Guide,” https://help.sap.com/doc/25fc8560420d4d5099d6df02f7cbff9e/1.0. 12/en-US/streaming developer guide.pdf, accessed: 2018-10-19

work page 2018
[64]

SamzaSQL: Scalable Fast Data Management with Streaming SQL,

M. Pathirage, J. Hyde, Y . Pan, and B. Plale, “SamzaSQL: Scalable Fast Data Management with Streaming SQL,” in IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPS , 2016, pp. 1627–1636. [Online]. Available: https://doi.org/10.1109/IPDPSW.2016. 141

work page doi:10.1109/ipdpsw.2016 2016
[65]

Samza: Stateful Scalable Stream Processing at LinkedIn,

S. A. Noghabi, K. Paramasivam, Y . Pan, N. Ramesh, J. Bringhurst, I. Gupta, and R. H. Campbell, “Samza: Stateful Scalable Stream Processing at LinkedIn,” PVLDB, vol. 10, no. 12, pp. 1634–1645, 2017. [Online]. Available: http://www.vldb.org/pvldb/vol10/p1634-noghabi. pdf

work page 2017
[66]

Linear Road: A Stream Data Management Benchmark,

A. Arasu, M. Cherniack, E. F. Galvez, D. Maier, A. Maskey, E. Ryvkina, M. Stonebraker, and R. Tibbetts, “Linear Road: A Stream Data Management Benchmark,” in (e)Proc. International Conference on V ery Large Data Bases , 2004, pp. 480–491. [Online]. Available: http://www.vldb.org/conf/2004/RS12P1.PDF

work page 2004
[67]

Aurora: a new model and architecture for data stream management,

D. J. Abadi, D. Carney, U. C ¸ etintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. B. Zdonik, “Aurora: a new model and architecture for data stream management,” VLDB J. , vol. 12, no. 2, pp. 120–139, 2003. [Online]. Available: https://doi.org/10.1007/s00778-003-0095-z

work page doi:10.1007/s00778-003-0095-z 2003
[68]

Apache Storm,

“Apache Storm,” http://storm.apache.org, accessed: 2018-10-23

work page 2018
[69]

NEXMark Benchmark,

“NEXMark Benchmark,” http://datalab.cs.pdx.edu/niagara/NEXMark/, accessed: 2018-10-20

work page 2018
[70]

NEXMark – A Benchmark for Queries over Data Streams DRAFT,

P. Tucker, K. Tufte, V . Papadimos, and D. Maier, “NEXMark – A Benchmark for Queries over Data Streams DRAFT,” http://datalab.cs. pdx.edu/niagara/pstream/nexmark.pdf, accessed: 2018-10-20

work page 2018
[71]

Nexmark benchmark suite,

“Nexmark benchmark suite,” https://beam.apache.org/documentation/ sdks/java/nexmark/, accessed: 2018-10-20

work page 2018
[72]

Spark versus Flink: Understanding Performance in Big Data Analytics Frameworks,

O. Marcu, A. Costan, G. Antoniu, and M. S. P ´erez-Hern´andez, “Spark versus Flink: Understanding Performance in Big Data Analytics Frameworks,” in IEEE International Conference on Cluster Computing, CLUSTER , 2016, pp. 433–442. [Online]. Available: https://doi.org/10.1109/CLUSTER.2016.22

work page doi:10.1109/cluster.2016.22 2016
[73]

A Performance Comparison of Open-Source Stream Processing Platforms,

M. A. Lopez, A. G. P. Lobato, and O. C. M. B. Duarte, “A Performance Comparison of Open-Source Stream Processing Platforms,” in IEEE Global Communications Conference, GLOBECOM , 2016, pp. 1–6. [Online]. Available: https://doi.org/10.1109/GLOCOM.2016.7841533

work page doi:10.1109/glocom.2016.7841533 2016

[1] [1]

”one size ﬁts all

M. Stonebraker and U. C ¸ etintemel, “”one size ﬁts all”: An idea whose time has come and gone (abstract),” in Proc. International Conference on Data Engineering, ICDE , 2005, pp. 2–11. [Online]. Available: https://doi.org/10.1109/ICDE.2005.1

work page doi:10.1109/icde.2005.1 2005

[2] [2]

Apache Beam Overview,

“Apache Beam Overview,” https://beam.apache.org/get-started/ beam-overview/, accessed: 2018-10-30

work page 2018

[3] [3]

Object- Relational Mapping Revisited - A Quantitative Study on the Impact of Database Technology on O/R Mapping Strategies,

M. Lorenz, J. Rudolph, G. Hesse, M. Uﬂacker, and H. Plattner, “Object- Relational Mapping Revisited - A Quantitative Study on the Impact of Database Technology on O/R Mapping Strategies,” in Hawaii Interna- tional Conference on System Sciences, HICSS , 2017

work page 2017

[4] [4]

Apache Flink™: Stream and Batch Processing in a Single Engine,

P. Carbone, A. Katsifodimos, S. Ewen, V . Markl, S. Haridi, and K. Tzoumas, “Apache Flink™: Stream and Batch Processing in a Single Engine,” IEEE Data Eng. Bull. , vol. 38, no. 4, pp. 28–38, 2015. [Online]. Available: http://sites.computer.org/debull/A15dec/p28.pdf

work page 2015

[5] [5]

Apache Spark: A Uniﬁed Engine for Big Data Processing,

M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin, A. Ghodsi, J. Gonzalez, S. Shenker, and I. Stoica, “Apache Spark: A Uniﬁed Engine for Big Data Processing,” Commun. ACM , vol. 59, no. 11, pp. 56–65, 2016. [Online]. Available: http://doi.acm.org/10.1145/2934664

work page doi:10.1145/2934664 2016

[6] [6]

Apache Apex,

“Apache Apex,” https://apex.apache.org/docs/apex/, accessed: 2018-09- 11

work page 2018

[7] [7]

Apache Beam,

“Apache Beam,” https://github.com/apache/beam, accessed: 2018-08-17

work page 2018

[8] [8]

Apache Beam Programming Guide,

“Apache Beam Programming Guide,” https://beam.apache.org/ documentation/programming-guide/, accessed: 2018-10-18

work page 2018

[9] [9]

Runner Authoring Guide,

“Runner Authoring Guide,” https://github.com/apache/beam/blob/ master/website/src/contribute/runner-guide.md, accessed: 2018-10-18

work page 2018

[10] [10]

The Dataﬂow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of- Order Data Processing,

T. Akidau, R. Bradshaw, C. Chambers, S. Chernyak, R. Fern ´andez- Moctezuma, R. Lax, S. McVeety, D. Mills, F. Perry, E. Schmidt, and S. Whittle, “The Dataﬂow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of- Order Data Processing,” PVLDB, vol. 8, no. 12, pp. 1792–1803, 2015. [Online]. Available: h...

work page 2015

[11] [11]

Beam Capability Matrix,

“Beam Capability Matrix,” https://beam.apache.org/documentation/ runners/capability-matrix/#cap-summary-what, accessed: 2018-09-19

work page 2018

[12] [12]

Apache Gearpump,

“Apache Gearpump,” https://gearpump.apache.org/overview.html, ac- cessed: 2018-10-15

work page 2018

[13] [13]

MapReduce Tutorial,

“MapReduce Tutorial,” https://hadoop.apache.org/docs/stable/ hadoop-mapreduce-client/hadoop-mapreduce-client-core/ MapReduceTutorial.html, accessed: 2018-10-15

work page 2018

[14] [14]

What is Samza?

“What is Samza?” https://samza.apache.org, accessed: 2018-10-15

work page 2018

[15] [15]

Alibaba JStorm,

“Alibaba JStorm,” http://jstorm.io, accessed: 2018-10-15

work page 2018

[16] [16]

IBM Streams,

“IBM Streams,” https://www.ibm.com/de-en/marketplace/ stream-computing, accessed: 2018-10-15

work page 2018

[17] [17]

CLOUD DATAFLOW - Simpliﬁed stream and batch data processing, with equal reliability and expressiveness,

“CLOUD DATAFLOW - Simpliﬁed stream and batch data processing, with equal reliability and expressiveness,” https://cloud.google.com/ dataﬂow/, accessed: 2018-08-17

work page 2018

[18] [18]

Cloud Dataﬂow, Apache Beam and you,

“Cloud Dataﬂow, Apache Beam and you,” https://cloud.google.com/ blog/products/gcp/cloud-dataﬂow-apache-beam-and-you, accessed: 2018-10-15

work page 2018

[19] [19]

Conceptual Survey on Data Stream Processing Systems,

G. Hesse and M. Lorenz, “Conceptual Survey on Data Stream Processing Systems,” in IEEE International Conference on Parallel and Distributed Systems, ICPADS , 2015, pp. 797–802. [Online]. Available: https://doi.org/10.1109/ICPADS.2015.106

work page doi:10.1109/icpads.2015.106 2015

[20] [20]

Flink - Distributed Runtime Environment,

“Flink - Distributed Runtime Environment,” https://ci.apache.org/ projects/ﬂink/ﬂink-docs-master/concepts/runtime.html, accessed: 2018- 09-27

work page 2018

[21] [21]

Spark Streaming Programming Guide,

“Spark Streaming Programming Guide,” https://spark.apache.org/docs/ latest/streaming-programming-guide.html, accessed: 2018-09-26

work page 2018

[22] [22]

Apache Spark - Cluster Mode Overview,

“Apache Spark - Cluster Mode Overview,” https://spark.apache.org/ docs/2.3.1/cluster-overview.html, accessed: 2018-09-10

work page 2018

[23] [23]

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center,

B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. H. Katz, S. Shenker, and I. Stoica, “Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center,” in Proc. USENIX Symposium on Networked Systems Design and Implementation, NSDI ,

work page

[24] [24]

Available: https://www.usenix.org/conference/nsdi11/ mesos-platform-ﬁne-grained-resource-sharing-data-center

[Online]. Available: https://www.usenix.org/conference/nsdi11/ mesos-platform-ﬁne-grained-resource-sharing-data-center

work page

[25] [25]

Apache Hadoop Y ARN: Yet Another Resource Negotiator,

V . K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O’Malley, S. Radia, B. Reed, and E. Baldeschwieler, “Apache Hadoop Y ARN: Yet Another Resource Negotiator,” inACM Symposium on Cloud Computing, SOCC , 2013, pp. 5:1–5:16. [Online]. Available: http://doi.acm.org/10.1145...

work page doi:10.1145/2523616.2523633 2013

[26] [26]

Kubernetes and the Path to Cloud Native,

E. A. Brewer, “Kubernetes and the Path to Cloud Native,” in Proc. ACM Symposium on Cloud Computing, SoCC , 2015, p. 167. [Online]. Available: http://doi.acm.org/10.1145/2806777.2809955

work page doi:10.1145/2806777.2809955 2015

[27] [27]

Accelerating Spark with RDMA for Big Data Processing: Early Experiences,

X. Lu, M. Wasi-ur-Rahman, N. S. Islam, D. Shankar, and D. K. Panda, “Accelerating Spark with RDMA for Big Data Processing: Early Experiences,” in IEEE Annual Symposium on High- Performance Interconnects, HOTI , 2014, pp. 9–16. [Online]. Available: https://doi.org/10.1109/HOTI.2014.15

work page doi:10.1109/hoti.2014.15 2014

[28] [28]

Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing,

M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauly, M. J. Franklin, S. Shenker, and I. Stoica, “Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing,” in Proc. USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2012, pp. 15–28. [Online]. Available: https://www.usenix.org/ conference/nsd...

work page 2012

[29] [29]

Discretized Streams: Fault-Tolerant Streaming Computation at Scale,

M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica, “Discretized Streams: Fault-Tolerant Streaming Computation at Scale,” in ACM SIGOPS Symposium on Operating Systems Principles, SOSP , 2013, pp. 423–438. [Online]. Available: http://doi.acm.org/10.1145/ 2517349.2522737

work page arXiv 2013

[30] [30]

Apache Hadoop,

“Apache Hadoop,” https://hadoop.apache.org, accessed: 2018-09-11

work page 2018

[31] [31]

HDFS Architecture,

“HDFS Architecture,” http://hadoop.apache.org/docs/current/ hadoop-project-dist/hadoop-hdfs/HdfsDesign.html, accessed: 2018- 09-11

work page 2018

[32] [32]

AdBench: A Complete Benchmark for Modern Data Pipelines,

M. Bhandarkar, “AdBench: A Complete Benchmark for Modern Data Pipelines,” in TPC Technology Conference, TPCTC , 2016, pp. 107–120. [Online]. Available: https://doi.org/10.1007/978-3-319-54334-5 8

work page doi:10.1007/978-3-319-54334-5 2016

[33] [33]

Dunning and E

T. Dunning and E. Friedman, Streaming Architecture: New Designs Us- ing Apache Kafka and MapR Streams . O’Reilly Media, 2016. [Online]. Available: https://books.google.de/books?id=EU8kDAAAQBAJ

work page 2016

[34] [34]

Kafka: a Distributed Messaging System for Log Processing,

J. Kreps, N. Narkhede, and J. Rao, “Kafka: a Distributed Messaging System for Log Processing,” in Proc. International Workshop on Net- working Meets Databases, NetDB , 2011, pp. 1–7

work page 2011

[35] [35]

Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications,

B. Saha, H. Shah, S. Seth, G. Vijayaraghavan, A. C. Murthy, and C. Curino, “Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications,” in Proc. International Conference on Management of Data, ACM SIGMOD , 2015, pp. 1357–

work page 2015

[36] [36]

Available: http://doi.acm.org/10.1145/2723372.2742790

[Online]. Available: http://doi.acm.org/10.1145/2723372.2742790

work page doi:10.1145/2723372.2742790

[37] [37]

Flink - Y ARN Setup,

“Flink - Y ARN Setup,” https://ci.apache.org/projects/ﬂink/ ﬂink-docs-master/ops/deployment/yarn setup.html, accessed: 2018-09- 23

work page 2018

[38] [38]

Running Spark on Y ARN,

“Running Spark on Y ARN,” https://spark.apache.org/docs/latest/ running-on-yarn.html, accessed: 2018-09-23

work page 2018

[39] [39]

Apache Hadoop Y ARN,

“Apache Hadoop Y ARN,” https://hadoop.apache.org/docs/r3.0.3/ hadoop-yarn/hadoop-yarn-site/Y ARN.html, accessed: 2018-09-25

work page 2018

[40] [40]

Apache Apex Documentation - Application Developer Guide,

“Apache Apex Documentation - Application Developer Guide,” http://apex.apache.org/docs/apex-3.7/application development/, accessed: 2018-09-26

work page 2018

[41] [41]

Apache Flink,

“Apache Flink,” https://github.com/apache/ﬂink, accessed: 2018-10-21

work page 2018

[42] [42]

Mirror of Apache Apex core,

“Mirror of Apache Apex core,” https://github.com/apache/apex-core, accessed: 2018-10-21

work page 2018

[43] [43]

Mirror of Apache Spark,

“Mirror of Apache Spark,” https://github.com/apache/spark, accessed: 2018-10-21

work page 2018

[44] [44]

AOL Search Query Logs,

“AOL Search Query Logs,” http://www.researchpipeline.com/ mediawiki/index.php?title=AOL Search Query Logs, accessed: 2018-09-28

work page 2018

[45] [45]

StreamBench: Towards Benchmarking Modern Distributed Stream Computing Frameworks,

R. Lu, G. Wu, B. Xie, and J. Hu, “StreamBench: Towards Benchmarking Modern Distributed Stream Computing Frameworks,” in Proc. IEEE/ACM International Conference on Utility and Cloud Computing, UCC , 2014, pp. 69–78. [Online]. Available: https: //doi.org/10.1109/UCC.2014.15

work page doi:10.1109/ucc.2014.15 2014

[46] [46]

Documentation - Kafka 0.10.2 Documentation,

“Documentation - Kafka 0.10.2 Documentation,” https://kafka.apache. org/documentation/, accessed: 2017-04-24

work page 2017

[47] [47]

Flink - Command-Line Interface,

“Flink - Command-Line Interface,” https://ci.apache.org/projects/ﬂink/ ﬂink-docs-release-1.6/ops/cli.html, accessed: 2018-09-27

work page 2018

[48] [48]

Spark Conﬁguration,

“Spark Conﬁguration,” https://spark.apache.org/docs/latest/ conﬁguration.html#spark-properties, accessed: 2018-09-27

work page 2018

[49] [49]

Interface Context.OperatorContext,

“Interface Context.OperatorContext,” https://ci.apache.org/projects/ apex-core/apex-core-javadoc-release-3.6/com/datatorrent/api/Context. OperatorContext.html, accessed: 2018-10-29

work page 2018

[50] [50]

Benchmarking Distributed Stream Data Processing Systems,

J. Karimov, T. Rabl, A. Katsifodimos, R. Samarev, H. Heiskanen, and V . Markl, “Benchmarking Distributed Stream Data Processing Systems,” in IEEE International Conference on Data Engineering, ICDE , 2018, pp. 1507–1518. [Online]. Available: http://doi.ieeecomputersociety.org/ 10.1109/ICDE.2018.00169

work page doi:10.1109/icde.2018.00169 2018

[51] [51]

Challenges and Experiences in Building an Efﬁcient Apache Beam Runner For IBM Streams,

S. Li, P. Gerver, J. Macmillan, D. Debrunner, W. Marshall, and K. Wu, “Challenges and Experiences in Building an Efﬁcient Apache Beam Runner For IBM Streams,” PVLDB, vol. 11, no. 12, pp. 1742–1754,

work page

[52] [52]

Available: http://www.vldb.org/pvldb/vol11/p1742-li.pdf

[Online]. Available: http://www.vldb.org/pvldb/vol11/p1742-li.pdf

work page

[53] [53]

The CQL Continuous Query Language: Semantic Foundations and Query Execution,

A. Arasu, S. Babu, and J. Widom, “The CQL Continuous Query Language: Semantic Foundations and Query Execution,” VLDB J. , vol. 15, no. 2, pp. 121–142, 2006. [Online]. Available: https://doi.org/10.1007/s00778-004-0147-z

work page doi:10.1007/s00778-004-0147-z 2006

[54] [54]

STREAM: The Stanford Stream Data Manager,

A. Arasu, B. Babcock, S. Babu, M. Datar, K. Ito, R. Motwani, I. Nishizawa, U. Srivastava, D. Thomas, R. Varma, and J. Widom, “STREAM: The Stanford Stream Data Manager,” IEEE Data Eng. Bull. , vol. 26, no. 1, pp. 19–26, 2003. [Online]. Available: http://sites.computer.org/debull/A03mar/paper.ps

work page 2003

[55] [55]

Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources,

E. Begoli, J. Camacho-Rodr ´ıguez, J. Hyde, M. J. Mior, and D. Lemire, “Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources,” in Proc. International Conference on Management of Data, ACM SIGMOD , 2018, pp. 221–

work page 2018

[56] [56]

Available: http://doi.acm.org/10.1145/3183713.3190662

[Online]. Available: http://doi.acm.org/10.1145/3183713.3190662

work page doi:10.1145/3183713.3190662

[57] [57]

Streaming,

“Streaming,” https://calcite.apache.org/docs/stream.html, accessed: 2018-10-19

work page 2018

[58] [58]

Towards a Streaming SQL Standard,

N. Jain, S. Mishra, A. Srinivasan, J. Gehrke, J. Widom, H. Balakrishnan, U. C ¸ etintemel, M. Cherniack, R. Tibbetts, and S. B. Zdonik, “Towards a Streaming SQL Standard,” PVLDB, vol. 1, no. 2, pp. 1379–1390,

work page

[59] [59]

Available: http://www.vldb.org/pvldb/1/1454179.pdf

[Online]. Available: http://www.vldb.org/pvldb/1/1454179.pdf

work page

[60] [60]

Oracle CEP CQL Language Reference 11g Release 1 (11.1.1),

“Oracle CEP CQL Language Reference 11g Release 1 (11.1.1),” https: //docs.oracle.com/cd/E16764 01/doc.1111/e12048/intro.htm, accessed: 2018-10-19

work page 2018

[61] [61]

StreamSQL Overview,

“StreamSQL Overview,” https://docs.tibco.com/pub/sb-lv/2.1.8/doc/ html/streamsql/ssql-intro.html, accessed: 2018-10-19

work page 2018

[62] [62]

KSQL and Kafka Streams,

“KSQL and Kafka Streams,” https://docs.conﬂuent.io/current/ streams-ksql.html, accessed: 2018-10-19

work page 2018

[63] [63]

SAP HANA Smart Data Streaming: Developer Guide,

“SAP HANA Smart Data Streaming: Developer Guide,” https://help.sap.com/doc/25fc8560420d4d5099d6df02f7cbff9e/1.0. 12/en-US/streaming developer guide.pdf, accessed: 2018-10-19

work page 2018

[64] [64]

SamzaSQL: Scalable Fast Data Management with Streaming SQL,

M. Pathirage, J. Hyde, Y . Pan, and B. Plale, “SamzaSQL: Scalable Fast Data Management with Streaming SQL,” in IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPS , 2016, pp. 1627–1636. [Online]. Available: https://doi.org/10.1109/IPDPSW.2016. 141

work page doi:10.1109/ipdpsw.2016 2016

[65] [65]

Samza: Stateful Scalable Stream Processing at LinkedIn,

S. A. Noghabi, K. Paramasivam, Y . Pan, N. Ramesh, J. Bringhurst, I. Gupta, and R. H. Campbell, “Samza: Stateful Scalable Stream Processing at LinkedIn,” PVLDB, vol. 10, no. 12, pp. 1634–1645, 2017. [Online]. Available: http://www.vldb.org/pvldb/vol10/p1634-noghabi. pdf

work page 2017

[66] [66]

Linear Road: A Stream Data Management Benchmark,

A. Arasu, M. Cherniack, E. F. Galvez, D. Maier, A. Maskey, E. Ryvkina, M. Stonebraker, and R. Tibbetts, “Linear Road: A Stream Data Management Benchmark,” in (e)Proc. International Conference on V ery Large Data Bases , 2004, pp. 480–491. [Online]. Available: http://www.vldb.org/conf/2004/RS12P1.PDF

work page 2004

[67] [67]

Aurora: a new model and architecture for data stream management,

D. J. Abadi, D. Carney, U. C ¸ etintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. B. Zdonik, “Aurora: a new model and architecture for data stream management,” VLDB J. , vol. 12, no. 2, pp. 120–139, 2003. [Online]. Available: https://doi.org/10.1007/s00778-003-0095-z

work page doi:10.1007/s00778-003-0095-z 2003

[68] [68]

Apache Storm,

“Apache Storm,” http://storm.apache.org, accessed: 2018-10-23

work page 2018

[69] [69]

NEXMark Benchmark,

“NEXMark Benchmark,” http://datalab.cs.pdx.edu/niagara/NEXMark/, accessed: 2018-10-20

work page 2018

[70] [70]

NEXMark – A Benchmark for Queries over Data Streams DRAFT,

P. Tucker, K. Tufte, V . Papadimos, and D. Maier, “NEXMark – A Benchmark for Queries over Data Streams DRAFT,” http://datalab.cs. pdx.edu/niagara/pstream/nexmark.pdf, accessed: 2018-10-20

work page 2018

[71] [71]

Nexmark benchmark suite,

“Nexmark benchmark suite,” https://beam.apache.org/documentation/ sdks/java/nexmark/, accessed: 2018-10-20

work page 2018

[72] [72]

Spark versus Flink: Understanding Performance in Big Data Analytics Frameworks,

O. Marcu, A. Costan, G. Antoniu, and M. S. P ´erez-Hern´andez, “Spark versus Flink: Understanding Performance in Big Data Analytics Frameworks,” in IEEE International Conference on Cluster Computing, CLUSTER , 2016, pp. 433–442. [Online]. Available: https://doi.org/10.1109/CLUSTER.2016.22

work page doi:10.1109/cluster.2016.22 2016

[73] [73]

A Performance Comparison of Open-Source Stream Processing Platforms,

M. A. Lopez, A. G. P. Lobato, and O. C. M. B. Duarte, “A Performance Comparison of Open-Source Stream Processing Platforms,” in IEEE Global Communications Conference, GLOBECOM , 2016, pp. 1–6. [Online]. Available: https://doi.org/10.1109/GLOCOM.2016.7841533

work page doi:10.1109/glocom.2016.7841533 2016