pith. sign in

arxiv: 1907.08302 · v1 · pith:GQ4776WGnew · submitted 2019-07-18 · 💻 cs.PF · cs.DC

Quantitative Impact Evaluation of an Abstraction Layer for Data Stream Processing Systems

Pith reviewed 2026-05-24 19:33 UTC · model grok-4.3

classification 💻 cs.PF cs.DC
keywords Apache Beamdata stream processingperformance evaluationabstraction layerbenchmarkApache Spark StreamingApache FlinkApache Apex
0
0 comments X

The pith

Using Apache Beam as an abstraction layer slows streaming query execution by up to a factor of 58 compared to native code on Spark, Flink, and Apex.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a benchmark architecture to measure the runtime cost of writing streaming applications once in Apache Beam and running them on multiple engines. It executes the same queries both through the Beam layer and in native implementations on Apache Spark Streaming, Apache Flink, and Apache Apex. The measurements show large variance and slowdowns reaching 58 times when the abstraction is used. A reader would care because the stated purpose of Beam is to avoid costly rewrites when switching frameworks, yet the results indicate that this portability comes with a measurable performance price that must be weighed against the benefit.

Core claim

Usage of Apache Beam for the examined streaming applications caused a high variance of query execution times with a slowdown of up to a factor of 58 compared to queries developed without the abstraction layer on the three surveyed frameworks.

What carries the argument

A novel benchmark architecture that runs identical streaming queries with and without the Apache Beam abstraction layer on Spark Streaming, Flink, and Apex.

If this is right

  • Portability across stream processors carries a concrete execution-time cost that developers must quantify for each workload.
  • Native code on a single framework can deliver substantially lower latency than the same logic expressed through Beam.
  • Performance comparisons between frameworks should include the abstraction layer when portability is a requirement.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Teams prioritizing raw speed over framework flexibility may prefer direct native implementations for production workloads.
  • The benchmark could be extended to newer engines or additional query patterns to test whether the overhead pattern persists.
  • Runtime profiling of Beam-translated jobs could reveal specific operators responsible for the largest slowdowns.

Load-bearing premise

The chosen streaming applications, queries, and native implementations are representative of typical use and were optimized to the same degree as the Beam versions.

What would settle it

Re-executing the benchmark on a fresh collection of streaming workloads or with further-tuned native implementations that eliminate most of the observed gap would show whether the reported slowdowns are inherent to the abstraction.

Figures

Figures reproduced from arXiv: 1907.08302 by Christoph Matthies, Guenter Hesse, Johannes Huegle, Kelvin Glass, Matthias Uflacker.

Figure 1
Figure 1. Figure 1: Architecture of an Apache Flink Runtime (based on [4], [19]) [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Architecture of Apache Spark in Cluster Mode (based on [19], [22]) [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Architecture of an Apache Hadoop YARN (based on [24], [37]) [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Overview About the General Benchmark Architecture and Process [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Average Execution Times - Identity Query [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Average Execution Times - Sample Query tween the analyzed systems and parallelism factors. Compared to identity query results, times are slightly lower overall, which could be a result of the lower number of output records as described in Section III-B. The Apex Beam implementation is an exception as there is a major difference. To be more concrete, the average execution times for the sample query amount t… view at source ↗
Figure 9
Figure 9. Figure 9: Average Execution Times - Grep Query query-SDK combination. By SDK it is distinguished between using Apache Beam or native system APIs for application development. Deviations for the two parallelism factors are averaged and condensed in this way. This is done since separate visualizations for different parallelisms would not reveal any further insights. Additionally, the reduced number of values simplifies… view at source ↗
Figure 10
Figure 10. Figure 10: Relative Standard Deviation for System-Query-SDK Combinations [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗
Figure 12
Figure 12. Figure 12: Apache Flink Execution Plan for the Grep Query [PITH_FULL_IMAGE:figures/full_fig_p008_12.png] view at source ↗
Figure 11
Figure 11. Figure 11: Slowdown Factor for the Analyzed Systems and Queries [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗
Figure 13
Figure 13. Figure 13: Apache Flink Execution Plan for the Grep Query Implemented Using [PITH_FULL_IMAGE:figures/full_fig_p009_13.png] view at source ↗
read the original abstract

With the demand to process ever-growing data volumes, a variety of new data stream processing frameworks have been developed. Moving an implementation from one such system to another, e.g., for performance reasons, requires adapting existing applications to new interfaces. Apache Beam addresses these high substitution costs by providing an abstraction layer that enables executing programs on any of the supported streaming frameworks. In this paper, we present a novel benchmark architecture for comparing the performance impact of using Apache Beam on three streaming frameworks: Apache Spark Streaming, Apache Flink, and Apache Apex. We find significant performance penalties when using Apache Beam for application development in the surveyed systems. Overall, usage of Apache Beam for the examined streaming applications caused a high variance of query execution times with a slowdown of up to a factor of 58 compared to queries developed without the abstraction layer. All developed benchmark artifacts are publicly available to ensure reproducible results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents a benchmark architecture for measuring the performance overhead of Apache Beam as an abstraction layer when executing streaming applications on Apache Spark Streaming, Apache Flink, and Apache Apex. It reports that Beam usage produces high variance in query execution times and slowdowns of up to a factor of 58 relative to native implementations, while releasing all benchmark artifacts publicly.

Significance. If the native and Beam implementations are shown to have received comparable optimization effort, the quantitative results would establish a concrete performance cost for the abstraction layer, helping practitioners weigh portability against efficiency. The public release of artifacts is a clear strength that enables direct verification and reuse.

major comments (2)
  1. [Benchmark Architecture] Benchmark Architecture section: the description of the three native implementations does not document tuning steps, profiling data, or framework-specific optimizations (e.g., custom state backends or partitioning) applied outside Beam, leaving open the possibility that part of the reported 58x slowdown arises from unequal implementation quality rather than the abstraction layer itself.
  2. [Results] Results section (tables/figures reporting slowdown factors): the headline claim of 'high variance' and the maximum slowdown of 58x would be strengthened by explicit reporting of the number of runs, statistical measures of variance, and per-query breakdowns so that readers can assess whether the observed differences are robust across workloads.
minor comments (1)
  1. [Abstract] Clarify in the abstract and introduction whether the selected applications and queries are intended to be representative or merely illustrative.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our benchmark results. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Benchmark Architecture] Benchmark Architecture section: the description of the three native implementations does not document tuning steps, profiling data, or framework-specific optimizations (e.g., custom state backends or partitioning) applied outside Beam, leaving open the possibility that part of the reported 58x slowdown arises from unequal implementation quality rather than the abstraction layer itself.

    Authors: We agree that the Benchmark Architecture section would benefit from additional detail on the native implementations. In the revision we will document the exact configurations used for each native framework (Spark Streaming, Flink, Apex), any tuning steps performed, and the rationale for not applying framework-specific optimizations beyond standard settings. This will make explicit that the native versions represent typical out-of-the-box usage and allow readers to judge the degree of comparability. revision: yes

  2. Referee: [Results] Results section (tables/figures reporting slowdown factors): the headline claim of 'high variance' and the maximum slowdown of 58x would be strengthened by explicit reporting of the number of runs, statistical measures of variance, and per-query breakdowns so that readers can assess whether the observed differences are robust across workloads.

    Authors: We will revise the Results section to report the number of runs executed for each experiment, include statistical measures (e.g., standard deviation or inter-quartile range) to quantify the observed variance, and add per-query breakdowns of execution times and slowdown factors. These additions will allow readers to evaluate the robustness of the 58x maximum and the high-variance claim across the workloads. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical benchmark measurements

full rationale

The paper reports direct runtime measurements from executed queries on Spark Streaming, Flink, and Apex, with and without Beam. No equations, fitted parameters, predictions, or derivations appear in the abstract or described content. Claims rest on observed execution times rather than any reduction to prior fits or self-citations. The central result (up to 58x slowdown) is a raw empirical comparison, not a constructed quantity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical benchmark study; no free parameters, mathematical axioms, or invented entities are introduced or required.

pith-pipeline@v0.9.0 · 5688 in / 929 out tokens · 23855 ms · 2026-05-24T19:33:34.074599+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages

  1. [1]

    ”one size fits all

    M. Stonebraker and U. C ¸ etintemel, “”one size fits all”: An idea whose time has come and gone (abstract),” in Proc. International Conference on Data Engineering, ICDE , 2005, pp. 2–11. [Online]. Available: https://doi.org/10.1109/ICDE.2005.1

  2. [2]

    Apache Beam Overview,

    “Apache Beam Overview,” https://beam.apache.org/get-started/ beam-overview/, accessed: 2018-10-30

  3. [3]

    Object- Relational Mapping Revisited - A Quantitative Study on the Impact of Database Technology on O/R Mapping Strategies,

    M. Lorenz, J. Rudolph, G. Hesse, M. Uflacker, and H. Plattner, “Object- Relational Mapping Revisited - A Quantitative Study on the Impact of Database Technology on O/R Mapping Strategies,” in Hawaii Interna- tional Conference on System Sciences, HICSS , 2017

  4. [4]

    Apache Flink™: Stream and Batch Processing in a Single Engine,

    P. Carbone, A. Katsifodimos, S. Ewen, V . Markl, S. Haridi, and K. Tzoumas, “Apache Flink™: Stream and Batch Processing in a Single Engine,” IEEE Data Eng. Bull. , vol. 38, no. 4, pp. 28–38, 2015. [Online]. Available: http://sites.computer.org/debull/A15dec/p28.pdf

  5. [5]

    Apache Spark: A Unified Engine for Big Data Processing,

    M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin, A. Ghodsi, J. Gonzalez, S. Shenker, and I. Stoica, “Apache Spark: A Unified Engine for Big Data Processing,” Commun. ACM , vol. 59, no. 11, pp. 56–65, 2016. [Online]. Available: http://doi.acm.org/10.1145/2934664

  6. [6]

    Apache Apex,

    “Apache Apex,” https://apex.apache.org/docs/apex/, accessed: 2018-09- 11

  7. [7]

    Apache Beam,

    “Apache Beam,” https://github.com/apache/beam, accessed: 2018-08-17

  8. [8]

    Apache Beam Programming Guide,

    “Apache Beam Programming Guide,” https://beam.apache.org/ documentation/programming-guide/, accessed: 2018-10-18

  9. [9]

    Runner Authoring Guide,

    “Runner Authoring Guide,” https://github.com/apache/beam/blob/ master/website/src/contribute/runner-guide.md, accessed: 2018-10-18

  10. [10]

    The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of- Order Data Processing,

    T. Akidau, R. Bradshaw, C. Chambers, S. Chernyak, R. Fern ´andez- Moctezuma, R. Lax, S. McVeety, D. Mills, F. Perry, E. Schmidt, and S. Whittle, “The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of- Order Data Processing,” PVLDB, vol. 8, no. 12, pp. 1792–1803, 2015. [Online]. Available: h...

  11. [11]

    Beam Capability Matrix,

    “Beam Capability Matrix,” https://beam.apache.org/documentation/ runners/capability-matrix/#cap-summary-what, accessed: 2018-09-19

  12. [12]

    Apache Gearpump,

    “Apache Gearpump,” https://gearpump.apache.org/overview.html, ac- cessed: 2018-10-15

  13. [13]

    MapReduce Tutorial,

    “MapReduce Tutorial,” https://hadoop.apache.org/docs/stable/ hadoop-mapreduce-client/hadoop-mapreduce-client-core/ MapReduceTutorial.html, accessed: 2018-10-15

  14. [14]

    What is Samza?

    “What is Samza?” https://samza.apache.org, accessed: 2018-10-15

  15. [15]

    Alibaba JStorm,

    “Alibaba JStorm,” http://jstorm.io, accessed: 2018-10-15

  16. [16]

    IBM Streams,

    “IBM Streams,” https://www.ibm.com/de-en/marketplace/ stream-computing, accessed: 2018-10-15

  17. [17]

    CLOUD DATAFLOW - Simplified stream and batch data processing, with equal reliability and expressiveness,

    “CLOUD DATAFLOW - Simplified stream and batch data processing, with equal reliability and expressiveness,” https://cloud.google.com/ dataflow/, accessed: 2018-08-17

  18. [18]

    Cloud Dataflow, Apache Beam and you,

    “Cloud Dataflow, Apache Beam and you,” https://cloud.google.com/ blog/products/gcp/cloud-dataflow-apache-beam-and-you, accessed: 2018-10-15

  19. [19]

    Conceptual Survey on Data Stream Processing Systems,

    G. Hesse and M. Lorenz, “Conceptual Survey on Data Stream Processing Systems,” in IEEE International Conference on Parallel and Distributed Systems, ICPADS , 2015, pp. 797–802. [Online]. Available: https://doi.org/10.1109/ICPADS.2015.106

  20. [20]

    Flink - Distributed Runtime Environment,

    “Flink - Distributed Runtime Environment,” https://ci.apache.org/ projects/flink/flink-docs-master/concepts/runtime.html, accessed: 2018- 09-27

  21. [21]

    Spark Streaming Programming Guide,

    “Spark Streaming Programming Guide,” https://spark.apache.org/docs/ latest/streaming-programming-guide.html, accessed: 2018-09-26

  22. [22]

    Apache Spark - Cluster Mode Overview,

    “Apache Spark - Cluster Mode Overview,” https://spark.apache.org/ docs/2.3.1/cluster-overview.html, accessed: 2018-09-10

  23. [23]

    Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center,

    B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. H. Katz, S. Shenker, and I. Stoica, “Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center,” in Proc. USENIX Symposium on Networked Systems Design and Implementation, NSDI ,

  24. [24]

    Available: https://www.usenix.org/conference/nsdi11/ mesos-platform-fine-grained-resource-sharing-data-center

    [Online]. Available: https://www.usenix.org/conference/nsdi11/ mesos-platform-fine-grained-resource-sharing-data-center

  25. [25]

    Apache Hadoop Y ARN: Yet Another Resource Negotiator,

    V . K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O’Malley, S. Radia, B. Reed, and E. Baldeschwieler, “Apache Hadoop Y ARN: Yet Another Resource Negotiator,” inACM Symposium on Cloud Computing, SOCC , 2013, pp. 5:1–5:16. [Online]. Available: http://doi.acm.org/10.1145...

  26. [26]

    Kubernetes and the Path to Cloud Native,

    E. A. Brewer, “Kubernetes and the Path to Cloud Native,” in Proc. ACM Symposium on Cloud Computing, SoCC , 2015, p. 167. [Online]. Available: http://doi.acm.org/10.1145/2806777.2809955

  27. [27]

    Accelerating Spark with RDMA for Big Data Processing: Early Experiences,

    X. Lu, M. Wasi-ur-Rahman, N. S. Islam, D. Shankar, and D. K. Panda, “Accelerating Spark with RDMA for Big Data Processing: Early Experiences,” in IEEE Annual Symposium on High- Performance Interconnects, HOTI , 2014, pp. 9–16. [Online]. Available: https://doi.org/10.1109/HOTI.2014.15

  28. [28]

    Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing,

    M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauly, M. J. Franklin, S. Shenker, and I. Stoica, “Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing,” in Proc. USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2012, pp. 15–28. [Online]. Available: https://www.usenix.org/ conference/nsd...

  29. [29]

    Discretized Streams: Fault-Tolerant Streaming Computation at Scale,

    M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica, “Discretized Streams: Fault-Tolerant Streaming Computation at Scale,” in ACM SIGOPS Symposium on Operating Systems Principles, SOSP , 2013, pp. 423–438. [Online]. Available: http://doi.acm.org/10.1145/ 2517349.2522737

  30. [30]

    Apache Hadoop,

    “Apache Hadoop,” https://hadoop.apache.org, accessed: 2018-09-11

  31. [31]

    HDFS Architecture,

    “HDFS Architecture,” http://hadoop.apache.org/docs/current/ hadoop-project-dist/hadoop-hdfs/HdfsDesign.html, accessed: 2018- 09-11

  32. [32]

    AdBench: A Complete Benchmark for Modern Data Pipelines,

    M. Bhandarkar, “AdBench: A Complete Benchmark for Modern Data Pipelines,” in TPC Technology Conference, TPCTC , 2016, pp. 107–120. [Online]. Available: https://doi.org/10.1007/978-3-319-54334-5 8

  33. [33]

    Dunning and E

    T. Dunning and E. Friedman, Streaming Architecture: New Designs Us- ing Apache Kafka and MapR Streams . O’Reilly Media, 2016. [Online]. Available: https://books.google.de/books?id=EU8kDAAAQBAJ

  34. [34]

    Kafka: a Distributed Messaging System for Log Processing,

    J. Kreps, N. Narkhede, and J. Rao, “Kafka: a Distributed Messaging System for Log Processing,” in Proc. International Workshop on Net- working Meets Databases, NetDB , 2011, pp. 1–7

  35. [35]

    Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications,

    B. Saha, H. Shah, S. Seth, G. Vijayaraghavan, A. C. Murthy, and C. Curino, “Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications,” in Proc. International Conference on Management of Data, ACM SIGMOD , 2015, pp. 1357–

  36. [36]

    Available: http://doi.acm.org/10.1145/2723372.2742790

    [Online]. Available: http://doi.acm.org/10.1145/2723372.2742790

  37. [37]

    Flink - Y ARN Setup,

    “Flink - Y ARN Setup,” https://ci.apache.org/projects/flink/ flink-docs-master/ops/deployment/yarn setup.html, accessed: 2018-09- 23

  38. [38]

    Running Spark on Y ARN,

    “Running Spark on Y ARN,” https://spark.apache.org/docs/latest/ running-on-yarn.html, accessed: 2018-09-23

  39. [39]

    Apache Hadoop Y ARN,

    “Apache Hadoop Y ARN,” https://hadoop.apache.org/docs/r3.0.3/ hadoop-yarn/hadoop-yarn-site/Y ARN.html, accessed: 2018-09-25

  40. [40]

    Apache Apex Documentation - Application Developer Guide,

    “Apache Apex Documentation - Application Developer Guide,” http://apex.apache.org/docs/apex-3.7/application development/, accessed: 2018-09-26

  41. [41]

    Apache Flink,

    “Apache Flink,” https://github.com/apache/flink, accessed: 2018-10-21

  42. [42]

    Mirror of Apache Apex core,

    “Mirror of Apache Apex core,” https://github.com/apache/apex-core, accessed: 2018-10-21

  43. [43]

    Mirror of Apache Spark,

    “Mirror of Apache Spark,” https://github.com/apache/spark, accessed: 2018-10-21

  44. [44]

    AOL Search Query Logs,

    “AOL Search Query Logs,” http://www.researchpipeline.com/ mediawiki/index.php?title=AOL Search Query Logs, accessed: 2018-09-28

  45. [45]

    StreamBench: Towards Benchmarking Modern Distributed Stream Computing Frameworks,

    R. Lu, G. Wu, B. Xie, and J. Hu, “StreamBench: Towards Benchmarking Modern Distributed Stream Computing Frameworks,” in Proc. IEEE/ACM International Conference on Utility and Cloud Computing, UCC , 2014, pp. 69–78. [Online]. Available: https: //doi.org/10.1109/UCC.2014.15

  46. [46]

    Documentation - Kafka 0.10.2 Documentation,

    “Documentation - Kafka 0.10.2 Documentation,” https://kafka.apache. org/documentation/, accessed: 2017-04-24

  47. [47]

    Flink - Command-Line Interface,

    “Flink - Command-Line Interface,” https://ci.apache.org/projects/flink/ flink-docs-release-1.6/ops/cli.html, accessed: 2018-09-27

  48. [48]

    Spark Configuration,

    “Spark Configuration,” https://spark.apache.org/docs/latest/ configuration.html#spark-properties, accessed: 2018-09-27

  49. [49]

    Interface Context.OperatorContext,

    “Interface Context.OperatorContext,” https://ci.apache.org/projects/ apex-core/apex-core-javadoc-release-3.6/com/datatorrent/api/Context. OperatorContext.html, accessed: 2018-10-29

  50. [50]

    Benchmarking Distributed Stream Data Processing Systems,

    J. Karimov, T. Rabl, A. Katsifodimos, R. Samarev, H. Heiskanen, and V . Markl, “Benchmarking Distributed Stream Data Processing Systems,” in IEEE International Conference on Data Engineering, ICDE , 2018, pp. 1507–1518. [Online]. Available: http://doi.ieeecomputersociety.org/ 10.1109/ICDE.2018.00169

  51. [51]

    Challenges and Experiences in Building an Efficient Apache Beam Runner For IBM Streams,

    S. Li, P. Gerver, J. Macmillan, D. Debrunner, W. Marshall, and K. Wu, “Challenges and Experiences in Building an Efficient Apache Beam Runner For IBM Streams,” PVLDB, vol. 11, no. 12, pp. 1742–1754,

  52. [52]

    Available: http://www.vldb.org/pvldb/vol11/p1742-li.pdf

    [Online]. Available: http://www.vldb.org/pvldb/vol11/p1742-li.pdf

  53. [53]

    The CQL Continuous Query Language: Semantic Foundations and Query Execution,

    A. Arasu, S. Babu, and J. Widom, “The CQL Continuous Query Language: Semantic Foundations and Query Execution,” VLDB J. , vol. 15, no. 2, pp. 121–142, 2006. [Online]. Available: https://doi.org/10.1007/s00778-004-0147-z

  54. [54]

    STREAM: The Stanford Stream Data Manager,

    A. Arasu, B. Babcock, S. Babu, M. Datar, K. Ito, R. Motwani, I. Nishizawa, U. Srivastava, D. Thomas, R. Varma, and J. Widom, “STREAM: The Stanford Stream Data Manager,” IEEE Data Eng. Bull. , vol. 26, no. 1, pp. 19–26, 2003. [Online]. Available: http://sites.computer.org/debull/A03mar/paper.ps

  55. [55]

    Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources,

    E. Begoli, J. Camacho-Rodr ´ıguez, J. Hyde, M. J. Mior, and D. Lemire, “Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources,” in Proc. International Conference on Management of Data, ACM SIGMOD , 2018, pp. 221–

  56. [56]

    Available: http://doi.acm.org/10.1145/3183713.3190662

    [Online]. Available: http://doi.acm.org/10.1145/3183713.3190662

  57. [57]

    Streaming,

    “Streaming,” https://calcite.apache.org/docs/stream.html, accessed: 2018-10-19

  58. [58]

    Towards a Streaming SQL Standard,

    N. Jain, S. Mishra, A. Srinivasan, J. Gehrke, J. Widom, H. Balakrishnan, U. C ¸ etintemel, M. Cherniack, R. Tibbetts, and S. B. Zdonik, “Towards a Streaming SQL Standard,” PVLDB, vol. 1, no. 2, pp. 1379–1390,

  59. [59]

    Available: http://www.vldb.org/pvldb/1/1454179.pdf

    [Online]. Available: http://www.vldb.org/pvldb/1/1454179.pdf

  60. [60]

    Oracle CEP CQL Language Reference 11g Release 1 (11.1.1),

    “Oracle CEP CQL Language Reference 11g Release 1 (11.1.1),” https: //docs.oracle.com/cd/E16764 01/doc.1111/e12048/intro.htm, accessed: 2018-10-19

  61. [61]

    StreamSQL Overview,

    “StreamSQL Overview,” https://docs.tibco.com/pub/sb-lv/2.1.8/doc/ html/streamsql/ssql-intro.html, accessed: 2018-10-19

  62. [62]

    KSQL and Kafka Streams,

    “KSQL and Kafka Streams,” https://docs.confluent.io/current/ streams-ksql.html, accessed: 2018-10-19

  63. [63]

    SAP HANA Smart Data Streaming: Developer Guide,

    “SAP HANA Smart Data Streaming: Developer Guide,” https://help.sap.com/doc/25fc8560420d4d5099d6df02f7cbff9e/1.0. 12/en-US/streaming developer guide.pdf, accessed: 2018-10-19

  64. [64]

    SamzaSQL: Scalable Fast Data Management with Streaming SQL,

    M. Pathirage, J. Hyde, Y . Pan, and B. Plale, “SamzaSQL: Scalable Fast Data Management with Streaming SQL,” in IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPS , 2016, pp. 1627–1636. [Online]. Available: https://doi.org/10.1109/IPDPSW.2016. 141

  65. [65]

    Samza: Stateful Scalable Stream Processing at LinkedIn,

    S. A. Noghabi, K. Paramasivam, Y . Pan, N. Ramesh, J. Bringhurst, I. Gupta, and R. H. Campbell, “Samza: Stateful Scalable Stream Processing at LinkedIn,” PVLDB, vol. 10, no. 12, pp. 1634–1645, 2017. [Online]. Available: http://www.vldb.org/pvldb/vol10/p1634-noghabi. pdf

  66. [66]

    Linear Road: A Stream Data Management Benchmark,

    A. Arasu, M. Cherniack, E. F. Galvez, D. Maier, A. Maskey, E. Ryvkina, M. Stonebraker, and R. Tibbetts, “Linear Road: A Stream Data Management Benchmark,” in (e)Proc. International Conference on V ery Large Data Bases , 2004, pp. 480–491. [Online]. Available: http://www.vldb.org/conf/2004/RS12P1.PDF

  67. [67]

    Aurora: a new model and architecture for data stream management,

    D. J. Abadi, D. Carney, U. C ¸ etintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. B. Zdonik, “Aurora: a new model and architecture for data stream management,” VLDB J. , vol. 12, no. 2, pp. 120–139, 2003. [Online]. Available: https://doi.org/10.1007/s00778-003-0095-z

  68. [68]

    Apache Storm,

    “Apache Storm,” http://storm.apache.org, accessed: 2018-10-23

  69. [69]

    NEXMark Benchmark,

    “NEXMark Benchmark,” http://datalab.cs.pdx.edu/niagara/NEXMark/, accessed: 2018-10-20

  70. [70]

    NEXMark – A Benchmark for Queries over Data Streams DRAFT,

    P. Tucker, K. Tufte, V . Papadimos, and D. Maier, “NEXMark – A Benchmark for Queries over Data Streams DRAFT,” http://datalab.cs. pdx.edu/niagara/pstream/nexmark.pdf, accessed: 2018-10-20

  71. [71]

    Nexmark benchmark suite,

    “Nexmark benchmark suite,” https://beam.apache.org/documentation/ sdks/java/nexmark/, accessed: 2018-10-20

  72. [72]

    Spark versus Flink: Understanding Performance in Big Data Analytics Frameworks,

    O. Marcu, A. Costan, G. Antoniu, and M. S. P ´erez-Hern´andez, “Spark versus Flink: Understanding Performance in Big Data Analytics Frameworks,” in IEEE International Conference on Cluster Computing, CLUSTER , 2016, pp. 433–442. [Online]. Available: https://doi.org/10.1109/CLUSTER.2016.22

  73. [73]

    A Performance Comparison of Open-Source Stream Processing Platforms,

    M. A. Lopez, A. G. P. Lobato, and O. C. M. B. Duarte, “A Performance Comparison of Open-Source Stream Processing Platforms,” in IEEE Global Communications Conference, GLOBECOM , 2016, pp. 1–6. [Online]. Available: https://doi.org/10.1109/GLOCOM.2016.7841533