Delivery, consistency, and determinism: rethinking guarantees in distributed stream processing
Pith reviewed 2026-05-24 21:32 UTC · model grok-4.3
The pith
Delivery, consistency, and determinism are tightly connected in distributed stream processing, enabling exactly-once guarantees via lightweight determinism with minimal overhead.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce a formal framework that allows us to define streaming guarantees more regularly. We demonstrate that the properties of delivery, consistency, and determinism are tightly connected within distributed stream processing. We also show that having lightweight determinism, it is possible to provide exactly-once with almost no performance overhead. Experiments show that the proposed approach can significantly outperform alternative industrial solutions.
What carries the argument
The formal framework that redefines streaming guarantees by linking delivery, consistency, and determinism properties.
Load-bearing premise
The formal framework accurately models all relevant properties of real distributed stream processing systems.
What would settle it
A deployment where the proposed lightweight determinism approach incurs significant performance overhead or fails to maintain exactly-once guarantees under failures would disprove the claim.
Figures
read the original abstract
Consistency requirements for state-of-the-art stream processing systems are defined in terms of delivery guarantees. Exactly-once is the strongest one and the most desirable for end-user. However, there are several issues regarding this concept. Commonly used techniques that enforce exactly-once produce significant performance overhead. Besides, the notion of exactly-once is not formally defined and does not capture all properties that provide stream processing systems supporting this guarantee. In this paper, we introduce a formal framework that allows us to define streaming guarantees more regularly. We demonstrate that the properties of delivery, consistency, and determinism are tightly connected within distributed stream processing. We also show that having lightweight determinism, it is possible to provide exactly-once with almost no performance overhead. Experiments show that the proposed approach can significantly outperform alternative industrial solutions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a formal framework for defining guarantees in distributed stream processing systems. It demonstrates that delivery, consistency, and determinism are tightly interconnected, and shows that lightweight determinism enables exactly-once semantics with negligible performance overhead. Experiments indicate that the proposed approach significantly outperforms alternative industrial solutions.
Significance. If the framework and results hold, the work offers a more rigorous basis for reasoning about streaming guarantees and a practical path to strong consistency at low cost. The formal linkage of the three properties and the empirical demonstration of low-overhead exactly-once processing are the primary contributions; the explicit generalization caveats noted in the manuscript strengthen the assessment.
minor comments (3)
- [Abstract] Abstract: the phrase 'define streaming guarantees more regularly' is likely intended as 'more rigorously'; this should be corrected for precision.
- [Experiments] The experimental section would benefit from explicit discussion of how the tested workloads relate to the assumptions of the formal framework (e.g., failure models and network conditions).
- [Formal Framework] Notation for the formal definitions could be clarified with a small glossary or running example to aid readers unfamiliar with the specific stream-processing model.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our work, the recognition of the formal framework linking delivery, consistency, and determinism, and the recommendation for minor revision. The report accurately captures the core contributions regarding lightweight determinism for exactly-once semantics with low overhead.
Circularity Check
No significant circularity; new framework is self-contained
full rationale
The paper introduces a novel formal framework to redefine streaming guarantees, explicitly linking delivery, consistency, and determinism without reducing any core claim to a fitted parameter, self-citation chain, or definitional tautology. The abstract and skeptic analysis confirm the framework is presented as newly introduced, with experimental overhead numbers obtained under stated assumptions rather than by construction. No load-bearing step matches any enumerated circularity pattern; the derivation chain remains independent of its own outputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Mapreduce: Simplified data processing on large clusters,
J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,” Commun. ACM, vol. 51, no. 1, pp. 107–113, Jan. 2008. [Online]. Available: http://doi.acm.org/10.1145/1327452.1327492
- [2]
-
[3]
Apache spark: A unified engine for big data processing,
M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin, A. Ghodsi, J. Gonzalez, S. Shenker, and I. Stoica, “Apache spark: A unified engine for big data processing,” Commun. ACM , vol. 59, no. 11, pp. 56–65, Oct. 2016
work page 2016
-
[4]
Apache hadoop goes realtime at facebook,
D. Borthakur, J. Gray, J. S. Sarma, K. Muthukkaruppan, N. Spiegelberg, H. Kuang, K. Ranganathan, D. Molkov, A. Menon, S. Rash et al. , “Apache hadoop goes realtime at facebook,” in Proc. of the 2011 ACM SIGMOD Intnl. Conf. on Management of data . ACM, 2011, pp. 1071– 1080
work page 2011
-
[5]
A survey of large-scale analytical query processing in mapreduce,
C. Doulkeridis and K. Norvaag, “A survey of large-scale analytical query processing in mapreduce,” The VLDB Journal , vol. 23, no. 3, pp. 355– 380, Jun. 2014
work page 2014
-
[6]
Apache flink: Stream and batch processing in a single engine,
P. Carbone, A. Katsifodimos, S. Ewen, V . Markl, S. Haridi, and K. Tzoumas, “Apache flink: Stream and batch processing in a single engine,” Bulletin of the IEEE Computer Society Technical Committee on Data Engineering , vol. 36, no. 4, 2015
work page 2015
-
[7]
Samza: Stateful scalable stream process- ing at linkedin,
S. A. Noghabi, K. Paramasivam, Y . Pan, N. Ramesh, J. Bringhurst, I. Gupta, and R. H. Campbell, “Samza: Stateful scalable stream process- ing at linkedin,” Proc. VLDB Endow. , vol. 10, no. 12, pp. 1634–1645, Aug. 2017
work page 2017
- [8]
-
[9]
Discretized streams: An efficient and fault-tolerant model for stream processing on large clusters,
M. Zaharia, T. Das, H. Li, S. Shenker, and I. Stoica, “Discretized streams: An efficient and fault-tolerant model for stream processing on large clusters,” in Proc. of the 4th USENIX Conf. on Hot Topics in Cloud Ccomputing , ser. HotCloud’12. Berkeley, CA, USA: USENIX Association, 2012, pp. 10–10
work page 2012
-
[10]
Millwheel: Fault- tolerant stream processing at internet scale,
T. Akidau, A. Balikov, K. Bekiro ˘glu, S. Chernyak, J. Haberman, R. Lax, S. McVeety, D. Mills, P. Nordstrom, and S. Whittle, “Millwheel: Fault- tolerant stream processing at internet scale,” Proc. VLDB, vol. 6, no. 11, pp. 1033–1044, Aug. 2013
work page 2013
- [11]
-
[12]
Benchmarking streaming computation engines: Storm, flink and spark streaming,
S. Chintapalli, D. Dagit, B. Evans, R. Farivar, T. Graves, M. Holder- baugh, Z. Liu, K. Nusbaum, K. Patil, B. J. Peng, and P. Poulosky, “Benchmarking streaming computation engines: Storm, flink and spark streaming,” in 2016 IEEE Intnl. Parallel and Distributed Processing Symposium Workshops (IPDPSW), May 2016, pp. 1789–1792
work page 2016
-
[13]
Benchmarking modern dis- tributed streaming platforms,
S. Qian, G. Wu, J. Huang, and T. Das, “Benchmarking modern dis- tributed streaming platforms,” in 2016 IEEE International Conference on Industrial Technology (ICIT) , March 2016, pp. 592–598
work page 2016
-
[14]
Exactly once is not exactly the same,
“Exactly once is not exactly the same,” https://streaml.io/blog/ exactly-once, 2017, accessed: 2018-10-08
work page 2017
-
[15]
Exactly-once or not, atomic broadcast is still impossible in kafka - or anywhere,
“Exactly-once or not, atomic broadcast is still impossible in kafka - or anywhere,” https://www.the-paper-trail.org/post/ 2017-07-28-exactly-not-atomic-broadcast-still-impossible-kafka/, 2017, accessed: 2018-10-08
work page 2017
-
[16]
Maximizing determinism in stream processing under latency constraints,
N. Zacheilas, V . Kalogeraki, Y . Nikolakopoulos, V . Gulisano, M. Papatri- antafilou, and P. Tsigas, “Maximizing determinism in stream processing under latency constraints,” in Proc. of the 11th ACM Intnl. Conf. on Distributed and Event-based Systems , ser. DEBS ’17. New York, NY , USA: ACM, 2017, pp. 112–123
work page 2017
-
[17]
The 8 requirements of real-time stream processing,
M. Stonebraker, U. C ¸ etintemel, and S. Zdonik, “The 8 requirements of real-time stream processing,” SIGMOD Rec., vol. 34, no. 4, pp. 42–47, Dec. 2005
work page 2005
-
[18]
De- terministic model for distributed speculative stream processing,
I. E. Kuralenok, A. Trofimov, N. Marshalkin, and B. Novikov, “De- terministic model for distributed speculative stream processing,” in Ad- vances in Databases and Information Systems, A. Bencz´ur, B. Thalheim, and T. Horv ´ath, Eds. Cham: Springer International Publishing, 2018, pp. 233–246
work page 2018
-
[19]
G. Weikum and G. V ossen, Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery . Morgan Kaufmann, 2002
work page 2002
-
[20]
Distributed snapshots: Determining global states of distributed systems,
K. M. Chandy and L. Lamport, “Distributed snapshots: Determining global states of distributed systems,” ACM Trans. Comput. Syst. , vol. 3, no. 1, pp. 63–75, Feb. 1985. [Online]. Available: http: //doi.acm.org/10.1145/214451.214456
-
[21]
Lightweight Asynchronous Snapshots for Distributed Dataflows,
P. Carbone, G. F ´ora, S. Ewen, S. Haridi, and K. Tzoumas, “Lightweight Asynchronous Snapshots for Distributed Dataflows,” ArXiv e-prints, Jun. 2015
work page 2015
-
[22]
State management in apache flink®: Consistent stateful distributed stream processing,
P. Carbone, S. Ewen, G. F ´ora, S. Haridi, S. Richter, and K. Tzoumas, “State management in apache flink®: Consistent stateful distributed stream processing,” Proc. VLDB, vol. 10, no. 12, pp. 1718–1729, Aug. 2017
work page 2017
- [23]
-
[24]
Flamestream: Model and runtime for distributed stream processing,
I. E. Kuralenok, A. Trofimov, N. Marshalkin, and B. Novikov, “Flamestream: Model and runtime for distributed stream processing,” in Proceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond , ser. BeyondMR’18. New York, NY , USA: ACM, 2018, pp. 8:1–8:2. [Online]. Available: http://doi.acm.org/10.1145/3206333.3209273
-
[25]
An optimistic approach to handle out-of-order events within analytical stream processing,
I. Kuralenok, N. Marshalkin, A. Trofimov, and B. Novikov, “An optimistic approach to handle out-of-order events within analytical stream processing,” in Third Conference on Software Engineering and Information Management (SEIM-2018) (full papers) , ser. CEUR Workshop Proceedings, Y . Litvinov, M. Akhin, B. Novikov, and V . Itsykson, Eds., no. 2135, Aachen,...
work page 2018
-
[26]
Failure detectors for large-scale distributed systems,
N. Hayashibara, A. Cherif, and T. Katayama, “Failure detectors for large-scale distributed systems,” in 21st IEEE Symposium on Reliable Distributed Systems, 2002. Proceedings. IEEE, 2002, pp. 404–409
work page 2002
- [27]
-
[28]
Benchmarking streaming computation engines: Storm, flink and spark streaming,
S. Chintapalli, D. Dagit, B. Evans, R. Farivar, T. Graves, M. Holder- baugh, Z. Liu, K. Nusbaum, K. Patil, B. J. Peng, and P. Poulosky, “Benchmarking streaming computation engines: Storm, flink and spark streaming,” in 2016 IEEE Intnl. Parallel and Distributed Processing Symp. Workshops (IPDPSW), May 2016, pp. 1789–1792
work page 2016
-
[29]
Aurora: A new model and architecture for data stream management,
D. J. Abadi, D. Carney, U. C ¸ etintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik, “Aurora: A new model and architecture for data stream management,” The VLDB Journal , vol. 12, no. 2, pp. 120–139, Aug. 2003
work page 2003
-
[30]
The design of the borealis stream processing engine,
D. J. Abadi, Y . Ahmad, M. Balazinska, U. C ¸ etintemel, M. Cherniack, J. Hwang, W. Lindner, A. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y . Xing, and S. B. Zdonik, “The design of the borealis stream processing engine,” in CIDR 2005, Second Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4-7, 2005, Online Proceedings . ...
work page 2005
-
[31]
Twitter heron: Stream processing at scale,
S. Kulkarni, N. Bhagat, M. Fu, V . Kedigehalli, C. Kellogg, S. Mittal, J. M. Patel, K. Ramasamy, and S. Taneja, “Twitter heron: Stream processing at scale,” in Proc. of the 2015 ACM SIGMOD Intnl. Conf. on Management of Data, ser. SIGMOD ’15. New York, NY , USA: ACM, 2015, pp. 239–250
work page 2015
-
[32]
Interfaces for stream processing systems,
R. Alur, K. Mamouras, C. Stanford, and V . Tannen, “Interfaces for stream processing systems,” in Principles of Modeling. Springer, 2018, pp. 38–60
work page 2018
-
[33]
A formalization of complex event stream processing,
S. Hall ´e and S. Varvaressos, “A formalization of complex event stream processing,” in 2014 IEEE 18th International Enterprise Distributed Object Computing Conference . IEEE, 2014, pp. 2–11
work page 2014
-
[34]
Lars: A logic-based framework for analytic reasoning over streams,
H. Beck, M. Dao-Tran, and T. Eiter, “Lars: A logic-based framework for analytic reasoning over streams,” Artificial Intelligence, vol. 261, pp. 16–70, 2018
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.