pith. sign in

arxiv: 2512.12314 · v1 · submitted 2025-12-13 · 💻 cs.SE · cs.DC· cs.PF· cs.SY· eess.SY

Evaluating Asynchronous Semantics in Trace-Discovered Resilience Models: A Case Study on the OpenTelemetry Demo

Pith reviewed 2026-05-16 22:48 UTC · model grok-4.3

classification 💻 cs.SE cs.DCcs.PFcs.SYeess.SY
keywords resilience modelingdistributed tracingmicroservices availabilityOpenTelemetryMonte Carlo simulationchaos engineeringservice dependency graphsasynchronous semantics
0
0 comments X

The pith

Adding asynchronous semantics for Kafka edges changes predicted HTTP availability by at most 0.001 percentage points in a trace-derived model of the OpenTelemetry demo.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether a dependency graph extracted from OpenTelemetry traces, combined with Monte Carlo simulation of fail-stop failures, needs explicit non-blocking rules for message queues to predict endpoint availability accurately. It derives the graph directly from raw traces, attaches success predicates per endpoint, and compares versions with and without asynchronous treatment of Kafka links. The simulation matches the shape of availability drop seen in real chaos experiments where services are randomly killed in Docker Compose. Yet the version with async semantics shifts the numbers by no more than 10 to the minus five, suggesting that connectivity alone captures the immediate HTTP behavior for this application. The result supports using simpler models when the goal is rapid assessment of resilience under sudden outages.

Core claim

The trace-derived connectivity model reproduces the overall availability degradation curve observed in chaos experiments. Introducing asynchronous semantics for Kafka edges changes the predicted availabilities by at most about 10^{-5}, or 0.001 percentage points. Therefore, for immediate HTTP availability in this case study, a simpler connectivity-only model is sufficient.

What carries the argument

Monte Carlo simulation over a service dependency graph extracted from raw OpenTelemetry traces, using endpoint-specific success predicates and optional non-blocking treatment of Kafka edges under a fail-stop failure model.

If this is right

  • Availability estimates stay essentially unchanged across the tested failure fractions when async details are omitted.
  • For HTTP-centric microservices, effort spent on timing semantics yields negligible improvement in immediate-success predictions.
  • A connectivity-only graph extracted from traces is adequate for reproducing the observed degradation curve in this deployment.
  • Computational cost of the resilience analysis can be reduced by dropping the asynchronous rules without loss of accuracy for the studied metric.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same trace-to-graph pipeline could be applied to other observability-heavy systems to test whether the negligible async effect holds more broadly.
  • Systems that rely on long-running asynchronous workflows rather than immediate HTTP replies might show larger differences once the same modeling choice is examined.
  • Extending the predicates to include partial failure modes or latency bounds would be a direct next measurement to check the limits of the connectivity-only simplification.

Load-bearing premise

The trace-derived graph plus endpoint success predicates plus fail-stop failure model accurately represent the real behavior of the demo under the chaos experiments performed in Docker Compose.

What would settle it

Running the same random service-kill patterns on the live demo and recording endpoint success rates that differ by more than 0.001 percentage points between the connectivity-only and async versions of the model.

Figures

Figures reproduced from arXiv: 2512.12314 by Anatoly A. Krasnovsky.

Figure 1
Figure 1. Figure 1: Comparison of model predictions with live measurements. (a) Probe [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of relative error between model and live availability for each [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
read the original abstract

While distributed tracing and chaos engineering are becoming standard for microservices, resilience models remain largely manual and bespoke. We revisit a trace-discovered connectivity model that derives a service dependency graph from traces and uses Monte Carlo simulation to estimate endpoint availability under fail-stop service failures. Compared to earlier work, we (i) derive the graph directly from raw OpenTelemetry traces, (ii) attach endpoint-specific success predicates, and (iii) add a simple asynchronous semantics that treats Kafka edges as non-blocking for immediate HTTP success. We apply this model to the OpenTelemetry Demo ("Astronomy Shop") using a GitHub Actions workflow that discovers the graph, runs simulations, and executes chaos experiments that randomly kill microservices in a Docker Compose deployment. Across the studied failure fractions, the model reproduces the overall availability degradation curve, while asynchronous semantics for Kafka edges change predicted availabilities by at most about 10^(-5) (0.001 percentage points). This null result suggests that for immediate HTTP availability in this case study, explicitly modeling asynchronous dependencies is not warranted, and a simpler connectivity-only model is sufficient.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript evaluates a trace-derived resilience model for microservices using the OpenTelemetry Demo. It derives a service dependency graph directly from raw OpenTelemetry traces, attaches endpoint-specific success predicates, and employs Monte Carlo simulation under a fail-stop failure model to estimate endpoint availability. The model is compared against chaos experiments that randomly kill microservices in a Docker Compose deployment. The central claims are that the simulation reproduces the observed availability degradation curve across studied failure fractions and that introducing simple asynchronous semantics for Kafka edges alters predicted availabilities by at most 10^{-5} (0.001 percentage points), implying that a connectivity-only model suffices for immediate HTTP availability in this case study.

Significance. If the reproduction of the degradation curve holds under the stated assumptions, the work provides concrete evidence from a realistic open-source demo that explicit modeling of asynchronous dependencies is unnecessary for short-term availability predictions in HTTP-centric microservices. The GitHub Actions workflow for trace discovery, simulation, and chaos execution is a strength that supports reproducibility. The null result on async semantics could inform simpler modeling practices, though its generality is limited to the studied system and failure model.

major comments (2)
  1. Abstract: The claim that the model reproduces the availability degradation curve and that async semantics change predictions by at most 10^{-5} is presented without error bars, confidence intervals, number of Monte Carlo runs, or discussion of simulation variance; this information is load-bearing for interpreting whether the tiny delta is distinguishable from noise and for supporting the conclusion that async modeling is unwarranted.
  2. Methods (trace processing and predicate attachment): Endpoint-specific success predicates are invoked to determine simulation outcomes but their exact definitions, how they are extracted from traces, and their mapping to the discovered graph are not specified in sufficient detail to assess fidelity to the Docker Compose chaos experiments or to enable independent replication.
minor comments (3)
  1. The manuscript would benefit from a dedicated limitations section discussing the fail-stop assumption and potential mismatches with real partial-failure or timeout behaviors observed in the demo.
  2. Figure captions and axis labels for the degradation curves should explicitly state the number of simulation trials and any aggregation method used to produce the plotted points.
  3. A brief comparison table contrasting the connectivity-only model versus the async variant (e.g., per-endpoint availability at each failure fraction) would make the 10^{-5} delta easier to evaluate.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the work's reproducibility strengths. We address each major comment below and will revise the manuscript accordingly to enhance clarity and support independent replication.

read point-by-point responses
  1. Referee: Abstract: The claim that the model reproduces the availability degradation curve and that async semantics change predictions by at most 10^{-5} is presented without error bars, confidence intervals, number of Monte Carlo runs, or discussion of simulation variance; this information is load-bearing for interpreting whether the tiny delta is distinguishable from noise and for supporting the conclusion that async modeling is unwarranted.

    Authors: We agree that reporting simulation parameters and variance is necessary to substantiate the claims. In the revised manuscript we will state the exact number of Monte Carlo runs used for each availability estimate, include error bars (or confidence intervals) derived from the simulation replicates, and add a short discussion showing that the maximum observed difference of 10^{-5} lies well below the estimated simulation variance, confirming it is indistinguishable from noise under the fail-stop model. revision: yes

  2. Referee: Methods (trace processing and predicate attachment): Endpoint-specific success predicates are invoked to determine simulation outcomes but their exact definitions, how they are extracted from traces, and their mapping to the discovered graph are not specified in sufficient detail to assess fidelity to the Docker Compose chaos experiments or to enable independent replication.

    Authors: We acknowledge the need for greater detail. The revised Methods section will explicitly list the success predicate for each endpoint (defined from trace attributes such as HTTP status codes and span status), describe the automated extraction rules applied to the raw OpenTelemetry traces, and specify the mapping from predicates to nodes in the discovered dependency graph. These additions will allow readers to verify fidelity to the chaos experiments and to replicate the simulation outcomes. revision: yes

Circularity Check

0 steps flagged

Direct Monte Carlo simulation on trace-derived graph yields null result with no fitted inputs or self-referential reduction

full rationale

The paper derives the dependency graph directly from raw OpenTelemetry traces, attaches endpoint-specific success predicates, and runs fail-stop Monte Carlo simulation to estimate availability under random service kills. These estimates are compared to separate chaos experiments in Docker Compose. The reported reproduction of the degradation curve and the ≤10^{-5} delta from adding non-blocking Kafka semantics are outputs of the simulation itself, not parameters fitted to the target curve or defined in terms of the result. The only self-reference is to prior model definition, which is not load-bearing for the null finding on asynchronous semantics. No step reduces by construction to its own inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that raw traces capture all relevant dependencies, that fail-stop failures are representative, and that the chosen success predicates correctly reflect endpoint behavior; no new entities are postulated and no parameters are fitted to the availability data itself.

free parameters (1)
  • studied failure fractions
    Discrete fractions of services killed in the chaos experiments; chosen to trace the degradation curve rather than fitted to any target availability value.
axioms (2)
  • domain assumption Services fail in a fail-stop manner
    Invoked when Monte Carlo simulation estimates endpoint availability under service failures.
  • domain assumption Traces from the demo capture the complete dependency graph for the studied endpoints
    Used when deriving the graph directly from raw OpenTelemetry traces.

pith-pipeline@v0.9.0 · 5504 in / 1547 out tokens · 35119 ms · 2026-05-16T22:48:15.389225+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

  1. [1]

    Opentelemetry demo documentation.https://opentelemetry.io/docs/demo/ (2025), accessed 22 November 2025

  2. [2]

    Opentelemetry traces specification.https://opentelemetry.io/docs/concepts/ signals/traces/(2025), accessed 22 November 2025

  3. [3]

    Resiliency in the opentelemetry collector.https://opentelemetry.io/docs/ collector/resiliency/(2025), accessed 22 November 2025

  4. [4]

    Journal of International Crisis and Risk Communication Re- search8(S10) (2025)

    Adapa,M.,SingiReddy,N.R.:Quantifyingchaosengineeringeffectivenessinevent- driven microservices. Journal of International Crisis and Risk Communication Re- search8(S10) (2025)

  5. [5]

    IEEE Software33(3), 35–41 (2016) 12 Krasnovsky

    Basiri, A., Behnam, N., de Rooij, R., Hochstein, L., Kosewski, L., Reynolds, J., Rosenthal, C.: Chaos engineering. IEEE Software33(3), 35–41 (2016) 12 Krasnovsky

  6. [6]

    Springer, 2 edn

    Billinton, R., Allan, R.N.: Reliability Evaluation of Engineering Systems: Concepts and Techniques. Springer, 2 edn. (1992)

  7. [7]

    Springer (2017)

    Dragoni, N., Giallorenzo, S., Lluch Lafuente, A., Mazzara, M., Montesi, F., Mustafin,R.,Safina,L.:Microservices:Yesterday,today,andtomorrow.In:Present and Ulterior Software Engineering. Springer (2017)

  8. [8]

    In: Proceedings of the 4th USENIX Symposium on Networked Systems Design and Implementation (NSDI)

    Fonseca, R., Porter, G., Katz, R.H., Shenker, S.: X-trace: A pervasive network tracing framework. In: Proceedings of the 4th USENIX Symposium on Networked Systems Design and Implementation (NSDI). USENIX (2007)

  9. [9]

    In: Pro- ceedings of the 24th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)

    Gan, Y., Zhang, Y., Cheng, D., Shetty, A., Rathi, P., Katarki, N., Bruno, A., Ritchken, B., Jackson, B., et al.: An open-source benchmark suite for microser- vices and their hardware-software implications for cloud & edge systems. In: Pro- ceedings of the 24th ACM International Conference on Architectural Support for Programming Languages and Operating Sy...

  10. [10]

    https://github.com/jaegertracing/jaeger(2017), gitHub repository

    Jaeger Authors: Jaeger: An open source end-to-end distributed tracing platform. https://github.com/jaegertracing/jaeger(2017), gitHub repository

  11. [11]

    Krasnovsky, A.A.: Model discovery and graph simulation: A lightweight gateway to chaos engineering (2025),https://arxiv.org/abs/2506.11176, accepted for publication at the 48th International Conference on Software Engineering (ICSE 2026)

  12. [12]

    https://doi.org/10.5281/zenodo.17703953

    Krasnovsky, A.A.: otel-demo-resilience (Nov 2025). https://doi.org/10.5281/zenodo.17703953

  13. [13]

    In: Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP)

    Mace, J., Roelke, R., Fonseca, R.: Pivot tracing: Dynamic causal monitoring for distributed systems. In: Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP). ACM (2015)

  14. [14]

    OpenTelemetry Authors: Opentelemetry demo (astronomy shop).https:// github.com/open-telemetry/opentelemetry-demo(2025), gitHub repository

  15. [15]

    arXiv preprint arXiv:2412.01416 (2024)

    Owotogbe, J., Kumara, I., van den Heuvel, W.J., Tamburri, D.A.: Chaos engineer- ing: A multi-vocal literature review. arXiv preprint arXiv:2412.01416 (2024)

  16. [16]

    Sambasivan, J.M., Shafer, I., et al.: So, you want to trace your distributed system? key design insights from years of practical experience. Tech. rep., Carnegie Mellon University, Parallel Data Laboratory (2014)

  17. [17]

    Sigelman, B.H., Barroso, L.A., Burrows, M., Stephenson, P., Plakal, M., Beaver, D., Jaspan, S., Shanbhag, C.: Dapper, a large-scale distributed systems tracing infrastructure. Tech. Rep. Technical Report dapper-2010-1, Google (2010)

  18. [18]

    Wiley, 2 edn

    Trivedi, K.S.: Probability and Statistics with Reliability, Queuing, and Computer Science Applications. Wiley, 2 edn. (2016)