Pith Number

pith:PUR7J55P

pith:2026:PUR7J55P3CJDPLJECWPX354HIX

not attested not anchored not stored refs resolved

Duet instrumentation: An Agentic Approach to Improving Sensitivity in Cloud Service Benchmarking

David Bermbach, Nils Japke, Sebastian Koch

Duet instrumentation uses LLMs to target performance measurements at code changes, detecting regressions at up to 5 times lower severity than standard benchmarks.

arxiv:2605.18397 v1 · 2026-05-18 · cs.DC

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{PUR7J55P3CJDPLJECWPX354HIX}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

our prototype can detect performance regressions at up to 5x lower injected severity compared to a traditional duet application benchmark while preserving similar A/A latency distributions.

C2weakest assumption

The LLM can reliably identify performance-relevant code changes between versions with enough accuracy (reported 58% precision, 93% recall at line-distance threshold of five) that the added instrumentation actually improves downstream regression detection sensitivity.

C3one line summary

Duet instrumentation uses LLM-driven code analysis to instrument performance-relevant changes between two app versions, detecting regressions at up to 5x lower severity than standard duet benchmarks in a testbed evaluation.

References

33 extracted · 33 resolved · 2 Pith anchors

[1] Bifrost: Sup- porting continuous deployment with automated enactment of multi- phase live testing strategies, 2016

[2] Continuous benchmark- ing: Using system benchmarking in build pipelines, 2019

[3] Creating a virtuous cycle in performance testing at mongodb, 2021 · doi:10.1145/3427921.3450234

[4] Patterns in the chaos - A study of performance variation and predictability in public iaas clouds, 2016 · doi:10.1145/2885497

[5] D. Bermbach, E. Wittern, and S. Tai,Cloud Service Benchmarking: Measuring Quality of Cloud Services from a Client Perspective, 1st ed. Springer Publishing Company, Incorporated, 2017 2017

Formal links

2 machine-checked theorem links

Receipt and verification

First computed	2026-05-20T00:05:58.735861Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

7d23f4f7afd89237ad24159f7df78745e801bdfe4a9394dbccf82d56437fa15a

Aliases

arxiv: 2605.18397 · arxiv_version: 2605.18397v1 · doi: 10.48550/arxiv.2605.18397 · pith_short_12: PUR7J55P3CJD · pith_short_16: PUR7J55P3CJDPLJE · pith_short_8: PUR7J55P

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/PUR7J55P3CJDPLJECWPX354HIX \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 7d23f4f7afd89237ad24159f7df78745e801bdfe4a9394dbccf82d56437fa15a

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "7794d89199a2347e89749baf70c889eb35f1bb7be0b76c7ff9a4e45a46f1be86",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by-nc-nd/4.0/",
    "primary_cat": "cs.DC",
    "submitted_at": "2026-05-18T13:43:10Z",
    "title_canon_sha256": "449f7816b6a7c55e27325358f339add8ad72e0c91abc0bde55200555ee2f08da"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.18397",
    "kind": "arxiv",
    "version": 1
  }
}