Duet instrumentation uses LLM-driven code analysis to instrument performance-relevant changes between two app versions, detecting regressions at up to 5x lower severity than standard duet benchmarks in a testbed evaluation.
Software microbenchmarking in the cloud. how bad is it really?
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
Ensemble voting strategies for change point detection improve F1-score by 11% over Mozilla's T-test method on a new ground-truth dataset of 174 performance time series annotated by practitioners.
Microbenchmarks on the JVM can produce misleading results due to unrealistic profiles collected during isolated execution despite following JMH guidelines.
citing papers explorer
-
Duet instrumentation: An Agentic Approach to Improving Sensitivity in Cloud Service Benchmarking
Duet instrumentation uses LLM-driven code analysis to instrument performance-relevant changes between two app versions, detecting regressions at up to 5x lower severity than standard duet benchmarks in a testbed evaluation.
-
Exploring Statistical Change Point Detection Techniques for Performance Anomaly Detection at Mozilla
Ensemble voting strategies for change point detection improve F1-score by 11% over Mozilla's T-test method on a new ground-truth dataset of 174 performance time series annotated by practitioners.
-
Misleading Microbenchmarks on the Java Virtual Machines
Microbenchmarks on the JVM can produce misleading results due to unrealistic profiles collected during isolated execution despite following JMH guidelines.