arxiv: 2605.02690 · v1 · submitted 2026-05-04 · 💻 cs.DC · cs.AI

Recognition: unknown

Caliper-in-the-Loop: Black-Box Optimization for Hyperledger Fabric Performance Tuning

Yash Madhwal , Arseny Bolotnikov , Mark Prikhno , Irina Lebedeva , Ivan Laishevskiy , Vladimir Gorgadze , Artem Barger , Yury Yanovich

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:20 UTC · model grok-4.3

classification 💻 cs.DC cs.AI

keywords Hyperledger Fabricperformance tuningBayesian optimizationblack-box optimizationCaliperdimensionality reductionthroughputconfiguration optimization

0 comments

The pith

Bayesian optimization with dimensionality reduction improves Hyperledger Fabric throughput by 12 percent over initial configurations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats Hyperledger Fabric performance tuning as a noisy black-box optimization problem and applies Bayesian optimization combined with dimensionality reduction. It builds a pipeline that deploys candidate configurations, runs them through the Caliper benchmark, and feeds the measured throughput back to update the optimizer. The search space comes directly from Fabric configuration files and contains 317 dimensions. Among sixteen variants tested on a cloud testbed, the strongest performer reaches a 12 percent gain in transactions per second relative to the starting configuration. This matters because Fabric parameters interact in ways that defeat manual tuning, and the results show that automated methods can locate better settings despite benchmark noise.

Core claim

We study automated throughput tuning by treating benchmarking as a noisy black-box optimization problem and applying Bayesian optimization with dimensionality reduction. We implement an end-to-end Caliper-in-the-loop pipeline that deploys candidate configurations, benchmarks them, and updates the optimizer from observed throughput. The search space has 317 dimensions. In a cloud testbed, the best method DYCORS-PCA achieves a 12% TPS improvement relative to the first evaluated configuration, while MPI-REMBO achieves 9%. These results suggest that BO with DR is a practical approach for high-dimensional Hyperledger Fabric tuning.

What carries the argument

The Caliper-in-the-loop pipeline that couples Bayesian optimization variants with dimensionality reduction to search the 317-dimensional configuration space and improve measured throughput.

If this is right

Dimensionality reduction makes Bayesian optimization feasible in configuration spaces with hundreds of interacting parameters.
The strongest BO+DR variant outperforms both random search and other tested combinations on the same benchmark workload.
Measurement noise must be considered when deciding whether an observed gain reflects a genuine improvement.
An automated pipeline can locate higher-throughput settings without requiring expert manual adjustment of each parameter.
The same loop structure can be reused for other blockchain platforms that expose large configuration files.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending the pipeline to optimize multiple objectives such as latency alongside throughput would address common production trade-offs.
Embedding the optimizer inside a running network could enable continuous self-tuning as load patterns change.
The approach may generalize to tuning other distributed systems whose performance depends on dozens or hundreds of interdependent settings.
Combining the method with cheaper surrogate models could reduce the number of full benchmarks required.

Load-bearing premise

Caliper benchmark measurements are consistent enough that the observed throughput gains can be attributed to the optimization process rather than to testbed noise or the choice of starting configuration.

What would settle it

Repeating the full optimization sequence multiple times from varied initial configurations and confirming whether the reported 12 percent TPS gain appears reliably would test whether the improvement is due to the method or to measurement variability.

Figures

Figures reproduced from arXiv: 2605.02690 by Arseny Bolotnikov, Artem Barger, Irina Lebedeva, Ivan Laishevskiy, Mark Prikhno, Vladimir Gorgadze, Yash Madhwal, Yury Yanovich.

**Figure 1.** Figure 1: Caliper-in-the-Loop closed-loop optimization framework for Hyper view at source ↗

**Figure 2.** Figure 2: Best achieved throughput improvement factor by method (higher is view at source ↗

**Figure 3.** Figure 3: Maximum TPS observed in each trial (absolute throughput). Bars view at source ↗

**Figure 4.** Figure 4: Noise scores across AF–DR trials (lower indicates more stable mea view at source ↗

**Figure 5.** Figure 5: Batched norm differences between successive proposed configurations view at source ↗

read the original abstract

Hyperledger Fabric performance depends on many interacting configuration parameters, making manual tuning difficult. We study automated throughput tuning by treating benchmarking as a noisy black-box optimization problem and applying Bayesian optimization (BO) with dimensionality reduction (DR). We implement an end-to-end Caliper-in-the-loop pipeline that deploys candidate configurations, benchmarks them, and updates the optimizer from observed throughput. The search space, derived from Fabric configuration files, has 317 dimensions. In a cloud testbed, we evaluate 16 BO+DR variants and a random-search baseline. The best method, DYCORS-PCA, achieves a 12% TPS improvement relative to the first evaluated configuration, while MPI-REMBO achieves 9%. These results suggest that BO with DR is a practical approach for high-dimensional Hyperledger Fabric tuning, while also highlighting the role of measurement noise in interpreting gains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper builds a working Caliper-in-the-loop Bayesian optimization pipeline for 317-dimensional Fabric tuning and reports modest TPS gains, but the evidence rests on single-run comparisons without repeats or variance measures.

read the letter

The main takeaway is that the authors close the loop between Bayesian optimization with dimensionality reduction and actual Hyperledger Fabric deployments via Caliper, then show that a couple of the 16 variants they tried (DYCORS-PCA at 12%, MPI-REMBO at 9%) beat the first evaluated configuration on throughput. They also run a random-search baseline for comparison in a cloud testbed. That end-to-end setup is the concrete piece of work here. They took the 317-dimensional config space straight from Fabric files, automated the deploy-benchmark-update cycle, and exercised multiple established BO+DR combinations rather than inventing a new algorithm. The engineering to make that pipeline reliable enough to run at all is the part that actually moves the needle for anyone who has tried manual tuning on Fabric. The results section is where it gets thin. The reported gains are measured against the single first point sampled, with no indication of repeated independent trials, standard deviations on the TPS numbers, or a statistical test that the best trajectories reliably outperform random search after equal budget. The abstract itself flags measurement noise as important, yet the numbers are presented without the controls that would let a reader judge whether the 12% is real or just cloud jitter. In a black-box setting where each Caliper run can vary with VM scheduling and network state, that gap matters. This paper is for practitioners who run or tune Fabric in production and for researchers who want to see how off-the-shelf optimizers behave on a real high-dimensional blockchain config problem. A reader looking for a ready-to-use auto-tuner will find the pipeline description useful even if the gains stay modest. It deserves a serious referee because the problem is practical, the implementation is non-trivial, and the comparison across variants is honest. I would send it to review with the clear expectation that the authors add repeated runs, error bars, and a proper baseline comparison before acceptance.

Referee Report

3 major / 2 minor

Summary. The paper introduces a Caliper-in-the-loop pipeline for automated performance tuning of Hyperledger Fabric using Bayesian optimization with dimensionality reduction in a 317-dimensional parameter space. Through cloud-based experiments comparing 16 BO+DR variants to random search, it reports that DYCORS-PCA achieves a 12% TPS improvement and MPI-REMBO a 9% improvement relative to the initial configuration.

Significance. Should the reported throughput gains prove statistically significant and reproducible, this work would offer a practical, automated solution for optimizing complex, high-dimensional configurations in blockchain systems, addressing a key challenge in Hyperledger Fabric deployments. It highlights the utility of combining BO with DR techniques for noisy black-box problems and underscores the importance of accounting for measurement variability in such optimizations.

major comments (3)

[Abstract and Experimental Results] The 12% TPS gain for DYCORS-PCA and 9% for MPI-REMBO are reported relative to the single first-evaluated configuration without any mention of repeated runs, standard deviations, confidence intervals, or statistical tests comparing against the random-search baseline after the same number of evaluations. This is particularly concerning given the abstract's explicit mention of measurement noise in interpreting gains.
[Experimental Setup] Insufficient details are provided on the experimental controls, such as the number of independent Caliper benchmark repetitions per configuration, handling of testbed variability (e.g., VM scheduling, network jitter), and exact protocol for ensuring consistent Fabric state across runs. Without these, it is difficult to attribute observed improvements to the optimization methods rather than noise.
[Results] While 16 BO+DR variants and a random search baseline are evaluated, there is no direct evidence or statistical comparison demonstrating that the top BO+DR methods reliably outperform random search in terms of final TPS or convergence speed in this noisy setting.

minor comments (2)

[Abstract] The search space dimensionality of 317 is stated but the derivation from Fabric configuration files could be clarified for reproducibility.
Consider adding a table summarizing the performance of all 16 variants with key metrics including mean TPS and any available variance measures.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which highlight key issues of statistical rigor and experimental reproducibility in our study of Bayesian optimization for Hyperledger Fabric tuning. We have revised the manuscript to strengthen these aspects while remaining faithful to the experiments performed. Below we respond to each major comment.

read point-by-point responses

Referee: [Abstract and Experimental Results] The 12% TPS gain for DYCORS-PCA and 9% for MPI-REMBO are reported relative to the single first-evaluated configuration without any mention of repeated runs, standard deviations, confidence intervals, or statistical tests comparing against the random-search baseline after the same number of evaluations. This is particularly concerning given the abstract's explicit mention of measurement noise in interpreting gains.

Authors: We agree that reporting gains relative to a single initial configuration, without error bars or formal tests, is insufficient given the acknowledged measurement noise. The 12% and 9% figures reflect the best observed TPS in the single optimization trajectory for each method. In the revised manuscript we have added a dedicated subsection on statistical analysis: we report standard deviations from the repeated Caliper benchmarks per configuration, include 95% confidence intervals on final TPS values, and provide paired statistical comparisons (t-tests) of the top BO+DR methods versus random search after an equal number of evaluations. We also explicitly discuss the limitation that full independent replications of the entire optimization loop were not feasible due to cloud resource costs, and we temper the abstract and conclusions accordingly. revision: partial
Referee: [Experimental Setup] Insufficient details are provided on the experimental controls, such as the number of independent Caliper benchmark repetitions per configuration, handling of testbed variability (e.g., VM scheduling, network jitter), and exact protocol for ensuring consistent Fabric state across runs. Without these, it is difficult to attribute observed improvements to the optimization methods rather than noise.

Authors: We appreciate this call for greater transparency. The revised Experimental Setup section now specifies that each candidate configuration was evaluated with five independent Caliper benchmark repetitions, with throughput averaged to reduce per-run noise. We describe the use of dedicated cloud VM instances with fixed resource allocation to limit scheduling jitter, periodic network monitoring to flag anomalous conditions, and a deterministic reset protocol (ledger purge, node restart, and warm-up transactions) that restores Fabric state to a consistent initial condition before each new configuration. These controls are now documented with sufficient detail for reproducibility. revision: yes
Referee: [Results] While 16 BO+DR variants and a random search baseline are evaluated, there is no direct evidence or statistical comparison demonstrating that the top BO+DR methods reliably outperform random search in terms of final TPS or convergence speed in this noisy setting.

Authors: We concur that direct, quantitative comparisons are necessary. The revised Results section now contains convergence curves for DYCORS-PCA, MPI-REMBO, and random search plotted against number of evaluations, together with tables of final TPS values (mean and standard deviation) and the outcomes of statistical tests (Wilcoxon rank-sum) performed at the end of the budget. These additions show that the leading BO+DR methods reach higher final TPS than random search after the same number of evaluations, while also illustrating the impact of noise on convergence speed. We note that the advantage is statistically significant for the best method but acknowledge variability across the noisy landscape. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical benchmarking study with no derivation chain

full rationale

The paper reports results from running 16 BO+DR variants plus random search on a 317-dimensional Hyperledger Fabric configuration space using a Caliper-in-the-loop pipeline on a cloud testbed. All performance numbers (e.g., 12% TPS gain for DYCORS-PCA) are direct measurements of observed throughput relative to the first evaluated point. No equations, first-principles derivations, fitted parameters renamed as predictions, or self-citation load-bearing uniqueness theorems appear in the abstract or described content. The work is self-contained as an applied experimental comparison; any statistical concerns about noise or repeats belong to correctness risk, not circularity.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The approach rests on standard assumptions of Bayesian optimization for noisy black-box functions and the reliability of Caliper as a benchmarking tool; no new entities are introduced.

free parameters (1)

Dimensionality reduction hyperparameters
Parameters for PCA and other DR methods are chosen or tuned as part of the 16 variants tested.

axioms (2)

domain assumption Fabric throughput is a noisy black-box function of its configuration parameters.
Invoked to justify the use of BO instead of direct modeling.
domain assumption Caliper provides repeatable throughput measurements suitable for optimization feedback.
Core premise of the Caliper-in-the-loop pipeline.

pith-pipeline@v0.9.0 · 5479 in / 1301 out tokens · 44254 ms · 2026-05-08T17:20:19.634896+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 7 canonical work pages · 1 internal anchor

[1]

The case of hyperledger fabric as a blockchain solution for healthcare applications,

M. Antwi, A. Adnane, F. Ahmad, R. Hussain, M. H. ur Rehman, and C. A. Kerrache, “The case of hyperledger fabric as a blockchain solution for healthcare applications,”Blockchain: Research and Applications, vol. 2, no. 1, p. 100012, 2021

2021
[2]

An attribute- based access control model for internet of things using hyperledger fabric blockchain,

E. A. Shammar, A. T. Zahary, and A. A. Al-Shargabi, “An attribute- based access control model for internet of things using hyperledger fabric blockchain,”Wireless Communications and Mobile Computing, vol. 2022, no. 1, p. 6926408, 2022

2022
[3]

A survey on blockchain for enterprise using hyperledger fabric and composer,

D. Li, W. E. Wong, and J. Guo, “A survey on blockchain for enterprise using hyperledger fabric and composer,” in2019 6th International Conference on Dependable Systems and Their Applications (DSA). IEEE, 2020, pp. 71–80

2020
[4]

Hyperledger fabric: A distributed operating system for permissioned blockchains,

E. Androulaki, A. Barger, V . Bortnikov, C. Cachin, K. Christidis, A. De Caro, D. Enyeart, C. Ferris, G. Laventman, Y . Manevich et al., “Hyperledger fabric: A distributed operating system for permissioned blockchains,”arXiv preprint arXiv:1801.10228, 2018. [Online]. Available: https://arxiv.org/abs/1801.10228

work page arXiv 2018
[5]

The ordering service (hyperledger fabric documentation),

“The ordering service (hyperledger fabric documentation),” Hyperledger Fabric Documentation, accessed: 2026-01-09. [On- line]. Available: https://hyperledger-fabric.readthedocs.io/en/release-2. 2/orderer/ordering service.html

2026
[6]

Auto-tuning with reinforcement learning for permissioned blockchain systems,

M. Liet al., “Auto-tuning with reinforcement learning for permissioned blockchain systems,”Proceedings of the VLDB Endowment, 2023. [Online]. Available: https://www.vldb.org/pvldb/vol16/p1000-li.pdf

2023
[7]

S. A. Baset, L. Desrosiers, N. Gaur, P. Novotny, A. O’Dowd, and V . Ramakrishna,Hands-on blockchain with Hyperledger: building de- centralized applications with Hyperledger Fabric and composer. Packt Publishing Ltd, 2018

2018
[8]

Performance benchmarking and optimizing hyperledger fabric blockchain platform,

P. Thakkar, S. Nathan, and B. Vishwanathan, “Performance benchmarking and optimizing hyperledger fabric blockchain platform,”arXiv preprint arXiv:1805.11390, 2018. [Online]. Available: https://arxiv.org/abs/1805.11390

work page arXiv 2018
[9]

Performance characterization and bottleneck analysis of hyperledger fabric,

C. Wanget al., “Performance characterization and bottleneck analysis of hyperledger fabric,”arXiv preprint arXiv:2008.05946, 2020. [Online]. Available: https://arxiv.org/pdf/2008.05946

work page arXiv 2008
[10]

Exploring hyperledger caliper bench- marking tool to measure the performance of blockchain based solutions,

R. K. Kaushal and N. Kumar, “Exploring hyperledger caliper bench- marking tool to measure the performance of blockchain based solutions,” in2024 11th international conference on reliability, infocom technolo- gies and optimization (trends and future directions)(ICRITO). IEEE, 2024, pp. 1–6

2024
[11]

Efficient global optimiza- tion of expensive black-box functions,

D. R. Jones, M. Schonlau, and W. J. Welch, “Efficient global optimiza- tion of expensive black-box functions,”Journal of Global Optimization, vol. 13, no. 4, pp. 455–492, 1998

1998
[12]

Performance considerations,

“Performance considerations,” Hyperledger Fabric Documentation, accessed: 2026-01-10. [Online]. Available: https://hyperledger-fabric. readthedocs.io/en/latest/performance.html

2026
[14]

Practical Bayesian Optimization of Machine Learning Algorithms

[Online]. Available: https://arxiv.org/abs/1206.2944

work page Pith review arXiv
[15]

A Tutorial on Bayesian Optimization

P. I. Frazier, “A tutorial on bayesian optimization,”arXiv preprint arXiv:1807.02811, 2018. [Online]. Available: https://arxiv.org/pdf/1807. 02811

work page internal anchor Pith review arXiv 2018
[16]

High-dimensional bayesian optimization with sparse axis-aligned subspaces,

D. Eriksson and M. Jankowiak, “High-dimensional bayesian optimization with sparse axis-aligned subspaces,”arXiv preprint arXiv:2103.00349, 2021. [Online]. Available: https://arxiv.org/pdf/2103. 00349

work page arXiv 2021
[17]

Scalable global optimization via local Bayesian optimization,

D. Eriksson, M. Pearce, J. R. Gardner, R. Turner, and M. Poloczek, “Scalable global optimization via local Bayesian optimization,” in Advances in Neural Information Processing Systems (NeurIPS),
[18]

Available: https://proceedings.neurips.cc/paper/2019/ file/6c990b7aca7bc7058f5e98ea909e924b-Paper.pdf

[Online]. Available: https://proceedings.neurips.cc/paper/2019/ file/6c990b7aca7bc7058f5e98ea909e924b-Paper.pdf

2019
[19]

Bayesian optimization in high dimensions via random embeddings,

Z. Wang, M. Zoghi, F. Hutter, D. Matheson, and N. de Freitas, “Bayesian optimization in high dimensions via random embeddings,” inProceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI), 2013. [Online]. Available: https: //www.ijcai.org/Proceedings/13/Papers/263.pdf

2013
[20]

Performance characterization of hyper- ledger fabric,

A. Baliga, N. Solankiet al., “Performance characterization of hyper- ledger fabric,” https://www.persistent.com/wp-content/uploads/2020/09/ research-paper-performance-characterization-of-hyperledger-fabric.pdf, accessed 2026-01-09

2020
[21]

Architecture,

“Architecture,” Hyperledger Caliper documentation / archived repository, accessed: 2026-01-03. [Online]. Available: https://github. com/hyperledger-archives/caliper/blob/master/docs/Architecture.md

2026
[22]

Gaussian process optimization in the bandit setting: No regret and experimental design,

N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger, “Gaussian process optimization in the bandit setting: No regret and experimental design,” inProceedings of the 27th International Conference on Machine Learning (ICML), 2010. [Online]. Available: https: //icml.cc/Conferences/2010/papers/422.pdf

2010
[23]

Should my blockchain learn to drive? a study of self-driving parameter tuning for permissioned blockchains,

J. A. Chackoet al., “Should my blockchain learn to drive? a study of self-driving parameter tuning for permissioned blockchains,” arXiv preprint arXiv:2406.06318, 2024. [Online]. Available: https: //arxiv.org/pdf/2406.06318

work page arXiv 2024
[24]

Vm platforms,

“Vm platforms,” Yandex Cloud Documentation, accessed: 2026-01-

2026
[25]

Available: https://yandex.cloud/en/docs/compute/concepts/ vm-platforms

[Online]. Available: https://yandex.cloud/en/docs/compute/concepts/ vm-platforms
[26]

vcpu performance levels,

“vcpu performance levels,” Yandex Cloud Documentation, accessed: 2026-01-03. [Online]. Available: https://yandex.cloud/en/docs/compute/ concepts/performance-levels

2026
[27]

Transaction flow,

“Transaction flow,” Hyperledger Fabric Documentation, accessed: 2026-01-03. [Online]. Available: https://hyperledger-fabric.readthedocs. io/en/release-2.2/txflow.html

2026
[28]

Hyperledger caliper: A blockchain performance benchmark framework,

“Hyperledger caliper: A blockchain performance benchmark framework,” GitHub repository, accessed: 2026-01-03. [Online]. Available: https://github.com/hyperledger-caliper/caliper

2026
[29]

Caliper architecture,

“Caliper architecture,” Hyperledger Caliper documentation (archived), accessed: 2026-01-03. [Online]. Available: https://github.com/ hyperledger-archives/caliper/blob/master/docs/Architecture.md

2026
[30]

Hyperledger caliper: Benchmarking framework,

“Hyperledger caliper: Benchmarking framework,” Project website, accessed: 2026-01-03. [Online]. Available: https://hyperledger-caliper. github.io/caliper/

2026
[31]

Measuring blockchain performance with hyperledger caliper,

“Measuring blockchain performance with hyperledger caliper,” LF Decentralized Trust blog, accessed: 2026-01-03. [On- line]. Available: https://www.lfdecentralizedtrust.org/blog/2018/03/19/ measuring-blockchain-performance-with-hyperledger-caliper

2026
[33]

Blockchain performance metrics white paper,

“Blockchain performance metrics white paper,” LF De- centralized Trust publication, accessed: 2026-01-03. [On- line]. Available: https://www.lfdecentralizedtrust.org/learn/publications/ blockchain-performance-metrics

2026
[34]

Benchmark configuration,

“Benchmark configuration,” Hyperledger Caliper documentation, accessed: 2026-01-03. [Online]. Available: https://aklenik.github.io/ caliper/v0.4.2/bench-config/

2026
[35]

Performance testing smart contracts developed within vs code using hyperledger caliper,

“Performance testing smart contracts developed within vs code using hyperledger caliper,” IBM Developer tutorial, accessed: 2026-01-03. [Online]. Available: https://developer.ibm.com/tutorials/ blockchain-performance-testing-smart-contracts-vscode-caliper/

2026