Vibe Calibration: Autonomous Bring-up of a 112-Qubit Superconducting Quantum Processor by a Skill-Orchestrating Language Agent
Pith reviewed 2026-06-26 10:40 UTC · model grok-4.3
The pith
Language model agents distill expert knowledge into decision-tree Skills and autonomously calibrate 108 of 112 qubits in 4.7 hours.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Vibe Calibration organizes expert tacit knowledge into auditable Skills organized as decision trees that package parameterized measurement commands, quantitative acceptance criteria, and audit records; after fine-tuning a large language model on validated trajectories, the resulting agent-orchestrated system autonomously completes calibration of 108 out of 112 qubits on a frequency-tunable transmon processor in 4.7 hours, achieving a 4-5 times speedup over manual calibration of the full array and agreement with expert manual results on 14 out of 16 qubits in a 16-qubit subset, with the core workflows generalizing to new processors after only minor interface adaptations.
What carries the argument
The Skill, a decision tree that packages parameterized measurement commands, quantitative acceptance criteria, and audit records to enable autonomous execution and self-healing.
If this is right
- The system reduces calibration time for 108 qubits to 4.7 hours versus the time required for manual calibration of all 112 qubits.
- Cross-validation on a 16-qubit subset shows agreement between autonomous and expert manual results on 14 qubits.
- Low-level control scripts require only minor interface adaptation while core decision logic and task orchestration transfer to new processors.
- The approach removes a critical barrier to scalable quantum hardware operation by providing a reusable and auditable laboratory interface.
Where Pith is reading between the lines
- If the Skills prove robust, the same distillation method could be applied to bring-up tasks on other quantum hardware platforms such as trapped ions or photonic circuits.
- The auditable nature of the Skills creates a record that could be mined to identify and correct systematic weaknesses in calibration procedures over multiple devices.
- Faster calibration cycles would allow more rapid iteration between processor fabrication and testing, potentially shortening development timelines for larger arrays.
Load-bearing premise
The three-phase human-in-the-loop distillation process fully captures expert tacit judgment into the Skills without critical omissions or biases that would cause autonomous execution to fail on unseen hardware states or anomalous signals.
What would settle it
A direct side-by-side run on the same 112-qubit processor in which the autonomous system and expert manual team produce differing calibration outcomes on more than two of the 16 cross-validated qubits, or fail to reach comparable success rates within the reported time on a new processor.
Figures
read the original abstract
Superconducting quantum computing is one of the most mature solid-state platforms for quantum computation, with processors exceeding one hundred qubits. Yet further scaling toward fault-tolerant quantum computing is increasingly constrained by calibration complexity. Conventional scripts are brittle to anomalous signals, and expert judgment is bounded by cognitive bandwidth and serial operation time, failing to keep pace with system scale. Here we report Vibe Calibration, an autonomous calibration system orchestrated by large language model agents, which distills expert tacit knowledge into reusable Skills. Each Skill is organized as a decision tree that packages parameterized measurement commands, quantitative acceptance criteria, and audit records, enabling autonomous execution and self-healing. We capture this knowledge through a three-phase human-in-the-loop distillation process and fine-tune a large language model on validated trajectories. On a 112-qubit processor with frequency-tunable transmons, the system autonomously completes calibration of 108 out of 112 qubits in 4.7 hours, achieving a 4--5$\times$ speedup over manual calibration of the full 112 qubits. A cross-validated comparison with expert manual calibration on a 16-qubit subset shows agreement on 14 out of 16 qubits. More importantly, the model demonstrates transferable calibration workflows across devices. While low-level control scripts require minor interface adaptation for different hardware platforms, the core decision logic and task orchestration generalize to new processors, demonstrating a reusable laboratory interface rather than a memorized script.This work demonstrates, for the first time, fully autonomous calibration of a hundred-qubit superconducting processor through reusable and auditable Skills, removing a critical barrier to scalable quantum hardware operation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to introduce Vibe Calibration, an autonomous calibration system for a 112-qubit superconducting quantum processor orchestrated by LLM agents. Expert tacit knowledge is distilled into reusable 'Skills' (decision trees with parameterized commands, acceptance criteria, and audit records) via a three-phase human-in-the-loop process and fine-tuning. Key results include autonomous calibration of 108 out of 112 qubits in 4.7 hours (4-5× speedup over manual), 14/16 agreement with expert manual calibration on a 16-qubit subset, and transferability of workflows across devices.
Significance. If the empirical results hold under fuller validation, this would represent a meaningful advance in addressing calibration bottlenecks for scaling superconducting processors beyond 100 qubits. The distillation of expert judgment into auditable, reusable Skills with demonstrated speedup and partial cross-device transferability provides concrete evidence of a practical laboratory automation approach, which could reduce reliance on serial expert time.
major comments (2)
- [Abstract] Abstract: The headline metrics (108/112 qubits calibrated autonomously in 4.7 hours, 4–5× speedup) rest on the three-phase distillation fully encoding expert acceptance criteria, yet no details are given on measurement protocols, statistical analysis of outcomes, failure modes for the 4 unsuccessful qubits, or how criteria were validated across the full dataset.
- [Abstract] Abstract: Cross-validation is reported only on a 16-qubit subset (agreement on 14/16), with no independent expert audit described for the 108 autonomous results or the 4 failures; this is load-bearing for the central claim of equivalence to expert calibration on the full processor.
minor comments (1)
- The term 'Skill' is presented as a novel packaging of decision trees; a brief comparison to existing automated calibration frameworks or scripted decision systems in the quantum hardware literature would improve context.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the potential impact of Vibe Calibration on scaling superconducting processors. We address each major comment below. We agree that the abstract and main text would benefit from expanded details on protocols, statistical methods, failure modes, and validation procedures, and we will revise the manuscript to include these.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline metrics (108/112 qubits calibrated autonomously in 4.7 hours, 4–5× speedup) rest on the three-phase distillation fully encoding expert acceptance criteria, yet no details are given on measurement protocols, statistical analysis of outcomes, failure modes for the 4 unsuccessful qubits, or how criteria were validated across the full dataset.
Authors: We agree that the abstract is concise and omits these specifics. The main text (Sections 3 and 4) describes the three-phase human-in-the-loop distillation, the structure of Skills (including parameterized commands and quantitative acceptance criteria), and the overall workflow. However, explicit discussion of measurement protocols (e.g., pulse sequences and readout settings), statistical analysis (e.g., how qubit success was determined across repeated runs), failure modes for the four qubits, and full-dataset validation is not sufficiently detailed. We will add a dedicated subsection (likely in Results or a new Methods appendix) providing these elements, including a table summarizing outcomes and failure analysis. revision: yes
-
Referee: [Abstract] Abstract: Cross-validation is reported only on a 16-qubit subset (agreement on 14/16), with no independent expert audit described for the 108 autonomous results or the 4 failures; this is load-bearing for the central claim of equivalence to expert calibration on the full processor.
Authors: The 16-qubit cross-validation was designed as a direct head-to-head comparison between autonomous Skill execution and expert manual calibration on the same device. For the full 112-qubit run, equivalence is claimed via the Skills having been distilled and validated through the three-phase process with expert oversight. We acknowledge that an independent post-hoc expert audit of all 108 successful and 4 failed calibrations is not described and would strengthen the claim. We will revise the text to explicitly state the scope of expert validation performed, clarify limitations of the subset comparison, and add any available post-run audit data or metrics if they exist in our records. revision: partial
Circularity Check
No circularity: purely empirical hardware metrics
full rationale
The paper reports measured outcomes from running an LLM-orchestrated calibration system on physical hardware (108/112 qubits completed in 4.7 h, 4–5× speedup, 14/16 subset agreement). No equations, derivations, fitted parameters, or self-citations appear in the abstract or described claims. The central results are direct experimental observations rather than any quantity computed from prior outputs by construction. The distillation process is an engineering workflow whose validity is assessed by external hardware performance, not by internal consistency with its own inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Expert tacit knowledge for qubit calibration can be captured without critical loss in structured decision trees via a three-phase human-in-the-loop process
invented entities (1)
-
Skill
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Kjaergaard, M
M. Kjaergaard, M. E. Schwartz, J. Braumüller, P. Krantz, J. I.-J. Wang, S. Gustavsson, and W. D. Oliver, Superconducting qubits: Current state of play, Annual Review of Condensed Matter Physics 11, 369 (2020)
2020
-
[2]
Y. Kim, A. Eddins, S. Anand, K. X. Wei, E. Van Den Berg, S. Rosenblatt, H. Nayfeh, Y. Wu, M. Zale- tel, K. Temme, et al., Evidence for the utility of quan- tum computing before fault tolerance, Nature618, 500 (2023)
2023
-
[3]
F. Jin, S. Jiang, X. Zhu, Z. Bao, F. Shen, K. Wang, Z. Zhu, S. Xu, Z. Song, J. Chen, et al. , Topological prethermal strong zero modes on superconducting pro- 9 cessors, Nature645, 626 (2025)
2025
-
[4]
D. Gao, D. Fan, C. Zha, J. Bei, G. Cai, J. Cai, S. Cao, F. Chen, J. Chen, K. Chen, et al. , Establishing a new benchmark in quantum computational advantage with 105-qubit zuchongzhi 3.0 processor, Physical Review Let- ters 134, 090601 (2025)
2025
-
[5]
Bravyi, A
S. Bravyi, A. W. Cross, J. M. Gambetta, D. Maslov, P. Rall, and T. J. Yoder, High-threshold and low- overhead fault-tolerant quantum memory, Nature627, 778 (2024)
2024
-
[6]
G. Q. AI, Quantum error correction below the surface code threshold, Nature638, 920 (2025)
2025
-
[7]
T. He, W. Lin, R. Wang, Y. Li, J. Bei, J. Cai, S. Cao, D. Chen, K. Chen, X. Chen,et al., Experimental quan- tum error correction below the surface code threshold via all-microwave leakage suppression, Physical Review Let- ters 135, 260601 (2025)
2025
-
[8]
A. G. Fowler, M. Mariantoni, J. M. Martinis, and A. N. Cleland, Surface codes: Towards practical large-scale quantum computation, Physical Review A 86, 032324 (2012)
2012
-
[9]
Preskill, Quantum computing in the nisq era and be- yond, Quantum2, 79 (2018)
J. Preskill, Quantum computing in the nisq era and be- yond, Quantum2, 79 (2018)
2018
-
[11]
Y. Y. Gao, M. A. Rol, S. Touzard, and C. Wang, Practi- cal guide for building superconducting quantum devices, PRX Quantum2, 040202 (2021)
2021
-
[12]
Wittler, F
N. Wittler, F. Roy, K. Pack, M. Werninghaus, A. S. Roy, D. J. Egger, S. Filipp, F. K. Wilhelm, and S. Machnes, Integrated tool set for control, calibration, and charac- terization of quantum devices applied to superconducting qubits, Physical Review Applied15, 034080 (2021)
2021
-
[13]
Kelly, R
J. Kelly, R. Barends, A. Fowler, A. Megrant, E. Jeffrey, T. White, D. Sank, J. Mutus, B. Campbell, Y. Chen, et al., Scalable in situ qubit calibration during repetitive error detection, Physical Review A94, 032321 (2016)
2016
-
[14]
Vaswani, N
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, At- tention is all you need, Advances in Neural Information Processing Systems30 (2017)
2017
-
[15]
Brown, B
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Ka- plan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., Language models are few-shot learners, Advances in Neural Information Processing Systems33, 1877 (2020)
2020
-
[16]
D. A. Boiko, R. MacKnight, B. Kline, and G. Gomes, Au- tonomous chemical research with large language models, Nature 624, 570 (2023)
2023
-
[17]
N. J. Szymanski, B. Rendy, Y. Fei, R. E. Kumar, T. He, D. Milsted, M. J. McDermott, M. Gallant, E. D. Cubuk, A. Merchant, et al., An autonomous laboratory for the accelerated synthesis of inorganic materials, Nature624, 86 (2023)
2023
-
[18]
S. Cao, Z. Zhang, M. Alghadeer, S. D. Fasciati, M. Piscitelli, M. Bakr, P. Leek, and A. Aspuru-Guzik, Agents for self-driving laboratories applied to quantum computing, arXiv preprint arXiv:2412.07978 (2024)
arXiv 2024
-
[19]
S. Li, J. M. Miller, P. J. Lee, G. Andersson, C. R. Conner, Y. J. Joshi, B. Karimi, A. M. King, H. L. Malc, H. Mishra, et al. , Large language model-assisted superconducting qubit experiments, arXiv preprint arXiv:2603.08801 (2026)
arXiv 2026
-
[21]
NVIDIA, NVIDIA Ising Calibration 1: A 35b vision-language model for quantum cal- ibration, https://build.nvidia.com/nvidia/ ising-calibration-1-35b-a3b (2026), accessed: 2026-05-29
2026
-
[22]
NVIDIA, Quantum calibration agent blueprint, https://github.com/NVIDIA/ Quantum-Calibration-Agent-Blueprint (2026), ac- cessed: 2026-05-29
2026
-
[23]
S. M. Kay,Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory , Prentice Hall Signal Pro- cessing Series (Prentice Hall, Upper Saddle River, NJ, 1993)
1993
-
[31]
Moonshot AI, Kimi K2.6, https://huggingface.co/ moonshotai/Kimi-K2.6 (2026)
2026
-
[34]
Moonshot AI, Kimi Code: Getting started with the cli, https://www.kimi.com/help/kimi-code/ cli-getting-started (2026), accessed: 2026-05-19
2026
-
[36]
Vibe Calibration: Autonomous Bring-up of a 112-Qubit Superconducting Quantum Processor by a Skill-Orchestrating Language Agent
QuarkStudio Contributors, QuarkStudio: A measure- ment framework for quantum experiments, https:// 10 quarkstudio.readthedocs.io/en/latest/ (2026), ac- cessed: 2026-05-28. Supplemental Material for “Vibe Calibration: Autonomous Bring-up of a 112-Qubit Superconducting Quantum Processor by a Skill-Orchestrating Language Agent” Huikai Xu,1, ∗ Jiaxiu Han,1, ∗...
2026
-
[37]
Power Rabi (home/run_powerrabi.py)
-
[38]
SingleShot (home/run_singleshot.py)
-
[39]
Run Spectrum on s8 (first step in pipeline)
Ramsey (home/run_ramsey.py) I’ll proceed with those when Spectrum finishes. * Brewed for 1m 20s * Background command "Run Spectrum on s8 (first step in pipeline)" completed (exit code 0) * Spectrum step completed. Let me check the results and proceed to Power Rabi. Read 1 file (ctrl+o to expand) * The Spectrum step completed. Let me read the full output. ...
2026
-
[43]
bash -l -c ’cd ˜/autolab && QUARK_USER=baqis uv run python home/run_t1.py’
[...] Ramsey Running T1 now. * Bash(ssh s8 "bash -l -c ’cd ˜/autolab && QUARK_USER=baqis uv run python home/run_t1.py’" 2>&1 | tail -40) > Load settings from /Users/baqis/autolab/quark.json Database path: /Users/baqis/autolab/home/checkpoint.db [Fig] Saved: results/characterization_20260506/T1_group0.png ... +425 lines (ctrl+o to expand) * T1 is running. ...
-
[45]
[OK] Power Rabi -- done
-
[46]
[OK] SingleShot -- done
-
[47]
[OK] T1 -- done (103/108 OK across all groups)
-
[48]
bash -l -c ’cd ˜/autolab && QUARK_USER=baqis uv run python home/run_ramsey.py’
[...] Ramsey -- last step Running Ramsey now. * Bash(ssh s8 "bash -l -c ’cd ˜/autolab && QUARK_USER=baqis uv run python home/run_ramsey.py’" 2>&1 | tail -40) > Running in the background (down to manage) * Ramsey is running (final step, background). Let me check progress. * Task Output(non-blocking) bnyk6r51b > (No output) * Ramsey is still running. Let me...
-
[49]
[OK] Spectrum -- done
-
[50]
[OK] Power Rabi -- done (0 bad fits)
-
[51]
[OK] SingleShot -- done (median visibility ˜1.6)
-
[52]
[OK] T1 -- done (103/108 OK)
-
[53]
The /loop will automatically check again in 10 minutes and proceed once done
[...] Ramsey -- running (final step) I’ll wait for Ramsey to complete. The /loop will automatically check again in 10 minutes and proceed once done. * Cogitated for 27m 7s * Running scheduled task (May 10 10:44pm) * Checking on the Ramsey step -- it’s been running in the background. * Task Output(non-blocking) bnyk6r51b > (No output) * The /loop checks in...
-
[54]
* The /loop dynamic mode isn’t available, but I’m still in the session
[...] Ramsey -- still running (background task bnyk6r51b) I’ll check again on the next loop cycle. * The /loop dynamic mode isn’t available, but I’m still in the session. Let me check the Ramsey task directly. * Task Output(non-blocking) bnyk6r51b > (No output) * The /loop fired again (every 10m, job 40560d9d). Ramsey is still running -- the background ta...
-
[55]
Run Ramsey on s8 (final step)
[...] Ramsey -- still in progress (task bnyk6r51b) * Churned for 45s * Background command "Run Ramsey on s8 (final step)" completed (exit code 0) * Ramsey step completed! Let me check the results. Searched for 1 pattern (ctrl+o to expand) * All 5 characterization steps completed. Full pipeline summary: 18 Step: Spectrum Status: [OK] Done Results: All 4 gr...
-
[56]
[OK] Spectrum -- all 4 groups done
-
[57]
[OK] Power Rabi -- all 4 groups done (0 bad fits)
-
[58]
[OK] SingleShot -- all 4 groups done
-
[59]
[OK] T1 -- 103/108 OK
-
[60]
bash -l -c ’cd ˜/autolab && QUARK_USER=baqis uv run python -c \
[OK] Ramsey -- Groups 1-3 done (83/83 OK), Group 0 fit failures No remaining steps. The only incomplete item is Group 0 Ramsey (0/25 fit) -- want me to re-run that group with different parameters (e.g. shorter delay_max or more repeats)? * Baked for 12s * Running scheduled task (May 10 11:01pm) 19 * Pipeline already fully complete. No remaining steps: +--...
-
[61]
Qwen Team, Qwen3.6-35B-A3B: Agentic coding power, now open to all (2026)
2026
-
[62]
Qwen Team, Qwen3.5: Towards native multimodal agents (2026)
2026
-
[63]
Kimi Team, Kimi K2: Open agentic intelligence (2025), arXiv:2507.20534 [cs.LG]
Pith/arXiv arXiv 2025
-
[64]
Kimi Team, Kimi K2.5: Visual agentic intelligence (2026), arXiv:2602.02276 [cs.CL]
Pith/arXiv arXiv 2026
-
[65]
Moonshot AI, Kimi K2.6, https://huggingface.co/moonshotai/Kimi-K2.6 (2026)
2026
-
[66]
Moonshot AI, Kimi Code: Getting started with the cli, https://www.kimi.com/help/kimi-code/cli-getting-started (2026), accessed: 2026-05-19
2026
-
[67]
W. Kwon, Z. Li, S. Zhuang, Y. Sheng, L. Zheng, C. H. Yu, J. Gonzalez, H. Zhang, and I. Stoica, in Proceedings of the 29th Symposium on Operating Systems Principles, SOSP ’23 (Association for Computing Machinery, New York, NY, USA, 2023) p. 611–626
2023
-
[68]
Anthropic, Claude Code, https://www.claude.com/product/claude-code (2025), accessed: 2026-05-19
2025
-
[69]
T. Schick, J. Dwivedi-Yu, R. Dess `ı, R. Raileanu, M. Lomeli, L. Zettlemoyer, N. Cancedda, and T. Scialom, in Advances in Neural Information Processing Systems (NeurIPS) (2023) arXiv:2302.04761
Pith/arXiv arXiv 2023
-
[70]
Qwen Team, Qwen3.6 model family, Model card, https://huggingface.co/Qwen/Qwen3.6-27B (2026), accessed: 2026-05-19
2026
-
[71]
DeepSeek-AI, Deepseek-v4: Towards highly efficient million-token context intelligence (2026), arXiv:2606.19348 [cs.CL]
arXiv 2026
-
[72]
Z. Allen-Zhu and Y. Li, in Proceedings of the International Conference on Machine Learning (ICML) (2024) arXiv:2309.14316
arXiv 2024
- [73]
-
[74]
J. Gao, Z.-Y. Lu, P. Zhang, and Z.-F. Gao, Strategic over-parameterization for generalizable low-rank adaptation (2026), arXiv:2605.16470 [cs.LG]
Pith/arXiv arXiv 2026
-
[75]
Anthropic, Equipping agents for the real world with agent skills, https://www.anthropic.com/engineering/ equipping-agents-for-the-real-world-with-agent-skills (2025), accessed: 2026-05-19
2025
-
[76]
Anthropic, Agent skills, https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview (2026), ac- cessed: 2026-05-19
2026
-
[77]
J. Koch, T. M. Yu, J. Gambetta, A. A. Houck, D. I. Schuster, J. Majer, A. Blais, M. H. Devoret, S. M. Girvin, and R. J. Schoelkopf, Physical Review A76, 042319 (2007), arXiv:cond-mat/0703002
Pith/arXiv arXiv 2007
-
[78]
P. Krantz, M. Kjaergaard, F. Yan, T. P. Orlando, S. Gustavsson, and W. D. Oliver, Applied Physics Reviews6, 10.1063/1.5089550 (2019)
-
[79]
Blais, A
A. Blais, A. L. Grimsmo, S. M. Girvin, and A. Wallraff, Reviews of Modern Physics 93, 025005 (2021)
2021
-
[80]
M. Boissonneault, J. M. Gambetta, and A. Blais, Physical Review A 79, 013819 (2009), arXiv:0810.1336 [quant-ph]
Pith/arXiv arXiv 2009
-
[81]
M. S. Khalil, M. J. A. Stoutimore, F. C. Wellstood, and K. D. Osborn, Journal of Applied Physics 111, 054510 (2012), arXiv:1108.3117 [physics.ins-det]
Pith/arXiv arXiv 2012
-
[82]
S. Probst, F. B. Song, P. A. Bushev, A. V. Ustinov, and M. Weides, Review of Scientific Instruments86, 024706 (2015), arXiv:1410.3365 [physics.ins-det]
Pith/arXiv arXiv 2015
-
[83]
S. Cao, Z. Zhang, A. Agarwal, G. Bratrud, N. R. Beysengulov, D. C. Cole, A. G. Frieiro, E. O. Glen, H. Hsu, G. Huang, R. Jow, G. Shaji, T. Lubowe, L. Zhu, L. M. Calder ´on, N. Pancotti, J. Pendleton, B. Severin, C. E. Staub, S. Sussman, A. Veps ¨al¨ainen, N. R. Vora, Y. Xu, V. Bernales, D. Bowring, E. Kyoseva, I. Rungger, G. Semeghini, S. Stanwyck, T. Cos...
Pith/arXiv arXiv 2026
-
[84]
NVIDIA, NVIDIA Ising Calibration 1: A 35b vision-language model for quantum calibration, https://build.nvidia.com/nvidia/ ising-calibration-1-35b-a3b (2026), accessed: 2026-05-29
2026
-
[85]
NVIDIA, Quantum calibration agent blueprint, https://github.com/NVIDIA/Quantum-Calibration-Agent-Blueprint (2026), accessed: 2026-05-29
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.