pith. sign in

arxiv: 2606.23704 · v1 · pith:IIGI3AELnew · submitted 2026-06-10 · 💻 cs.AR

PCB-QA: Evaluating LLMs over the First Printed Circuit Board Design Question-Answer Dataset

Pith reviewed 2026-06-27 08:13 UTC · model grok-4.3

classification 💻 cs.AR
keywords PCB designLarge Language ModelsQuestion AnsweringElectronic Design AutomationNetlistsSchematicsBenchmark DatasetJSON format
0
0 comments X

The pith

A JSON textual format lets LLMs answer PCB design questions at 93 percent accuracy on a new benchmark of 480 pairs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates PCB-QA, the first manually built set of 480 question-answer pairs taken from eight open-source PCB projects that differ in complexity. It tests how well several LLMs handle questions on component connections, datasheets, and SPICE simulation data when the input is given as graphical PDFs, native KiCAD files, or a proposed JSON text version. The best result comes from Gemini 3 Flash Preview at 93 percent accuracy on the JSON format, while other representations score lower. This shows that text-based PCB files can be evaluated by language models even though no such dataset existed before. A reader would care because it supplies the missing data and method needed to bring LLMs into board-level electronic design work.

Core claim

The central claim is that LLMs can understand and answer questions about printed circuit board designs when the designs are supplied in a JSON-based textual format, with Gemini 3 Flash Preview reaching 93 percent accuracy on the PCB-QA dataset of 480 pairs drawn from eight projects of varying complexity; native graphical PDFs and KiCAD files produce lower scores, establishing that text representations are usable for LLM evaluation in PCB tasks.

What carries the argument

The PCB-QA dataset of 480 manually authored question-answer pairs that probe component connections, datasheet details, and SPICE simulation outputs, paired with experiments that feed LLMs the same designs in PDF, KiCAD, and JSON forms.

If this is right

  • Text-based formats such as the proposed JSON representation outperform graphical and native file formats for LLM comprehension of schematics and netlists.
  • Commercial models can be benchmarked on PCB tasks for the first time using this dataset.
  • The open questionnaire enables repeated evaluation of future LLMs on board design questions.
  • LLM assistance becomes feasible for routine PCB checks involving connections and simulation data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Design tools could embed the JSON format and an LLM to flag connection errors before fabrication.
  • Extending the dataset to include more layers or proprietary boards would test whether the 93 percent result generalizes.
  • The same prompting approach might apply to other hardware formats such as FPGA or mechanical CAD files.

Load-bearing premise

The 480 question-answer pairs drawn from eight projects of varying complexities are representative of real PCB design tasks and the chosen file representations fairly measure what LLMs actually understand.

What would settle it

Running the same four LLMs on a fresh collection of PCB projects that include multi-layer boards or designs absent from the original eight would show whether accuracy stays near 93 percent or falls sharply.

Figures

Figures reproduced from arXiv: 2606.23704 by Benjamin Tan, Benjamin Turnbull, Hammond Pearce, Sahana Srinivasan.

Figure 1
Figure 1. Figure 1: Overview of the methods to prepare the different [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Different format sets of PCB design files. After Step 1 of Figure 1, to provide a comprehensive overview of the PCB [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Templates for (a) each question category in the [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Accuracy of responses per model, evaluated against [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
read the original abstract

Large Language Models (LLMs) have demonstrated capabilities in electronic design automation (EDA) for integrated circuits. However, their applications in printed circuit board (PCB) design and analysis tasks remain underexplored. In part, this is due to a (1) a lack of text-based PCB datasets to evaluate LLMs and (2) a lack of methodologies for prompting LLMs with different types of PCB design files. To address this gap, our paper proposes PCB-QA: a manually created questionnaire dataset amounting to 480 question-answer pairs for PCBs, derived from 8 different open-source hardware projects of varying complexities. We examine multiple aspects of PCB designs and cover questions about component connections, datasheet examination, and simulation data obtainable via SPICE. Using our dataset as a benchmark, we prompt LLMs with different representations (and combinations) of PCB design files and record observations. This allows us to measure, for the first time, if LLMs can understand schematics and netlists in their "native forms" (i.e. graphical PDFs, KiCAD-format design files) or if textual formats are preferred. Including both commercial and open-weight models, we benchmark 4 state-of-the-art LLMs on our dataset, finding that Gemini 3 Flash Preview can answer questions with an accuracy of 93% using a proposed JSON-based textual format. This demonstrates that text-based PCB design formats can be evaluated by LLMs. Our open-source questionnaire is the first step towards enabling LLM integrations within the PCB design life cycle.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces PCB-QA, the first manually created dataset of 480 question-answer pairs derived from 8 open-source PCB projects of varying complexity. It benchmarks four LLMs (including commercial and open-weight models) on multiple PCB design representations—graphical PDFs, KiCAD files, and a proposed JSON-based textual format—reporting that Gemini 3 Flash Preview achieves 93% accuracy on the JSON format. The work positions this as evidence that text-based PCB representations can be effectively evaluated by LLMs and as an initial step toward LLM integration in the PCB design lifecycle.

Significance. If the dataset is shown to be representative and the performance claims are validated, the open-sourced PCB-QA dataset would be a valuable contribution by filling a noted gap in benchmarks for LLM-based EDA on PCBs. The explicit comparison of native graphical vs. textual formats and the release of the questionnaire enable follow-on work; these strengths are noted even though the current evidence for generalization remains limited.

major comments (3)
  1. [Dataset Construction] Dataset section: No description is provided of the question-generation process, inter-annotator agreement, error analysis, or coverage metrics for the 480 pairs across the eight projects. This directly undermines the central claim that the 93% accuracy demonstrates LLM understanding of PCB designs, as the pairs' representativeness for tasks such as connectivity, datasheets, SPICE, routing, and constraints is unverified.
  2. [Results] Results and Evaluation sections: The headline 93% accuracy for Gemini 3 Flash Preview on the JSON format is reported without statistical testing, confidence intervals, per-question-type breakdowns, or hold-out project evaluation. This makes it impossible to determine whether the result is robust or merely reflects narrow patterns in the eight source projects.
  3. [Introduction] Introduction and Dataset sections: Reliance on only eight open-source projects leaves open the possibility of training-data overlap and incomplete coverage of real-world PCB issues (e.g., manufacturing rules, high-speed constraints, thermal/EMI). No analysis addresses these risks, which are load-bearing for the generalization that text-based formats are preferred for LLM evaluation.
minor comments (2)
  1. [Experimental Setup] Clarify the exact model version and prompting details (temperature, few-shot examples, JSON schema definition) in the experimental setup to improve reproducibility.
  2. [Results] The abstract states the dataset size and 93% result but the main text should include a table summarizing accuracy by model and representation for quick comparison.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their constructive feedback. We address each major comment below and indicate planned revisions to improve the manuscript.

read point-by-point responses
  1. Referee: [Dataset Construction] Dataset section: No description is provided of the question-generation process, inter-annotator agreement, error analysis, or coverage metrics for the 480 pairs across the eight projects. This directly undermines the central claim that the 93% accuracy demonstrates LLM understanding of PCB designs, as the pairs' representativeness for tasks such as connectivity, datasheets, SPICE, routing, and constraints is unverified.

    Authors: We agree that the current manuscript lacks sufficient detail on dataset construction. In the revision we will expand the Dataset section with a description of the question-generation process (expert review of each project to create questions on connectivity, datasheets, and SPICE), categorization of the 480 pairs, any inter-annotator agreement metrics, error analysis, and coverage statistics across the eight projects. This will directly support the representativeness claim. revision: yes

  2. Referee: [Results] Results and Evaluation sections: The headline 93% accuracy for Gemini 3 Flash Preview on the JSON format is reported without statistical testing, confidence intervals, per-question-type breakdowns, or hold-out project evaluation. This makes it impossible to determine whether the result is robust or merely reflects narrow patterns in the eight source projects.

    Authors: We accept that additional statistical rigor is required. The revised Results section will include statistical significance tests, confidence intervals (via bootstrap resampling), and per-question-type accuracy breakdowns. We will also add leave-one-project-out evaluation to assess robustness across the eight source projects. revision: yes

  3. Referee: [Introduction] Introduction and Dataset sections: Reliance on only eight open-source projects leaves open the possibility of training-data overlap and incomplete coverage of real-world PCB issues (e.g., manufacturing rules, high-speed constraints, thermal/EMI). No analysis addresses these risks, which are load-bearing for the generalization that text-based formats are preferred for LLM evaluation.

    Authors: The work is framed as an initial benchmark using publicly available projects of varying complexity. We will add an explicit Limitations subsection discussing potential training-data overlap and incomplete coverage of advanced topics such as manufacturing rules, high-speed constraints, and EMI. The core empirical result on JSON vs. native formats will be presented with these caveats. revision: partial

standing simulated objections not resolved
  • Complete verification of training-data overlap is not feasible for closed-source models whose training corpora are not disclosed.

Circularity Check

0 steps flagged

No circularity: empirical benchmark on newly authored dataset with direct measurements

full rationale

The paper constructs a new 480-pair QA dataset from 8 open-source projects and reports LLM accuracies via direct prompting experiments. No equations, fitted parameters, self-citations as load-bearing premises, ansatzes, or uniqueness theorems appear in the provided text. The 93% accuracy figure is a direct empirical measurement on the authors' own test items rather than a derived quantity that reduces to prior inputs by construction. The central claim (text-based PCB formats are evaluable by LLMs) follows from the benchmark results without self-referential reduction. This is the expected non-finding for a dataset/benchmark paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical dataset-creation and benchmarking paper; no free parameters, axioms, or invented entities are introduced.

pith-pipeline@v0.9.1-grok · 5813 in / 1117 out tokens · 24843 ms · 2026-06-27T08:13:47.418405+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 23 canonical work pages · 4 internal anchors

  1. [1]

    Sequoia Alexander. 2021. Twisted-Fields/acorn-robot-electronics. https: //github.com/Twisted-Fields/acorn-robot-electronics original-date: 2021-01- 31T00:37:35Z

  2. [2]

    Anthropic. 2026. Claude Sonnet 4.6. https://www.anthropic.com/claude/sonnet

  3. [3]

    Arya. 2019. CRImier/MyKiCad. https://github.com/CRImier/MyKiCad original- date: 2019-07-22T02:53:26Z

  4. [4]

    awgh. 2022. Meshinger/Meshinger. https://github.com/Meshinger/Meshinger

  5. [5]

    Dylan Bristot. 2025. Compare LLMs Side-by-Side | WhatLLM.org. https:// whatllm.org/compare

  6. [6]

    Maurizio Calabrese, Leonardo Agnusdei, Gianmauro Fontana, Gabriele Papadia, and Antonio Del Prete. 2025. Application of Mask R-CNN and YOLOv8 algorithms for defect detection in printed circuit board manufacturing.Discover Applied Sciences7, 4 (March 2025), 257. doi:10.1007/s42452-025-06641-x

  7. [7]

    Kaiyan Chang, Ying Wang, Haimeng Ren, Mengdi Wang, Shengwen Liang, Yinhe Han, Huawei Li, and Xiaowei Li. 2023. ChipGPT: How far are we from natural language hardware design. doi:10.48550/arXiv.2305.14019 arXiv:2305.14019 [cs]

  8. [8]

    Mark Chen et al . 2021. Evaluating Large Language Models Trained on Code. doi:10.48550/arXiv.2107.03374 arXiv:2107.03374 [cs]

  9. [9]

    Smith, and Matt Gard- ner

    Pradeep Dasigi, Kyle Lo, Iz Beltagy, Arman Cohan, Noah A. Smith, and Matt Gard- ner. 2021. A Dataset of Information-Seeking Questions and Answers Anchored Table 4: Binary classification metrics of the selected LLMs on a 25% subset of the entire dataset, with different input combinations from design sets➁and➂. NNet and NCir denote native KiCAD exports, PNe...

  10. [10]

    Alekos Filini. 2024. TwentyTwoHW/portal-hardware. https://github.com/ TwentyTwoHW/portal-hardware

  11. [11]

    Shajib Ghosh, Antika Roy, Nitin Varshney, Patrick Craig, Md Mahfuz Al Hasan, and Navid Asadizanjani. 2025. Advanced metrics for high-precision PCB inspec- tion using 3D x-ray reconstruction. InMetrology, Inspection, and Process Control XXXIX, Vol. 13426. SPIE, 835–844. doi:10.1117/12.3058653

  12. [12]

    Google. 2025. Gemini 3 Flash Preview. https://ollama.com/gemini-3-flash- preview

  13. [13]

    Taojun Hu and Xiao-Hua Zhou. 2024. Unveiling LLM Evaluation Focused on Met- rics: Challenges and Solutions. doi:10.48550/arXiv.2404.09135 arXiv:2404.09135 [cs]

  14. [14]

    Hugging Face. 2024. sentence-transformers/all-MiniLM-L6-v2·Hugging Face. https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

  15. [15]

    Shinya Ishikawa. 2021. stack-chan/stack-chan. https://github.com/stack-chan/ stack-chan

  16. [16]

    Peter Jansen. 2023. From Words to Wires: Generating Functioning Electronic Devices from Natural Language Descriptions. doi:10.48550/arXiv.2305.14874 arXiv:2305.14874 [cs]

  17. [17]

    Pan, and Ping Luo

    Yao Lai, Sungyoung Lee, Guojin Chen, Souradip Poddar, Mengkang Hu, David Z. Pan, and Ping Luo. 2025. AnalogCoder: Analog Circuit Design via Training-Free Code Generation.Proceedings of the AAAI Conference on Artificial Intelligence39, 1 (April 2025), 379–387. doi:10.1609/aaai.v39i1.32016 Number: 1

  18. [18]

    Jindong Li, Lianrong Chen, Bin Yang, Jiadong Zhu, Ying Wang, Yuzhe Ma, and Menglin Yang. 2025. PCB-Bench: Benchmarking LLMs for Printed Circuit Board Placement and Routing. https://openreview.net/forum?id=Q5QLu7XTWx

  19. [19]

    Junyan Li, Sam-Zaak Wong, Gwok-Waa Wan, Xi Wang, and Jun Yang. 2025. EDA-Debugger: An LLM-Based Framework for Automated EDA Runtime Issue Resolution. In2025 26th International Symposium on Quality Electronic Design (ISQED). 1–7. doi:10.1109/ISQED65160.2025.11014463

  20. [20]

    Ming Li, Jike Zhong, Tianle Chen, Yuxiang Lai, and Konstantinos Psounis

  21. [21]

    13337–13349

    EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark. 13337–13349. https://openaccess.thecvf.com/content/ CVPR2025/html/Li_EEE-Bench_A_Comprehensive_Multimodal_Electrical_ And_Electronics_Engineering_Benchmark_CVPR_2025_paper.html

  22. [22]

    Renjie Li, Wenjie Wei, Qi Xin, Xiaoli Liu, Sixuan Mao, Erik Ma, Zijian Chen, Malu Zhang, Haizhou Li, and Zhaoyu Zhang. 2025. What Is Next for LLMs? Next-Generation AI Computing Hardware Using Photonic Chips. doi:10.48550/ arXiv.2505.05794 arXiv:2505.05794 [cs]

  23. [23]

    Rodriguez-Andina, Josep M

    Fanfan Lin, Xinze Li, Weihao Lei, Juan J. Rodriguez-Andina, Josep M. Guerrero, Changyun Wen, Xin Zhang, and Hao Ma. 2025. PE-GPT: A New Paradigm for Power Electronics Design.IEEE Transactions on Industrial Electronics72, 4 (April 2025), 3778–3791. doi:10.1109/TIE.2024.3454408

  24. [24]

    Junhua Liu, Fanfan Lin, Xinze Li, Kwan Hui Lim, and Shuai Zhao. 2024. Physics- Informed LLM-Agent for Automated Modulation Design in Power Electronics Systems. doi:10.48550/arXiv.2411.14214 arXiv:2411.14214 [cs]

  25. [25]

    Mingjie Liu, Nathaniel Pinckney, Brucek Khailany, and Haoxing Ren. 2023. VerilogEval: Evaluating Large Language Models for Verilog Code Generation. doi:10.48550/arXiv.2309.07544 arXiv:2309.07544 [cs]

  26. [26]

    Shang Liu, Wenji Fang, Yao Lu, Jing Wang, Qijun Zhang, Hongce Zhang, and Zhiyao Xie. 2025. RTLCoder: Fully Open-Source and Efficient LLM-Assisted RTL Code Generation Technique.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems44, 4 (April 2025), 1448–1461. doi:10.1109/TCAD. 2024.3483089

  27. [27]

    Aman Madaan, Shuyan Zhou, Uri Alon, Yiming Yang, and Graham Neubig. 2022. Language Models of Code are Few-Shot Commonsense Learners. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 1384–1403. doi:10.18653/v1/2022.emnlp-main.90

  28. [28]

    Shane Mattner. 2025. https://circuit-synth.readthedocs.io/en/latest/JSON_ SCHEMA.html

  29. [29]

    Dizon-Paradis, Nathan Jessurun, Damon L

    Dhwani Mehta, John True, Olivia P. Dizon-Paradis, Nathan Jessurun, Damon L. Woodard, Navid Asadizanjani, and Mark Tehranipoor. 2022. FICS PCB X-ray: A dataset for automated printed circuit board inter-layers inspection. https: //eprint.iacr.org/2022/924 Publication info: Preprint

  30. [30]

    Pragati Shuddhodhan Meshram, Swetha Karthikeyan, Bhavya, and Suma Bhat

  31. [31]

    ElectroVizQA: How well do Multi-modal LLMs perform in Electronics Visual Question Answering? doi:10.48550/arXiv.2412.00102 arXiv:2412.00102 [cs]

  32. [32]

    Meta. 2024. meta-llama/Llama-3.3-70B-Instruct·Hugging Face. https:// huggingface.co/meta-llama/Llama-3.3-70B-Instruct

  33. [33]

    Long Phan et al. 2025. Humanity’s Last Exam. doi:10.48550/arXiv.2501.14249 arXiv:2501.14249 [cs] version: 1

  34. [34]

    Weibo Qiu, Yinhao Xiao, and Runyu Pan. 2026. HWE-Bench: Can Language Models Perform Board-level Schematic Designs? doi:10.48550/arXiv.2603.18102 arXiv:2603.18102 [cs]

  35. [35]

    Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know What You Don’t Know: Unanswerable Questions for SQuAD. doi:10.48550/arXiv.1806.03822 arXiv:1806.03822 [cs]

  36. [36]

    Andrew Reilley. 2025. reilleya/CF-Chef. https://github.com/reilleya/CF-Chef

  37. [37]

    Philip Salmony. 2019. pms67/HadesFCS. https://github.com/pms67/HadesFCS original-date: 2019-12-01T22:27:22Z

  38. [38]

    Lejla Skelic, Yan Xu, Matthew Cox, Wenjie Lu, Tao Yu, and Ruonan Han. 2025. CIRCUIT: A Benchmark for Circuit Interpretation and Reasoning Capabilities of LLMs. doi:10.48550/arXiv.2502.07980 arXiv:2502.07980 [cs]

  39. [39]

    Minqing Sun, Lanqi Ding, Huifeng Zhu, Yier Jin, An Zou, Minqing Sun, Lanqi Ding, Huifeng Zhu, Yier Jin, and An Zou. 2026. From ICs to Device: A Survey on Hardware Tampering Detection via Power Delivery Network and Signal Trace. ACM Trans. Des. Autom. Electron. Syst.31, 2 (Jan. 2026), 37:1–37:31. doi:10.1145/ 3779441

  40. [40]

    Sanli Tang, Fan He, Xiaolin Huang, and Jie Yang. 2019. Online PCB De- fect Detector On A New PCB Defect Dataset. doi:10.48550/arXiv.1902.06197 arXiv:1902.06197 [cs]

  41. [41]

    Shailja Thakur, Jason Blocklove, Hammond Pearce, Benjamin Tan, Siddharth Garg, and Ramesh Karri. 2024. AutoChip: Automating HDL Generation Using LLM Feedback. doi:10.48550/arXiv.2311.04887 arXiv:2311.04887 [cs]

  42. [42]

    Coert Vonk. 2026. OPNhydro/hardware at main·cvonk/OPNhydro. https: //github.com/cvonk/OPNhydro/tree/main/hardware

  43. [43]

    Deepak Vungarala, Md Hasibul Amin, Pietro Mercati, Arnob Ghosh, Arman Roohi, Ramtin Zand, and Shaahin Angizi. 2025. LIMCA: LLM for Automating Analog In-Memory Computing Architecture Design Exploration. doi:10.48550/ arXiv.2503.13301 arXiv:2503.13301 [cs]

  44. [44]

    Deepak Vungarala, Md Hasibul Amin, Arman Roohi, Arnob Ghosh, Ramtin Zand, and Shaahin Angizi. 2025. From Prompt to Accelerator: A Perspective on LLM- Based Analog In-Memory Accelerator Design Automation. InProceedings of the Great Lakes Symposium on VLSI 2025 (GLSVLSI ’25). Association for Computing Machinery, New York, NY, USA, 811–816. doi:10.1145/37163...

  45. [45]

    Sam-Zaak Wong, Gwok-Waa Wan, Dongping Liu, and Xi Wang. 2024. VGV: Verilog Generation using Visual Capabilities of Multi-Modal Large Language Models. In2024 IEEE LLM Aided Design Workshop (LAD). 1–5. doi:10.1109/ LAD62341.2024.10691753 MLCAD ’26, Sep. 07–09, 2026, Jeju, South Korea Sahana Srinivasan, Benjamin Tan, Benjamin Turnbull, and Hammond Pearce

  46. [46]

    Youzhi Xu, Hao Wu, Yulong Liu, and Xiaoming Liu. 2025. Printed Circuit Board Sample Expansion and Automatic Defect Detection Based on Diffusion Models and ConvNeXt.Micromachines16, 3 (Feb. 2025), 261. doi:10.3390/mi16030261

  47. [47]

    Jianfeng Zheng, Xiaopeng Sun, Haixiang Zhou, Chenyang Tian, and Hao Qiang

  48. [48]

    doi:10.1109/ ACCESS.2022.3214306 Received 29 May 2026

    Printed Circuit Boards Defect Detection Method Based on Improved Fully Convolutional Networks.IEEE Access10 (2022), 109908–109918. doi:10.1109/ ACCESS.2022.3214306 Received 29 May 2026