A PennyLane-Centric Dataset to Enhance LLM-based Quantum Code Generation using RAG

· 2025 · cs.SE · arXiv 2503.02497

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Large Language Models (LLMs) offer powerful capabilities in code generation, natural language understanding, and domain-specific reasoning. Their application to quantum software development remains limited, in part because of the lack of high-quality datasets both for LLM training and as dependable knowledge sources. To bridge this gap, we introduce \textit{PennyLang}, an off-the-shelf, high-quality dataset of 3,347 PennyLane-specific quantum code samples with contextual descriptions, curated from textbooks, official documentation, and open-source repositories. Our contributions are threefold: (1) the creation and open-source release of PennyLang, a purpose-built dataset for quantum programming with PennyLane; (2) a framework for automated quantum code dataset construction that systematizes curation, annotation, and formatting to maximize downstream LLM usability; and (3) a baseline evaluation of the dataset across multiple open-source and commercial models, including ablation studies, all conducted within a retrieval-augmented generation (RAG) pipeline. Using PennyLang with RAG substantially improves performance: for example, Qwen 7B's success rate rises from 8.7% without retrieval to 41.7% with full-context augmentation, and LLaMa 4 improves from 78.8% to 84.8%, while also reducing hallucinations and enhancing quantum code correctness. Moving beyond Qiskit-focused studies, we bring LLM-based tools and reproducible methods to PennyLane for advancing AI-assisted quantum development.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Can LLMs Solve Science or Just Write Code? Evaluating Quantum Solver Generation

cs.SE · 2026-05-08 · unverdicted · novelty 4.0 · 2 refs

Iterative refinement boosts LLM success in generating quantum solvers that match classical results, but more advanced models shift from execution errors to hard-to-detect numerical inaccuracies.

Automated Quantum Software and AI Engineering

cs.SE · 2026-04-21 · unverdicted · novelty 4.0

A systematic literature review maps trends in automated approaches to quantum software engineering and quantum AI, highlighting their role in hybrid quantum-classical systems.

citing papers explorer

Showing 2 of 2 citing papers.

Can LLMs Solve Science or Just Write Code? Evaluating Quantum Solver Generation cs.SE · 2026-05-08 · unverdicted · none · ref 10 · 2 links · internal anchor
Iterative refinement boosts LLM success in generating quantum solvers that match classical results, but more advanced models shift from execution errors to hard-to-detect numerical inaccuracies.
Automated Quantum Software and AI Engineering cs.SE · 2026-04-21 · unverdicted · none · ref 68 · internal anchor
A systematic literature review maps trends in automated approaches to quantum software engineering and quantum AI, highlighting their role in hybrid quantum-classical systems.

A PennyLane-Centric Dataset to Enhance LLM-based Quantum Code Generation using RAG

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer