Debug Like a Human: Scaling LLM-based Fault Localization to Processor Design via Block-Level Instruction-Oriented Slicing

Deheng Yang; Guangda Zhang; Jiang Wu; Jianjun Xu; Jiayu He; Xiaoguang Mao; Yan Lei; Yihao Qin; Zizhen Liu

arxiv: 2605.17290 · v1 · pith:YF5ULMXEnew · submitted 2026-05-17 · 💻 cs.SE

Debug Like a Human: Scaling LLM-based Fault Localization to Processor Design via Block-Level Instruction-Oriented Slicing

Zizhen Liu , Xiaoguang Mao , Deheng Yang , Jiayu He , Yihao Qin , Guangda Zhang , Yan Lei , Jianjun Xu

show 1 more author

Jiang Wu

This is my paper

Pith reviewed 2026-05-19 22:51 UTC · model grok-4.3

classification 💻 cs.SE

keywords fault localizationLLM-based debuggingprocessor designRISC-VSystemVeriloghardware verificationbug localizationcode slicing

0 comments

The pith

BluesFL triples top-1 bug localization in large processor designs using block-level instruction slicing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents BluesFL to address the challenge of localizing bugs in large-scale processor designs using LLMs. It proposes a dataflow-based code blockization to focus on critical local contexts and a Block-Level Instruction-Oriented Slicing algorithm that lets LLMs analyze instruction execution paths and states like human debuggers. Evaluated on a 19K-line RISC-V SystemVerilog core, the approach localizes 24 bugs at top-1, a 242.9% improvement over the state-of-the-art. This makes automated fault localization more practical and cost-effective for hardware verification.

Core claim

BluesFL is a block-level LLM-based fault localization framework for processor designs that uses dataflow-based blockization and the Blues slicing algorithm to enable LLMs to mimic human reasoning on instruction paths and processor states, achieving correct localization of 24 bugs at Top-1 in a real-world 19K-line RISC-V core.

What carries the argument

The Block-Level Instruction-Oriented Slicing (Blues) algorithm that guides LLMs to focus on relevant code blocks derived from dataflow analysis and to examine instruction execution paths and processor states.

If this is right

Reduces the manual effort in processor verification by automating a key step.
Lowers the average cost of localizing a bug to under 30 cents.
Outperforms existing module-level LLM approaches by a large margin on project-level designs.
Provides a scalable method applicable to other complex hardware designs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The slicing technique might apply to debugging other large codebases in software engineering.
Future work could combine this with simulation traces for even better accuracy.
It suggests LLMs can handle hardware-specific reasoning when given structured context.

Load-bearing premise

That the dataflow-based blockization and Blues algorithm supply the right local context for LLMs to effectively replicate human debugging without overlooking important dependencies across blocks.

What would settle it

Testing the framework on another large processor design with a known set of injected bugs and measuring whether the top-1 hit rate remains significantly higher than baseline methods.

Figures

Figures reproduced from arXiv: 2605.17290 by Deheng Yang, Guangda Zhang, Jiang Wu, Jianjun Xu, Jiayu He, Xiaoguang Mao, Yan Lei, Yihao Qin, Zizhen Liu.

**Figure 1.** Figure 1: Overview of BluesFL. 3.2 Dataflow-based Code Blockization Definition 1 (Code Block). Let 𝐿 = {𝑙1,𝑙2, . . . ,𝑙𝑚 } denote the set of all code lines in the HDL source code. A code block 𝑏 is defined as a subset of lines 𝑏 ⊆ 𝐿 such that for any two distinct blocks 𝑏𝑖 , 𝑏𝑗 , we have 𝑏𝑖 ∩ 𝑏𝑗 = ∅. The set of all code blocks is denoted as 𝐵 = {𝑏1, 𝑏2, . . . , 𝑏𝑛 }. Based on Definition 1, we propose a dataflow-base… view at source ↗

**Figure 3.** Figure 3: The prompt template of BluesFL. We prompt LLM reasoning and make decisions via tool calls at each state (𝑏, 𝑡). As shown in [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Histogram of Block Sizes in Ibex (Log Y Scale). [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Distribution of the number of checked blocks across [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: A case to show how [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

read the original abstract

Fault localization in modern processor design code is a critical yet time-consuming step during processor verification. While recent advances in LLM-based techniques for module-level hardware design have shown promising results, automatically localizing bugs in large-scale, project-level processor designs remains challenging. In this paper, we present BluesFL, a novel block-level LLM-based fault localization framework for processor designs. Inspired by the way engineers debug processors, we first propose a dataflow-based code blockization approach to guide LLMs to focus on critical local code context. We further propose a Block-Level Instruction-Oriented Slicing (Blues) algorithm that enables LLMs to mimic human reasoning by analyzing instruction execution paths and processor states. We evaluate BluesFL on a real-world RISC-V processor core comprising 19K lines of SystemVerilog code. Experimental results demonstrate that BluesFL correctly localizes 24 bugs at Top-1, achieving 242.9% improvement over the existing state-of-the-art (7 bugs). Cost analysis shows that BluesFL requires an average of only $0.257 to localize a single bug.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BluesFL shows a slicing technique that lifts LLM fault localization on a real 19k-line processor core, but the large reported gain over prior work needs checking for whether the baseline got equivalent input context.

read the letter

BluesFL is a framework that uses dataflow blockization and instruction-oriented slicing to help LLMs localize faults in large processor designs. On a 19k-line RISC-V core they get 24 top-1 localizations, which is a big step up from the 7 reported for earlier methods. The work does a decent job of moving from module-level experiments to a full project, and the slicing approach is a reasonable way to give the model relevant context about execution paths and state. That matches how people actually debug hardware. The main concern is whether the baseline got the same treatment. If the state-of-the-art was run on raw code while this one used pre-sliced blocks, the 242% improvement mixes the effect of the new algorithm with the effect of better prompts or input. The paper would be stronger with an ablation that turns the slicing on and off, and with explicit confirmation that the comparison used the same LLM and the same set of bugs. This paper is for researchers building LLM assistants for hardware verification. Anyone trying to scale these tools beyond small modules will find the case study useful. I would send it to peer review. The result on a real design is concrete enough to merit referee feedback, even if the evaluation details need more work.

Referee Report

3 major / 2 minor

Summary. The paper presents BluesFL, a block-level LLM-based fault localization framework for processor designs. It proposes a dataflow-based code blockization approach and a Block-Level Instruction-Oriented Slicing (Blues) algorithm to enable LLMs to focus on critical local contexts and mimic human debugging by analyzing instruction execution paths and processor states. Evaluated on a real-world 19K-line RISC-V SystemVerilog core, BluesFL achieves Top-1 localization of 24 bugs, a 242.9% improvement over the prior state-of-the-art (7 bugs), with an average cost of $0.257 per bug.

Significance. If the central performance claims hold under fair and controlled comparisons, the work has moderate-to-high significance for scaling automated fault localization to project-level hardware designs. The human-inspired slicing and blockization techniques address a genuine scalability gap in LLM-based hardware debugging, and the low per-bug cost supports practical adoption. The evaluation on a real 19K-line core is a strength, though the absence of detailed ablations and baseline equivalence details limits the immediate impact.

major comments (3)

[Abstract and Evaluation] Abstract and Evaluation section: the headline claim of 242.9% improvement (24 vs. 7 Top-1 localizations) is load-bearing for the paper's contribution, yet the manuscript does not state whether the SOTA baseline received equivalent dataflow-based blockization and Blues slicing inputs. If the baseline operated on raw module- or file-level code while BluesFL used pre-sliced blocks, the reported gain conflates algorithmic innovation with input-format differences.
[§4 and §5] §4 (Experimental Setup) and §5 (Results): no ablation is reported that isolates the effect of the Block-Level Instruction-Oriented Slicing algorithm from the blockization step or from standard LLM prompting. Without such controls, it is unclear whether the increase from 7 to 24 correctly localized bugs is attributable to the proposed Blues algorithm rather than other factors.
[Evaluation] Evaluation section: the comparison lacks explicit details on bug selection criteria, whether identical bug instances were used across methods, the LLM backend and prompt configurations applied to the SOTA baseline, and any statistical significance testing for the Top-1 counts.

minor comments (2)

[Abstract] Abstract: the reference to 'existing state-of-the-art (7 bugs)' should include a citation to the specific prior work being compared.
[Figures/Tables] Figure and table captions: several figures illustrating the slicing process or example bug localizations would improve readability of how the instruction-oriented analysis operates.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to improve clarity and completeness of the evaluation.

read point-by-point responses

Referee: [Abstract and Evaluation] the headline claim of 242.9% improvement (24 vs. 7 Top-1 localizations) is load-bearing for the paper's contribution, yet the manuscript does not state whether the SOTA baseline received equivalent dataflow-based blockization and Blues slicing inputs. If the baseline operated on raw module- or file-level code while BluesFL used pre-sliced blocks, the reported gain conflates algorithmic innovation with input-format differences.

Authors: The SOTA baseline was evaluated using the original module- and file-level code inputs as described in the prior work, without our dataflow-based blockization or Blues slicing. This is the appropriate comparison to demonstrate the benefit of the proposed techniques. We will revise the abstract and evaluation section to explicitly state the input formats used for the baseline versus BluesFL. revision: yes
Referee: [§4 and §5] no ablation is reported that isolates the effect of the Block-Level Instruction-Oriented Slicing algorithm from the blockization step or from standard LLM prompting. Without such controls, it is unclear whether the increase from 7 to 24 correctly localized bugs is attributable to the proposed Blues algorithm rather than other factors.

Authors: We agree that isolating the contributions would strengthen the claims. In the revised manuscript we will add an ablation study comparing standard LLM prompting on raw code, blockization without Blues slicing, and the full BluesFL pipeline to attribute performance gains to each component. revision: yes
Referee: [Evaluation] the comparison lacks explicit details on bug selection criteria, whether identical bug instances were used across methods, the LLM backend and prompt configurations applied to the SOTA baseline, and any statistical significance testing for the Top-1 counts.

Authors: We will expand the evaluation section to detail bug selection from real verification failures, confirm identical bug instances across methods, specify the LLM backend and prompt configurations used for the baseline, and report statistical significance testing on the Top-1 results. revision: yes

Circularity Check

0 steps flagged

Empirical framework with no derivation chain or fitted inputs

full rationale

The paper proposes BluesFL as an empirical LLM-based fault localization method using dataflow blockization and the Blues slicing algorithm, then reports experimental Top-1 localization counts (24 vs. 7) on a 19K-line RISC-V core. No equations, parameters fitted to subsets of data, or self-citation chains are described that would reduce any claimed result to its own inputs by construction. The performance numbers are presented as direct experimental outcomes rather than derivations, making the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are stated in the abstract; the contribution is an empirical engineering method rather than a theoretical derivation.

pith-pipeline@v0.9.0 · 5748 in / 1076 out tokens · 31672 ms · 2026-05-19T22:51:51.565150+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

BluesFL correctly localizes 24 bugs at Top-1... on a real-world RISC-V processor core comprising 19K lines of SystemVerilog code

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 1 internal anchor

[1]

CoreMark

2025. CoreMark. https://github.com/lowRISC/ibex/tree/master/examples/sw/ benchmarks/coremark

work page 2025
[2]

2025. cva6. https://github.com/openhwgroup/cva6

work page 2025
[3]

2025. Ibex. https://github.com/lowRISC/ibex

work page 2025
[4]

Rocket Chip Generator

2025. Rocket Chip Generator. https://github.com/chipsalliance/rocket-chip

work page 2025
[5]

sv-parser

2025. sv-parser. https://github.com/dalance/sv-parser

work page 2025
[6]

Verilator

2025. Verilator. https://github.com/verilator/verilator

work page 2025
[7]

Hammad Ahmad, Yu Huang, and Westley Weimer. 2022. CirFix: Automatically repairing defects in hardware design code. InProceedings of the 27th ACM In- ternational Conference on Architectural Support for Programming Languages and Operating Systems. 990–1003

work page 2022
[8]

Desire Athow. 2014. Pentium FDIV: The processor bug that shook the world. https://www.techradar.com/news/computing-components/processors/ pentium-fdiv-the-processor-bug-that-shook-the-world-1270773

work page 2014
[9]

Erick Carvajal Barboza, Sara Jacob, Mahesh Ketkar, Michael Kishinevsky, Paul Gratz, and Jiang Hu. 2021. Automatic microprocessor performance bug detection. In2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 545–556

work page 2021
[10]

Harry Foster. 2024. 2024-siemens-eda-and-wilson-research-group-ic-asic- functional-verification-trend-report. https://verificationacademy.com/topics/ planning-measurement-and-analysis/wrg-industry-data-and-trends/2024- siemens-eda-and-wilson-research-group-ic-asic-functional-verification- trend-report/. Last accessed: Feb 2025

work page 2024
[11]

Xiaolong Guo, Raj Gautam Dutta, Yier Jin, Farimah Farahmandi, and Prabhat Mishra. 2015. Pre-silicon security verification and validation: A formal perspec- tive. InProceedings of the 52nd annual design automation conference. 1–6

work page 2015
[12]

Jaewon Hur, Suhwan Song, Dongup Kwon, Eunjin Baek, Jangwoo Kim, and Byoungyoung Lee. 2021. Difuzzrtl: Differential fuzz testing to find cpu bugs. In 2021 IEEE Symposium on Security and Privacy (SP). IEEE, 1286–1303

work page 2021
[13]

Nursultan Kabylkas, Tommy Thorn, Shreesha Srinath, Polychronis Xekalakis, and Jose Renau. 2021. Effective Processor Verification with Logic Fuzzer Enhanced Co-simulation. InMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture(Virtual Event, Greece)(MICRO ’21). Association for Computing Machinery, New York, NY, USA, 667–678. doi:10.1...

work page doi:10.1145/3466752.3480092 2021
[14]

Sungmin Kang, Gabin An, and Shin Yoo. 2024. A Quantitative and Qualitative Evaluation of LLM-Based Explainable Fault Localization.Proc. ACM Softw. Eng. 1, FSE, Article 64 (July 2024), 23 pages. doi:10.1145/3660771

work page doi:10.1145/3660771 2024
[15]

Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, et al. 2020. Spectre attacks: Exploiting speculative execution.Commun. ACM63, 7 (2020), 93–101

work page 2020
[16]

Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, and Mike Hamburg

work page
[17]

Meltdown.arXiv preprint arXiv:1801.01207(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[18]

Transactions of the Association for Computational Linguistics , volume =

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Transactions of the Association for Computational Linguistics 12 (2024), 157–173. doi:10.1162/tacl_a_00638

work page doi:10.1162/tacl_a_00638 2024
[19]

Jiacheng Ma, Gefei Zuo, Kevin Loughlin, Haoyang Zhang, Andrew Quinn, and Baris Kasikci. 2022. Debugging in the brave new world of reconfigurable hardware. InProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 946–962

work page 2022
[20]

Samit Shahnawaz Miftah, Amisha Srivastava, Hyunmin Kim, Shiyi Wei, and Kanad Basu. 2025. SymbFuzz: Symbolic Execution Guided Hardware Fuzzing. In Proceedings of the 58th IEEE/ACM International Symposium on Microarchitecture®. 1477–1490

work page 2025
[21]

Sangeetha Sudakrishnan, Janaki Madhavan, E James Whitehead Jr, and Jose Renau. 2008. Understanding bug fix patterns in verilog. InProceedings of the 2008 international working conference on Mining software repositories. 39–42

work page 2008
[22]

Stuart Sutherland and Don Mills. 2013. Synthesizing systemverilog busting the myth that systemverilog is only for verification.SNUG silicon valley24 (2013)

work page 2013
[23]

Ilya Wagner, Valeria Bertacco, and Todd Austin. 2005. StressTest: an automatic approach to test generation via activity monitors. InProceedings of the 42nd annual Design Automation Conference. 783–788

work page 2005
[24]

Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa

W. Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. 2016. A Survey on Software Fault Localization.IEEE Transactions on Software Engineering 42, 8 (2016), 707–740. doi:10.1109/TSE.2016.2521368

work page doi:10.1109/tse.2016.2521368 2016
[25]

Jiang Wu, Zhuo Zhang, Deheng Yang, Xiankai Meng, Jiayu He, Xiaoguang Mao, and Yan Lei. 2022. Fault Localization for Hardware Design Code with Time-Aware Program Spectrum. In2022 IEEE 40th International Conference on Computer Design (ICCD). 537–544. doi:10.1109/ICCD56317.2022.00085

work page doi:10.1109/iccd56317.2022.00085 2022
[26]

Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, and Lingming Zhang. 2025. De- mystifying LLM-Based Software Engineering Agents.Proc. ACM Softw. Eng.2, FSE, Article FSE037 (June 2025), 24 pages. doi:10.1145/3715754

work page doi:10.1145/3715754 2025
[27]

Deheng Yang, Jiayu He, Xiaoguang Mao, Tun Li, Yan Lei, Xin Yi, and Jiang Wu

work page
[28]

STRIDER: Signal value transition-guided defect repair for HDL program- ming assignments.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems43, 5 (2023), 1594–1607

work page 2023
[29]

Bingkun Yao, Ning Wang, Jie Zhou, Xi Wang, Hong Gao, Zhe Jiang, and Nan Guan. 2025. Location is Key: Leveraging LLM for Functional Bug Localization in Verilog Design. In2025 62nd ACM/IEEE Design Automation Conference (DAC). 1–7. doi:10.1109/DAC63849.2025.11133280

work page doi:10.1109/dac63849.2025.11133280 2025
[30]

Yanhong Zhou, Tiancheng Wang, Huawei Li, Tao Lv, and Xiaowei Li. 2015. Functional test generation for hard-to-reach states using path constraint solving. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 35, 6 (2015), 999–1011

work page 2015

[1] [1]

CoreMark

2025. CoreMark. https://github.com/lowRISC/ibex/tree/master/examples/sw/ benchmarks/coremark

work page 2025

[2] [2]

2025. cva6. https://github.com/openhwgroup/cva6

work page 2025

[3] [3]

2025. Ibex. https://github.com/lowRISC/ibex

work page 2025

[4] [4]

Rocket Chip Generator

2025. Rocket Chip Generator. https://github.com/chipsalliance/rocket-chip

work page 2025

[5] [5]

sv-parser

2025. sv-parser. https://github.com/dalance/sv-parser

work page 2025

[6] [6]

Verilator

2025. Verilator. https://github.com/verilator/verilator

work page 2025

[7] [7]

Hammad Ahmad, Yu Huang, and Westley Weimer. 2022. CirFix: Automatically repairing defects in hardware design code. InProceedings of the 27th ACM In- ternational Conference on Architectural Support for Programming Languages and Operating Systems. 990–1003

work page 2022

[8] [8]

Desire Athow. 2014. Pentium FDIV: The processor bug that shook the world. https://www.techradar.com/news/computing-components/processors/ pentium-fdiv-the-processor-bug-that-shook-the-world-1270773

work page 2014

[9] [9]

Erick Carvajal Barboza, Sara Jacob, Mahesh Ketkar, Michael Kishinevsky, Paul Gratz, and Jiang Hu. 2021. Automatic microprocessor performance bug detection. In2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 545–556

work page 2021

[10] [10]

Harry Foster. 2024. 2024-siemens-eda-and-wilson-research-group-ic-asic- functional-verification-trend-report. https://verificationacademy.com/topics/ planning-measurement-and-analysis/wrg-industry-data-and-trends/2024- siemens-eda-and-wilson-research-group-ic-asic-functional-verification- trend-report/. Last accessed: Feb 2025

work page 2024

[11] [11]

Xiaolong Guo, Raj Gautam Dutta, Yier Jin, Farimah Farahmandi, and Prabhat Mishra. 2015. Pre-silicon security verification and validation: A formal perspec- tive. InProceedings of the 52nd annual design automation conference. 1–6

work page 2015

[12] [12]

Jaewon Hur, Suhwan Song, Dongup Kwon, Eunjin Baek, Jangwoo Kim, and Byoungyoung Lee. 2021. Difuzzrtl: Differential fuzz testing to find cpu bugs. In 2021 IEEE Symposium on Security and Privacy (SP). IEEE, 1286–1303

work page 2021

[13] [13]

Nursultan Kabylkas, Tommy Thorn, Shreesha Srinath, Polychronis Xekalakis, and Jose Renau. 2021. Effective Processor Verification with Logic Fuzzer Enhanced Co-simulation. InMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture(Virtual Event, Greece)(MICRO ’21). Association for Computing Machinery, New York, NY, USA, 667–678. doi:10.1...

work page doi:10.1145/3466752.3480092 2021

[14] [14]

Sungmin Kang, Gabin An, and Shin Yoo. 2024. A Quantitative and Qualitative Evaluation of LLM-Based Explainable Fault Localization.Proc. ACM Softw. Eng. 1, FSE, Article 64 (July 2024), 23 pages. doi:10.1145/3660771

work page doi:10.1145/3660771 2024

[15] [15]

Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, et al. 2020. Spectre attacks: Exploiting speculative execution.Commun. ACM63, 7 (2020), 93–101

work page 2020

[16] [16]

Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, and Mike Hamburg

work page

[17] [17]

Meltdown.arXiv preprint arXiv:1801.01207(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[18] [18]

Transactions of the Association for Computational Linguistics , volume =

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Transactions of the Association for Computational Linguistics 12 (2024), 157–173. doi:10.1162/tacl_a_00638

work page doi:10.1162/tacl_a_00638 2024

[19] [19]

Jiacheng Ma, Gefei Zuo, Kevin Loughlin, Haoyang Zhang, Andrew Quinn, and Baris Kasikci. 2022. Debugging in the brave new world of reconfigurable hardware. InProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 946–962

work page 2022

[20] [20]

Samit Shahnawaz Miftah, Amisha Srivastava, Hyunmin Kim, Shiyi Wei, and Kanad Basu. 2025. SymbFuzz: Symbolic Execution Guided Hardware Fuzzing. In Proceedings of the 58th IEEE/ACM International Symposium on Microarchitecture®. 1477–1490

work page 2025

[21] [21]

Sangeetha Sudakrishnan, Janaki Madhavan, E James Whitehead Jr, and Jose Renau. 2008. Understanding bug fix patterns in verilog. InProceedings of the 2008 international working conference on Mining software repositories. 39–42

work page 2008

[22] [22]

Stuart Sutherland and Don Mills. 2013. Synthesizing systemverilog busting the myth that systemverilog is only for verification.SNUG silicon valley24 (2013)

work page 2013

[23] [23]

Ilya Wagner, Valeria Bertacco, and Todd Austin. 2005. StressTest: an automatic approach to test generation via activity monitors. InProceedings of the 42nd annual Design Automation Conference. 783–788

work page 2005

[24] [24]

Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa

W. Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. 2016. A Survey on Software Fault Localization.IEEE Transactions on Software Engineering 42, 8 (2016), 707–740. doi:10.1109/TSE.2016.2521368

work page doi:10.1109/tse.2016.2521368 2016

[25] [25]

Jiang Wu, Zhuo Zhang, Deheng Yang, Xiankai Meng, Jiayu He, Xiaoguang Mao, and Yan Lei. 2022. Fault Localization for Hardware Design Code with Time-Aware Program Spectrum. In2022 IEEE 40th International Conference on Computer Design (ICCD). 537–544. doi:10.1109/ICCD56317.2022.00085

work page doi:10.1109/iccd56317.2022.00085 2022

[26] [26]

Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, and Lingming Zhang. 2025. De- mystifying LLM-Based Software Engineering Agents.Proc. ACM Softw. Eng.2, FSE, Article FSE037 (June 2025), 24 pages. doi:10.1145/3715754

work page doi:10.1145/3715754 2025

[27] [27]

Deheng Yang, Jiayu He, Xiaoguang Mao, Tun Li, Yan Lei, Xin Yi, and Jiang Wu

work page

[28] [28]

STRIDER: Signal value transition-guided defect repair for HDL program- ming assignments.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems43, 5 (2023), 1594–1607

work page 2023

[29] [29]

Bingkun Yao, Ning Wang, Jie Zhou, Xi Wang, Hong Gao, Zhe Jiang, and Nan Guan. 2025. Location is Key: Leveraging LLM for Functional Bug Localization in Verilog Design. In2025 62nd ACM/IEEE Design Automation Conference (DAC). 1–7. doi:10.1109/DAC63849.2025.11133280

work page doi:10.1109/dac63849.2025.11133280 2025

[30] [30]

Yanhong Zhou, Tiancheng Wang, Huawei Li, Tao Lv, and Xiaowei Li. 2015. Functional test generation for hard-to-reach states using path constraint solving. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 35, 6 (2015), 999–1011

work page 2015