A-IC3: Learning-Guided Adaptive Inductive Generalization for Hardware Model Checking

Guangyu Hu; Hongce Zhang; Wei Zhang; Xiaofeng Zhou

arxiv: 2604.21688 · v1 · submitted 2026-04-23 · 💻 cs.LO · cs.LG

A-IC3: Learning-Guided Adaptive Inductive Generalization for Hardware Model Checking

Xiaofeng Zhou , Guangyu Hu , Hongce Zhang , Wei Zhang This is my paper

Pith reviewed 2026-05-08 13:45 UTC · model grok-4.3

classification 💻 cs.LO cs.LG

keywords IC3 algorithminductive generalizationhardware model checkingmulti-armed banditadaptive strategy selectionformal verificationcounterexample to inductiveness

0 comments

The pith

A multi-armed bandit selects inductive generalization strategies on the fly to strengthen the IC3 hardware model checker.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The IC3 algorithm proves hardware safety by building inductive clauses from counterexamples to inductiveness. Its success hinges on how broadly and usefully each counterexample is generalized into a clause. Existing IC3 implementations apply the same generalization rule regardless of the current verification state. The paper replaces that fixed choice with a lightweight multi-armed bandit that picks among several generalization strategies according to immediate feedback on clause quality. On 914 standard benchmarks the adaptive version solves more instances and records better overall scores than the fixed baselines.

Core claim

A multi-armed bandit algorithm is used to adaptively choose among inductive generalization strategies inside IC3 by updating its preferences from real-time measurements of how well each chosen generalization performs in the current verification context.

What carries the argument

The multi-armed bandit that learns strategy preferences from the quality of each generalization outcome and selects the next strategy accordingly.

If this is right

The same IC3 implementation solves between 26 and 50 additional benchmark instances.
PAR-2 scores improve by 194 to 389 points because fewer instances remain unsolved.
The learning component can be added to an existing checker such as rIC3 with only modest extra code.
Clause quality improves over time as the bandit accumulates context-specific preferences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same bandit-driven selection could be applied to other choice points inside model checkers whenever multiple heuristics compete.
If stable feedback signals can be defined, the approach might transfer to SAT solvers or software verifiers that also rely on generalization steps.
Over many runs the learned preferences could be distilled into a static policy that avoids the bandit entirely while retaining most of the benefit.

Load-bearing premise

Real-time signals about whether a chosen generalization produced a useful clause are stable and informative enough for the bandit to learn preferences without the learning overhead canceling the gains.

What would settle it

On the same 914-instance suite, replace the bandit with random strategy selection and observe whether the number of solved cases and PAR-2 score drop back to the level of the best fixed-strategy baseline.

Figures

Figures reproduced from arXiv: 2604.21688 by Guangyu Hu, Hongce Zhang, Wei Zhang, Xiaofeng Zhou.

**Figure 1.** Figure 1: Overview of A-IC3. fails to discover sufficiently broad blocking clauses, severely hindering proof convergence. A-IC3 overcomes this limitation by treating generalization strategy selection as a contextual multi-armed bandit problem, as illustrated in view at source ↗

**Figure 2.** Figure 2: Context vector extraction from IC3 proof state. The figure illustrates view at source ↗

**Figure 3.** Figure 3: Performance comparison of rIC3 with and without MAB: view at source ↗

**Figure 4.** Figure 4: Cactus plots comparing solver performance. view at source ↗

**Figure 5.** Figure 5: Comparison of the termination level count. (a) view at source ↗

read the original abstract

The IC3 algorithm represents the state-of-the-art (SOTA) hardware model checking technique, owing to its robust performance and scalability. A significant body of research has focused on enhancing the solving efficiency of the IC3 algorithm, with particular attention to the inductive generalization process: a critical phase wherein the algorithm seeks to generalize a counterexample to inductiveness (CTI), which typically is a state leading to a bad state, into a broader set of states. This inductive generalization is a primary source of clauses in IC3 and thus plays a pivotal role in determining the overall effectiveness of the algorithm. Despite its importance, existing approaches often rely on fixed inductive generalization strategies, overlooking the dynamic and context-sensitive nature of the verification environment in which spurious counterexamples arise. This rigidity can limit the quality of generated clauses and, consequently, the performance of IC3. To address this limitation, we propose a lightweight machine-learning-based framework that dynamically selects appropriate inductive generalization strategies in response to the evolving verification context. Specifically, we employ a multi-armed bandit (MAB) algorithm to adaptively choose inductive generalization strategies based on real-time feedback from the verification process. The agent is updated by evaluating the quality of generalization outcomes, thereby refining its strategy selection over time. Empirical evaluation on a benchmark suite comprising 914 instances, primarily drawn from the latest HWMCC collection, demonstrates the efficacy of our approach. When implemented on the state-of-the-art model checker rIC3, our method solves 26 to 50 more cases than the baselines and improves the PAR-2 score by 194.72 to 389.29.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A-IC3 adds an online multi-armed bandit to pick inductive generalization strategies inside IC3 and claims 26-50 extra solved cases plus PAR-2 gains on 914 HWMCC instances.

read the letter

The paper's main contribution is a lightweight online adaptation layer: a multi-armed bandit that selects among inductive generalization strategies for CTIs based on immediate feedback from how well each generalization performs during the run. This is presented as a response to the rigidity of fixed strategies in IC3, and the authors integrate it into the state-of-the-art rIC3 checker without offline training or heavy machinery. The approach stays inside the existing IC3 loop, which keeps the circularity burden low since the bandit uses live verification signals rather than final labels. On the positive side, the benchmark is large and drawn from recent HWMCC collections, and the reported improvements—extra solved instances and better PAR-2—are concrete enough to matter for people who already run IC3 variants in hardware verification. The idea of context-sensitive strategy selection is a natural incremental step given how much clause quality drives overall performance. The soft spots sit mostly in the experimental reporting. The abstract gives no definition of the reward signal, the number or identity of the bandit arms, the update schedule, or how generalization quality is quantified. It also omits details on baseline implementations, run variance, or statistical tests. The stress-test concern about noisy or costly feedback therefore lands: if the per-step signals are unstable or if bandit overhead accumulates in the timing numbers, the net gains could shrink. Until those sections are checked, the soundness claim rests on the headline numbers alone. This work is for researchers and tool builders already focused on SAT-based hardware model checking who want a practical tweak to IC3 rather than a new algorithm family. A reader interested in adaptive heuristics inside mature solvers will get value from the concrete integration. It deserves a serious referee because the motivation is clear, the domain has industrial weight, and the core loop is simple enough to evaluate. I would send it to review but ask the authors to add precise reward definitions, arm descriptions, and overhead breakdowns before acceptance.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes A-IC3, a lightweight framework that augments the IC3 hardware model checking algorithm with a multi-armed bandit to dynamically select among inductive generalization strategies based on real-time feedback from CTI generalization outcomes. When integrated with the rIC3 model checker and evaluated on 914 instances drawn primarily from the HWMCC collection, the approach is claimed to solve 26–50 additional cases and improve PAR-2 scores by 194.72–389.29 relative to fixed-strategy baselines.

Significance. If the reported gains prove robust and reproducible, the work demonstrates a practical, online-learning method for making a core, performance-critical component of IC3 context-adaptive without offline training or heavy computational overhead. This could meaningfully advance the state of the art in hardware model checking by addressing the rigidity of fixed inductive generalization heuristics.

major comments (3)

[§3] §3 (Approach): the reward signal used to update the bandit—derived from per-step generalization outcomes—is never formally defined, nor is the number of arms, the update schedule, or the precise mapping from clause quality to reward; without these, it is impossible to assess whether the feedback is stable enough to converge on useful preferences or whether bandit overhead offsets the claimed net gains.
[§5] §5 (Experiments): the headline results (26–50 extra solved instances, PAR-2 improvements of 194.72–389.29) are presented without any description of the exact baseline rIC3 configurations, ablation isolating the adaptive component, measured CPU overhead of the bandit updates, or statistical significance tests and run-to-run variance; these omissions leave the central performance claim only weakly supported.
[§5] §5 (Experiments): no information is given on how the 914 instances were selected or partitioned, nor on whether the reported improvements hold under different random seeds or instance orderings; this is load-bearing because the skeptic concern that noisy or costly feedback could erase the gains cannot be evaluated from the given data.

minor comments (2)

[Abstract] Abstract: the ranges '26 to 50' and '194.72 to 389.29' should be explicitly tied to specific baseline variants so readers can interpret the magnitude of improvement.
[Introduction] Notation: the acronym CTI is used without expansion on first occurrence; a brief parenthetical definition would improve readability for readers outside the immediate IC3 community.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and will revise the manuscript to provide the requested details and strengthen the presentation of our results.

read point-by-point responses

Referee: [§3] §3 (Approach): the reward signal used to update the bandit—derived from per-step generalization outcomes—is never formally defined, nor is the number of arms, the update schedule, or the precise mapping from clause quality to reward; without these, it is impossible to assess whether the feedback is stable enough to converge on useful preferences or whether bandit overhead offsets the claimed net gains.

Authors: We acknowledge that the current description in §3 is high-level and lacks formal details. In the revised manuscript, we will formally define the reward signal used to update the bandit, specify the number of arms corresponding to the inductive generalization strategies, detail the update schedule, and provide the precise mapping from clause quality to reward. These additions will allow assessment of feedback stability and ensure that the bandit overhead is accounted for in the net gains. revision: yes
Referee: [§5] §5 (Experiments): the headline results (26–50 extra solved instances, PAR-2 improvements of 194.72–389.29) are presented without any description of the exact baseline rIC3 configurations, ablation isolating the adaptive component, measured CPU overhead of the bandit updates, or statistical significance tests and run-to-run variance; these omissions leave the central performance claim only weakly supported.

Authors: We agree that these details are required to fully support the performance claims. In the revised §5, we will describe the exact baseline rIC3 configurations used, include an ablation study isolating the adaptive MAB component, report the measured CPU overhead of the bandit updates, and provide statistical significance tests along with run-to-run variance. revision: yes
Referee: [§5] §5 (Experiments): no information is given on how the 914 instances were selected or partitioned, nor on whether the reported improvements hold under different random seeds or instance orderings; this is load-bearing because the skeptic concern that noisy or costly feedback could erase the gains cannot be evaluated from the given data.

Authors: We will clarify in the revised manuscript how the 914 instances were selected from the HWMCC collection and any partitioning applied. Additionally, we will present results demonstrating that the improvements hold under different random seeds for the bandit and discuss sensitivity to instance orderings. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on external benchmarks

full rationale

The paper introduces an algorithmic framework that uses a multi-armed bandit to select among inductive generalization strategies in IC3, with the bandit updated from real-time quality signals generated during the ongoing verification run. The central claims are empirical performance gains (26-50 additional solved instances and PAR-2 improvements) measured on an external benchmark suite of 914 HWMCC instances. No derivation chain, equations, or first-principles argument is presented that reduces by construction to fitted parameters, self-citations, or renamed inputs. The feedback loop draws from intermediate verification outcomes rather than final solved/unsolved labels, so it does not create a self-definitional or fitted-input-called-prediction circularity. Any self-citations to prior IC3 work are not load-bearing for the reported gains.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that generalization quality can be measured on the fly to supply bandit rewards; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Real-time feedback from generalization outcomes is sufficiently informative and stable to drive bandit learning.
This premise is required for the MAB update rule to produce useful strategy preferences during a single verification run.

pith-pipeline@v0.9.0 · 5606 in / 1392 out tokens · 63596 ms · 2026-05-08T13:45:01.788363+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 9 canonical work pages

[1]

In: Proceedings 1999 Design Automation Con- ference (Cat

Biere, A., Cimatti, A., Clarke, E., Fujita, M., Zhu, Y.: Symbolic model checking us- ing sat procedures instead of bdds. In: Proceedings 1999 Design Automation Con- ference (Cat. No. 99CH36361). pp. 317–320 (1999).https://doi.org/10.1109/ DAC.1999.781333

work page arXiv 1999
[2]

Biere, A., Froleyks, N., Preiner, M.: HWMCC’20 Benchmarks (2020),https: //fmv.jku.at/hwmcc20/hwmcc20benchmarks.tar.xz, benchmark archive for the Hardware Model Checking Competition 2020 (BTOR2 and bit-blasted AIGER formats)

2020
[3]

In: 2024 Formal Methods in Computer-Aided Design (FMCAD)

Biere, A., Froleyks, N., Preiner, M.: Hardware model checking competition 2024. In: 2024 Formal Methods in Computer-Aided Design (FMCAD). pp. 1–1 (2024). https://doi.org/10.34727/2024/isbn.978-3-85448-065-5_6

work page doi:10.34727/2024/isbn.978-3-85448-065-5_6 2024
[4]

Bradley,A.R.:Sat-basedmodelcheckingwithoutunrolling.In:InternationalWork- shop on Verification, Model Checking, and Abstract Interpretation. pp. 70–87. Springer (2011)

2011
[5]

In: 2011 Formal Methods in Computer-Aided Design (FM- CAD)

Een, N., Mishchenko, A., Brayton, R.: Efficient implementation of property di- rected reachability. In: 2011 Formal Methods in Computer-Aided Design (FM- CAD). pp. 125–134 (2011)

2011
[6]

Hassan,Z.,Bradley,A.R.,Somenzi,F.:Bettergeneralizationinic3.In:2013Formal Methods in Computer-Aided Design. pp. 157–164 (2013).https://doi.org/10. 1109/FMCAD.2013.6679405

work page arXiv 2013
[7]

In: 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)

Hu, G., Tang, J., Yu, C., Zhang, W., Zhang, H.: Deepic3: Guiding ic3 algorithms by graph neural network clause prediction. In: 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC). pp. 262–268 (2024).https://doi. org/10.1109/ASP-DAC58780.2024.10473807

work page doi:10.1109/asp-dac58780.2024.10473807 2024
[8]

Hu, G., Zhang, W., Zhang, H.: Neuropdr: Integrating neural networks in the pdr algorithmforhardwaremodelchecking.In:2023ACM/IEEE5thWorkshoponMa- chine Learning for CAD (MLCAD). pp. 1–6 (2023).https://doi.org/10.1109/ MLCAD58807.2023.10299875

work page arXiv 2023
[9]

In: FMCAD

Le, N., Si, X., Gurfinkel, A.: Data-driven optimization of inductive generalization. In: FMCAD. pp. 86–95 (2021)

2021
[10]

In: Proceedings of the 19th international conference on World wide web

Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to per- sonalized news article recommendation. In: Proceedings of the 19th international conference on World wide web. pp. 661–670. ACM (2010)

2010
[11]

In: Computer Aided Verification: 15th International Conference, CAV 2003, Boulder, CO, USA, July 8-12, 2003

McMillan, K.L.: Interpolation and sat-based model checking. In: Computer Aided Verification: 15th International Conference, CAV 2003, Boulder, CO, USA, July 8-12, 2003. Proceedings 15. pp. 1–13. Springer (2003)

2003
[12]

Preiner, M., Froleyks, N., Biere, A.: Hwmcc’24 benchmarks and results.https: //doi.org/10.5281/zenodo.14156844,https://zenodo.org/records/14156844

work page doi:10.5281/zenodo.14156844
[13]

Preiner, M., Froleyks, N., Biere, A.: HWMCC’25 Benchmarks and Re- sults (2025).https://doi.org/10.5281/zenodo.17428464,https://zenodo.org/ records/17428464, dataset, version v1, published 2025-10-23

work page doi:10.5281/zenodo.17428464 2025
[14]

Su, Y., Yang, Q., Ci, Y., Bu, T., Huang, Z.: The ric3 hardware model checker (2025),https://arxiv.org/abs/2502.13605

work page arXiv 2025
[15]

Su, Y., Yang, Q., Ci, Y., Huang, Z.: Extended ctg generalization and dynamic adjustment of generalization strategies in ic3 (2025),https://arxiv.org/abs/ 2501.02480

work page arXiv 2025
[16]

Zhou and G

Vediramana Krishnan, H.G., Chen, Y., Shoham, S., Gurfinkel, A.: Global guidance forlocalgeneralizationinmodelchecking.FormalMethodsinSystemDesign63(1), 81–109 (2024) 20 X. Zhou and G. Hu et al

2024
[17]

In: 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

VK, H.G., Fedyukovich, G., Gurfinkel, A.: Word level property directed reacha- bility. In: 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD). pp. 1–9. IEEE (2020)

2020
[18]

In: MBMV 2021; 24th Workshop

Winterer, F., Seufert, T., Scheibler, K., Teige, T., Scholl, C., Becker, B.: Icp and ic3 with stronger generalization. In: MBMV 2021; 24th Workshop. pp. 1–12. VDE (2021)

2021

[1] [1]

In: Proceedings 1999 Design Automation Con- ference (Cat

Biere, A., Cimatti, A., Clarke, E., Fujita, M., Zhu, Y.: Symbolic model checking us- ing sat procedures instead of bdds. In: Proceedings 1999 Design Automation Con- ference (Cat. No. 99CH36361). pp. 317–320 (1999).https://doi.org/10.1109/ DAC.1999.781333

work page arXiv 1999

[2] [2]

Biere, A., Froleyks, N., Preiner, M.: HWMCC’20 Benchmarks (2020),https: //fmv.jku.at/hwmcc20/hwmcc20benchmarks.tar.xz, benchmark archive for the Hardware Model Checking Competition 2020 (BTOR2 and bit-blasted AIGER formats)

2020

[3] [3]

In: 2024 Formal Methods in Computer-Aided Design (FMCAD)

Biere, A., Froleyks, N., Preiner, M.: Hardware model checking competition 2024. In: 2024 Formal Methods in Computer-Aided Design (FMCAD). pp. 1–1 (2024). https://doi.org/10.34727/2024/isbn.978-3-85448-065-5_6

work page doi:10.34727/2024/isbn.978-3-85448-065-5_6 2024

[4] [4]

Bradley,A.R.:Sat-basedmodelcheckingwithoutunrolling.In:InternationalWork- shop on Verification, Model Checking, and Abstract Interpretation. pp. 70–87. Springer (2011)

2011

[5] [5]

In: 2011 Formal Methods in Computer-Aided Design (FM- CAD)

Een, N., Mishchenko, A., Brayton, R.: Efficient implementation of property di- rected reachability. In: 2011 Formal Methods in Computer-Aided Design (FM- CAD). pp. 125–134 (2011)

2011

[6] [6]

Hassan,Z.,Bradley,A.R.,Somenzi,F.:Bettergeneralizationinic3.In:2013Formal Methods in Computer-Aided Design. pp. 157–164 (2013).https://doi.org/10. 1109/FMCAD.2013.6679405

work page arXiv 2013

[7] [7]

In: 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)

Hu, G., Tang, J., Yu, C., Zhang, W., Zhang, H.: Deepic3: Guiding ic3 algorithms by graph neural network clause prediction. In: 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC). pp. 262–268 (2024).https://doi. org/10.1109/ASP-DAC58780.2024.10473807

work page doi:10.1109/asp-dac58780.2024.10473807 2024

[8] [8]

Hu, G., Zhang, W., Zhang, H.: Neuropdr: Integrating neural networks in the pdr algorithmforhardwaremodelchecking.In:2023ACM/IEEE5thWorkshoponMa- chine Learning for CAD (MLCAD). pp. 1–6 (2023).https://doi.org/10.1109/ MLCAD58807.2023.10299875

work page arXiv 2023

[9] [9]

In: FMCAD

Le, N., Si, X., Gurfinkel, A.: Data-driven optimization of inductive generalization. In: FMCAD. pp. 86–95 (2021)

2021

[10] [10]

In: Proceedings of the 19th international conference on World wide web

Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to per- sonalized news article recommendation. In: Proceedings of the 19th international conference on World wide web. pp. 661–670. ACM (2010)

2010

[11] [11]

In: Computer Aided Verification: 15th International Conference, CAV 2003, Boulder, CO, USA, July 8-12, 2003

McMillan, K.L.: Interpolation and sat-based model checking. In: Computer Aided Verification: 15th International Conference, CAV 2003, Boulder, CO, USA, July 8-12, 2003. Proceedings 15. pp. 1–13. Springer (2003)

2003

[12] [12]

Preiner, M., Froleyks, N., Biere, A.: Hwmcc’24 benchmarks and results.https: //doi.org/10.5281/zenodo.14156844,https://zenodo.org/records/14156844

work page doi:10.5281/zenodo.14156844

[13] [13]

Preiner, M., Froleyks, N., Biere, A.: HWMCC’25 Benchmarks and Re- sults (2025).https://doi.org/10.5281/zenodo.17428464,https://zenodo.org/ records/17428464, dataset, version v1, published 2025-10-23

work page doi:10.5281/zenodo.17428464 2025

[14] [14]

Su, Y., Yang, Q., Ci, Y., Bu, T., Huang, Z.: The ric3 hardware model checker (2025),https://arxiv.org/abs/2502.13605

work page arXiv 2025

[15] [15]

Su, Y., Yang, Q., Ci, Y., Huang, Z.: Extended ctg generalization and dynamic adjustment of generalization strategies in ic3 (2025),https://arxiv.org/abs/ 2501.02480

work page arXiv 2025

[16] [16]

Zhou and G

Vediramana Krishnan, H.G., Chen, Y., Shoham, S., Gurfinkel, A.: Global guidance forlocalgeneralizationinmodelchecking.FormalMethodsinSystemDesign63(1), 81–109 (2024) 20 X. Zhou and G. Hu et al

2024

[17] [17]

In: 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

VK, H.G., Fedyukovich, G., Gurfinkel, A.: Word level property directed reacha- bility. In: 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD). pp. 1–9. IEEE (2020)

2020

[18] [18]

In: MBMV 2021; 24th Workshop

Winterer, F., Seufert, T., Scheibler, K., Teige, T., Scholl, C., Becker, B.: Icp and ic3 with stronger generalization. In: MBMV 2021; 24th Workshop. pp. 1–12. VDE (2021)

2021