Neuroforger: certified violation witnesses for smart contracts verification via LLMs
Pith reviewed 2026-06-28 20:02 UTC · model grok-4.3
The pith
A Solidity extension with abstract types turns LLM outputs into certified counterexamples for smart contract properties.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a specification can be expressed as a Solidity test function containing existentially quantified variables of abstract type. The LLM provides an instantiation of these variables with concrete values of the appropriate types. Type checking and subsequent concrete execution of the test then certify whether the instantiation demonstrates a violation of the target property, yielding a reliable proof-of-concept counterexample.
What carries the argument
The Solidity test with existentially quantified abstract-type variables, whose instantiation by the LLM is validated through type checking and execution to produce a certified violation witness.
If this is right
- Certified counterexamples can be directly executed to demonstrate property violations in smart contracts.
- The method combines the generative power of LLMs with the reliability of type systems and runtime checking.
- It applies to a range of properties expressible through test-like specifications in Solidity.
- Violation witnesses become portable PoCs that auditors can use without trusting the LLM directly.
Where Pith is reading between the lines
- This approach might allow verification of more complex properties by leveraging LLM creativity for finding edge cases.
- Similar techniques could apply to other programming languages with test frameworks.
- The workflow could be iterated, using failed executions to refine LLM prompts.
Load-bearing premise
That type checking and concrete execution of the instantiated test will always detect genuine violations without being fooled by incorrect modeling of the contract or incomplete property specification.
What would settle it
A counterexample where an LLM instantiation type-checks and the test passes execution, yet the contract does not actually violate the property due to differences in how the abstract test models the real execution environment.
Figures
read the original abstract
Recent large language models (LLMs) incorporate reasoning capabilities that allow them to perform well in predicting whether a smart contract respects a certain property, suggesting a complementary approach to traditional formal-methods-based techniques for smart contract verification. However, the application of LLMs in such context has two major issues: 1) properties expressed in natural language are intrinsically ambiguous, and 2) answers returned by LLMs have no guarantee of correctness. In this paper, we address both issues simultaneously by: 1) introducing a new formal specification language that extends Solidity with abstract types, and 2) designing a workflow that combines LLMs with type checking and concrete execution to generate and validate violation witnesses (i.e., counterexamples). The key idea is to represent a specification as a Solidity test with (existentially quantified) variables of abstract type; finding an instantiation of these variables to concrete values (of the correct type) concretizes the test into an executable counterexample (PoC) for the target property. We implemented our procedure in the tool Neuroforger, experimentally evaluating it on a smart-contract verification dataset drawn from literature, obtaining promising results that demonstrate its potential applicability in the wild.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Neuroforger, a workflow and tool that extends Solidity with abstract types to express smart-contract properties as tests containing existentially quantified variables. LLMs are used to propose concrete instantiations of these variables; the resulting test is then type-checked and executed on the EVM to produce a certified violation witness (PoC). The approach is claimed to address both the ambiguity of natural-language properties and the lack of correctness guarantees in LLM outputs, with an experimental evaluation on a literature-derived dataset reported to yield promising results.
Significance. If the central claim holds, the work offers a practical hybrid technique that could complement traditional formal-methods tools for smart-contract verification by generating certified counterexamples. The combination of an extended specification language with external type checkers and execution engines for certification is a notable strength, and the provision of reproducible code or falsifiable predictions would further strengthen its contribution.
major comments (2)
- [Abstract / workflow description] Abstract and workflow description: the manuscript states that type checking plus concrete execution certifies an LLM-proposed instantiation as a genuine violation witness, yet supplies no soundness argument addressing potential semantic gaps between the extended Solidity test and the target property (e.g., omitted reentrancy, ordering constraints on abstract types, or external-call effects). This assumption is load-bearing for the certification claim.
- [Abstract] Abstract: the evaluation is described only as 'promising results' on a literature dataset; no quantitative metrics, error rates, baseline comparisons, or analysis of edge cases such as non-determinism or incomplete contract models are provided, preventing assessment of whether the central claim is supported.
minor comments (1)
- [Abstract] The acronym 'PoC' is introduced without an explicit definition in the abstract.
Simulated Author's Rebuttal
Thank you for the constructive feedback. We address the major comments point by point below and will revise the manuscript to strengthen the presentation of the certification claim and evaluation details.
read point-by-point responses
-
Referee: [Abstract / workflow description] Abstract and workflow description: the manuscript states that type checking plus concrete execution certifies an LLM-proposed instantiation as a genuine violation witness, yet supplies no soundness argument addressing potential semantic gaps between the extended Solidity test and the target property (e.g., omitted reentrancy, ordering constraints on abstract types, or external-call effects). This assumption is load-bearing for the certification claim.
Authors: We agree that an explicit soundness argument is necessary to support the certification claim. The workflow certifies that a concrete instantiation violates the test as written (via type checking and EVM execution), and the test is intended as a direct formalization of the target property using abstract types for existential quantification. However, we acknowledge that the manuscript does not currently provide a detailed argument addressing potential gaps such as reentrancy, ordering constraints, or external-call effects. In the revision we will add a dedicated soundness section that (1) formalizes the relationship between the abstract test and the property, (2) states the assumptions under which the certification holds, and (3) discusses how the test language can express the cited concerns when they are part of the property. This will make the load-bearing assumption explicit and defensible. revision: yes
-
Referee: [Abstract] Abstract: the evaluation is described only as 'promising results' on a literature dataset; no quantitative metrics, error rates, baseline comparisons, or analysis of edge cases such as non-determinism or incomplete contract models are provided, preventing assessment of whether the central claim is supported.
Authors: The abstract is written at a high level, using the phrase 'promising results' to summarize the experimental outcome. The full manuscript contains a dedicated evaluation section on the literature-derived dataset. To address the concern and enable direct assessment of the central claim, we will revise the abstract to report key quantitative metrics (e.g., witness-generation success rate, false-positive rate after certification) and will expand the evaluation section with explicit analysis of edge cases including LLM non-determinism, incomplete contract models, and any baseline comparisons performed. These additions will be included in the revised manuscript. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper describes a workflow that encodes specifications as Solidity tests with abstract-typed existentially quantified variables, then uses LLMs to propose instantiations, followed by external type checking and concrete execution to produce certified counterexamples. No equations, fitted parameters, or mathematical derivations appear. The certification step relies on independent external tools (Solidity type checker and EVM execution) rather than any internal definition that loops back to the LLM output or the input property. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The method is therefore self-contained against external benchmarks and does not reduce any claim to its own inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Neuroforger: certified violation witnesses for smart contracts ver- ification via LLMs
Public data for “Neuroforger: certified violation witnesses for smart contracts ver- ification via LLMs”.https://github.com/elipparini/Neuroforger(2026)
2026
-
[2]
Albert, E., Correas, J., Gordillo, P., Rom´ an-D´ ıez, G., Rubio, A.: GASOL: Gas analysis and optimization for Ethereum smart contracts. In: TACAS (2020). https://doi.org/10.1007/978-3-030-45237-7_7
-
[3]
Alt, L., Blicha, M., Hyv¨ arinen, A.E.J., Sharygina, N.: SolCMC: Solidity compiler’s model checker. In: CAV (2022).https://doi.org/10.1007/978-3-031-13185-1_ 16 18 Massimo Bartoletti and Enrico Lipparini
-
[4]
PoCo: Agentic Proof-of-Concept Exploit Generation for Smart Contracts
Andersson, V., Bobadilla, S., Hobbelhagen, H., Monperrus, M.: Poco: Agentic proof-of-concept exploit generation for smart contracts. CoRR (2026).https: //doi.org/10.48550/ARXIV.2511.02780,https://arxiv.org/abs/2511.02780
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2511.02780 2026
-
[5]
In: POST (2017).https://doi.org/10.1007/978-3-662-54455-6_8
Atzei, N., Bartoletti, M., Cimoli, T.: A survey of attacks on Ethereum smart con- tracts (sok). In: POST (2017).https://doi.org/10.1007/978-3-662-54455-6_8
-
[6]
In: IEEE S&P (2023).https://doi.org/10.1109/SP46215.2023.10179435
Babel, K., Daian, P., Kelkar, M., Juels, A.: Clockwork finance: Automated analysis of economic security in smart contracts. In: IEEE S&P (2023).https://doi.org/ 10.1109/SP46215.2023.10179346
-
[7]
Bartoletti, M., Benetollo, L., Bugliesi, M., Crafa, S., Sasso, G.D., Pettinau, R., Pinna, A., Piras, M., Rossi, S., Salis, S., Span` o, A., Tkachenko, V., Tonelli, R., Zunino, R.: Smart contract languages: A comparative analysis. Future Gener. Com- put. Syst.164(2025).https://doi.org/10.1016/J.FUTURE.2024.107563
-
[8]
In: FMBC (2025).https://doi.org/10
Bartoletti, M., Crafa, S., Lipparini, E.: Formal Verification in Solidity and Move: Insights from a Comparative Analysis. In: FMBC (2025).https://doi.org/10. 4230/OASIcs.FMBC.2025.3
2025
-
[9]
In: Integrated Formal Methods (2024).https://doi
Bartoletti, M., Ferrando, A., Lipparini, E., Malvone, V.: Solvent: Liquidity veri- fication of smart contracts. In: Integrated Formal Methods (2024).https://doi. org/10.1007/978-3-031-76554-4_14
-
[10]
KindHML: formal verification of smart contracts based on Hennessy-Milner logic
Bartoletti, M., Ferrando, A., Lipparini, E., Malvone, V.: KindHML: formal verifi- cation of smart contracts based on Hennessy-Milner logic. CoRR (2026).https: //doi.org/10.48550/ARXIV.2604.14038,https://arxiv.org/abs/2604.14038
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.14038 2026
-
[11]
In: FMBC (2024).https://doi.org/ 10.4230/OASICS.FMBC.2024.6
Bartoletti, M., Fioravanti, F., Matricardi, G., Pettinau, R., Sainas, F.: Towards benchmarking of Solidity verification tools. In: FMBC (2024).https://doi.org/ 10.4230/OASICS.FMBC.2024.6
-
[12]
In: Financial Cryptography (2026)
Bartoletti, M., Lipparini, E., Pompianu, L.: LLMs as verification oracles for Solid- ity. In: Financial Cryptography (2026)
2026
-
[13]
Bernardi, T., Dor, N., Fedotov, A., Grossman, S., Immerman, N., Jackson, D., Nutz, A., Oppenheim, L., Pistiner, O., Rinetzky, N., Sagiv, M., Taube, M., Toman, J.A., Wilcox, J.R.: WIP: Finding bugs automatically in smart contracts with parameterized invariants.https://groups.csail.mit.edu/sdg/pubs/2020/ sbc2020.pdf(2020)
2020
-
[14]
In: Conference on Artificial Intelligence and Theorem Proving (2019)
Blanchette, J.C., Ouraoui, D.E., Fontaine, P., Kaliszyk, C.: Machine Learning for Instance Selection in SMT Solving. In: Conference on Artificial Intelligence and Theorem Proving (2019)
2019
-
[15]
Certora verification reports.https://docs.certora.com/en/latest/docs/ prover/dashboard/report.html(2025)
2025
-
[16]
Certora: Hooks.https://docs.certora.com/en/latest/docs/cvl/hooks.html (2025)
2025
-
[17]
In: ICSE (2024).https://doi.org/10
Chaliasos, S., Charalambous, M.A., Zhou, L., Galanopoulou, R., Gervais, A., Mitropoulos, D., Livshits, B.: Smart contract and DeFi security: Insights from tool evaluations and practitioner surveys. In: ICSE (2024).https://doi.org/10. 1145/3597503.3623302
-
[18]
Chen, T.Y., Kuo, F., Liu, H., Poon, P., Towey, D., Tse, T.H., Zhou, Z.Q.: Meta- morphic testing: A review of challenges and opportunities. ACM Comput. Surv. 51(1) (2018).https://doi.org/10.1145/3143561
- [19]
-
[20]
In: Financial Cryptography (2020).https://doi.org/10
Eskandari, S., Moosavi, S., Clark, J.: SoK: Transparent Dishonesty: Front-Running Attacks on Blockchain. In: Financial Cryptography (2020).https://doi.org/10. 1007/978-3-030-43725-1_13 Certified violation witnesses for smart contracts verification via LLMs 19
2020
-
[21]
Foundry: A blazing fast, portable and modular toolkit for Ethereum application development written in Rust.https://github.com/foundry-rs/foundry(2025)
2025
-
[22]
Foundry: Forge.https://www.getfoundry.sh/forge(2026)
2026
-
[23]
Foundry: snapshotState cheatcodes.https://getfoundry.sh/reference/ cheatcodes/state-snapshots(2026)
2026
-
[24]
Foundry: startStateDiffRecording cheatcodes.https://www.getfoundry.sh/ reference/cheatcodes/start-state-diff-recording(2026)
2026
-
[25]
In: Financial Cryptography (2026)
Gervais, A., Zhou, L.: AI agent smart contract exploit generation. In: Financial Cryptography (2026)
2026
-
[26]
In: FMCAD (2018).https: //doi.org/10.23919/FMCAD.2018.8603013
Hojjat, H., R¨ ummer, P.: The ELDARICA Horn solver. In: FMCAD (2018).https: //doi.org/10.23919/FMCAD.2018.8603013
-
[27]
certora.com/en/latest/docs/whitepaper/index.html(2022)
Jackson, D., Nandi, C., Sagiv, M.: Certora technology white paper.https://docs. certora.com/en/latest/docs/whitepaper/index.html(2022)
2022
-
[28]
International Journal of Approximate Reasoning189(2026)
Jakub˚ uv, J., Janota, M., Piepenbrock, J., Urban, J.: Machine learning for quantifier selection in cvc5. International Journal of Approximate Reasoning189(2026). https://doi.org/https://doi.org/10.1016/j.ijar.2025.109602
-
[29]
In: FMCAD (2024).https://doi.org/10.34727/2024/ISBN.978-3-85448-065-5_16
Kamath, A., Senthilnathan, A., Chakraborty, S., Deligiannis, P., Lahiri, S., Lal, A., Rastogi, A., Roy, S., Sharma, R.: Leveraging LLMs for program verification. In: FMCAD (2024).https://doi.org/10.34727/2024/ISBN.978-3-85448-065-5_16
-
[30]
In: NDSS (2025)
Liu, Y., Xue, Y., Wu, D., Sun, Y., Li, Y., Shi, M., Liu, Y.: PropertyGPT: LLM- driven formal verification of smart contracts through retrieval-augmented property generation. In: NDSS (2025)
2025
-
[31]
de Moura, L.M., Bjørner, N.: Z3: An efficient SMT solver. In: TACAS (2008). https://doi.org/10.1007/978-3-540-78800-3_24
-
[32]
OpenAI: Gpt-5 docs (2025),https://platform.openai.com/docs/models/gpt-5 [Accessed: 2025-09-12]
2025
-
[33]
Pirzada, M.A.A., Reger, G., Bhayat, A., Cordeiro, L.C.: LLM-generated invariants for bounded model checking without loop unrolling. In: IEEE/ACM ASE (2024). https://doi.org/10.1145/3691620.3695512
-
[34]
In: Financial Cryptography (2021).https://doi
Qin, K., Zhou, L., Livshits, B., Gervais, A.: Attacking the DeFi ecosystem with Flash Loans for fun and profit. In: Financial Cryptography (2021).https://doi. org/10.1007/978-3-662-64322-8_1
-
[35]
In: FMBC (2024).https://doi.org/10.4230/OASICS.FMBC.2024.8
Schiffl, J., Beckert, B.: A practical notion of liveness in smart contract applications. In: FMBC (2024).https://doi.org/10.4230/OASICS.FMBC.2024.8
-
[36]
In: TACAS (2021).https:// doi.org/10.1007/978-3-030-72013-1_16
Scott, J., Niemetz, A., Preiner, M., Nejati, S., Ganesh, V.: MachSMT: A machine learning-based algorithm selector for SMT solvers. In: TACAS (2021).https:// doi.org/10.1007/978-3-030-72013-1_16
-
[37]
In: IEEE S&P (2024).https://doi.org/ 10.1109/SP54263.2024.00230
Sendner, C., Petzi, L., Stang, J., Dmitrienko, A.: Large-scale study of vulnerability scanners for Ethereum smart contracts. In: IEEE S&P (2024).https://doi.org/ 10.1109/SP54263.2024.00230
-
[38]
The Solidity Authors: Contracts.https://docs.soliditylang.org/en/latest/ contracts.html#contracts(2025)
2025
-
[39]
The Solidity Authors: Receive Ether Function.https://docs.soliditylang.org/ en/latest/contracts.html#receive-ether-function(2025)
2025
-
[40]
In: AST (2025).https://doi.org/10.1109/AST66626.2025
Tihanyi, N., Charalambous, Y., Jain, R., Ferrag, M.A., Cordeiro, L.C.: A new era in software security: Towards self-healing software via large language models and formal verification. In: AST (2025).https://doi.org/10.1109/AST66626.2025. 00020
-
[41]
Wu, G., Cao, W., Yao, Y., Wei, H., Chen, T., Ma, X.: LLM meets bounded model checking: Neuro-symbolic loop invariant inference. In: ASE (2024).https://doi. org/10.1145/3691620.3695014 20 Massimo Bartoletti and Enrico Lipparini
-
[42]
In: ICLR (2024),https://openreview.net/ forum?id=Q3YaCghZNt
Wu, H., Barrett, C., Narodytska, N.: Lemur: Integrating large language models in automated program verification. In: ICLR (2024),https://openreview.net/ forum?id=Q3YaCghZNt
2024
-
[43]
Xiao, Z., Li, Y., Wang, Q., Chen, S.: Prompt to pwn: Automated exploit generation for smart contracts (2025),https://arxiv.org/abs/2508.01371
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[44]
Zhou, B., Wang, H., Yao, Y., Chen, T., Xu, F., Ma, X.: Simulate, refine and inte- grate: Strategy synthesis for efficient SMT solving. In: IJCAI (2025).https://doi. org/10.24963/ijcai.2025/887,https://doi.org/10.24963/ijcai.2025/887
-
[45]
In: IEEE S&P (2023).https://doi.org/10.1109/SP46215.2023.10179435
Zhou, L., Xiong, X., Ernstberger, J., Chaliasos, S., Wang, Z., Wang, Y., Qin, K., Wattenhofer, R., Song, D., Gervais, A.: Sok: Decentralized finance (DeFi) attacks. In: IEEE S&P (2023).https://doi.org/10.1109/SP46215.2023.10179435
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.