pith. sign in

arxiv: 2605.31389 · v1 · pith:TQ6PHJHXnew · submitted 2026-05-29 · 💻 cs.PL · cs.CR

Neuroforger: certified violation witnesses for smart contracts verification via LLMs

Pith reviewed 2026-06-28 20:02 UTC · model grok-4.3

classification 💻 cs.PL cs.CR
keywords smart contract verificationlarge language modelscounterexamplesviolation witnessesSolidityabstract typescertified witnesses
0
0 comments X

The pith

A Solidity extension with abstract types turns LLM outputs into certified counterexamples for smart contract properties.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to make LLM-based verification of smart contracts reliable by solving two problems at once: natural language properties are ambiguous, and LLM answers lack correctness guarantees. It does this by defining a new way to write specifications as Solidity tests that include variables of abstract types, which represent possible values that could cause a violation. An LLM suggests concrete values for these variables, after which type checking confirms they are valid, and running the test on the contract shows if a violation actually occurs. This produces executable counterexamples that serve as certified witnesses.

Core claim

The central claim is that a specification can be expressed as a Solidity test function containing existentially quantified variables of abstract type. The LLM provides an instantiation of these variables with concrete values of the appropriate types. Type checking and subsequent concrete execution of the test then certify whether the instantiation demonstrates a violation of the target property, yielding a reliable proof-of-concept counterexample.

What carries the argument

The Solidity test with existentially quantified abstract-type variables, whose instantiation by the LLM is validated through type checking and execution to produce a certified violation witness.

If this is right

  • Certified counterexamples can be directly executed to demonstrate property violations in smart contracts.
  • The method combines the generative power of LLMs with the reliability of type systems and runtime checking.
  • It applies to a range of properties expressible through test-like specifications in Solidity.
  • Violation witnesses become portable PoCs that auditors can use without trusting the LLM directly.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach might allow verification of more complex properties by leveraging LLM creativity for finding edge cases.
  • Similar techniques could apply to other programming languages with test frameworks.
  • The workflow could be iterated, using failed executions to refine LLM prompts.

Load-bearing premise

That type checking and concrete execution of the instantiated test will always detect genuine violations without being fooled by incorrect modeling of the contract or incomplete property specification.

What would settle it

A counterexample where an LLM instantiation type-checks and the test passes execution, yet the contract does not actually violate the property due to differences in how the abstract test models the real execution environment.

Figures

Figures reproduced from arXiv: 2605.31389 by Enrico Lipparini, Massimo Bartoletti.

Figure 1
Figure 1. Figure 1: GPT-5 output of [12] for the query withdraw-assets-credit-others. Answer : FALSE Explanation : withdraw decreases only credits [ msg . sender ] and then performs an external call to msg . sender . During that external call , arbitrary reentrant code can execute and invoke deposit from a different address , changing another user ’s credit . Thus , other users ’ credits are not necessarily preserved . See wi… view at source ↗
Figure 2
Figure 2. Figure 2: Workflow of the procedure. Contract(s) Spec LLM type check & forge test true? (False, ⟨CEX ⟩) ⟨CEX ⟩? True? PASS FAIL 4 Producing certified counterexamples We present in this section our method to produce certified counterexamples. The high-level workflow of our procedure is displayed in [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
read the original abstract

Recent large language models (LLMs) incorporate reasoning capabilities that allow them to perform well in predicting whether a smart contract respects a certain property, suggesting a complementary approach to traditional formal-methods-based techniques for smart contract verification. However, the application of LLMs in such context has two major issues: 1) properties expressed in natural language are intrinsically ambiguous, and 2) answers returned by LLMs have no guarantee of correctness. In this paper, we address both issues simultaneously by: 1) introducing a new formal specification language that extends Solidity with abstract types, and 2) designing a workflow that combines LLMs with type checking and concrete execution to generate and validate violation witnesses (i.e., counterexamples). The key idea is to represent a specification as a Solidity test with (existentially quantified) variables of abstract type; finding an instantiation of these variables to concrete values (of the correct type) concretizes the test into an executable counterexample (PoC) for the target property. We implemented our procedure in the tool Neuroforger, experimentally evaluating it on a smart-contract verification dataset drawn from literature, obtaining promising results that demonstrate its potential applicability in the wild.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces Neuroforger, a workflow and tool that extends Solidity with abstract types to express smart-contract properties as tests containing existentially quantified variables. LLMs are used to propose concrete instantiations of these variables; the resulting test is then type-checked and executed on the EVM to produce a certified violation witness (PoC). The approach is claimed to address both the ambiguity of natural-language properties and the lack of correctness guarantees in LLM outputs, with an experimental evaluation on a literature-derived dataset reported to yield promising results.

Significance. If the central claim holds, the work offers a practical hybrid technique that could complement traditional formal-methods tools for smart-contract verification by generating certified counterexamples. The combination of an extended specification language with external type checkers and execution engines for certification is a notable strength, and the provision of reproducible code or falsifiable predictions would further strengthen its contribution.

major comments (2)
  1. [Abstract / workflow description] Abstract and workflow description: the manuscript states that type checking plus concrete execution certifies an LLM-proposed instantiation as a genuine violation witness, yet supplies no soundness argument addressing potential semantic gaps between the extended Solidity test and the target property (e.g., omitted reentrancy, ordering constraints on abstract types, or external-call effects). This assumption is load-bearing for the certification claim.
  2. [Abstract] Abstract: the evaluation is described only as 'promising results' on a literature dataset; no quantitative metrics, error rates, baseline comparisons, or analysis of edge cases such as non-determinism or incomplete contract models are provided, preventing assessment of whether the central claim is supported.
minor comments (1)
  1. [Abstract] The acronym 'PoC' is introduced without an explicit definition in the abstract.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback. We address the major comments point by point below and will revise the manuscript to strengthen the presentation of the certification claim and evaluation details.

read point-by-point responses
  1. Referee: [Abstract / workflow description] Abstract and workflow description: the manuscript states that type checking plus concrete execution certifies an LLM-proposed instantiation as a genuine violation witness, yet supplies no soundness argument addressing potential semantic gaps between the extended Solidity test and the target property (e.g., omitted reentrancy, ordering constraints on abstract types, or external-call effects). This assumption is load-bearing for the certification claim.

    Authors: We agree that an explicit soundness argument is necessary to support the certification claim. The workflow certifies that a concrete instantiation violates the test as written (via type checking and EVM execution), and the test is intended as a direct formalization of the target property using abstract types for existential quantification. However, we acknowledge that the manuscript does not currently provide a detailed argument addressing potential gaps such as reentrancy, ordering constraints, or external-call effects. In the revision we will add a dedicated soundness section that (1) formalizes the relationship between the abstract test and the property, (2) states the assumptions under which the certification holds, and (3) discusses how the test language can express the cited concerns when they are part of the property. This will make the load-bearing assumption explicit and defensible. revision: yes

  2. Referee: [Abstract] Abstract: the evaluation is described only as 'promising results' on a literature dataset; no quantitative metrics, error rates, baseline comparisons, or analysis of edge cases such as non-determinism or incomplete contract models are provided, preventing assessment of whether the central claim is supported.

    Authors: The abstract is written at a high level, using the phrase 'promising results' to summarize the experimental outcome. The full manuscript contains a dedicated evaluation section on the literature-derived dataset. To address the concern and enable direct assessment of the central claim, we will revise the abstract to report key quantitative metrics (e.g., witness-generation success rate, false-positive rate after certification) and will expand the evaluation section with explicit analysis of edge cases including LLM non-determinism, incomplete contract models, and any baseline comparisons performed. These additions will be included in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper describes a workflow that encodes specifications as Solidity tests with abstract-typed existentially quantified variables, then uses LLMs to propose instantiations, followed by external type checking and concrete execution to produce certified counterexamples. No equations, fitted parameters, or mathematical derivations appear. The certification step relies on independent external tools (Solidity type checker and EVM execution) rather than any internal definition that loops back to the LLM output or the input property. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The method is therefore self-contained against external benchmarks and does not reduce any claim to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Reviewed from abstract only; no free parameters, axioms, or invented entities are specified in the provided text.

pith-pipeline@v0.9.1-grok · 5736 in / 1172 out tokens · 28732 ms · 2026-06-28T20:02:47.009192+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 26 canonical work pages · 3 internal anchors

  1. [1]

    Neuroforger: certified violation witnesses for smart contracts ver- ification via LLMs

    Public data for “Neuroforger: certified violation witnesses for smart contracts ver- ification via LLMs”.https://github.com/elipparini/Neuroforger(2026)

  2. [2]

    In: TACAS (2020)

    Albert, E., Correas, J., Gordillo, P., Rom´ an-D´ ıez, G., Rubio, A.: GASOL: Gas analysis and optimization for Ethereum smart contracts. In: TACAS (2020). https://doi.org/10.1007/978-3-030-45237-7_7

  3. [3]

    In: CAV (2022).https://doi.org/10.1007/978-3-031-13185-1_ 16 18 Massimo Bartoletti and Enrico Lipparini

    Alt, L., Blicha, M., Hyv¨ arinen, A.E.J., Sharygina, N.: SolCMC: Solidity compiler’s model checker. In: CAV (2022).https://doi.org/10.1007/978-3-031-13185-1_ 16 18 Massimo Bartoletti and Enrico Lipparini

  4. [4]

    PoCo: Agentic Proof-of-Concept Exploit Generation for Smart Contracts

    Andersson, V., Bobadilla, S., Hobbelhagen, H., Monperrus, M.: Poco: Agentic proof-of-concept exploit generation for smart contracts. CoRR (2026).https: //doi.org/10.48550/ARXIV.2511.02780,https://arxiv.org/abs/2511.02780

  5. [5]

    In: POST (2017).https://doi.org/10.1007/978-3-662-54455-6_8

    Atzei, N., Bartoletti, M., Cimoli, T.: A survey of attacks on Ethereum smart con- tracts (sok). In: POST (2017).https://doi.org/10.1007/978-3-662-54455-6_8

  6. [6]

    In: IEEE S&P (2023).https://doi.org/10.1109/SP46215.2023.10179435

    Babel, K., Daian, P., Kelkar, M., Juels, A.: Clockwork finance: Automated analysis of economic security in smart contracts. In: IEEE S&P (2023).https://doi.org/ 10.1109/SP46215.2023.10179346

  7. [7]

    Future Gener

    Bartoletti, M., Benetollo, L., Bugliesi, M., Crafa, S., Sasso, G.D., Pettinau, R., Pinna, A., Piras, M., Rossi, S., Salis, S., Span` o, A., Tkachenko, V., Tonelli, R., Zunino, R.: Smart contract languages: A comparative analysis. Future Gener. Com- put. Syst.164(2025).https://doi.org/10.1016/J.FUTURE.2024.107563

  8. [8]

    In: FMBC (2025).https://doi.org/10

    Bartoletti, M., Crafa, S., Lipparini, E.: Formal Verification in Solidity and Move: Insights from a Comparative Analysis. In: FMBC (2025).https://doi.org/10. 4230/OASIcs.FMBC.2025.3

  9. [9]

    In: Integrated Formal Methods (2024).https://doi

    Bartoletti, M., Ferrando, A., Lipparini, E., Malvone, V.: Solvent: Liquidity veri- fication of smart contracts. In: Integrated Formal Methods (2024).https://doi. org/10.1007/978-3-031-76554-4_14

  10. [10]

    KindHML: formal verification of smart contracts based on Hennessy-Milner logic

    Bartoletti, M., Ferrando, A., Lipparini, E., Malvone, V.: KindHML: formal verifi- cation of smart contracts based on Hennessy-Milner logic. CoRR (2026).https: //doi.org/10.48550/ARXIV.2604.14038,https://arxiv.org/abs/2604.14038

  11. [11]

    In: FMBC (2024).https://doi.org/ 10.4230/OASICS.FMBC.2024.6

    Bartoletti, M., Fioravanti, F., Matricardi, G., Pettinau, R., Sainas, F.: Towards benchmarking of Solidity verification tools. In: FMBC (2024).https://doi.org/ 10.4230/OASICS.FMBC.2024.6

  12. [12]

    In: Financial Cryptography (2026)

    Bartoletti, M., Lipparini, E., Pompianu, L.: LLMs as verification oracles for Solid- ity. In: Financial Cryptography (2026)

  13. [13]

    Bernardi, T., Dor, N., Fedotov, A., Grossman, S., Immerman, N., Jackson, D., Nutz, A., Oppenheim, L., Pistiner, O., Rinetzky, N., Sagiv, M., Taube, M., Toman, J.A., Wilcox, J.R.: WIP: Finding bugs automatically in smart contracts with parameterized invariants.https://groups.csail.mit.edu/sdg/pubs/2020/ sbc2020.pdf(2020)

  14. [14]

    In: Conference on Artificial Intelligence and Theorem Proving (2019)

    Blanchette, J.C., Ouraoui, D.E., Fontaine, P., Kaliszyk, C.: Machine Learning for Instance Selection in SMT Solving. In: Conference on Artificial Intelligence and Theorem Proving (2019)

  15. [15]

    Certora verification reports.https://docs.certora.com/en/latest/docs/ prover/dashboard/report.html(2025)

  16. [16]

    Certora: Hooks.https://docs.certora.com/en/latest/docs/cvl/hooks.html (2025)

  17. [17]

    In: ICSE (2024).https://doi.org/10

    Chaliasos, S., Charalambous, M.A., Zhou, L., Galanopoulou, R., Gervais, A., Mitropoulos, D., Livshits, B.: Smart contract and DeFi security: Insights from tool evaluations and practitioner surveys. In: ICSE (2024).https://doi.org/10. 1145/3597503.3623302

  18. [18]

    ACM Comput

    Chen, T.Y., Kuo, F., Liu, H., Poon, P., Towey, D., Tse, T.H., Zhou, Z.Q.: Meta- morphic testing: A review of challenges and opportunities. ACM Comput. Surv. 51(1) (2018).https://doi.org/10.1145/3143561

  19. [19]

    Cramer, M., McIntyre, L.: Verifying LLM-generated code in the context of software verification with Ada/SPARK (2025),https://arxiv.org/abs/2502.07728

  20. [20]

    In: Financial Cryptography (2020).https://doi.org/10

    Eskandari, S., Moosavi, S., Clark, J.: SoK: Transparent Dishonesty: Front-Running Attacks on Blockchain. In: Financial Cryptography (2020).https://doi.org/10. 1007/978-3-030-43725-1_13 Certified violation witnesses for smart contracts verification via LLMs 19

  21. [21]

    Foundry: A blazing fast, portable and modular toolkit for Ethereum application development written in Rust.https://github.com/foundry-rs/foundry(2025)

  22. [22]

    Foundry: Forge.https://www.getfoundry.sh/forge(2026)

  23. [23]

    Foundry: snapshotState cheatcodes.https://getfoundry.sh/reference/ cheatcodes/state-snapshots(2026)

  24. [24]

    Foundry: startStateDiffRecording cheatcodes.https://www.getfoundry.sh/ reference/cheatcodes/start-state-diff-recording(2026)

  25. [25]

    In: Financial Cryptography (2026)

    Gervais, A., Zhou, L.: AI agent smart contract exploit generation. In: Financial Cryptography (2026)

  26. [26]

    In: FMCAD (2018).https: //doi.org/10.23919/FMCAD.2018.8603013

    Hojjat, H., R¨ ummer, P.: The ELDARICA Horn solver. In: FMCAD (2018).https: //doi.org/10.23919/FMCAD.2018.8603013

  27. [27]

    certora.com/en/latest/docs/whitepaper/index.html(2022)

    Jackson, D., Nandi, C., Sagiv, M.: Certora technology white paper.https://docs. certora.com/en/latest/docs/whitepaper/index.html(2022)

  28. [28]

    International Journal of Approximate Reasoning189(2026)

    Jakub˚ uv, J., Janota, M., Piepenbrock, J., Urban, J.: Machine learning for quantifier selection in cvc5. International Journal of Approximate Reasoning189(2026). https://doi.org/https://doi.org/10.1016/j.ijar.2025.109602

  29. [29]

    In: FMCAD (2024).https://doi.org/10.34727/2024/ISBN.978-3-85448-065-5_16

    Kamath, A., Senthilnathan, A., Chakraborty, S., Deligiannis, P., Lahiri, S., Lal, A., Rastogi, A., Roy, S., Sharma, R.: Leveraging LLMs for program verification. In: FMCAD (2024).https://doi.org/10.34727/2024/ISBN.978-3-85448-065-5_16

  30. [30]

    In: NDSS (2025)

    Liu, Y., Xue, Y., Wu, D., Sun, Y., Li, Y., Shi, M., Liu, Y.: PropertyGPT: LLM- driven formal verification of smart contracts through retrieval-augmented property generation. In: NDSS (2025)

  31. [31]

    In: TACAS (2008)

    de Moura, L.M., Bjørner, N.: Z3: An efficient SMT solver. In: TACAS (2008). https://doi.org/10.1007/978-3-540-78800-3_24

  32. [32]

    OpenAI: Gpt-5 docs (2025),https://platform.openai.com/docs/models/gpt-5 [Accessed: 2025-09-12]

  33. [33]

    In: IEEE/ACM ASE (2024)

    Pirzada, M.A.A., Reger, G., Bhayat, A., Cordeiro, L.C.: LLM-generated invariants for bounded model checking without loop unrolling. In: IEEE/ACM ASE (2024). https://doi.org/10.1145/3691620.3695512

  34. [34]

    In: Financial Cryptography (2021).https://doi

    Qin, K., Zhou, L., Livshits, B., Gervais, A.: Attacking the DeFi ecosystem with Flash Loans for fun and profit. In: Financial Cryptography (2021).https://doi. org/10.1007/978-3-662-64322-8_1

  35. [35]

    In: FMBC (2024).https://doi.org/10.4230/OASICS.FMBC.2024.8

    Schiffl, J., Beckert, B.: A practical notion of liveness in smart contract applications. In: FMBC (2024).https://doi.org/10.4230/OASICS.FMBC.2024.8

  36. [36]

    In: TACAS (2021).https:// doi.org/10.1007/978-3-030-72013-1_16

    Scott, J., Niemetz, A., Preiner, M., Nejati, S., Ganesh, V.: MachSMT: A machine learning-based algorithm selector for SMT solvers. In: TACAS (2021).https:// doi.org/10.1007/978-3-030-72013-1_16

  37. [37]

    In: IEEE S&P (2024).https://doi.org/ 10.1109/SP54263.2024.00230

    Sendner, C., Petzi, L., Stang, J., Dmitrienko, A.: Large-scale study of vulnerability scanners for Ethereum smart contracts. In: IEEE S&P (2024).https://doi.org/ 10.1109/SP54263.2024.00230

  38. [38]

    The Solidity Authors: Contracts.https://docs.soliditylang.org/en/latest/ contracts.html#contracts(2025)

  39. [39]

    The Solidity Authors: Receive Ether Function.https://docs.soliditylang.org/ en/latest/contracts.html#receive-ether-function(2025)

  40. [40]

    In: AST (2025).https://doi.org/10.1109/AST66626.2025

    Tihanyi, N., Charalambous, Y., Jain, R., Ferrag, M.A., Cordeiro, L.C.: A new era in software security: Towards self-healing software via large language models and formal verification. In: AST (2025).https://doi.org/10.1109/AST66626.2025. 00020

  41. [41]

    In: ASE (2024).https://doi

    Wu, G., Cao, W., Yao, Y., Wei, H., Chen, T., Ma, X.: LLM meets bounded model checking: Neuro-symbolic loop invariant inference. In: ASE (2024).https://doi. org/10.1145/3691620.3695014 20 Massimo Bartoletti and Enrico Lipparini

  42. [42]

    In: ICLR (2024),https://openreview.net/ forum?id=Q3YaCghZNt

    Wu, H., Barrett, C., Narodytska, N.: Lemur: Integrating large language models in automated program verification. In: ICLR (2024),https://openreview.net/ forum?id=Q3YaCghZNt

  43. [43]

    Xiao, Z., Li, Y., Wang, Q., Chen, S.: Prompt to pwn: Automated exploit generation for smart contracts (2025),https://arxiv.org/abs/2508.01371

  44. [44]

    In: IJCAI (2025).https://doi

    Zhou, B., Wang, H., Yao, Y., Chen, T., Xu, F., Ma, X.: Simulate, refine and inte- grate: Strategy synthesis for efficient SMT solving. In: IJCAI (2025).https://doi. org/10.24963/ijcai.2025/887,https://doi.org/10.24963/ijcai.2025/887

  45. [45]

    In: IEEE S&P (2023).https://doi.org/10.1109/SP46215.2023.10179435

    Zhou, L., Xiong, X., Ernstberger, J., Chaliasos, S., Wang, Z., Wang, Y., Qin, K., Wattenhofer, R., Song, D., Gervais, A.: Sok: Decentralized finance (DeFi) attacks. In: IEEE S&P (2023).https://doi.org/10.1109/SP46215.2023.10179435