pith. sign in

arxiv: 2605.09350 · v1 · submitted 2026-05-10 · 💻 cs.AI

CHAINTRIX: A multi-pipeline LLM-augmented framework for automated smart-contract security auditing

Pith reviewed 2026-05-12 03:47 UTC · model grok-4.3

classification 💻 cs.AI
keywords smart contract auditingvulnerability detectionlarge language modelsautomated security analysissoliditycross-contract interactionsfalse positive reductionstructural verification
0
0 comments X

The pith

Chaintrix requires every LLM-generated claim about a smart contract to be discharged against a deterministic structural model of its interactions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an auditing framework that first parses Solidity source into a Cross-Contract Interaction Model capturing function reads, writes, modifiers, and resolved calls across contracts. This model then serves as the shared reference for twelve deterministic signal engines and parallel LLM pipelines. A staged reduction process, ending with a Structural Verdict Engine that applies further deterministic checks, filters the combined findings, after which selected results receive symbolic execution or fuzz testing. The central commitment is that language-model flexibility must always be constrained by verifiable structural facts extracted from the code itself. If this holds, automated reviews could become both faster and less prone to the hallucinations and unverified reports that currently limit each technique used alone.

Core claim

The framework constructs a Cross-Contract Interaction Model that turns Solidity code into an explicit map of state accesses and cross-contract dependencies. Every output from the LLM audit pipelines must be checked against this map by the deterministic engines and the Structural Verdict Engine; only findings that survive these structural tests are retained. The resulting merged set is further validated on selected items through symbolic execution and fuzz testing.

What carries the argument

The Cross-Contract Interaction Model (CCIM), a parsed representation of function-level reads, writes, modifiers, and resolved cross-contract calls that functions as the single substrate for all deterministic checks and LLM claims.

If this is right

  • LLM outputs that contradict the parsed contract map are rejected before they reach the auditor.
  • Cross-contract vulnerabilities become detectable because the model explicitly resolves calls between contracts.
  • A single structural representation can support both fixed rule engines and language-model analysis without requiring separate parsers.
  • Staged deterministic filtering after LLM generation reduces the volume of findings that require manual review.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same grounding pattern could be tested on other languages used for on-chain code to check whether the reduction in unverified claims generalizes.
  • If the CCIM parsing step can be made incremental, repeated audits of evolving contracts would become cheaper.
  • The staged pipeline suggests that purely symbolic or fuzzing tools might also benefit from an initial LLM pass that proposes candidate locations rather than scanning the entire contract.

Load-bearing premise

The Cross-Contract Interaction Model correctly and completely parses every relevant Solidity behavior that could produce a vulnerability.

What would settle it

A Solidity contract containing a high-severity vulnerability whose enabling interaction is present in the source yet omitted from the CCIM or incorrectly rejected by the Structural Verdict Engine.

Figures

Figures reproduced from arXiv: 2605.09350 by Adela Bara, Gabriela Dobrita, Simona-Vasilica Oprea.

Figure 1
Figure 1. Figure 1: Chaintrix end-to-end architecture 3.2 System architecture 3.2.1 Cross-contract interaction model The CCIM is the system’s deterministic ground-truth layer: a regex-only parser that compiles the entire codebase, in under a second, into a single structured object that every downstream verifier checks the LLM against. Its atomic unit is a per-function record carrying the function’s visibility, mutability, mod… view at source ↗
Figure 3
Figure 3. Figure 3: The DD phased audit Phase C (cross-contract verification). Phase C verifies the cross-contract interactions identified in CCIM Layer 2. From the state-dependency graph and the call graph, it constructs both pairs (a writer of a variable paired with each reader of the same variable or a caller paired with each callee) and N-way groups (three or more functions that all touch the same storage variable), so th… view at source ↗
Figure 4
Figure 4. Figure 4: shows the resulting reduction funnel [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
read the original abstract

Smart-contract exploits have caused billions of USD in cumulative losses, yet audits remain expensive and slow. Automated tools have emerged to close this gap, but each class has a characteristic failure mode. Static analyzers report findings that frequently fail manual triage at high rates, while large language models (LLMs) hallucinate findings that contradict the source code. Thus, we propose Chaintrix, an end-to-end auditing framework whose central architectural commitment is that every LLM-generated claim must be discharged against a deterministic structural contract representation. We introduce a Cross-Contract Interaction Model (CCIM) that parses Solidity into a structured map of function-level reads, writes, modifiers and resolved cross-contract calls. CCIM serves as the substrate against which all 12 of Chaintrix's deterministic signal engines and the parallel LLM audit pipelines operate. A staged false-positive-reduction pipeline, terminating in a Structural Verdict Engine (SVE) that applies deterministic structural checks against parsed code, filters the merged finding set, with selected high-confidence findings further validated through symbolic execution and fuzz testing. We evaluate Chaintrix on EVMbench, the smart-contract security benchmark by OpenAI, Paradigm, OtterSec. Chaintrix detects 86 of 120 high-severity vulnerabilities (71.7% recall), with 25 audits scoring 100% recall, placing Chaintrix 26 percentage points above the strongest frontier-model baseline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces Chaintrix, a multi-pipeline framework for automated smart-contract auditing that combines LLM-based analysis with a deterministic Cross-Contract Interaction Model (CCIM) for parsing Solidity into structured maps of reads/writes/modifiers/cross-contract calls, 12 signal engines, and a staged false-positive reduction pipeline ending in a Structural Verdict Engine (SVE). Selected findings are further checked via symbolic execution and fuzzing. On the EVMbench benchmark, the system is reported to detect 86 of 120 high-severity vulnerabilities (71.7% recall), with 25 audits at 100% recall, outperforming the strongest frontier-model baseline by 26 percentage points.

Significance. If the empirical claims are substantiated, the work would demonstrate a viable architecture for mitigating LLM hallucinations in code security analysis through grounding in a structural contract model. This hybrid approach could meaningfully advance practical automated auditing tools for blockchain applications, where both static analyzers and pure LLM methods have known limitations.

major comments (3)
  1. [Evaluation] Evaluation section: The headline result (86/120 detections, 71.7% recall, 26 pp improvement) is stated without baseline model names and scores, implementation details, statistical tests, confidence intervals, or per-vulnerability-type breakdown. This absence prevents verification that the reported gain is attributable to the full Chaintrix pipeline rather than subset effects.
  2. [Cross-Contract Interaction Model] Cross-Contract Interaction Model description: The CCIM is asserted to correctly parse all relevant Solidity behaviors (reads/writes, modifiers, resolved cross-contract calls including delegatecall, inheritance, storage collisions). No coverage metrics, edge-case test suite, or failure analysis on EVMbench contracts is provided, yet any parsing omissions would directly reduce the effective recall of the downstream engines and SVE.
  3. [Structural Verdict Engine] Structural Verdict Engine and false-positive pipeline: The SVE applies deterministic structural checks to filter findings, but no recall-preservation analysis (i.e., fraction of true vulnerabilities discarded) or ablation showing the pipeline's net effect on true-positive rate is reported. This assumption is load-bearing for the overall 71.7% recall claim.
minor comments (2)
  1. [Abstract and System Overview] The abstract refers to '12 of Chaintrix's deterministic signal engines' but the manuscript does not provide a clear enumerated list or table mapping each engine to its input from CCIM.
  2. [Pipeline Description] Notation for the merged finding set and staged filtering steps could be clarified with a diagram or pseudocode to improve readability of the pipeline flow.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which highlights important areas for strengthening the empirical rigor and technical transparency of the manuscript. We address each major comment below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: The headline result (86/120 detections, 71.7% recall, 26 pp improvement) is stated without baseline model names and scores, implementation details, statistical tests, confidence intervals, or per-vulnerability-type breakdown. This absence prevents verification that the reported gain is attributable to the full Chaintrix pipeline rather than subset effects.

    Authors: We agree that the current evaluation section does not provide sufficient detail to allow independent verification of the headline results. In the revised manuscript we will add a dedicated evaluation subsection that (1) names all baseline models and reports their exact recall scores, (2) documents implementation details including model versions, prompting templates, and temperature settings, (3) includes statistical significance tests (e.g., McNemar’s test) comparing Chaintrix against the strongest baseline, (4) reports 95% confidence intervals for the 71.7% recall figure, and (5) presents a per-vulnerability-type breakdown table. These additions will make clear that the 26-percentage-point improvement is produced by the full multi-pipeline architecture rather than any subset of the data or components. revision: yes

  2. Referee: [Cross-Contract Interaction Model] Cross-Contract Interaction Model description: The CCIM is asserted to correctly parse all relevant Solidity behaviors (reads/writes, modifiers, resolved cross-contract calls including delegatecall, inheritance, storage collisions). No coverage metrics, edge-case test suite, or failure analysis on EVMbench contracts is provided, yet any parsing omissions would directly reduce the effective recall of the downstream engines and SVE.

    Authors: The referee is correct that the submitted manuscript contains no quantitative coverage metrics or systematic failure analysis for the CCIM. While the CCIM was engineered to handle the listed Solidity constructs and was exercised on the EVMbench contracts during development, we did not report coverage statistics or edge-case results. In the revision we will insert a new subsection that (a) states the measured coverage (percentage of functions and contracts successfully parsed), (b) describes the edge-case test suite (including targeted tests for delegatecall, inheritance, and storage collisions), and (c) provides a concise failure analysis of any parsing limitations observed on EVMbench. This will allow readers to evaluate the potential downstream impact on recall. revision: yes

  3. Referee: [Structural Verdict Engine] Structural Verdict Engine and false-positive pipeline: The SVE applies deterministic structural checks to filter findings, but no recall-preservation analysis (i.e., fraction of true vulnerabilities discarded) or ablation showing the pipeline's net effect on true-positive rate is reported. This assumption is load-bearing for the overall 71.7% recall claim.

    Authors: We acknowledge that the manuscript does not contain an ablation or recall-preservation study for the SVE and the preceding false-positive reduction stages. Because the SVE is a central component of the claimed recall, we will add an ablation experiment in the revised evaluation section. The new analysis will report (1) recall measured before and after the full false-positive pipeline, (2) the exact fraction of ground-truth vulnerabilities discarded by the SVE’s structural checks, and (3) the net change in true-positive rate attributable to the pipeline. These results will be presented alongside the headline 71.7% recall figure. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmark result on external EVMbench data

full rationale

The paper describes an architectural framework (CCIM parsing, deterministic signal engines, LLM pipelines, and SVE filtering) and reports its performance as a direct empirical measurement (86/120 high-severity detections, 71.7% recall) on the external EVMbench benchmark. No equations, parameter fitting, self-citations, or derivations are present that would reduce the reported recall to a tautology or fitted input by construction. The central claim is a measured outcome on held-out contracts rather than a self-referential prediction, satisfying the criteria for a non-circular empirical result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Abstract-only review; detailed methods, assumptions, and any fitted parameters are not visible. The framework introduces new named components whose correctness is presupposed.

axioms (1)
  • domain assumption The Cross-Contract Interaction Model accurately extracts and represents function-level reads, writes, modifiers, and resolved cross-contract calls from Solidity source code.
    Invoked as the central substrate against which all LLM claims and deterministic engines operate.
invented entities (2)
  • Cross-Contract Interaction Model (CCIM) no independent evidence
    purpose: Structured map of contract interactions used to ground and verify LLM-generated findings
    New component introduced by the framework; no independent external validation described in abstract.
  • Structural Verdict Engine (SVE) no independent evidence
    purpose: Deterministic structural checks that filter the merged finding set
    Part of the false-positive-reduction pipeline; correctness assumed rather than demonstrated.

pith-pipeline@v0.9.0 · 5560 in / 1474 out tokens · 88419 ms · 2026-05-12T03:47:28.295017+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

  1. [1]

    The 2023 Crypto Crime Report,

    Chainalysis, “The 2023 Crypto Crime Report,” 2023

  2. [2]

    Bridging the gap: a comparative study of academic and developer approaches to smart contract vulnerabilities,

    F. Salzano et al., “Bridging the gap: a comparative study of academic and developer approaches to smart contract vulnerabilities,” Empir. Softw. Eng., 2026, doi: 10.1007/s10664-025-10780-5

  3. [3]

    Donaldson

    J. Feist, G. Grieco, and A. Groce, “Slither: A static analysis framework for smart contracts,” in Proceedings - 2019 IEEE/ACM 2nd International Workshop on Emerging Trends in Software Engineering for Blockchain, WETSEB 2019, 2019. doi: 10.1109/WETSEB.2019.00008

  4. [4]

    Sligpt: A Large Language Model-Based Approach for Data Dependency Analysis on Solidity Smart Contracts,

    X. Ren and Q. Wei, “Sligpt: A Large Language Model-Based Approach for Data Dependency Analysis on Solidity Smart Contracts,” Software, 2024, doi: 10.3390/software3030018

  5. [5]

    Mythril,

    Consensys, “Mythril,” Consensys

  6. [6]

    Ferreira, Rui Abreu, and Pedro Cruz

    T. Durieux, J. F. Ferreira, R. Abreu, and P. Cruz, “Empirical review of automated analysis tools on 47,587 ethereum smart contracts,” in Proceedings - International Conference on Software Engineering, 2020. doi: 10.1145/3377811.3380364

  7. [7]

    Automated repair of feature interaction failures in automated driving systems,

    A. Ghaleb and K. Pattabiraman, “How effective are smart contract analysis tools? evaluating smart contract static analysis tools using bug injection,” in ISSTA 2020 - Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2020. doi: 10.1145/3395363.3397385. 18

  8. [8]

    Agent4Vul: multimodal LLM agents for smart contract vulnerability detection,

    W. Jie et al., “Agent4Vul: multimodal LLM agents for smart contract vulnerability detection,” Sci. China Inf. Sci., 2025, doi: 10.1007/s11432-024-4402-2

  9. [9]

    NumScout: Unveiling Numerical Defects in Smart Contracts Using LLM-Pruning Symbolic Execution,

    J. Chen et al., “NumScout: Unveiling Numerical Defects in Smart Contracts Using LLM-Pruning Symbolic Execution,” IEEE Trans. Softw. Eng., 2025, doi: 10.1109/TSE.2025.3555622

  10. [10]

    Advanced Smart Contract Vulnerability Detection via LLM-Powered Multi-Agent Systems,

    Z. Wei et al., “Advanced Smart Contract Vulnerability Detection via LLM-Powered Multi-Agent Systems,” IEEE Trans. Softw. Eng., 2025, doi: 10.1109/TSE.2025.3597319

  11. [11]

    OpenSCV: an open hierarchical taxonomy for smart contract vulnerabilities,

    F. R. Vidal, N. Ivaki, and N. Laranjeiro, “OpenSCV: an open hierarchical taxonomy for smart contract vulnerabilities,” Empir. Softw. Eng., 2024, doi: 10.1007/s10664-024-10446-8

  12. [12]

    Engler, D

    D. Engler, D. Y. Chen, S. Hallem, A. Chou, and B. Chelf, “Bugs as deviant behavior: A general approach to inferring errors in systems code,” Oper. Syst. Rev., 2001, doi: 10.1145/502059.502041

  13. [13]

    EVMbench: Evaluating AI Agents on Smart Contract Security,

    J. Wang et al., “EVMbench: Evaluating AI Agents on Smart Contract Security,” 2026. [Online]. Available: https://arxiv.org/abs/2603.04915

  14. [14]

    Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, and Min-Hung Chen

    R. V. Patil, I. S. Borse, M. P. Patil, A. H. Khadke, G. M. Poddar, and S. R. Patil, “Ensuring Trust in Blockchain Enabled Business Processes using Smart Contract Audits,” in Proceedings of 8th International Conference on Inventive Computation Technologies, ICICT 2025, 2025. doi: 10.1109/ICICT64420.2025.11004761

  15. [15]

    Smart Contracts Security Application and Challenges: A Review,

    F. A. Alaba, H. A. Sulaimon, M. I. Marisa, and O. Najeem, “Smart Contracts Security Application and Challenges: A Review,” Cloud Comput. Data Sci., 2024

  16. [16]

    A comprehensive survey of smart contract security: State of the art and research directions,

    G. Wu, H. P. Wang, X. Lai, M. Wang, D. He, and S. Chan, “A comprehensive survey of smart contract security: State of the art and research directions,” 2024. doi: 10.1016/j.jnca.2024.103882

  17. [17]

    Major vulnerabilities in Ethereum smart contracts: Investigation and statistical analysis,

    M. Pishdar, M. Bahaghighat, R. Kumar, and Q. Xin, “Major vulnerabilities in Ethereum smart contracts: Investigation and statistical analysis,” EAI Endorsed Trans. Internet Things, 2025, doi: 10.4108/eetiot.5120

  18. [18]

    Modeling and Mitigating Reentrancy Attacks: A Decision-Theoretic Framework for Smart Contract Security,

    A. Mallick and I. Chebolu, “Modeling and Mitigating Reentrancy Attacks: A Decision-Theoretic Framework for Smart Contract Security,” IEEE Access, 2026, doi: 10.1109/ACCESS.2025.3650603

  19. [19]

    Machine learning methods for detecting smart contracts vulnerabilities within Ethereum blockchain − A review,

    J. Crisostomo, F. Bacao, and V. Lobo, “Machine learning methods for detecting smart contracts vulnerabilities within Ethereum blockchain − A review,” 2025. doi: 10.1016/j.eswa.2024.126353

  20. [20]

    Enhancing smart contract security: Leveraging pre-trained language models for advanced vulnerability detection,

    F. He, F. Li, and P. Liang, “Enhancing smart contract security: Leveraging pre-trained language models for advanced vulnerability detection,” IET Blockchain, 2024, doi: 10.1049/blc2.12072

  21. [21]

    Smart contract vulnerability detection based on semantic graph and residual graph convolutional networks with edge attention,

    D. Chen, L. Feng, Y. Fan, S. Shang, and Z. Wei, “Smart contract vulnerability detection based on semantic graph and residual graph convolutional networks with edge attention,” J. Syst. Softw., 2023, doi: 10.1016/j.jss.2023.111705

  22. [22]

    Combining Graph Neural Networks with Expert Knowledge for Smart Contract Vulnerability Detection,

    Z. Liu, P. Qian, X. Wang, Y. Zhuang, L. Qiu, and X. Wang, “Combining Graph Neural Networks with Expert Knowledge for Smart Contract Vulnerability Detection,” IEEE Trans. Knowl. Data Eng., 2023, doi: 10.1109/TKDE.2021.3095196

  23. [23]

    Enhancing Smart Contract Security Through Multi-Agent Deep Reinforcement Learning Fuzzing: A Survey of Approaches and Techniques,

    M. F. Andrijasa, S. A. Ismail, N. Ahmad, and O. M. Yusop, “Enhancing Smart Contract Security Through Multi-Agent Deep Reinforcement Learning Fuzzing: A Survey of Approaches and Techniques,” Int. J. Adv. Comput. Sci. Appl., 2024, doi: 10.14569/IJACSA.2024.0150576

  24. [24]

    When ChatGPT Meets Smart Contract Vulnerability Detection: How Far Are We?,

    C. Chen et al., “When ChatGPT Meets Smart Contract Vulnerability Detection: How Far Are We?,” ACM Trans. Softw. Eng. Methodol., 2025, doi: 10.1145/3702973

  25. [25]

    A Reliable Framework for Detection of Smart Contract Vulnerabilities for Enhancing Operability in Inter-Organizational Systems,

    S. Arunprasath and A. Suresh, “A Reliable Framework for Detection of Smart Contract Vulnerabilities for Enhancing Operability in Inter-Organizational Systems,” J. Mob. Multimed., 2024, doi: 10.13052/jmm1550-4646.2027

  26. [26]

    Securify: Practical security analysis of smart contracts,

    P. Tsankov, A. Dan, D. Drachsler-Cohen, A. Gervais, F. Bünzli, and M. Vechev, “Securify: Practical security analysis of smart contracts,” in Proceedings of the ACM Conference on Computer and Communications Security, 2018. doi: 10.1145/3243734.3243780

  27. [27]

    PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation,

    Y. Liu et al., “PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation,” 2025. doi: 10.14722/ndss.2025.241357

  28. [28]

    In: 2023 5th IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA), pp

    S. Hu, T. Huang, F. Ilhan, S. F. Tekin, and L. Liu, “Large Language Model-Powered Smart 19 Contract Vulnerability Detection: New Perspectives,” in Proceedings - 2023 5th IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2023, 2023. doi: 10.1109/TPS-ISA58951.2023.00044

  29. [29]

    Strom and Shaula Yemini

    R. E. Strom and S. Yemini, “Typestate: A Programming Language Concept for Enhancing Software Reliability,” IEEE Trans. Softw. Eng., 1986, doi: 10.1109/TSE.1986.6312929

  30. [30]

    Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contract Security?,

    C. Peng, L. Wu, and Y. Zhou, “Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contract Security?,” 2026. [Online]. Available: https://arxiv.org/abs/2603.10795

  31. [31]

    GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis,

    Y. Sun et al., “GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis,” in Proceedings - International Conference on Software Engineering,

  32. [32]

    doi: 10.1145/3597503.3639117