CHAINTRIX: A multi-pipeline LLM-augmented framework for automated smart-contract security auditing
Pith reviewed 2026-05-12 03:47 UTC · model grok-4.3
The pith
Chaintrix requires every LLM-generated claim about a smart contract to be discharged against a deterministic structural model of its interactions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The framework constructs a Cross-Contract Interaction Model that turns Solidity code into an explicit map of state accesses and cross-contract dependencies. Every output from the LLM audit pipelines must be checked against this map by the deterministic engines and the Structural Verdict Engine; only findings that survive these structural tests are retained. The resulting merged set is further validated on selected items through symbolic execution and fuzz testing.
What carries the argument
The Cross-Contract Interaction Model (CCIM), a parsed representation of function-level reads, writes, modifiers, and resolved cross-contract calls that functions as the single substrate for all deterministic checks and LLM claims.
If this is right
- LLM outputs that contradict the parsed contract map are rejected before they reach the auditor.
- Cross-contract vulnerabilities become detectable because the model explicitly resolves calls between contracts.
- A single structural representation can support both fixed rule engines and language-model analysis without requiring separate parsers.
- Staged deterministic filtering after LLM generation reduces the volume of findings that require manual review.
Where Pith is reading between the lines
- The same grounding pattern could be tested on other languages used for on-chain code to check whether the reduction in unverified claims generalizes.
- If the CCIM parsing step can be made incremental, repeated audits of evolving contracts would become cheaper.
- The staged pipeline suggests that purely symbolic or fuzzing tools might also benefit from an initial LLM pass that proposes candidate locations rather than scanning the entire contract.
Load-bearing premise
The Cross-Contract Interaction Model correctly and completely parses every relevant Solidity behavior that could produce a vulnerability.
What would settle it
A Solidity contract containing a high-severity vulnerability whose enabling interaction is present in the source yet omitted from the CCIM or incorrectly rejected by the Structural Verdict Engine.
Figures
read the original abstract
Smart-contract exploits have caused billions of USD in cumulative losses, yet audits remain expensive and slow. Automated tools have emerged to close this gap, but each class has a characteristic failure mode. Static analyzers report findings that frequently fail manual triage at high rates, while large language models (LLMs) hallucinate findings that contradict the source code. Thus, we propose Chaintrix, an end-to-end auditing framework whose central architectural commitment is that every LLM-generated claim must be discharged against a deterministic structural contract representation. We introduce a Cross-Contract Interaction Model (CCIM) that parses Solidity into a structured map of function-level reads, writes, modifiers and resolved cross-contract calls. CCIM serves as the substrate against which all 12 of Chaintrix's deterministic signal engines and the parallel LLM audit pipelines operate. A staged false-positive-reduction pipeline, terminating in a Structural Verdict Engine (SVE) that applies deterministic structural checks against parsed code, filters the merged finding set, with selected high-confidence findings further validated through symbolic execution and fuzz testing. We evaluate Chaintrix on EVMbench, the smart-contract security benchmark by OpenAI, Paradigm, OtterSec. Chaintrix detects 86 of 120 high-severity vulnerabilities (71.7% recall), with 25 audits scoring 100% recall, placing Chaintrix 26 percentage points above the strongest frontier-model baseline.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Chaintrix, a multi-pipeline framework for automated smart-contract auditing that combines LLM-based analysis with a deterministic Cross-Contract Interaction Model (CCIM) for parsing Solidity into structured maps of reads/writes/modifiers/cross-contract calls, 12 signal engines, and a staged false-positive reduction pipeline ending in a Structural Verdict Engine (SVE). Selected findings are further checked via symbolic execution and fuzzing. On the EVMbench benchmark, the system is reported to detect 86 of 120 high-severity vulnerabilities (71.7% recall), with 25 audits at 100% recall, outperforming the strongest frontier-model baseline by 26 percentage points.
Significance. If the empirical claims are substantiated, the work would demonstrate a viable architecture for mitigating LLM hallucinations in code security analysis through grounding in a structural contract model. This hybrid approach could meaningfully advance practical automated auditing tools for blockchain applications, where both static analyzers and pure LLM methods have known limitations.
major comments (3)
- [Evaluation] Evaluation section: The headline result (86/120 detections, 71.7% recall, 26 pp improvement) is stated without baseline model names and scores, implementation details, statistical tests, confidence intervals, or per-vulnerability-type breakdown. This absence prevents verification that the reported gain is attributable to the full Chaintrix pipeline rather than subset effects.
- [Cross-Contract Interaction Model] Cross-Contract Interaction Model description: The CCIM is asserted to correctly parse all relevant Solidity behaviors (reads/writes, modifiers, resolved cross-contract calls including delegatecall, inheritance, storage collisions). No coverage metrics, edge-case test suite, or failure analysis on EVMbench contracts is provided, yet any parsing omissions would directly reduce the effective recall of the downstream engines and SVE.
- [Structural Verdict Engine] Structural Verdict Engine and false-positive pipeline: The SVE applies deterministic structural checks to filter findings, but no recall-preservation analysis (i.e., fraction of true vulnerabilities discarded) or ablation showing the pipeline's net effect on true-positive rate is reported. This assumption is load-bearing for the overall 71.7% recall claim.
minor comments (2)
- [Abstract and System Overview] The abstract refers to '12 of Chaintrix's deterministic signal engines' but the manuscript does not provide a clear enumerated list or table mapping each engine to its input from CCIM.
- [Pipeline Description] Notation for the merged finding set and staged filtering steps could be clarified with a diagram or pseudocode to improve readability of the pipeline flow.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback, which highlights important areas for strengthening the empirical rigor and technical transparency of the manuscript. We address each major comment below and commit to revisions that directly respond to the concerns raised.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: The headline result (86/120 detections, 71.7% recall, 26 pp improvement) is stated without baseline model names and scores, implementation details, statistical tests, confidence intervals, or per-vulnerability-type breakdown. This absence prevents verification that the reported gain is attributable to the full Chaintrix pipeline rather than subset effects.
Authors: We agree that the current evaluation section does not provide sufficient detail to allow independent verification of the headline results. In the revised manuscript we will add a dedicated evaluation subsection that (1) names all baseline models and reports their exact recall scores, (2) documents implementation details including model versions, prompting templates, and temperature settings, (3) includes statistical significance tests (e.g., McNemar’s test) comparing Chaintrix against the strongest baseline, (4) reports 95% confidence intervals for the 71.7% recall figure, and (5) presents a per-vulnerability-type breakdown table. These additions will make clear that the 26-percentage-point improvement is produced by the full multi-pipeline architecture rather than any subset of the data or components. revision: yes
-
Referee: [Cross-Contract Interaction Model] Cross-Contract Interaction Model description: The CCIM is asserted to correctly parse all relevant Solidity behaviors (reads/writes, modifiers, resolved cross-contract calls including delegatecall, inheritance, storage collisions). No coverage metrics, edge-case test suite, or failure analysis on EVMbench contracts is provided, yet any parsing omissions would directly reduce the effective recall of the downstream engines and SVE.
Authors: The referee is correct that the submitted manuscript contains no quantitative coverage metrics or systematic failure analysis for the CCIM. While the CCIM was engineered to handle the listed Solidity constructs and was exercised on the EVMbench contracts during development, we did not report coverage statistics or edge-case results. In the revision we will insert a new subsection that (a) states the measured coverage (percentage of functions and contracts successfully parsed), (b) describes the edge-case test suite (including targeted tests for delegatecall, inheritance, and storage collisions), and (c) provides a concise failure analysis of any parsing limitations observed on EVMbench. This will allow readers to evaluate the potential downstream impact on recall. revision: yes
-
Referee: [Structural Verdict Engine] Structural Verdict Engine and false-positive pipeline: The SVE applies deterministic structural checks to filter findings, but no recall-preservation analysis (i.e., fraction of true vulnerabilities discarded) or ablation showing the pipeline's net effect on true-positive rate is reported. This assumption is load-bearing for the overall 71.7% recall claim.
Authors: We acknowledge that the manuscript does not contain an ablation or recall-preservation study for the SVE and the preceding false-positive reduction stages. Because the SVE is a central component of the claimed recall, we will add an ablation experiment in the revised evaluation section. The new analysis will report (1) recall measured before and after the full false-positive pipeline, (2) the exact fraction of ground-truth vulnerabilities discarded by the SVE’s structural checks, and (3) the net change in true-positive rate attributable to the pipeline. These results will be presented alongside the headline 71.7% recall figure. revision: yes
Circularity Check
No circularity: empirical benchmark result on external EVMbench data
full rationale
The paper describes an architectural framework (CCIM parsing, deterministic signal engines, LLM pipelines, and SVE filtering) and reports its performance as a direct empirical measurement (86/120 high-severity detections, 71.7% recall) on the external EVMbench benchmark. No equations, parameter fitting, self-citations, or derivations are present that would reduce the reported recall to a tautology or fitted input by construction. The central claim is a measured outcome on held-out contracts rather than a self-referential prediction, satisfying the criteria for a non-circular empirical result.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The Cross-Contract Interaction Model accurately extracts and represents function-level reads, writes, modifiers, and resolved cross-contract calls from Solidity source code.
invented entities (2)
-
Cross-Contract Interaction Model (CCIM)
no independent evidence
-
Structural Verdict Engine (SVE)
no independent evidence
Reference graph
Works this paper leans on
- [1]
-
[2]
F. Salzano et al., “Bridging the gap: a comparative study of academic and developer approaches to smart contract vulnerabilities,” Empir. Softw. Eng., 2026, doi: 10.1007/s10664-025-10780-5
-
[3]
J. Feist, G. Grieco, and A. Groce, “Slither: A static analysis framework for smart contracts,” in Proceedings - 2019 IEEE/ACM 2nd International Workshop on Emerging Trends in Software Engineering for Blockchain, WETSEB 2019, 2019. doi: 10.1109/WETSEB.2019.00008
-
[4]
X. Ren and Q. Wei, “Sligpt: A Large Language Model-Based Approach for Data Dependency Analysis on Solidity Smart Contracts,” Software, 2024, doi: 10.3390/software3030018
- [5]
-
[6]
Ferreira, Rui Abreu, and Pedro Cruz
T. Durieux, J. F. Ferreira, R. Abreu, and P. Cruz, “Empirical review of automated analysis tools on 47,587 ethereum smart contracts,” in Proceedings - International Conference on Software Engineering, 2020. doi: 10.1145/3377811.3380364
-
[7]
Automated repair of feature interaction failures in automated driving systems,
A. Ghaleb and K. Pattabiraman, “How effective are smart contract analysis tools? evaluating smart contract static analysis tools using bug injection,” in ISSTA 2020 - Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2020. doi: 10.1145/3395363.3397385. 18
-
[8]
Agent4Vul: multimodal LLM agents for smart contract vulnerability detection,
W. Jie et al., “Agent4Vul: multimodal LLM agents for smart contract vulnerability detection,” Sci. China Inf. Sci., 2025, doi: 10.1007/s11432-024-4402-2
-
[9]
NumScout: Unveiling Numerical Defects in Smart Contracts Using LLM-Pruning Symbolic Execution,
J. Chen et al., “NumScout: Unveiling Numerical Defects in Smart Contracts Using LLM-Pruning Symbolic Execution,” IEEE Trans. Softw. Eng., 2025, doi: 10.1109/TSE.2025.3555622
-
[10]
Advanced Smart Contract Vulnerability Detection via LLM-Powered Multi-Agent Systems,
Z. Wei et al., “Advanced Smart Contract Vulnerability Detection via LLM-Powered Multi-Agent Systems,” IEEE Trans. Softw. Eng., 2025, doi: 10.1109/TSE.2025.3597319
-
[11]
OpenSCV: an open hierarchical taxonomy for smart contract vulnerabilities,
F. R. Vidal, N. Ivaki, and N. Laranjeiro, “OpenSCV: an open hierarchical taxonomy for smart contract vulnerabilities,” Empir. Softw. Eng., 2024, doi: 10.1007/s10664-024-10446-8
-
[12]
D. Engler, D. Y. Chen, S. Hallem, A. Chou, and B. Chelf, “Bugs as deviant behavior: A general approach to inferring errors in systems code,” Oper. Syst. Rev., 2001, doi: 10.1145/502059.502041
-
[13]
EVMbench: Evaluating AI Agents on Smart Contract Security,
J. Wang et al., “EVMbench: Evaluating AI Agents on Smart Contract Security,” 2026. [Online]. Available: https://arxiv.org/abs/2603.04915
-
[14]
R. V. Patil, I. S. Borse, M. P. Patil, A. H. Khadke, G. M. Poddar, and S. R. Patil, “Ensuring Trust in Blockchain Enabled Business Processes using Smart Contract Audits,” in Proceedings of 8th International Conference on Inventive Computation Technologies, ICICT 2025, 2025. doi: 10.1109/ICICT64420.2025.11004761
-
[15]
Smart Contracts Security Application and Challenges: A Review,
F. A. Alaba, H. A. Sulaimon, M. I. Marisa, and O. Najeem, “Smart Contracts Security Application and Challenges: A Review,” Cloud Comput. Data Sci., 2024
work page 2024
-
[16]
A comprehensive survey of smart contract security: State of the art and research directions,
G. Wu, H. P. Wang, X. Lai, M. Wang, D. He, and S. Chan, “A comprehensive survey of smart contract security: State of the art and research directions,” 2024. doi: 10.1016/j.jnca.2024.103882
-
[17]
Major vulnerabilities in Ethereum smart contracts: Investigation and statistical analysis,
M. Pishdar, M. Bahaghighat, R. Kumar, and Q. Xin, “Major vulnerabilities in Ethereum smart contracts: Investigation and statistical analysis,” EAI Endorsed Trans. Internet Things, 2025, doi: 10.4108/eetiot.5120
-
[18]
A. Mallick and I. Chebolu, “Modeling and Mitigating Reentrancy Attacks: A Decision-Theoretic Framework for Smart Contract Security,” IEEE Access, 2026, doi: 10.1109/ACCESS.2025.3650603
-
[19]
J. Crisostomo, F. Bacao, and V. Lobo, “Machine learning methods for detecting smart contracts vulnerabilities within Ethereum blockchain − A review,” 2025. doi: 10.1016/j.eswa.2024.126353
-
[20]
F. He, F. Li, and P. Liang, “Enhancing smart contract security: Leveraging pre-trained language models for advanced vulnerability detection,” IET Blockchain, 2024, doi: 10.1049/blc2.12072
-
[21]
D. Chen, L. Feng, Y. Fan, S. Shang, and Z. Wei, “Smart contract vulnerability detection based on semantic graph and residual graph convolutional networks with edge attention,” J. Syst. Softw., 2023, doi: 10.1016/j.jss.2023.111705
-
[22]
Combining Graph Neural Networks with Expert Knowledge for Smart Contract Vulnerability Detection,
Z. Liu, P. Qian, X. Wang, Y. Zhuang, L. Qiu, and X. Wang, “Combining Graph Neural Networks with Expert Knowledge for Smart Contract Vulnerability Detection,” IEEE Trans. Knowl. Data Eng., 2023, doi: 10.1109/TKDE.2021.3095196
-
[23]
M. F. Andrijasa, S. A. Ismail, N. Ahmad, and O. M. Yusop, “Enhancing Smart Contract Security Through Multi-Agent Deep Reinforcement Learning Fuzzing: A Survey of Approaches and Techniques,” Int. J. Adv. Comput. Sci. Appl., 2024, doi: 10.14569/IJACSA.2024.0150576
-
[24]
When ChatGPT Meets Smart Contract Vulnerability Detection: How Far Are We?,
C. Chen et al., “When ChatGPT Meets Smart Contract Vulnerability Detection: How Far Are We?,” ACM Trans. Softw. Eng. Methodol., 2025, doi: 10.1145/3702973
-
[25]
S. Arunprasath and A. Suresh, “A Reliable Framework for Detection of Smart Contract Vulnerabilities for Enhancing Operability in Inter-Organizational Systems,” J. Mob. Multimed., 2024, doi: 10.13052/jmm1550-4646.2027
-
[26]
Securify: Practical security analysis of smart contracts,
P. Tsankov, A. Dan, D. Drachsler-Cohen, A. Gervais, F. Bünzli, and M. Vechev, “Securify: Practical security analysis of smart contracts,” in Proceedings of the ACM Conference on Computer and Communications Security, 2018. doi: 10.1145/3243734.3243780
-
[27]
Y. Liu et al., “PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation,” 2025. doi: 10.14722/ndss.2025.241357
-
[28]
S. Hu, T. Huang, F. Ilhan, S. F. Tekin, and L. Liu, “Large Language Model-Powered Smart 19 Contract Vulnerability Detection: New Perspectives,” in Proceedings - 2023 5th IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2023, 2023. doi: 10.1109/TPS-ISA58951.2023.00044
-
[29]
R. E. Strom and S. Yemini, “Typestate: A Programming Language Concept for Enhancing Software Reliability,” IEEE Trans. Softw. Eng., 1986, doi: 10.1109/TSE.1986.6312929
-
[30]
Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contract Security?,
C. Peng, L. Wu, and Y. Zhou, “Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contract Security?,” 2026. [Online]. Available: https://arxiv.org/abs/2603.10795
-
[31]
GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis,
Y. Sun et al., “GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis,” in Proceedings - International Conference on Software Engineering,
-
[32]
doi: 10.1145/3597503.3639117
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.