pith. sign in

arxiv: 2605.19668 · v1 · pith:DXPRU3JKnew · submitted 2026-05-19 · 💻 cs.CR · cs.SE

SCARA: A Semantics-Constrained Autonomous Remediation Agent for Opaque Industrial Software Vulnerabilities

Pith reviewed 2026-05-20 04:50 UTC · model grok-4.3

classification 💻 cs.CR cs.SE
keywords opaque industrial softwarevulnerability remediationbinary analysisautonomous agentindustrial control systemssource unavailableICS securityprotocol mitigation
0
0 comments X

The pith

SCARA links binary vulnerability candidates to validated remedies for opaque industrial software without source code or builds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces SCARA as an autonomous agent that remediates vulnerabilities in opaque industrial software, where source code, symbols, and build environments are unavailable. It uses a four-stage pipeline to first filter out fixes that would not work in real operations, then synthesize repairs from protocol, binary, or constrained patch options, and finally check that the changes preserve expected behavior. A sympathetic reader would care because critical infrastructure frequently relies on such stripped firmware and proprietary handlers, leaving a gap between binary discovery and safe, deployable fixes. The authors report 100 percent precision and an 88.9 percent success rate on a 15-case benchmark after targeted reruns, while dismissing 20 percent of candidates as operationally infeasible.

Core claim

SCARA operates under a source-unavailable defender model and connects upstream binary vulnerability candidates to conditionally validated remedies through a four-stage pipeline. Operational-state-aware verification (OSVA) filters infeasible candidates using a nine-component industrial state model; remediation synthesis (RSA) selects the strongest available remedy across protocol mitigation, binary hardening, and SSCKG-constrained source patches; and correctness validation (CVA) provides conditional correctness evidence via behavioral-coverage preservation, independent replay, and typed rejection feedback. On OIS-RemedBench, SCARA achieves observed 100% precision with no false positives, refu

What carries the argument

The four-stage pipeline that chains operational-state-aware verification (OSVA) with a nine-component industrial state model to remediation synthesis (RSA) and correctness validation (CVA).

If this is right

  • Binary-discovered vulnerability candidates can be automatically filtered for operational feasibility before any repair attempt.
  • Remedies can be chosen from protocol mitigation, binary hardening, or constrained source patches depending on what is available.
  • Conditional correctness evidence can be produced through behavioral coverage checks and independent replay without full instrumentation.
  • Twenty percent of apparent vulnerabilities can be refuted as not relevant to actual system operation.
  • The approach scales to firmware, protocol handlers, and ICS/PLC artifacts on the provided benchmark.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same state-model filtering could reduce false remediation attempts in other embedded or proprietary systems beyond industrial control.
  • Combining SCARA with upstream binary scanners would create a closed-loop detection-to-remediation process for operators lacking development access.
  • Expanding the nine-component state model with additional timing or network-state elements might raise the infeasibility refutation rate further.
  • The conditional validation outputs could serve as audit artifacts for compliance in regulated critical-infrastructure environments.

Load-bearing premise

The nine-component industrial state model used in operational-state-aware verification is sufficient to correctly identify which vulnerability candidates are infeasible in real deployments.

What would settle it

Deploy a SCARA-generated remedy on a previously unseen opaque industrial control system in a live testbed and check whether any operational failure occurs that the correctness validation stage did not flag.

Figures

Figures reproduced from arXiv: 2605.19668 by Bowei Ning, Guogang Wang, Jinyang Liu, Kan He, Lian Lian, Xuejun Zong, Yifei Sun.

Figure 1
Figure 1. Figure 1: SCARA four-stage pipeline. CACA normalises heterogeneous candidate evidence; OSVA verifies operational [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: CACA normalisation funnel. Heterogeneous vulnerability evidence (static-analyser candidates, taint-derived [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Operational-state reachability analysis in OSVA. A candidate source-to-sink path is projected from the [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Availability-aware remediation synthesis in RSA. SCARA selects the strongest feasible remediation tier [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: OIS-RemedBench evidence-availability tile chart. Each row of tiles within a partition shows the fraction of [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: CWE coverage across the three OIS-RemedBench partitions. Regions list the CWEs unique to a partition or [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Baseline applicability heatmap on OIS-RemedBench ( [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Three-column flow diagram from research question to SCARA stage to primary metric family. [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Ablation–to–RQ arc diagram. Component brackets along the bottom group ablations by the SCARA [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Experimental architecture diagram of SCARA implementation. [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Per-case outcome waterfall on OIS-RemedBench. Each row is one of the 15 cases, ordered by partition [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Cliff’s δ forest plot summarising [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: FPR and FNR by partition for SCARA (reconciled after-rerun rates), the deduplicated static-analysis union, [PITH_FULL_IMAGE:figures/full_fig_p024_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Per-dimension contribution heatmap. Left panel: [PITH_FULL_IMAGE:figures/full_fig_p025_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Recall@K on Drank = 11 for the three scheduling strategies; K is on a log axis. SSCKG-guided ranking dominates static-risk and random for every K < 50 and reaches Recall@50 = 1.0, supporting the design intent that SSCKG guidance is path prioritisation rather than pruning [PITH_FULL_IMAGE:figures/full_fig_p026_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Time-to-first-SAT-STRICT-witness distribution on Drank. Boxes show median/IQR; overlaid dots are per-case￾per-seed measurements with horizontal jitter; red triangles mark censored timeouts at the budget ceiling. SSCKG-guided (n = 55, 0 timeouts) sits well below static-risk and random (n = 11 each, 1 timeout each at seed 42). 26 [PITH_FULL_IMAGE:figures/full_fig_p026_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Tier distribution and per-tier remediation success on [PITH_FULL_IMAGE:figures/full_fig_p027_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: SAN2PATCH vs SCARA behavioural coverage preservation on the SAN2PATCH-applicable OIS-ICS [PITH_FULL_IMAGE:figures/full_fig_p028_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: CVA quality on DCVA-audit, with all four variants re-scored under the same full-CVA oracle. Bars are per-variant rates for root-cause removed, BCP ≥ τcov, no-new-vulnerability (NVR), replay confirmation, and final CVA acceptance. Variants that disable a CVA component score 0% on the dependent components, supporting the §3.5 argument that each component is load-bearing [PITH_FULL_IMAGE:figures/full_fig_p0… view at source ↗
Figure 20
Figure 20. Figure 20: Hyperparameter sensitivity grid. Top: τp (path-priority temperature) and Ttotal (solver budget). Bottom-left: α (CACA ranking weight) — both recall and Recall@10 peak at α = 0.6 and stay flat to α = 0.9 before degrading at α = 1.0. Bottom-right: joint (τcov, τblock) sweep with the operating point at (0.95, 0.05). The dashed vertical line in each panel marks the default operating point. 29 [PITH_FULL_IMAG… view at source ↗
Figure 21
Figure 21. Figure 21: Per-case analyst-hour comparison on DPLC (n = 5). X-axis: PLCverif total analyst-hours (property authoring + model construction + debugging); y-axis: SCARA operational-context review hours. Marker shape encodes the PLCverif outcome (circle = VERIFIED, triangle = TIMEOUT, diamond = VERIFIED-INFEASIBLE). The y = x and y = x/5 reference lines bracket the case-by-case throughput gap; every OIS-ICS case sits w… view at source ↗
read the original abstract

Critical-infrastructure operators are increasingly expected to assess and remediate vulnerabilities in deployed industrial software. However, much of this software exists as opaque industrial software (OIS), including stripped firmware, proprietary protocol handlers, and compiled control logic without source code, symbols, build environments, or hardware interfaces. While binary analysis can identify vulnerability candidates, existing automated repair systems largely rely on source code, compilable artifacts, sanitizer feedback, or instrumentable builds, leaving a gap between binary-level discovery and validated remediation. This paper presents SCARA, a Semantics-Constrained Autonomous Remediation Agent for OIS. SCARA operates under a source-unavailable defender model and connects upstream binary vulnerability candidates to conditionally validated remedies through a four-stage pipeline. Operational-state-aware verification (OSVA) filters infeasible candidates using a nine-component industrial state model; remediation synthesis (RSA) selects the strongest available remedy across protocol mitigation, binary hardening, and SSCKG-constrained source patches; and correctness validation (CVA) provides conditional correctness evidence via behavioral-coverage preservation, independent replay, and typed rejection feedback. On OIS-RemedBench, a 15-case benchmark spanning firmware, protocol handlers, and ICS/PLC artifacts, SCARA achieves observed 100% precision with no false positives, refutes 20.0% of cases as operationally infeasible, and reaches 88.9% remediation success after targeted reruns. To our knowledge, SCARA is the first end-to-end framework that connects binary vulnerability candidates to conditionally validated remediation for opaque industrial software.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents SCARA, a four-stage autonomous remediation pipeline for opaque industrial software (OIS) under a source-unavailable defender model. The pipeline consists of operational-state-aware verification (OSVA) that uses a nine-component industrial state model to filter infeasible vulnerability candidates, remediation synthesis (RSA) that selects among protocol mitigation, binary hardening, and constrained patches, and correctness validation (CVA) that checks behavioral coverage and replay. On the 15-case OIS-RemedBench benchmark spanning firmware, protocol handlers, and ICS/PLC artifacts, the system reports 100% observed precision with no false positives, refutes 20% of cases as operationally infeasible, and achieves 88.9% remediation success after targeted reruns.

Significance. If the OSVA state model and conditional validation steps hold under real deployments, SCARA would represent a meaningful advance in bridging binary vulnerability discovery to validated remediation for critical-infrastructure software that lacks source or build artifacts. The reported precision and success rates on a dedicated benchmark indicate potential practical impact, and the explicit handling of operational infeasibility is a distinguishing feature relative to source-dependent repair systems.

major comments (2)
  1. [Section 3 (OSVA)] OSVA description (Section 3): the claim of 100% precision and 20% infeasible refutations rests on the nine-component industrial state model correctly identifying operationally infeasible candidates. The manuscript provides no external validation, cross-check against deployed systems, or sensitivity analysis showing that the model captures timing, hardware interactions, proprietary protocol semantics, and runtime configurations that binary analysis cannot observe. If the model under-approximates feasible states, both the precision figure and the infeasibility refutations on OIS-RemedBench become unreliable.
  2. [Section 5 (Evaluation)] Evaluation section (Section 5): the benchmark results are presented without details on case selection criteria, how success and precision were measured, or error analysis for the three failed remediation cases. This information is required to assess whether the 88.9% success rate and zero false positives are robust or sensitive to benchmark construction.
minor comments (2)
  1. [Abstract and §1] The abstract and introduction use the term 'conditionally validated remedies' without a concise definition of what 'conditional' means in the context of CVA; a short clarifying sentence would improve readability.
  2. [Section 5] Table or figure summarizing the 15 OIS-RemedBench cases (e.g., artifact type, vulnerability class, and outcome) is missing; adding one would make the experimental claims easier to inspect.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below, indicating planned revisions to the manuscript where the concerns can be directly addressed through clarification, additional analysis, or expanded discussion.

read point-by-point responses
  1. Referee: [Section 3 (OSVA)] OSVA description (Section 3): the claim of 100% precision and 20% infeasible refutations rests on the nine-component industrial state model correctly identifying operationally infeasible candidates. The manuscript provides no external validation, cross-check against deployed systems, or sensitivity analysis showing that the model captures timing, hardware interactions, proprietary protocol semantics, and runtime configurations that binary analysis cannot observe. If the model under-approximates feasible states, both the precision figure and the infeasibility refutations on OIS-RemedBench become unreliable.

    Authors: We acknowledge that the reliability of the reported precision and infeasibility refutations depends on the fidelity of the nine-component state model. The model is synthesized from established ICS operational semantics, timing diagrams, and protocol specifications drawn from standards and prior security literature. We agree that a sensitivity analysis and explicit discussion of assumptions would strengthen the presentation. In the revised manuscript we will add a subsection detailing the model's construction, include sensitivity analysis on parameters such as timing windows and configuration variability, and clarify that the 100% precision and 20% refutation figures are empirical results obtained on OIS-RemedBench under the stated model. We will also note the inherent limitations in observing proprietary hardware interactions from binary artifacts alone. revision: partial

  2. Referee: [Section 5 (Evaluation)] Evaluation section (Section 5): the benchmark results are presented without details on case selection criteria, how success and precision were measured, or error analysis for the three failed remediation cases. This information is required to assess whether the 88.9% success rate and zero false positives are robust or sensitive to benchmark construction.

    Authors: We agree that greater transparency on benchmark construction and measurement is necessary. The 15 cases were chosen to span firmware, protocol handlers, and ICS/PLC artifacts drawn from publicly documented vulnerability patterns and synthetic constructions that emulate opaque industrial environments. Success is defined as remediation that preserves behavioral coverage under CVA with no new issues introduced; precision is measured by the absence of false positives among candidates that pass OSVA. We will revise Section 5 to include explicit case-selection criteria, formal definitions of the success and precision metrics, and a dedicated error analysis for the three unsuccessful cases, highlighting the dominant factors such as timing dependencies and configuration complexity that exceeded current model coverage. revision: yes

standing simulated objections not resolved
  • Direct external validation or cross-check of the state model against live, proprietary critical-infrastructure deployments, which is precluded by access, safety, and legal constraints inherent to academic evaluation of opaque industrial systems.

Circularity Check

0 steps flagged

No significant circularity; results are direct experimental outcomes

full rationale

The paper describes SCARA as a four-stage pipeline evaluated directly on the OIS-RemedBench benchmark. Reported metrics (100% precision, 20% infeasible refutations, 88.9% success) are presented as observed experimental results rather than quantities derived from equations, fitted parameters, or self-referential definitions. No load-bearing self-citations, uniqueness theorems, or ansatzes appear in the provided text to support the core claims; the nine-component state model is introduced as a design element for OSVA without reducing to a fit or prior self-result. The derivation chain remains self-contained through explicit pipeline stages and benchmark evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the nine-component state model and SSCKG constraints are referenced but not detailed enough to classify.

pith-pipeline@v0.9.0 · 5835 in / 1186 out tokens · 32060 ms · 2026-05-20T04:50:54.462958+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 3 internal anchors

  1. [1]

    Executive order 14028 of may 12, 2021: Improving the nation’s cybersecu- rity

    Executive Office of the President. Executive order 14028 of may 12, 2021: Improving the nation’s cybersecu- rity. Federal Register, vol. 86, no. 93, pp. 26633–26647, May 2021. URL https://www.federalregister. gov/documents/2021/05/17/2021-10460/improving-the-nations-cybersecurity . Accessed: 2025- 01-15

  2. [2]

    NIST special publication 800-82 rev

    Keith Stouffer, Michael Pease, CheeYee Tang, Timothy Zimmerman, Victoria Pillitteri, Suzanne Lightman, Adam Hahn, Stephanie Saravia, Aslam Sherule, and Michael Thompson. NIST special publication 800-82 rev. 3: Guide to operational technology (OT) security. NIST Special Publication 800-82 Rev. 3, National Institute of Standards and Technology, September 20...

  3. [3]

    IEC 62443-4-1:2018: Security for industrial automation and control systems — part 4-1: Secure product development lifecycle requirements, 2018

    International Electrotechnical Commission. IEC 62443-4-1:2018: Security for industrial automation and control systems — part 4-1: Secure product development lifecycle requirements, 2018. URL https://webstore.iec. ch/en/publication/33615. International standard, Edition 1.0

  4. [4]

    Howie Huang

    Yuede Ji, Lei Cui, and H. Howie Huang. BugGraph: Differentiating source-binary code similarity with graph triplet-loss network. InProceedings of the 2021 ACM Asia Conference on Computer and Communications Security, pages 702–715. ACM, 2021. doi:10.1145/3433210.3437533

  5. [5]

    KARONTE: Detecting insecure multi-binary interactions in embedded firmware

    Nilo Redini, Aravind Machiry, Ruoyu Wang, Chad Spensky, Andrea Continella, Yan Shoshitaishvili, Christo- pher Kruegel, and Giovanni Vigna. KARONTE: Detecting insecure multi-binary interactions in embedded firmware. InProceedings of the 2020 IEEE Symposium on Security and Privacy, pages 1544–1561. IEEE, 2020. doi:10.1109/SP40000.2020.00036

  6. [6]

    Sharing more and checking less: Leveraging common input keywords to detect bugs in em- bedded systems

    Libo Chen, Yanhao Wang, Quanpu Cai, Yunfan Zhan, Hong Hu, Jiaqi Linghu, Qinsheng Hou, Chao Zhang, Haixin Duan, and Zhi Xue. Sharing more and checking less: Leveraging common input keywords to detect bugs in em- bedded systems. InProceedings of the 30th USENIX Security Symposium, pages 303–319. USENIX Association,

  7. [7]

    URLhttps://www.usenix.org/conference/usenixsecurity21/presentation/chen-libo

  8. [8]

    Securing the dark matter: A semantic-enhanced neuro-symbolic framework for supply chain analysis of opaque industrial software,

    Bowei Ning, Xuejun Zong, Lian Lian, Kan He, Yifei Sun, Yuxiang Lei, and Plamen Vasilev. Securing the dark matter: A semantic-enhanced neuro-symbolic framework for supply chain analysis of opaque industrial software,

  9. [9]

    URLhttps://arxiv.org/abs/2605.07737

  10. [10]

    SoK: Automated vulnerability repair: Methods, tools, and assessments

    Yiwei Hu, Zhen Li, Kedie Shu, Shenghua Guan, Deqing Zou, Shouhuai Xu, Bin Yuan, and Hai Jin. SoK: Automated vulnerability repair: Methods, tools, and assessments. InProceedings of the 34th USENIX Security Symposium, pages 4421–4440. USENIX Association, 2025. URL https://www.usenix.org/conference/ usenixsecurity25/presentation/hu-yiwei

  11. [11]

    SoK: Towards effective automated vul- nerability repair

    Ying Li, Faysal Hossain Shezan, Bomin Wei, Gang Wang, and Yuan Tian. SoK: Towards effective automated vul- nerability repair. InProceedings of the 34th USENIX Security Symposium, pages 4441–4462. USENIX Association,

  12. [12]

    URLhttps://www.usenix.org/conference/usenixsecurity25/presentation/li-ying

  13. [13]

    VulShield: Protecting vulnerable code before deploying patches

    Yuan Li, Chao Zhang, Jinhao Zhu, Penghui Li, Chenyang Li, Songtao Yang, and Wende Tan. VulShield: Protecting vulnerable code before deploying patches. InProceedings of the Network and Distributed System Security Symposium. Internet Society, 2025. doi:10.14722/ndss.2025.240298. URL https://www.ndss-symposium. org/ndss-paper/vulshield-protecting-vulnerable-...

  14. [14]

    Timperley, Yannic Noller, Claire Le Goues, and Abhik Roychoudhury

    Ridwan Shariffdeen, Christopher S. Timperley, Yannic Noller, Claire Le Goues, and Abhik Roychoudhury. Vulnerability repair via concolic execution and code mutations.ACM Transactions on Software Engineering and Methodology, 2025. doi:10.1145/3707454

  15. [15]

    Neural transfer learning for repairing se- curity vulnerabilities in C code.IEEE Transactions on Software Engineering, 49(1):147–165, 2023

    Zimin Chen, Steve Kommrusch, and Martin Monperrus. Neural transfer learning for repairing se- curity vulnerabilities in C code.IEEE Transactions on Software Engineering, 49(1):147–165, 2023. doi:10.1109/TSE.2022.3147265

  16. [16]

    VulRepair: A T5-based automated software vulnerability repair

    Michael Fu, Chakkrit Tantithamthavorn, Trung Le, Van Nguyen, and Dinh Phung. VulRepair: A T5-based automated software vulnerability repair. InProceedings of the 30th ACM Joint European Software Engineer- ing Conference and Symposium on the Foundations of Software Engineering, pages 935–947. ACM, 2022. doi:10.1145/3540250.3549098. 35 SCARAA PREPRINT

  17. [17]

    Out of sight, out of mind: Better automatic vulnerability repair by broadening input ranges and sources

    Xin Zhou, Kisub Kim, Bowen Xu, DongGyun Han, and David Lo. Out of sight, out of mind: Better automatic vulnerability repair by broadening input ranges and sources. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. ACM, 2024. doi:10.1145/3597503.3639222

  18. [18]

    Logs in, patches out: Automated vulnerability repair via tree-of-thought LLM analysis

    Youngjoon Kim, Sunguk Shin, Hyoungshick Kim, and Jiwon Yoon. Logs in, patches out: Automated vulnerability repair via tree-of-thought LLM analysis. InProceedings of the 34th USENIX Security Symposium, pages 4401–

  19. [19]

    URL https://www.usenix.org/conference/usenixsecurity25/ presentation/kim-youngjoon

    USENIX Association, 2025. URL https://www.usenix.org/conference/usenixsecurity25/ presentation/kim-youngjoon

  20. [20]

    APPATCH: Automated adaptive prompting large language models for real-world software vulnerability patching

    Yu Nong, Haoran Yang, Long Cheng, Hongxin Hu, and Haipeng Cai. APPATCH: Automated adaptive prompting large language models for real-world software vulnerability patching. InProceedings of the 34th USENIX Security Symposium, pages 4481–4500. USENIX Association, 2025. URL https://www.usenix.org/conference/ usenixsecurity25/presentation/nong

  21. [21]

    PATCHA- GENT: A practical program repair agent mimicking human expertise

    Zheng Yu, Ziyi Guo, Yuhang Wu, Jiahao Yu, Meng Xu, Dongliang Mu, Yan Chen, and Xinyu Xing. PATCHA- GENT: A practical program repair agent mimicking human expertise. InProceedings of the 34th USENIX Security Symposium, pages 4381–4400. USENIX Association, 2025. URL https://www.usenix.org/conference/ usenixsecurity25/presentation/yu-zheng

  22. [22]

    Bridging research and practice in simulation-based testing of industrial robot navigation systems,

    Xin-Cheng Wen, Zirui Lin, Yijun Yang, Cuiyun Gao, and Deheng Ye. Vul-R2: A reasoning LLM for automated vulnerability repair. InProceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering, pages 26–38. IEEE, 2025. doi:10.1109/ASE63991.2025.00011

  23. [23]

    What IF is not enough? fixing null pointer dereference with contextual check

    Yunlong Xing, Shu Wang, Shiyu Sun, Xu He, Kun Sun, and Qi Li. What IF is not enough? fixing null pointer dereference with contextual check. InProceedings of the 33rd USENIX Security Symposium, pages 1367–

  24. [24]

    URL https://www.usenix.org/conference/usenixsecurity24/ presentation/xing

    USENIX Association, 2024. URL https://www.usenix.org/conference/usenixsecurity24/ presentation/xing

  25. [25]

    Chen, Manuel Egele, Maverick Woo, and Maverick Brumley

    Daming D. Chen, Manuel Egele, Maverick Woo, and Maverick Brumley. Towards automated dynamic analysis for Linux-based embedded firmware. InProceedings of the Network and Distributed System Security Symposium. Inter- net Society, 2016. doi:10.14722/ndss.2016.23415. URL https://www.ndss-symposium.org/wp-content/ uploads/2017/09/towards-automated-dynamic-anal...

  26. [26]

    FirmAE: Towards large-scale emulation of IoT firmware for dynamic analysis

    Mingeun Kim, Dongkwan Kim, Eunsoo Kim, Suryeon Kim, Yeongjin Jang, and Yongdae Kim. FirmAE: Towards large-scale emulation of IoT firmware for dynamic analysis. InProceedings of the 36th Annual Computer Security Applications Conference, pages 733–745. ACM, 2020. doi:10.1145/3427228.3427294

  27. [27]

    Clements, Eric Gustafson, Tobias Scharnowski, Paul Grosen, David Fritz, Christopher Kruegel, Giovanni Vigna, Saurabh Bagchi, and Mathias Payer

    Abraham A. Clements, Eric Gustafson, Tobias Scharnowski, Paul Grosen, David Fritz, Christopher Kruegel, Giovanni Vigna, Saurabh Bagchi, and Mathias Payer. HALucinator: Firmware re-hosting through abstraction layer emulation. InProceedings of the 29th USENIX Security Symposium, pages 1201–1218. USENIX Association,

  28. [28]

    URLhttps://www.usenix.org/conference/usenixsecurity20/presentation/clements

  29. [29]

    Fuzzware: Using precise MMIO modeling for effective firmware fuzzing

    Tobias Scharnowski, Nils Bars, Moritz Schloegel, Eric Gustafson, Marius Muench, Giovanni Vigna, Christopher Kruegel, Thorsten Holz, and Ali Abbasi. Fuzzware: Using precise MMIO modeling for effective firmware fuzzing. InProceedings of the 31st USENIX Security Symposium, pages 1239–1256. USENIX Association, 2022. URL https://www.usenix.org/conference/useni...

  30. [30]

    FirmSolo: Enabling dynamic analysis of binary Linux-based IoT kernel modules

    Ioannis Angelakopoulos, Gianluca Stringhini, and Manuel Egele. FirmSolo: Enabling dynamic analysis of binary Linux-based IoT kernel modules. InProceedings of the 32nd USENIX Security Symposium, pages 5021–

  31. [31]

    URL https://www.usenix.org/conference/usenixsecurity23/ presentation/angelakopoulos

    USENIX Association, 2023. URL https://www.usenix.org/conference/usenixsecurity23/ presentation/angelakopoulos

  32. [32]

    Forming faster firmware fuzzers

    Lukas Seidel, Dominik Christian Maier, and Marius Muench. Forming faster firmware fuzzers. InProceedings of the 32nd USENIX Security Symposium, pages 2903–2920. USENIX Association, 2023. URL https://www. usenix.org/conference/usenixsecurity23/presentation/seidel

  33. [33]

    Moyne, and Z

    Mu Zhang, Chien-Ying Chen, Bin-Chou Kao, Yassine Qamsane, Yuru Shao, Yikai Lin, Elaine Shi, Sibin Mohan, Kira Barton, James R. Moyne, and Z. Morley Mao. Towards automated safety vetting of PLC code in real-world plants. InProceedings of the 2019 IEEE Symposium on Security and Privacy, pages 522–538. IEEE, 2019. doi:10.1109/SP.2019.00034

  34. [34]

    Symbolic execution of programmable logic controller code

    Shengjian Guo, Meng Wu, and Chao Wang. Symbolic execution of programmable logic controller code. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, pages 326–336. ACM, 2017. doi:10.1145/3106237.3106245

  35. [35]

    Automated test generation for IEC 61131-3 ST programs via dynamic symbolic execution.Science of Computer Programming, 206:102608, 2021

    Weigang He, Jianqi Shi, Ting Su, Zeyu Lu, Li Hao, and Yanhong Huang. Automated test generation for IEC 61131-3 ST programs via dynamic symbolic execution.Science of Computer Programming, 206:102608, 2021. doi:10.1016/j.scico.2021.102608. 36 SCARAA PREPRINT

  36. [36]

    ICSQuartz: Scan cycle-aware and vendor-agnostic fuzzing for industrial control systems

    Corban Villa, Constantine Doumanidis, Hithem Lamri, Prashant Hari Narayan Rajput, and Michail Maniatakos. ICSQuartz: Scan cycle-aware and vendor-agnostic fuzzing for industrial control systems. InProceedings of the Network and Distributed System Security Symposium. Internet Society, 2025. doi:10.14722/ndss.2025.240795

  37. [37]

    PLCverif: A tool to verify PLC programs based on model checking techniques

    Dániel Darvas, Enrique Blanco Viñuela, and Borja Fernández Adiego. PLCverif: A tool to verify PLC programs based on model checking techniques. InProceedings of the 15th International Conference on Accelerator and Large Experimental Physics Control Systems, pages 911–914. JACoW Publishing, 2015. doi:10.18429/JACoW- ICALEPCS2015-WEPGF092. URLhttps://jacow.o...

  38. [38]

    An iec 61131-3 compiler for the matplc

    Mário de Sousa and Adriano Carvalho. An iec 61131-3 compiler for the matplc. InEFTA 2003. 2003 IEEE Conference on Emerging Technologies and Factory Automation. Proceedings (Cat. No. 03TH8696), volume 1, pages 485–490. IEEE, 2003

  39. [39]

    Modeling and Dis- covering Vulnerabilities with Code Property Graphs,

    Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. Modeling and discovering vulnerabilities with code property graphs. InProceedings of the 2014 IEEE Symposium on Security and Privacy, pages 590–604. IEEE, 2014. doi:10.1109/SP.2014.44

  40. [40]

    MITRE ATT&CK for industrial control systems: ICS matrix, 2024

    MITRE Corporation. MITRE ATT&CK for industrial control systems: ICS matrix, 2024. URLhttps://attack. mitre.org/matrices/ics/. Accessed: 2025-01-15

  41. [41]

    National vulnerability database (NVD), 2024

    National Institute of Standards and Technology. National vulnerability database (NVD), 2024. URL https: //nvd.nist.gov/

  42. [42]

    SoK: (state of) the art of war: Offensive techniques in binary analysis

    Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. SoK: (state of) the art of war: Offensive techniques in binary analysis. InProceedings of the 2016 IEEE Symposium on Security and Privacy, pages 138–157. IEEE, 2016. doi:10.110...

  43. [43]

    Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. KLEE: Unassisted and automatic generation of high- coverage tests for complex systems programs. InProceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation, pages 209–224. USENIX Association, 2008. URL https://www.usenix. org/legacy/event/osdi08/tech/full_papers/cadar/cadar.pdf

  44. [44]

    Sentence-BERT: Sentence embeddings using Siamese BERT- networks

    Nils Reimers and Iryna Gurevych. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 3982–3992. Association for Computational Linguistics, 2019. doi:10.18653/v1/D19-1410

  45. [45]

    Retrowrite: Statically instrumenting cots binaries for fuzzing and sanitization

    Sushant Dinesh, Nathan Burow, Dongyan Xu, and Mathias Payer. Retrowrite: Statically instrumenting cots binaries for fuzzing and sanitization. In2020 IEEE Symposium on Security and Privacy (SP), pages 1497–1511. IEEE, 2020

  46. [46]

    Binary rewriting without control flow recovery

    Gregory J Duck, Xiang Gao, and Abhik Roychoudhury. Binary rewriting without control flow recovery. In Proceedings of the 41st ACM SIGPLAN conference on programming language design and implementation, pages 151–163, 2020

  47. [47]

    OpenPLC: An open source alternative to automation

    Thiago Rodrigues Alves, Mario Buratto, Flavio Mauricio de Souza, and Thelma Virginia Rodrigues. OpenPLC: An open source alternative to automation. InProceedings of the 2014 IEEE Global Humanitarian Technology Conference, pages 585–589. IEEE, 2014. doi:10.1109/GHTC.2014.6970342

  48. [48]

    Modbus application protocol specification V1.1b3, April 2012

    Modbus Organization. Modbus application protocol specification V1.1b3, April 2012. URL https://www. modbus.org/file/secure/modbusprotocolspecification.pdf. Published April 26, 2012

  49. [49]

    DNP3 technical bulletin TB2016-002: Address- ing deficiencies in DNP3-SAv5, 2016

    DNP Technical Committee. DNP3 technical bulletin TB2016-002: Address- ing deficiencies in DNP3-SAv5, 2016. URL https://www.witsprotocol.org/ 01-sep-2016-dnp3-technical-bulletin-tb2016-002-information-for-wits-members/ . Pub- lic information page; full bulletin available to DNP Users Group members

  50. [50]

    IEC 60870-5-104:2006: Telecontrol equipment and systems — part 5-104: Transmission protocols — network access for IEC 60870-5-101 using standard transport profiles, 2006

    International Electrotechnical Commission. IEC 60870-5-104:2006: Telecontrol equipment and systems — part 5-104: Transmission protocols — network access for IEC 60870-5-101 using standard transport profiles, 2006. URLhttps://webstore.iec.ch/en/publication/3746. International standard

  51. [51]

    IEC 61131-3:2013: Programmable controllers — part 3: Programming languages, 2013

    International Electrotechnical Commission. IEC 61131-3:2013: Programmable controllers — part 3: Programming languages, 2013. URL https://webstore.iec.ch/en/publication/4552. International standard, Third edition

  52. [52]

    APR4Vul: An empirical study of automatic program repair techniques on real-world java vulnerabilities.Empirical Software Engineering, 29(1):18, 2024

    Quang-Cuong Bui, Ranindya Paramitha, Duc-Ly Vu, Fabio Massacci, and Riccardo Scandariato. APR4Vul: An empirical study of automatic program repair techniques on real-world java vulnerabilities.Empirical Software Engineering, 29(1):18, 2024. doi:10.1007/s10664-023-10415-7. 37 SCARAA PREPRINT

  53. [53]

    A tool for checking ansi-c programs

    Edmund Clarke, Daniel Kroening, and Flavio Lerda. A tool for checking ansi-c programs. InInternational Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 168–176. Springer, 2004

  54. [54]

    Biometrics Bulletin 1, 80- 83,10.2307/3001968

    Frank Wilcoxon. Individual comparisons by ranking methods.Biometrics Bulletin, 1(6):80–83, 1945. doi:10.2307/3001968

  55. [55]

    Duck, Xiang Gao, and Abhik Roychoudhury

    Gregory J. Duck, Xiang Gao, and Abhik Roychoudhury. Binary rewriting without control flow recovery. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 151–164. ACM, 2020. doi:10.1145/3385412.3385972

  56. [56]

    Qwen3 Technical Report

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025. URL https: //arxiv.org/abs/2505.09388

  57. [57]

    DeepSeek-V3 Technical Report

    DeepSeek-AI. DeepSeek-V3 technical report.arXiv preprint arXiv:2412.19437, 2024. URL https://arxiv. org/abs/2412.19437

  58. [58]

    Z3 : An efficient SMT solver

    Leonardo de Moura and Nikolaj Bjørner. Z3: An efficient SMT solver. InProceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, volume 4963 ofLecture Notes in Computer Science, pages 337–340. Springer, 2008. doi:10.1007/978-3-540-78800-3_24

  59. [59]

    Lawrence Erlbaum Associates, Hillsdale, NJ, 2 edition, 1988

    Jacob Cohen.Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates, Hillsdale, NJ, 2 edition, 1988. ISBN 0805802835

  60. [60]

    Do automated fixes truly mitigate smart contract exploits? IEEE Transactions on Software Engineering, 52(1):100–115, 2026

    Sofia Bobadilla, Monica Jin, and Martin Monperrus. Do automated fixes truly mitigate smart contract exploits? IEEE Transactions on Software Engineering, 52(1):100–115, 2026. doi:10.1109/TSE.2025.3618123

  61. [61]

    P2IM: Scalable and hardware-independent firmware testing via auto- matic peripheral interface modeling

    Bo Feng, Alejandro Mera, and Long Lu. P2IM: Scalable and hardware-independent firmware testing via auto- matic peripheral interface modeling. InProceedings of the 29th USENIX Security Symposium, pages 1237–

  62. [62]

    URL https://www.usenix.org/conference/usenixsecurity20/ presentation/feng

    USENIX Association, 2020. URL https://www.usenix.org/conference/usenixsecurity20/ presentation/feng

  63. [63]

    Impact assessment of third-party library vulnerabilities through vulnerability reachability analysis.Computers & Security, page 104546, 2025

    Zhizhuang Jia, Chao Yang, Pengbin Feng, Xiaoyun Zhao, Xinghua Li, and Jianfeng Ma. Impact assessment of third-party library vulnerabilities through vulnerability reachability analysis.Computers & Security, page 104546, 2025

  64. [64]

    ICSFuzz: Manipulating I/Os and repurposing bi- nary code to enable instrumented fuzzing in ICS control applications

    Dimitrios Tychalas, Hadjer Benkraouda, and Michail Maniatakos. ICSFuzz: Manipulating I/Os and repurposing bi- nary code to enable instrumented fuzzing in ICS control applications. InProceedings of the 30th USENIX Security Symposium, pages 2847–2862. USENIX Association, 2021. URL https://www.usenix.org/conference/ usenixsecurity21/presentation/tychalas. 38