PLC-BinX: A Cross-Platform Binary Code Analysis Framework for PLC Binaries
Pith reviewed 2026-06-30 19:26 UTC · model grok-4.3
The pith
PLC-BinX applies a three-stage workflow to convert PLC binaries from four platforms into function-level semantic representations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PLC-BinX applies a three-stage PLC binary analysis workflow, including cross-platform reverse engineering, core function identification, and function-level semantic representation, to analyze PLC binaries from four platforms: CODESYS v3, GEB, OpenPLC v2, and OpenPLC v3. We evaluate PLC-BinX on PLC-BEAD, which contains 2,431 PLC binaries across four platforms, using two downstream tasks: toolchain prediction and functionality prediction. Experimental results show that PLC-BinX achieves 100.00% precision, recall, and F1 in toolchain prediction, and 51.43% precision, 49.38% recall, and 49.18% F1 in functionality prediction over 22 labels.
What carries the argument
The three-stage workflow of cross-platform reverse engineering, core function identification, and function-level semantic representation.
If this is right
- Raw PLC binaries from the four tested platforms can be turned into representations that support perfect toolchain identification.
- The same representations allow functionality classification at nearly 50 percent F1 across 22 labels.
- The approach directly enables downstream security tasks such as deployed-binary auditing for industrial control systems.
- The workflow produces usable function-level features without requiring source code or platform-specific manual tuning.
Where Pith is reading between the lines
- The representations may transfer to other embedded control devices that share similar binary heterogeneity.
- Further downstream tasks such as vulnerability detection or malware identification could be tested on the same extracted features.
- Accuracy on functionality prediction leaves open the possibility that additional function identification rules would raise the scores.
Load-bearing premise
The three-stage workflow of cross-platform reverse engineering, core function identification, and function-level semantic representation is sufficient to overcome heterogeneous formats, entangled program semantics, and limited semantic representations.
What would settle it
Running the same toolchain and functionality prediction tasks on a fresh collection of PLC binaries from a fifth platform or with a different set of 22 functionality labels would show whether the reported precision, recall, and F1 scores are preserved.
Figures
read the original abstract
As emerging attacks increasingly target Industrial Control Systems (ICS), the security of Programmable Logic Controllers (PLCs) has become a critical concern. Binary Code Analysis (BCA), which enables analysts to analyze compiled programs, is essential for ICS security tasks such as deployed-binary auditing. However, automated BCA for PLC binaries remains challenging due to three key issues: heterogeneous binary formats across PLC platforms, entangled program semantics with runtime code, and limited semantic representations for downstream tasks. To resolve these challenges, we present PLC-BinX, a cross-platform BCA framework for PLC binaries. PLC-BinX applies a three-stage PLC binary analysis workflow, including cross-platform reverse engineering, core function identification, and function-level semantic representation, to analyze PLC binaries from four platforms: CODESYS v3, GEB, OpenPLC v2, and OpenPLC v3. We evaluate PLC-BinX on PLC-BEAD, which contains 2,431 PLC binaries across four platforms, using two downstream tasks: toolchain prediction and functionality prediction. Experimental results show that PLC-BinX achieves 100.00% precision, recall, and F1 in toolchain prediction, and 51.43% precision, 49.38% recall, and 49.18% F1 in functionality prediction over 22 labels. These results demonstrate that PLC-BinX can transform raw PLC binaries into effective function-level semantic representations for PLC binary code analysis.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces PLC-BinX, a cross-platform binary code analysis framework for PLC binaries that uses a three-stage workflow (cross-platform reverse engineering, core function identification, and function-level semantic representation) to address heterogeneous formats, entangled semantics, and limited representations. It is evaluated on the PLC-BEAD dataset of 2,431 binaries from four platforms (CODESYS v3, GEB, OpenPLC v2, OpenPLC v3) on two tasks: toolchain prediction (reported 100.00% precision/recall/F1) and functionality prediction (51.43% precision, 49.38% recall, 49.18% F1 over 22 labels). The central claim is that the framework transforms raw PLC binaries into effective function-level semantic representations.
Significance. If the results hold under proper validation, the work could contribute to ICS security by enabling binary analysis across PLC platforms and by releasing the PLC-BEAD dataset. The three-stage workflow targets real challenges in the domain. However, the reported metrics do not yet establish that the semantic stages are load-bearing or superior to simpler methods.
major comments (3)
- [Abstract, Evaluation] Abstract and Evaluation section: The 100.00% precision/recall/F1 on toolchain prediction does not validate the full workflow or the semantic representation stages. Toolchain prediction is equivalent to platform identification, which can be performed by format detection (headers, magic bytes, or string constants) in the cross-platform reverse engineering stage alone; the core function identification and function-level semantic representation stages are not shown to be necessary for this result.
- [Evaluation] Evaluation section: The functionality prediction task reports modest performance (51.43% precision, 49.38% recall, 49.18% F1 over 22 labels) but provides no baseline comparisons, ablations isolating the contribution of each workflow stage, or error analysis. Without these, it is not possible to determine whether the semantic representations are effective or whether the workflow overcomes entangled program semantics.
- [Abstract, §3] Abstract and §3 (Workflow): The claim that the three-stage workflow is sufficient to overcome heterogeneous formats, entangled semantics, and limited representations rests on the assumption that core function identification and semantic representation add value beyond the first stage, but no evidence (e.g., stage-wise performance or comparison to format-only baselines) is supplied to support this.
minor comments (2)
- [Abstract] The abstract states concrete performance numbers but does not describe dataset construction criteria, label definitions for the 22 functionality classes, or the exact method used to obtain the reported metrics.
- [Evaluation] Notation for the semantic representation vectors and the downstream classifiers is not introduced in the provided abstract; this should be clarified in the main text for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment point-by-point below, agreeing with the need for clarification and additional analysis where the current evaluation falls short, and outlining the revisions we will make.
read point-by-point responses
-
Referee: [Abstract, Evaluation] Abstract and Evaluation section: The 100.00% precision/recall/F1 on toolchain prediction does not validate the full workflow or the semantic representation stages. Toolchain prediction is equivalent to platform identification, which can be performed by format detection (headers, magic bytes, or string constants) in the cross-platform reverse engineering stage alone; the core function identification and function-level semantic representation stages are not shown to be necessary for this result.
Authors: We agree that the 100% toolchain prediction result can be achieved via format detection in the first stage alone and does not demonstrate the necessity of the later stages. This metric primarily validates the cross-platform reverse engineering component of the workflow. The full three-stage approach is motivated by the functionality prediction task. We will revise the abstract and evaluation section to explicitly distinguish the role of each stage and clarify that toolchain prediction validates the initial reverse engineering process rather than the semantic stages. revision: yes
-
Referee: [Evaluation] Evaluation section: The functionality prediction task reports modest performance (51.43% precision, 49.38% recall, 49.18% F1 over 22 labels) but provides no baseline comparisons, ablations isolating the contribution of each workflow stage, or error analysis. Without these, it is not possible to determine whether the semantic representations are effective or whether the workflow overcomes entangled program semantics.
Authors: We acknowledge that the evaluation lacks baseline comparisons, ablations, and error analysis, making it difficult to isolate the contribution of the core function identification and semantic representation stages. In the revised manuscript, we will add baseline experiments (e.g., format-only or first-stage-only approaches), ablation studies removing individual stages, and error analysis across the 22 labels to better demonstrate whether the semantic stages address entangled semantics. revision: yes
-
Referee: [Abstract, §3] Abstract and §3 (Workflow): The claim that the three-stage workflow is sufficient to overcome heterogeneous formats, entangled semantics, and limited representations rests on the assumption that core function identification and semantic representation add value beyond the first stage, but no evidence (e.g., stage-wise performance or comparison to format-only baselines) is supplied to support this.
Authors: We recognize that the manuscript does not currently provide stage-wise performance or format-only baseline comparisons to support the added value of the later stages. We will incorporate these analyses into the evaluation section of the revised manuscript, including stage-wise metrics and direct comparisons to format-only methods, to substantiate the claim that the full workflow is required for effective function-level semantic representations. revision: yes
Circularity Check
No circularity; empirical results on external dataset
full rationale
The paper describes a three-stage framework evaluated via direct experiments on the PLC-BEAD dataset (2,431 binaries) for toolchain and functionality prediction tasks. No equations, fitted parameters, self-definitional constructs, or load-bearing self-citations are present in the provided text. Claims reduce to reported precision/recall/F1 metrics rather than any derivation that collapses to inputs by construction. This matches the default case of a self-contained empirical paper against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The four listed platforms and the PLC-BEAD collection sufficiently capture the heterogeneity, entanglement, and representation challenges described.
Reference graph
Works this paper leans on
-
[1]
Stuxnet: What has changed?
D. E. Denning, “Stuxnet: What has changed?”Future Internet, vol. 4, no. 3, pp. 672–687, 2012
2012
-
[2]
Backdooring CODESYS applications via vulnerability chaining,
Nozomi Networks Labs, “Backdooring CODESYS applications via vulnerability chaining,” Nozomi Networks Blog, Apr. 2026, accessed: 2026-05-08. [Online]. Available: https://www.nozominetworks.com/ blog/backdooring-codesys-applications-via-vulnerability-chaining
2026
-
[3]
Bridging the binary analysis gap: A cross-compiler dataset and neural framework for industrial control systems,
Y . G. Achamyeleh, S.-Y . Yu, G. Q. Araya, and M. A. Al Faruque, “Bridging the binary analysis gap: A cross-compiler dataset and neural framework for industrial control systems,” inProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, 2025, pp. 5260–5269
2025
-
[4]
ICSREF: A framework for automated reverse engineering of industrial control systems binaries,
A. Keliris and M. Maniatakos, “ICSREF: A framework for automated reverse engineering of industrial control systems binaries,” in26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, February 24-27, 2019. The Internet Society, 2019
2019
-
[5]
IEC 61131-3 programming languages,
PLCopen, “IEC 61131-3 programming languages,” Technical documen- tation, 2013. [Online]. Available: https://plcopen.org/iec-61131-3
2013
-
[6]
Generating application code,
CODESYS GmbH, “Generating application code,” https: //content.helpme-codesys.com/en/CODESYS%20Development% 20System/ cds creating application code for plc.html, accessed: 2026-05-08
2026
-
[7]
Creating a boot application,
CODESYS, “Creating a boot application,” https://content. helpme-codesys.com/en/CODESYS%20Development%20System/ cds creating a boot application.html, accessed: 2026-05-08
2026
-
[8]
MATIEC: IEC 61131-3 compiler,
OpenPLC, “MATIEC: IEC 61131-3 compiler,” https://openplcproject. gitlab.io/matiec/, accessed: 2026-05-08
2026
-
[9]
The Future of Industrial Automation,
AUTONOMY, “The Future of Industrial Automation,” https:// autonomylogic.com/, accessed: 2026-05-11
2026
-
[10]
Capstone: Next-gen disassembly framework,
N. A. Quynh, “Capstone: Next-gen disassembly framework,”Black Hat USA, vol. 5, no. 2, pp. 3–8, 2014
2014
-
[11]
IDA Pro disassembler and debugger,
Hex-Rays, “IDA Pro disassembler and debugger,” Software, 2024. [Online]. Available: https://hex-rays.com/ida-pro/ 14
2024
-
[12]
Attention is all you need,
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdvances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, 2017, pp. 5998–6008
2017
-
[13]
Inductive representation learning on large graphs,
W. L. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” inAdvances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, 2017, pp. 1024–1034
2017
-
[14]
Ghidra software reverse engineering framework,
National Security Agency, “Ghidra software reverse engineering framework,” Software, 2019. [Online]. Available: https://ghidra-sre.org/
2019
-
[15]
Sok:(state of) the art of war: Offensive techniques in binary analysis,
Y . Shoshitaishvili, R. Wang, C. Salls, N. Stephens, M. Polino, A. Dutcher, J. Grosen, S. Feng, C. Hauser, C. Kruegelet al., “Sok:(state of) the art of war: Offensive techniques in binary analysis,” in2016 IEEE symposium on security and privacy (SP). IEEE, 2016, pp. 138–157
2016
-
[16]
An {In-Depth}analysis of disassembly on{Full-Scale}x86/x64 binaries,
D. Andriesse, X. Chen, V . Van Der Veen, A. Slowinska, and H. Bos, “An {In-Depth}analysis of disassembly on{Full-Scale}x86/x64 binaries,” in25th USENIX security symposium (USENIX security 16), 2016, pp. 583–600
2016
-
[17]
XDA: accurate, robust disassembly with transfer learning,
K. Pei, J. Guan, D. Williams-King, J. Yang, and S. Jana, “XDA: accurate, robust disassembly with transfer learning,” in28th Annual Network and Distributed System Security Symposium, NDSS 2021, virtually, February 21-25, 2021. The Internet Society, 2021
2021
-
[18]
Denial of engineering operations attacks in industrial control systems,
S. Senthivel, S. Dhungana, H. Yoo, I. Ahmed, and V . Roussev, “Denial of engineering operations attacks in industrial control systems,” in Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy, 2018, pp. 319–329
2018
-
[19]
Automated reconstruction of control logic for programmable logic controller forensics,
S. A. Qasim, J. Lopez Jr, and I. Ahmed, “Automated reconstruction of control logic for programmable logic controller forensics,” inIn- ternational Conference on Information Security. Springer, 2019, pp. 402–422
2019
-
[20]
From control application to control logic: Plc decompile framework for industrial control system,
C. Sang, J. Wu, J. Li, and M. Guizani, “From control application to control logic: Plc decompile framework for industrial control system,” IEEE Transactions on Information Forensics and Security, vol. 19, pp. 8685–8700, 2024
2024
-
[21]
Control logic attack detection and forensics through reverse-engineering and verifying plc control applications,
Y . Geng, X. Che, R. Ma, Q. Wei, M. Wang, and Y . Chen, “Control logic attack detection and forensics through reverse-engineering and verifying plc control applications,”IEEE Internet of Things Journal, vol. 11, no. 5, pp. 8386–8400, 2023
2023
-
[22]
Towards plc-specific binary analysis tools: An investigation of codesys- compiled plc software applications,
H. Benkraouda, A. Agrawal, D. Tychalas, M. Sazos, and M. Maniatakos, “Towards plc-specific binary analysis tools: An investigation of codesys- compiled plc software applications,” inProceedings of the 5th Workshop on CPS&IoT Security and Privacy, 2023, pp. 83–89
2023
-
[23]
A survey of binary code similarity,
I. U. Haq and J. Caballero, “A survey of binary code similarity,”Acm computing surveys (csur), vol. 54, no. 3, pp. 1–38, 2021
2021
-
[24]
How machine learning is solving the binary function similarity problem,
A. Marcelli, M. Graziano, X. Ugarte-Pedrero, Y . Fratantonio, M. Man- souri, and D. Balzarotti, “How machine learning is solving the binary function similarity problem,” in31st USENIX Security Symposium (USENIX Security 22), 2022, pp. 2099–2116
2022
-
[25]
Discovre: Efficient cross-architecture identification of bugs in binary code
S. Eschweiler, K. Yakdan, E. Gerhards-Padillaet al., “Discovre: Efficient cross-architecture identification of bugs in binary code.” inNdss, vol. 52, 2016, pp. 58–79
2016
-
[26]
Scalable graph-based bug search for firmware images,
Q. Feng, R. Zhou, C. Xu, Y . Cheng, B. Testa, and H. Yin, “Scalable graph-based bug search for firmware images,” inProceedings of the 2016 ACM SIGSAC conference on computer and communications security, 2016, pp. 480–491
2016
-
[27]
Neural network- based graph embedding for cross-platform binary code similarity detec- tion,
X. Xu, C. Liu, Q. Feng, H. Yin, L. Song, and D. Song, “Neural network- based graph embedding for cross-platform binary code similarity detec- tion,” inProceedings of the 2017 ACM SIGSAC conference on computer and communications security, 2017, pp. 363–376
2017
-
[28]
Vulseeker: A semantic learning based vulnerability seeker for cross-platform binary,
J. Gao, X. Yang, Y . Fu, Y . Jiang, and J. Sun, “Vulseeker: A semantic learning based vulnerability seeker for cross-platform binary,” inPro- ceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 2018, pp. 896–899
2018
-
[29]
Order matters: Semantic-aware neural networks for binary code similarity detection,
Z. Yu, R. Cao, Q. Tang, S. Nie, J. Huang, and S. Wu, “Order matters: Semantic-aware neural networks for binary code similarity detection,” inProceedings of the AAAI conference on artificial intelligence, vol. 34, no. 01, 2020, pp. 1145–1152
2020
-
[30]
Deepbindiff: Learning program- wide code representations for binary diffing,
Y . Duan, X. Li, J. Wang, and H. Yin, “Deepbindiff: Learning program- wide code representations for binary diffing,” in27th Annual Network and Distributed System Security Symposium, NDSS 2020, San Diego, California, USA, February 23-26, 2020. The Internet Society, 2020
2020
-
[31]
Safe: Self-attentive function embeddings for binary similarity,
L. Massarelli, G. A. Di Luna, F. Petroni, R. Baldoni, and L. Querzoni, “Safe: Self-attentive function embeddings for binary similarity,” in International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 2019, pp. 309–329
2019
-
[32]
Asm2vec: Boosting static representation robustness for binary clone search against code obfusca- tion and compiler optimization,
S. H. Ding, B. C. Fung, and P. Charland, “Asm2vec: Boosting static representation robustness for binary clone search against code obfusca- tion and compiler optimization,” in2019 ieee symposium on security and privacy (sp). IEEE, 2019, pp. 472–489
2019
-
[33]
Palmtree: Learning an assembly language model for instruction embedding,
X. Li, Y . Qu, and H. Yin, “Palmtree: Learning an assembly language model for instruction embedding,” inProceedings of the 2021 ACM SIGSAC conference on computer and communications security, 2021, pp. 3236–3251
2021
-
[34]
Jtrans: Jump-aware transformer for binary code similarity detection,
H. Wang, W. Qu, G. Katz, W. Zhu, Z. Gao, H. Qiu, J. Zhuge, and C. Zhang, “Jtrans: Jump-aware transformer for binary code similarity detection,” inProceedings of the 31st ACM SIGSOFT international symposium on software testing and analysis, 2022, pp. 1–13
2022
-
[35]
Trex: Learning execution semantics from micro-traces for binary similarity,
K. Pei, Z. Xuan, J. Yang, S. Jana, and B. Ray, “Trex: Learning execution semantics from micro-traces for binary similarity,”arXiv preprint arXiv:2012.08680, 2020
-
[36]
1-to-1 or 1-to-n? investigating the effect of function inlining on binary similarity analysis,
A. Jia, M. Fan, W. Jin, X. Xu, Z. Zhou, Q. Tang, S. Nie, S. Wu, and T. Liu, “1-to-1 or 1-to-n? investigating the effect of function inlining on binary similarity analysis,”ACM Transactions on Software Engineering and Methodology, vol. 32, no. 4, pp. 1–26, 2023
2023
-
[37]
Cross-inlining binary function similarity detection,
A. Jia, M. Fan, X. Xu, W. Jin, H. Wang, and T. Liu, “Cross-inlining binary function similarity detection,” inProceedings of the IEEE/ACM 46th international conference on software engineering, 2024, pp. 1–13
2024
-
[38]
{ICSFuzz}: Ma- nipulating{I/Os}and repurposing binary code to enable instrumented fuzzing in{ICS}control applications,
D. Tychalas, H. Benkraouda, and M. Maniatakos, “{ICSFuzz}: Ma- nipulating{I/Os}and repurposing binary code to enable instrumented fuzzing in{ICS}control applications,” in30th USENIX Security Sym- posium (USENIX Security 21), 2021, pp. 2847–2862
2021
-
[39]
Fieldfuzz: In situ blackbox fuzzing of proprietary industrial automation runtimes via the network,
A. Bytes, P. H. N. Rajput, C. Doumanidis, M. Maniatakos, J. Zhou, and N. O. Tippenhauer, “Fieldfuzz: In situ blackbox fuzzing of proprietary industrial automation runtimes via the network,” inProceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses, 2023, pp. 499–512
2023
-
[40]
Icsquartz: Scan cycle-aware and vendor-agnostic fuzzing for industrial control systems
C. Villa, C. Doumanidis, H. Lamri, P. H. N. Rajput, and M. Maniatakos, “Icsquartz: Scan cycle-aware and vendor-agnostic fuzzing for industrial control systems.” inNDSS, 2025
2025
-
[41]
Plcverif: Status of a formal verification tool for programmable logic controller,
I. D. Lopez-Miguel, J.-C. Tournier, and B. F. Adiego, “Plcverif: Status of a formal verification tool for programmable logic controller,”arXiv preprint arXiv:2203.17253, 2022
-
[42]
Binary-level formal verification based automatic security ensurement for plc in industrial iot,
X. Zhang, J. Li, J. Wu, G. Chen, Y . Meng, H. Zhu, and X. Zhang, “Binary-level formal verification based automatic security ensurement for plc in industrial iot,”IEEE Transactions on Dependable and Secure Computing, vol. 22, no. 3, pp. 2211–2226, 2024
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.