arxiv: 2604.15390 · v3 · submitted 2026-04-16 · 💻 cs.SE · cs.AI

Recognition: unknown

Analyzing Chain of Thought (CoT) Approaches in Control Flow Code Deobfuscation Tasks

Seyedreza Mohseni , Sarvesh Baskar , Edward Raff , Manas Gaur

Authors on Pith no claims yet

Pith reviewed 2026-05-10 11:21 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords code deobfuscationchain of thought promptingcontrol flow obfuscationlarge language modelscontrol flow graph reconstructionprogram semantics preservationreverse engineeringopaque predicates

0 comments

The pith

Chain-of-thought prompting improves large language models' recovery of control flow and semantics in obfuscated C code.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether step-by-step reasoning prompts help large language models undo control-flow obfuscation such as flattening and opaque predicates. It evaluates five models on standard C benchmarks, measuring both how well the original control-flow graph is reconstructed and how faithfully the program's behavior is preserved through output matching. CoT prompting raises performance over direct prompting, with the largest model showing average gains of 16 percent on graph recovery and 20.5 percent on semantic preservation. A reader would care because manual deobfuscation is slow and costly, and better automated assistance could shorten reverse-engineering time for security and maintenance tasks.

Core claim

The paper establishes that Chain-of-Thought prompting, which directs the model through explicit reasoning steps for code analysis, produces higher-quality deobfuscated output than zero-shot prompting. Across the tested models and obfuscation types, this yields clearer control-flow graphs and outputs that match the original program on test inputs more closely. GPT5 records the strongest results, with roughly 16 percent better control-flow graph reconstruction and 20.5 percent better semantic preservation than simple prompting. Results also vary with the original program's control-flow complexity and the specific obfuscator used.

What carries the argument

Chain-of-Thought (CoT) prompting, which supplies the model with explicit, task-specific reasoning steps before producing the deobfuscated code.

If this is right

CoT prompting reduces reliance on manual inspection for recovering readable control flow from flattened or predicate-protected code.
Model performance scales with both the degree of obfuscation and the intrinsic complexity of the original control-flow graph.
The approach supplies more faithful graph reconstructions and behavior-preserving code than direct prompting alone.
CoT can serve as an assistive layer that shortens the time needed for reverse engineering while improving code explainability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same prompting pattern could be tried on other obfuscation families such as data obfuscation or virtualization-based protection.
Hybrid pipelines that feed CoT output into existing static-analysis tools might further raise reconstruction accuracy.
Testing on additional languages and larger, real malware samples would show how far the observed gains extend.

Load-bearing premise

That matching program outputs on a fixed set of test inputs is enough to confirm the deobfuscated code still carries the original semantics.

What would settle it

An obfuscated program where the CoT output matches the original on every test input yet produces different behavior on an untested input, or a real-world control-flow-obfuscated binary outside the chosen C benchmark set on which CoT fails to recover the original graph structure.

Figures

Figures reproduced from arXiv: 2604.15390 by Edward Raff, Manas Gaur, Sarvesh Baskar, Seyedreza Mohseni.

**Figure 2.** Figure 2: Control Flow Graph (CFG) of (a) original code, (b) Obfuscated code by Tigress with Control Flow Flattening (CFF), and (c) Deobfuscated code by DeepSeek-V2, which shows some sort of hallucination. analysis, because their defining characteristic is a branch condition whose truth value is fixed at construction time yet syntactically indistinguishable from a legitimate runtime decision; careful propagation of… view at source ↗

**Figure 3.** Figure 3: BLEU and SRS scores for Group 2 experiment 2. obfuscated using the open source tools Tigress and OLLVM. Before being provided to the models, we remove all code names and comments so that the models cannot rely on high-level hints about the algorithm’s type or structure. We evaluate our deobfuscation pipeline on a fixed pool of 12 standard C benchmark programs that cover a wide range of control-flow graph … view at source ↗

**Figure 5.** Figure 5: Average BLEU and SRS scores across models. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: (Left) Original snippet code. (Right) Deobfuscated snippet code by DeepSeek-V2. Headers and Node structure declaration are missing. Original code is obfuscated by Tigress [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Illustrative semantic hallucination during code deobfuscation. The original function computes the height of a [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

read the original abstract

Code deobfuscation is the task of recovering a readable version of a program while preserving its original behavior. In practice, this often requires days or even months of manual work with complex and expensive analysis tools. In this paper, we explore an alternative approach based on Chain-of-Thought (CoT) prompting, where a large language model is guided through explicit, step-by-step reasoning tailored for code analysis. We focus on control flow obfuscation, including Control Flow Flattening (CFF), Opaque Predicates, and their combination, and we measure both structural recovery of the control flow graph and preservation of program semantics. We evaluate five state-of-the-art large language models and show that CoT prompting significantly improves deobfuscation quality compared with simple prompting. We validate our approach on a diverse set of standard C benchmarks and report results using both structural metrics for control flow graphs and semantic metrics based on output similarity. Among the tested models and by applying CoT, GPT5 achieves the strongest overall performance, with an average gain of about 16% in control-flow graph reconstruction and about 20.5% in semantic preservation across our benchmarks compared to zero-shot prompting. Our results also show that model performance depends not only on the obfuscation level and the chosen obfuscator but also on the intrinsic complexity of the original control flow graph. Collectively, these findings suggest that CoT-guided large language models can serve as effective assistants for code deobfuscation, providing improved code explainability, more faithful control flow graph reconstruction, and better preservation of program behavior while potentially reducing the manual effort needed for reverse engineering.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CoT prompting lifts deobfuscation numbers on C benchmarks but the semantic metric rests on output matching that may miss real behavioral differences.

read the letter

The main takeaway is that chain-of-thought prompting improves LLM performance on recovering control-flow graphs and keeping semantics in deobfuscated C code. The paper reports roughly 16% better CFG reconstruction and 20.5% better semantic scores with CoT over zero-shot, and GPT5 leads the models they tested. Those are the concrete numbers to take away first. What is actually new is the narrow application of an established prompting trick to control-flow flattening, opaque predicates, and their mix. They run the comparison on standard C benchmarks and track both structural CFG recovery and a semantic check based on output similarity. That combination and the reported gains do not look like prior published results in this exact framing. The work does a straightforward job of showing that step-by-step guidance helps the model on this reverse-engineering subtask and that performance varies with obfuscation strength and the original code's complexity. The setup is practical and easy to understand. The soft spots sit mainly in the evaluation. Semantic preservation is scored by output similarity on the chosen test inputs, yet that proxy can overlook path divergences, side effects, or non-determinism that the tests never reach. The abstract supplies no numbers on test-suite coverage, input count, tolerance thresholds, or differential testing, so the 20% semantic gain is difficult to interpret with certainty. No statistical tests or run-to-run variance appear either. If the full paper adds those details the concern shrinks; on the supplied text it remains the load-bearing assumption. This paper is for people already working on LLM tools for software security or maintenance who want a data point on prompting for deobfuscation. A reader who needs quick empirical guidance on CoT for code tasks will get some value from the model comparisons and the noted dependencies. It deserves a serious referee because the question is well-posed, the experiments are reproducible in principle, and the topic is timely. A review would likely focus on tightening the semantic metric and adding coverage details rather than rejecting the core claim. I would send it to peer review instead of desk-rejecting it.

Referee Report

2 major / 0 minor

Summary. The manuscript claims that Chain-of-Thought (CoT) prompting substantially improves large language model performance on control-flow deobfuscation tasks (Control Flow Flattening, Opaque Predicates, and combinations) relative to zero-shot prompting. It evaluates five state-of-the-art LLMs on standard C benchmarks, reports that GPT5 with CoT yields average gains of ~16% in control-flow graph reconstruction and ~20.5% in semantic preservation (measured by output similarity), and notes that results vary with obfuscation level and original CFG complexity.

Significance. If the evaluation protocol and semantic metric are shown to be reliable, the work would provide useful empirical evidence that structured prompting can assist automated deobfuscation and reduce manual reverse-engineering effort. The observation that LLM success depends on both obfuscation technique and intrinsic program complexity offers a practical insight for tool builders in software security and program analysis.

major comments (2)

Abstract: the central quantitative claims (~16% CFG reconstruction gain and ~20.5% semantic preservation gain for GPT5) are stated without any description of the experimental protocol, number or identity of benchmarks, number of runs, statistical tests, or error bars, preventing assessment of whether the reported improvements are robust or reproducible.
Abstract: semantic preservation is defined solely via 'output similarity' on test inputs, yet no information is supplied on test-suite coverage, equivalence tolerance, handling of side effects, memory layout, or non-determinism. This proxy is load-bearing for the claim of improved deobfuscation quality, especially under CFF and opaque predicates where unexercised paths may diverge.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract's clarity and the need for more experimental context. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: Abstract: the central quantitative claims (~16% CFG reconstruction gain and ~20.5% semantic preservation gain for GPT5) are stated without any description of the experimental protocol, number or identity of benchmarks, number of runs, statistical tests, or error bars, preventing assessment of whether the reported improvements are robust or reproducible.

Authors: We agree that the abstract would benefit from additional context on the evaluation setup to support assessment of the claims. In the revised version, we will expand the abstract to note that the results are based on a diverse set of standard C benchmarks, five state-of-the-art LLMs, and averages across multiple runs per configuration, with full details on the protocol, benchmark identities, and any statistical measures provided in the experimental section of the paper. This revision will be made while respecting abstract length limits. revision: yes
Referee: Abstract: semantic preservation is defined solely via 'output similarity' on test inputs, yet no information is supplied on test-suite coverage, equivalence tolerance, handling of side effects, memory layout, or non-determinism. This proxy is load-bearing for the claim of improved deobfuscation quality, especially under CFF and opaque predicates where unexercised paths may diverge.

Authors: We acknowledge that the abstract's description of the semantic metric is high-level. The full manuscript provides details on the test inputs, coverage considerations, and handling of program behavior in the evaluation methodology section. We will revise the abstract to clarify that semantic preservation is assessed via output equivalence on test suites targeting the original program's observable behavior, including notes on equivalence tolerance and non-determinism where relevant. This addresses the concern without altering the core metric, which we maintain is suitable for verifying deobfuscation quality on exercised paths. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical comparison of prompting strategies

full rationale

The paper reports experimental results from evaluating five LLMs on control-flow deobfuscation benchmarks using CoT versus zero-shot prompting. It measures CFG reconstruction accuracy and semantic preservation via output similarity on standard C programs. No equations, derivations, fitted parameters, uniqueness theorems, or self-citations are used to derive the central claims. All quantitative gains (e.g., 16% CFG, 20.5% semantic) are direct empirical observations from the described test suite. The work is self-contained against external benchmarks and contains no load-bearing steps that reduce to prior fitted quantities or self-referential definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical prompting study with no mathematical model, free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5609 in / 1076 out tokens · 44773 ms · 2026-05-10T11:21:06.477379+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 9 canonical work pages · 1 internal anchor

[1]

Protecting software through obfuscation: Can it keep pace with progress in code analysis?

S. Schrittwieser, S. Katzenbeisser, J. Kinder, G. Merzdovnik, and E. Weippl, “Protecting software through obfuscation: Can it keep pace with progress in code analysis?”Acm computing surveys (csur), vol. 49, no. 1, pp. 1–37, 2016

2016
[2]

Obfuscation of executable code to improve resistance to static disassembly,

C. Linn and S. Debray, “Obfuscation of executable code to improve resistance to static disassembly,” inProceedings of the 10th ACM conference on Computer and communications security, 2003, pp. 290–299

2003
[3]

Enhancing reverse engineering: Investigating and benchmarking large language models for vulnerability analysis in decompiled binaries,

D. Manuel, N. T. Islam, J. Khoury, A. Nunez, E. Bou-Harb, and P. Najafirad, “Enhancing reverse engineering: Investigating and benchmarking large language models for vulnerability analysis in decompiled binaries,”arXiv preprint arXiv:2411.04981, 2024

work page arXiv 2024
[4]

Can llms obfuscate code? a systematic analysis of large language models into assembly code obfuscation,

S. Mohseni, S. Mohammadi, D. Tilwani, Y . Saxena, G. K. Ndawula, S. Vema, E. Raff, and M. Gaur, “Can llms obfuscate code? a systematic analysis of large language models into assembly code obfuscation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 23, 2025, pp. 24 893–24 901

2025
[5]

Taxonomy of static code analysis tools,

J. Novak, A. Krajncet al., “Taxonomy of static code analysis tools,” inThe 33rd international convention MIPRO. IEEE, 2010, pp. 418–422

2010
[6]

Code obfuscation literature survey,

A. Balakrishnan and C. Schulze, “Code obfuscation literature survey,”CS701 Construction of compilers, vol. 19, no. 31, p. 10, 2005

2005
[7]

Layered obfuscation: a taxonomy of software obfuscation techniques for layered security,

H. Xu, Y . Zhou, J. Ming, and M. Lyu, “Layered obfuscation: a taxonomy of software obfuscation techniques for layered security,” Cybersecurity, vol. 3, pp. 1–18, 2020

2020
[8]

Predicting the re- silience of obfuscated code against symbolic execution attacks via machine learning,

S. Banescu, C. Collberg, and A. Pretschner, “Predicting the re- silience of obfuscated code against symbolic execution attacks via machine learning,” inProceedings of the 26th USENIX Security Symposium (USENIX Security ’17), Vancouver, BC, Canada, 2017, pp. 661–678

2017
[9]

Symbolic execution for software testing: Three decades later,

C. Cadar and K. Sen, “Symbolic execution for software testing: Three decades later,”Communications of the ACM, vol. 56, no. 2, pp. 82–90, 2013

2013
[10]

A survey of symbolic execution techniques,

R. Baldoni, E. Coppa, D. C. D’Elia, C. Demetrescu, and I. Finoc- chi, “A survey of symbolic execution techniques,”ACM Computing Surveys, vol. 51, no. 3, pp. 1–39, 2018

2018
[11]

Manufacturing resilient bi-opaque predicates against symbolic execution,

H. Xu, Y . Zhou, Y . Kang, F. Tu, and M. R. Lyu, “Manufacturing resilient bi-opaque predicates against symbolic execution,” in2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Luxembourg, 2018, pp. 666–677

2018
[12]

Dobf: A deobfuscation pre-training objective for programming languages,

M.-A. Lachaux, B. Roziere, M. Szafraniec, and G. Lample, “Dobf: A deobfuscation pre-training objective for programming languages,”Advances in Neural Information Processing Systems, vol. 34, pp. 14 967–14 979, 2021

2021
[13]

Static code analysis in the ai era: An in-depth exploration of the concept, function, and potential of intelligent code analysis agents,

G. Fan, X. Xie, X. Zheng, Y . Liang, and P. Di, “Static code analysis in the ai era: An in-depth exploration of the concept, function, and potential of intelligent code analysis agents,”arXiv preprint arXiv:2310.08837, 2023

work page arXiv 2023
[14]

A survey on machine learning techniques for source code analysis,

T. Sharma, M. Kechagia, S. Georgiou, R. Tiwari, I. Vats, H. Moazen, and F. Sarro, “A survey on machine learning techniques for source code analysis,”arXiv preprint arXiv:2110.09610, 2021

work page arXiv 2021
[15]

Chain-of-thought prompting elicits reasoning in large language models,

J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhouet al., “Chain-of-thought prompting elicits reasoning in large language models,”Advances in neural information pro- cessing systems, vol. 35, pp. 24 824–24 837, 2022

2022
[16]

A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications

P. Sahoo, A. K. Singh, S. Saha, V . Jain, S. Mondal, and A. Chadha, “A systematic survey of prompt engineering in large language models: Techniques and applications,”arXiv preprint arXiv:2402.07927, 2024

work page internal anchor Pith review arXiv 2024
[17]

Automatic chain of thought prompting in large language models.arXiv preprint arXiv:2210.03493, 2022

Z. Zhang, A. Zhang, M. Li, and A. Smola, “Automatic chain of thought prompting in large language models,”arXiv preprint arXiv:2210.03493, 2022

work page arXiv 2022
[18]

Towards bet- ter chain-of-thought prompting strategies: A survey,

Z. Yu, L. He, Z. Wu, X. Dai, and J. Chen, “Towards better chain-of-thought prompting strategies: A survey,”arXiv preprint arXiv:2310.04959, 2023

work page arXiv 2023
[19]

Obfuscator- llvm–software protection for the masses,

P. Junod, J. Rinaldini, J. Wehrli, and J. Michielin, “Obfuscator- llvm–software protection for the masses,” in2015 ieee/acm 1st international workshop on software protection. IEEE, 2015, pp. 3–9

2015
[20]

Deobfuscation: Reverse engineering obfuscated code,

S. K. Udupa, S. K. Debray, and M. Madou, “Deobfuscation: Reverse engineering obfuscated code,” in12th Working Conference on Reverse Engineering (WCRE’05). IEEE, 2005, pp. 10–pp

2005
[21]

Impact of code deobfuscation and feature interaction in android malware detection,

Y .-C. Chen, H.-Y . Chen, T. Takahashi, B. Sun, and T.-N. Lin, “Impact of code deobfuscation and feature interaction in android malware detection,”IEEE Access, vol. 9, pp. 123 208–123 219, 2021

2021
[22]

Camouflage in malware: from encryption to metamorphism,

B. B. Rad, M. Masrom, and S. Ibrahim, “Camouflage in malware: from encryption to metamorphism,”International Journal of Com- puter Science and Network Security, vol. 12, no. 8, pp. 74–83, 2012

2012
[23]

From static to ai-driven detection: A comprehensive review of obfuscated malware techniques,

S. Chandran, S. R. Syam, S. Sankaran, T. Pandey, and K. Achuthan, “From static to ai-driven detection: A comprehensive review of obfuscated malware techniques,”IEEE Access, 2025

2025
[24]

Use of cryptography in malware ob- fuscation,

H. J. Asghar, B. Z. H. Zhao, M. Ikram, G. Nguyen, D. Kaafar, S. Lamont, and D. Coscia, “Use of cryptography in malware ob- fuscation,”Journal of Computer Virology and Hacking Techniques, vol. 20, no. 1, pp. 135–152, 2024

2024
[25]

Control flow ob- fuscation for android applications,

V . Balachandran, D. J. Tan, V . L. Thinget al., “Control flow ob- fuscation for android applications,”Computers & Security, vol. 61, pp. 72–93, 2016

2016
[26]

A framework to quantify the quality of source code obfuscation,

H. Jin, J. Lee, S. Yang, K. Kim, and D. H. Lee, “A framework to quantify the quality of source code obfuscation,”Applied Sciences, vol. 14, no. 12, p. 5056, 2024

2024
[27]

Lightweight dispatcher constructions for control flow flattening,

B. Johansson, P. Lantz, and M. Liljenstam, “Lightweight dispatcher constructions for control flow flattening,” inProceedings of the 7th Software Security, Protection, and Reverse Engineering/Software Security and Protection Workshop, 2017, pp. 1–12

2017
[28]

A generic approach to automatic deobfuscation of executable code,

B. Yadegari, B. Johannesmeyer, B. Whitely, and S. Debray, “A generic approach to automatic deobfuscation of executable code,” inProceedings of the IEEE Symposium on Security and Privacy (S&P). IEEE, 2015, pp. 674–691

2015
[29]

Statistical deobfuscation of android applications,

B. Bichsel, V . Raychev, P. Tsankov, and M. Vechev, “Statistical deobfuscation of android applications,” inProceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2016, pp. 343–355

2016
[30]

Obfuscating C++ programs via control flow flattening,

G. L ´aszl´o and ´A. Kiss, “Obfuscating C++ programs via control flow flattening,”Annales Universitatis Scientiarum Budapestinensis de Rolando E ¨otv¨os Nominatae, Sectio Computatorica, vol. 30, pp. 3–19, 2009

2009
[31]

Neural network-based graph embedding for cross-platform binary code similarity detection,

X. Xu, C. Liu, Q. Feng, H. Yin, L. Song, and D. Song, “Neural network-based graph embedding for cross-platform binary code similarity detection,” inProceedings of the ACM SIGSAC Confer- ence on Computer and Communications Security (CCS). ACM, 2017, pp. 363–376

2017
[32]

Toxic code snippets on stack overflow,

C. Ragkhitwetsagul, J. Krinke, M. Paixao, G. Bianco, and R. Oliveto, “Toxic code snippets on stack overflow,”IEEE Transac- tions on Software Engineering, vol. 48, no. 3, pp. 733–750, 2022

2022
[33]

Assessing llms in malicious code deobfuscation of real-world malware campaigns,

C. Patsakis, F. Casino, and N. Lykousas, “Assessing llms in malicious code deobfuscation of real-world malware campaigns,” Expert systems with applications, vol. 256, p. 124912, 2024

2024
[34]

Exploring the potential of llms for code deobfuscation,

D. Beste, G. Menguy, H. Hajipour, M. Fritz, A. E. Cin `a, S. Bardin, T. Holz, T. Eisenhofer, and L. Sch ¨onherr, “Exploring the potential of llms for code deobfuscation,” inInternational Conference on Detection of Intrusions and Malware, and Vulnerability Assess- ment. Springer, 2025, pp. 267–286

2025
[35]

Can llms recover program semantics? a systematic evaluation with symbolic execution,

R. Feng and S. Saha, “Can llms recover program semantics? a systematic evaluation with symbolic execution,”arXiv preprint arXiv:2511.19130, 2025

work page arXiv 2025
[36]

Malicious source code detection using a translation model,

C. Tsfaty and M. Fire, “Malicious source code detection using a translation model,”Patterns, vol. 4, no. 7, 2023

2023
[37]

Deobfuscation, unpacking, and decoding of obfuscated malicious javascript for machine learning models detection performance improvement,

S. Ndichu, S. Kim, and S. Ozawa, “Deobfuscation, unpacking, and decoding of obfuscated malicious javascript for machine learning models detection performance improvement,”CAAI Transactions on Intelligence Technology, vol. 5, no. 3, pp. 184–192, 2020

2020
[38]

A survey of the recent trends in deep learning based malware detection,

U.-e.-H. Tayyab, F. B. Khan, M. H. Durad, A. Khan, and Y . S. Lee, “A survey of the recent trends in deep learning based malware detection,”Journal of Cybersecurity and Privacy, vol. 2, no. 4, pp. 800–829, 2022

2022
[39]

Dfsgraph: Data flow semantic model for intermediate representation programs based on graph network,

K. Tang, Z. Shan, C. Zhang, L. Xu, M. Qiao, and F. Liu, “Dfsgraph: Data flow semantic model for intermediate representation programs based on graph network,”Electronics, vol. 11, no. 19, p. 3230, 2022

2022
[40]

Deep graph learning for circuit deobfuscation,

Z. Chen, L. Zhang, G. Kolhe, H. M. Kamali, S. Rafatirad, S. M. Pudukotai Dinakarrao, H. Homayoun, C.-T. Lu, and L. Zhao, “Deep graph learning for circuit deobfuscation,”Frontiers in big Data, vol. 4, p. 608286, 2021

2021
[41]

A neural network-based cognitive obfuscation toward enhanced logic locking,

R. Hassan, G. Kolhe, S. Rafatirad, H. Homayoun, and S. M. P. Dinakarrao, “A neural network-based cognitive obfuscation toward enhanced logic locking,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 11, pp. 4587–4599, 2021

2021
[42]

Towards machine- learning-based oracle-guided analog circuit deobfuscation,

D. Jain, G. Zhao, R. Datta, and K. Shamsi, “Towards machine- learning-based oracle-guided analog circuit deobfuscation,” in2024 IEEE International Test Conference (ITC). IEEE, 2024, pp. 323– 332

2024
[43]

Graph matching networks for learning the similarity of graph structured objects,

Y . Li, C. Gu, T. Dullien, O. Vinyals, and P. Kohli, “Graph matching networks for learning the similarity of graph structured objects,” inInternational conference on machine learning. PMLR, 2019, pp. 3835–3845

2019
[44]

Graph clustering using k-neighbourhood attribute structural similarity,

M. P. Boobalan, D. Lopez, and X. Z. Gao, “Graph clustering using k-neighbourhood attribute structural similarity,”Applied soft computing, vol. 47, pp. 216–223, 2016

2016
[45]

A method to evaluate cfg compari- son algorithms,

P. P. Chan and C. Collberg, “A method to evaluate cfg compari- son algorithms,” in2014 14th international conference on quality software. IEEE, 2014, pp. 95–104

2014
[46]

Bleu: a method for automatic evaluation of machine translation,

K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a method for automatic evaluation of machine translation,” inProceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002, pp. 311–318

2002
[47]

Enhanced bilingual evaluation understudy,

K. Maraseket al., “Enhanced bilingual evaluation understudy,” arXiv preprint arXiv:1509.09088, 2015

work page arXiv 2015
[48]

A call for clarity in reporting BLEU scores.arXiv preprint arXiv:1804.08771,

M. Post, “A call for clarity in reporting bleu scores,”arXiv preprint arXiv:1804.08771, 2018

work page arXiv 2018