arxiv: 2605.06184 · v1 · submitted 2026-05-07 · 💻 cs.SE · cs.LG· cs.PL

Recognition: unknown

Teaching LLMs Program Semantics via Symbolic Execution Traces

Jonas Bayer , Stefan Zetzsche , Olivier Bouissou , Remi Delmas , Michael Tautschnig , Soonho Kong

Authors on Pith no claims yet

Pith reviewed 2026-05-08 08:51 UTC · model grok-4.3

classification 💻 cs.SE cs.LGcs.PL

keywords LLM trainingsymbolic executionprogram verificationviolation detectionC programschain-of-thoughtSV-COMPformal traces

0 comments

The pith

Training an 8B LLM on 3,000 symbolic execution traces from C code, paired with chain-of-thought reasoning, raises violation detection accuracy by more than 17 points and lets it surpass a 32B model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors build a benchmark of 500 C program verification tasks spanning memory safety, overflow, termination, reachability, and data races, drawn from SV-COMP 2025. They observe that current LLMs are strong at verifying that a property holds but weak at finding actual violations, with performance dropping as programs grow longer. To address this, they generate symbolic execution traces using the Soteria engine on open-source C programs and use a small set of about 3,000 traces containing bugs to continue pretraining an 8B model. When this training is combined with chain-of-thought prompting during inference, violation detection improves markedly and the gains carry over to property types absent from the training data.

Core claim

Continued pretraining of Qwen3-8B on symbolic execution traces that record paths to property violations in C code, followed by chain-of-thought reasoning at inference time, produces superadditive improvements in violation detection. The resulting model achieves one of the most balanced accuracy profiles across confirmation and violation tasks and outperforms the larger Qwen3-32B model on violation detection while approaching it overall. These gains arise from the semantics captured in the traces rather than from additional code exposure alone.

What carries the argument

Symbolic execution traces produced by running the Soteria engine on generic open-source C code; these traces encode concrete execution paths and states leading to violations and are used for continued pretraining.

If this is right

Violation detection gains transfer across all five property types even when the training traces target only some of them.
The combination of trace-based training and chain-of-thought reasoning at inference produces larger improvements than either component alone.
Performance improvements depend on the curation and format of the traces rather than the sheer volume of code seen during training.
Smaller models can reach or exceed larger models on this task after targeted trace training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar trace-based training might help LLMs with other formal reasoning tasks such as theorem proving if analogous execution artifacts can be generated.
The approach could be tested on codebases from different domains to check whether the learned semantics generalize beyond the open-source C programs used for training.
Integrating these traces into retrieval-augmented generation pipelines might further boost performance on long programs where base models degrade.

Load-bearing premise

The symbolic traces from generic open-source C code convey general program semantics that transfer to the SV-COMP benchmark tasks rather than superficial statistical patterns specific to the training distribution.

What would settle it

Running the trained model on a new set of C programs with violation patterns or lengths that differ systematically from both the trace generation corpus and the SV-COMP tasks, then measuring whether the 17-point violation detection gain disappears.

Figures

Figures reproduced from arXiv: 2605.06184 by Jonas Bayer, Michael Tautschnig, Olivier Bouissou, Remi Delmas, Soonho Kong, Stefan Zetzsche.

**Figure 1.** Figure 1: Baseline accuracy on our benchmark, sorted by overall performance. Numbers above each view at source ↗

**Figure 2.** Figure 2: Baseline accuracy on our benchmark by program length for all models. Holds accuracy (a) view at source ↗

**Figure 3.** Figure 3: Example from our benchmark: does this program always terminate? When view at source ↗

**Figure 4.** Figure 4: Overview of our data generation and training pipeline. Starting from CodeParrot, we filter view at source ↗

**Figure 5.** Figure 5: Accuracy on our benchmark by training configuration, split by property holds and violated. view at source ↗

**Figure 6.** Figure 6: Model performance on CBMC regression tests (% correct out of 179 tasks). High accuracy view at source ↗

**Figure 7.** Figure 7: Distribution of program lengths in our benchmark (488 unique programs across 500 tasks, view at source ↗

**Figure 8.** Figure 8: Accuracy on our benchmark by program length for selected training configurations. (a) view at source ↗

read the original abstract

We introduce an evaluation framework of 500 C verification tasks across five property types (memory safety, overflow, termination, reachability, data races) built on SV-COMP 2025, and evaluate 14 models across six families. We find that high overall accuracy masks a critical weakness: while most models reliably confirm properties hold, violation detection varies widely and degrades sharply with program length. To close this gap, we train on formal verification artifacts: running the Soteria symbolic execution engine on generic open-source C code and using the resulting traces for continued pretraining of Qwen3-8B. Just ${\sim}$3,000 bug traces combined with chain-of-thought reasoning at inference time improve violation detection by over 17 percentage points, producing one of the most balanced accuracy profiles among evaluated models. On violation detection, the trained 8B model outperforms the 4$\times$ larger Qwen3-32B without thinking and approaches it in overall accuracy. The interaction between trace training and chain-of-thought is superadditive: neither alone provides meaningful gains, but their combination does. Improvements transfer across all five property types, including ones the training traces do not target. Our 28 configurations confirm the gains stem from trace semantics, not code volume, and that trace curation and format matter.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Continued pretraining on ~3000 Soteria traces plus CoT lifts violation detection 17+ points on the 500-task SV-COMP benchmark and transfers across properties, but the semantic-vs-format question still needs tighter isolation.

read the letter

The main thing to know is that this paper shows a practical way to inject formal verification signals into an 8B LLM. Training on symbolic execution bug traces from generic open-source C code, then using chain-of-thought at inference, improves violation detection by more than 17 points on their 500-task benchmark and produces more balanced accuracy than most baselines. The trained 8B model even beats the 4x larger Qwen3-32B on violation detection alone. The gains are superadditive with CoT and carry over to properties the traces never targeted, like termination and data races. They ran 28 configurations to argue the effect is not just extra code volume, which is better than most LLM-for-code papers manage.

Referee Report

2 major / 2 minor

Summary. The paper introduces a 500-task C verification benchmark across five properties (memory safety, overflow, termination, reachability, data races) derived from SV-COMP 2025. It shows that LLMs generally confirm properties but struggle with violation detection, especially on longer programs. The authors generate ~3,000 symbolic execution traces via Soteria on generic open-source C code and use them for continued pretraining of Qwen3-8B. Combined with chain-of-thought reasoning at inference, this yields >17pp gains in violation detection; the trained 8B model outperforms 4x larger Qwen3-32B on violations and approaches it overall. Gains are superadditive (traces + CoT), transfer to all five properties (including untargeted ones), and are attributed to trace semantics via 28 controlled configurations rather than code volume alone.

Significance. If the central result holds, the work demonstrates a practical way to inject formal program semantics into LLMs via symbolic traces, producing more balanced verification performance and superadditive benefits with CoT. The multi-property benchmark, cross-property transfer, and explicit ablation of 28 configurations (ruling out volume) are notable strengths that support falsifiable claims about semantic learning. This approach could meaningfully advance LLM-assisted program analysis in software engineering.

major comments (2)

[Experiments / Ablation study (around the 28-configuration results)] The manuscript states that 28 configurations confirm gains stem from trace semantics rather than code volume, yet does not describe an explicit control isolating semantic content (e.g., randomized trace order, format-only inputs without executable semantics, or syntax-shuffled variants). Without such a condition, it remains possible that improvements reflect learning of trace syntax or SV-COMP-style signatures rather than transferable program semantics; this directly bears on the generalization claims to longer programs and untargeted properties.
[Data / Corpus construction] No explicit check or statement is provided confirming zero overlap between the generic open-source C corpus used for Soteria trace generation and the SV-COMP-derived 500-task test set. Given that both involve C code, any shared programs or similar constructs could inflate the reported transfer and violation-detection gains.

minor comments (2)

[Abstract and Results] The abstract claims transfer 'across all five property types, including ones the training traces do not target,' but the main text should explicitly list which properties appear in the Soteria traces versus which are purely zero-shot transfer.
[Evaluation framework] Reproducibility would benefit from additional detail on task selection/filtering criteria from SV-COMP 2025 and the exact prompt templates used for CoT inference.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback. We address each major comment below and will revise the manuscript to incorporate the suggested clarifications and additional controls.

read point-by-point responses

Referee: [Experiments / Ablation study (around the 28-configuration results)] The manuscript states that 28 configurations confirm gains stem from trace semantics rather than code volume, yet does not describe an explicit control isolating semantic content (e.g., randomized trace order, format-only inputs without executable semantics, or syntax-shuffled variants). Without such a condition, it remains possible that improvements reflect learning of trace syntax or SV-COMP-style signatures rather than transferable program semantics; this directly bears on the generalization claims to longer programs and untargeted properties.

Authors: We agree that the current description of the 28 configurations focuses primarily on volume controls and format variations but does not include explicit semantic-isolation ablations such as randomized trace ordering or syntax-shuffled variants. Our existing results show that equivalent volumes of raw code (without traces) produce no gains and that trace curation matters, but to more rigorously rule out syntax or signature learning, we will add these specific controls in the revised manuscript. This will provide stronger support for the semantic-learning interpretation and the observed transfer to longer programs and untargeted properties. revision: yes
Referee: [Data / Corpus construction] No explicit check or statement is provided confirming zero overlap between the generic open-source C corpus used for Soteria trace generation and the SV-COMP-derived 500-task test set. Given that both involve C code, any shared programs or similar constructs could inflate the reported transfer and violation-detection gains.

Authors: This is a fair concern. We have verified the sources and confirm there is no overlap: the open-source C corpus for trace generation draws from public repositories unrelated to SV-COMP, while the 500-task test set is derived exclusively from SV-COMP 2025 benchmarks with no shared programs or direct constructs. We will add an explicit statement and brief description of this verification to the data-construction section of the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical training and external benchmark evaluation

full rationale

The paper reports an empirical study: continued pretraining of Qwen3-8B on ~3,000 Soteria symbolic traces from generic open-source C code, followed by evaluation on a 500-task SV-COMP-derived benchmark across five property classes. No equations, derivations, or self-referential definitions appear in the provided text. The 28 configurations are ablation experiments that compare trace semantics against code volume controls; they do not reduce any reported gain to a fitted parameter or self-citation by construction. The central claim (superadditive lift from traces + CoT) is measured against an external benchmark with no overlap or format-only controls stated as load-bearing. This is a standard self-contained empirical result with no reduction to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that symbolic execution traces encode program semantics in a form usable by LLMs for violation detection, plus standard ML assumptions about generalization from training data.

axioms (1)

domain assumption Symbolic execution traces from Soteria capture the relevant semantics for detecting violations across memory safety, overflow, termination, reachability, and data races.
Invoked when claiming that training on these traces produces transferable gains to all five property types.

pith-pipeline@v0.9.0 · 5549 in / 1311 out tokens · 66264 ms · 2026-05-08T08:51:50.329176+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 26 canonical work pages · 4 internal anchors

[1]

Aristotle: IMO-level Automated Theorem Proving

Achim, T., Best, A., Bietti, A., Der, K., Fédérico, M., Gukov, S., Halpern-Leistner, D., Hen- ningsgard, K., Kudryashov, Y ., Meiburg, A., et al. Aristotle: Imo-level automated theorem proving.arXiv preprint arXiv:2510.01346, 2025

work page internal anchor Pith review arXiv 2025
[2]

Kani rust verifier

Amazon Web Services. Kani rust verifier. https://github.com/model-checking/kani. Accessed February 2026

2026
[3]

H., Seeker, V ., Cummins, C., Kambadur, M., O’Boyle, M

Armengol-Estapé, J., Carbonneaux, Q., Zhang, T., Markosyan, A. H., Seeker, V ., Cummins, C., Kambadur, M., O’Boyle, M. F., Wang, S., Synnaeve, G., et al. What i cannot execute, i do not understand: Training and evaluating llms on program execution traces.arXiv preprint arXiv:2503.05703, 2025

work page arXiv 2025
[4]

Soteria: Efficient Symbolic Execution as a Functional Library

Ayoun, S.-É., Sjöstedt, O., and Raad, A. Soteria: Efficient symbolic execution as a functional library.arXiv preprint arXiv:2511.08729, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[5]

Baksys, M., Zetzsche, S., Bouissou, O., Delmas, R., Kong, S., and Holden, S. B. Atlas: Automated toolkit for large-scale verified code synthesis.arXiv preprint arXiv:2512.10173, 2025

work page arXiv 2025
[6]

and Strejˇcek, J

Beyer, D. and Strejˇcek, J. Improvements in software verification and witness validation: SV- COMP 2025. InInternational Conference on Tools and Algorithms for the Construction and Analysis of Systems, pp. 151–186. Springer, 2025

2025
[7]

arXiv preprint arXiv:2509.22908 , year=

Bursuc, S., Ehrenborg, T., Lin, S., Astefanoaei, L., Chiosa, I. E., Kukovec, J., Singh, A., Butterley, O., Bizid, A., Dougherty, Q., et al. A benchmark for vericoding: formally verified program synthesis.arXiv preprint arXiv:2509.22908, 2025

work page arXiv 2025
[8]

Moving fast with software verification

Calcagno, C., Distefano, D., Dubreil, J., Gabi, D., Hooimeijer, P., Luca, M., O’Hearn, P., Papakonstantinou, I., Purbrick, J., and Rodriguez, D. Moving fast with software verification. In NASA Formal Methods Symposium, pp. 3–11. Springer, 2015

2015
[9]

Leonardo de Moura and Sebastian Ullrich

Chen, L., Gu, J., Huang, L., Huang, W., Jiang, Z., Jie, A., Jin, X., Jin, X., Li, C., Ma, K., et al. Seed-prover: Deep and broad reasoning for automated theorem proving.arXiv preprint arXiv:2507.23726, 2025

work page arXiv 2025
[10]

Chain of execution supervision promotes general reasoning in large language models.arXiv preprint arXiv:2510.23629, 2025

Chen, N., Li, Z., Bao, K., Lin, J., and Liu, D. Chain of execution supervision promotes general reasoning in large language models.arXiv preprint arXiv:2510.23629, 2025

work page arXiv 2025
[11]

Chen, T., Lu, S., Lu, S., Gong, Y ., Yang, C., Li, X., Misu, M. R. H., Yu, H., Duan, N., Cheng, P., et al. Automated proof generation for rust code via self-evolution.arXiv preprint arXiv:2410.15756, 2024

work page arXiv 2024
[12]

Github code dataset

CodeParrot. Github code dataset. https://huggingface.co/datasets/codeparrot/ github-code. Accessed: February 2026

2026
[13]

Cwm: An open-weights llm for research on code generation with world models

Copet, J., Carbonneaux, Q., Cohen, G., Gehring, J., Kahn, J., Kossen, J., Kreuk, F., McMilin, E., Meyer, M., Wei, Y ., et al. CWM: An open-weights LLM for research on code generation with world models.arXiv preprint arXiv:2510.02387, 2025

work page arXiv 2025
[14]

CBMC regression tests

Diffblue. CBMC regression tests. https://github.com/diffblue/cbmc/tree/develop/ regression/cbmc. Accessed: February 2026

2026
[15]

A., Horvát, K

Dubniczky, R. A., Horvát, K. Z., Bisztray, T., Ferrag, M. A., Cordeiro, L. C., and Tihanyi, N. CASTLE: Benchmarking dataset for static code analyzers and LLMs towards CWE detection,
[16]

URLhttps://arxiv.org/abs/2503.09433. 10

work page arXiv
[17]

and Akiba, T

Eguchi, K. and Akiba, T. Making LLMs program interpreters via execution trace chain of thought. InICML 2025 Workshop on Programmatic Representations for Agent Learning

2025
[18]

I., Gehr, T., and Vechev, M

Fedoseev, T., Dimitrov, D. I., Gehr, T., and Vechev, M. Constraint-based synthetic data generation for llm mathematical reasoning. InThe 4th Workshop on Mathematical Reasoning and AI at NeurIPS’24, 2024

2024
[19]

Gu, A., Rozière, B., Leather, H., Solar-Lezama, A., Synnaeve, G., and Wang, S. I. CruxEval: A benchmark for code reasoning, understanding and execution.arXiv preprint arXiv:2401.03065, 2024

work page internal anchor Pith review arXiv 2024
[20]

Ultimate automizer and the search for perfect interpolants: (competition contribution)

Heizmann, M., Chen, Y .-F., Dietsch, D., Greitschus, M., Hoenicke, J., Li, Y ., Nutz, A., Musa, B., Schilling, C., Schindler, T., et al. Ultimate automizer and the search for perfect interpolants: (competition contribution). InInternational Conference on Tools and Algorithms for the Construction and Analysis of Systems, pp. 447–451. Springer, 2018

2018
[21]

Z., Žuži ´c, G., Wieser, E., Huang, A., Schrit- twieser, J., Schroecker, Y ., Masoom, H., et al

Hubert, T., Mehta, R., Sartran, L., Horváth, M. Z., Žuži ´c, G., Wieser, E., Huang, A., Schrit- twieser, J., Schroecker, Y ., Masoom, H., et al. Olympiad-level formal mathematical reasoning with reinforcement learning.Nature, pp. 1–3, 2025

2025
[22]

and Tautschnig, M

Kroening, D. and Tautschnig, M. CBMC–C bounded model checker: (competition contribution). InInternational Conference on Tools and Algorithms for the Construction and Analysis of Systems, pp. 389–391. Springer, 2014

2014
[23]

Everything you wanted to know about LLM-based vulnerability detection but were afraid to ask.arXiv preprint arXiv:2504.13474, 2025

Li, Y ., Li, X., Wu, H., Xu, M., Zhang, Y ., Cheng, X., Xu, F., and Zhong, S. Everything you wanted to know about LLM-based vulnerability detection but were afraid to ask.arXiv preprint arXiv:2504.13474, 2025

work page arXiv 2025
[24]

arXiv preprint arXiv:2305.05383 , year=

Liu, C., Lu, S., Chen, W., Jiang, D., Svyatkovskiy, A., Fu, S., Sundaresan, N., and Duan, N. Code execution with pre-trained language models.arXiv preprint arXiv:2305.05383, 2023

work page arXiv 2023
[25]

Agentic property-based testing: Finding bugs across the python ecosystem.arXiv preprint arXiv:2510.09907, 2025

Maaz, M., DeV oe, L., Hatfield-Dodds, Z., and Carlini, N. Agentic property-based testing: Finding bugs across the python ecosystem.arXiv preprint arXiv:2510.09907, 2025

work page arXiv 2025
[26]

Trace gadgets: Minimizing code context for machine learning-based vulnerability prediction.arXiv preprint arXiv:2504.13676, 2025

Mächtle, F., Loose, N., Schulz, T., Sieck, F., Serr, J.-N., Möller, R., and Eisenbarth, T. Trace gadgets: Minimizing code context for machine learning-based vulnerability prediction.arXiv preprint arXiv:2504.13676, 2025

work page arXiv 2025
[27]

M., Aleithan, R., Harzevili, N

Mohajer, M. M., Aleithan, R., Harzevili, N. S., Wei, M., Belle, A. B., Pham, H. V ., and Wang, S. SkipAnalyzer: A tool for static code analysis with large language models.arXiv preprint arXiv:2310.18532, 2023

work page arXiv 2023
[28]

Moura, L. d. and Ullrich, S. The Lean 4 theorem prover and programming language. In International Conference on Automated Deduction, pp. 625–635. Springer, 2021

2021
[29]

arXiv preprint arXiv:2404.14662 , year=

Ni, A., Allamanis, M., Cohan, A., Deng, Y ., Shi, K., Sutton, C., and Yin, P. NExT: Teaching large language models to reason about code execution.arXiv preprint arXiv:2404.14662, 2024

work page arXiv 2024
[30]

LocoBench: A benchmark for long-context large language models in complex software engineering.arXiv preprint arXiv:2509.09614, 2025

Qiu, J., Liu, Z., Liu, Z., Murthy, R., Zhang, J., Chen, H., Wang, S., Zhu, M., Yang, L., Tan, J., et al. LocoBench: A benchmark for long-context large language models in complex software engineering.arXiv preprint arXiv:2509.09614, 2025

work page arXiv 2025
[31]

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

Ren, Z., Shao, Z., Song, J., Xin, H., Wang, H., Zhao, W., Zhang, L., Fu, Z., Zhu, Q., Yang, D., et al. DeepSeek-prover-v2: Advancing formal mathematical reasoning via reinforcement learning for subgoal decomposition.arXiv preprint arXiv:2504.21801, 2025

work page internal anchor Pith review arXiv 2025
[32]

Integrating symbolic execution into the fine-tuning of code-generating llms.arXiv preprint arXiv:2504.15210, 2025

Sakharova, M., Anand, A., and Mezini, M. Integrating symbolic execution into the fine-tuning of code-generating llms.arXiv preprint arXiv:2504.15210, 2025

work page arXiv 2025
[33]

Llms versus the halting problem: Revisiting program termination prediction

Sultan, O., Armengol-Estape, J., Kesseli, P., Vanegue, J., Shahaf, D., Adi, Y ., and O’Hearn, P. LLMs versus the halting problem: Revisiting program termination prediction.arXiv preprint arXiv:2601.18987, 2026. 11

work page arXiv 2026
[34]

Clever: A curated benchmark for formally verified code generation.arXiv preprint arXiv:2505.13938, 2025

Thakur, A., Lee, J., Tsoukalas, G., Sistla, M., Zhao, M., Zetzsche, S., Durrett, G., Yue, Y ., and Chaudhuri, S. Clever: A curated benchmark for formally verified code generation.arXiv preprint arXiv:2505.13938, 2025

work page arXiv 2025
[35]

Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning

Wang, H., Unsal, M., Lin, X., Baksys, M., Liu, J., Santos, M. D., Sung, F., Vinyes, M., Ying, Z., Zhu, Z., et al. Kimina-prover preview: Towards large formal reasoning models with reinforcement learning.arXiv preprint arXiv:2504.11354, 2025

work page arXiv 2025
[36]

InvBench: Can LLMs accelerate program verification with invariant synthesis?arXiv preprint arXiv:2509.21629, 2025

Wei, A., Suresh, T., Sun, T., Wu, H., Wang, K., and Aiken, A. InvBench: Can LLMs accelerate program verification with invariant synthesis?arXiv preprint arXiv:2509.21629, 2025

work page arXiv 2025
[37]

Lemur: Integrating large language models in automated program verification

Wu, H., Barrett, C., and Narodytska, N. Lemur: Integrating large language models in automated program verification. InInternational Conference on Learning Representations, 2024

2024
[38]

CoRe: Bench- marking LLMs code reasoning capabilities through static analysis tasks.arXiv preprint arXiv:2507.05269, 2025

Xie, D., Zheng, M., Liu, X., Wang, J., Wang, C., Tan, L., and Zhang, X. CoRe: Bench- marking LLMs code reasoning capabilities through static analysis tasks.arXiv preprint arXiv:2507.05269, 2025

work page arXiv 2025
[39]

SatLM: Satisfiability-aided language models using declarative prompting.Advances in Neural Information Processing Systems, 36:45548–45580, 2023

Ye, X., Chen, Q., Dillig, I., and Durrett, G. SatLM: Satisfiability-aided language models using declarative prompting.Advances in Neural Information Processing Systems, 36:45548–45580, 2023

2023
[40]

Verina: Benchmarking verifiable code generation.arXiv preprint arXiv:2505.23135, 2025

Ye, Z., Yan, Z., He, J., Kasriel, T., Yang, K., and Song, D. Verina: Benchmarking verifiable code generation.arXiv preprint arXiv:2505.23135, 2025. A CBMC Regression Tests We initially evaluated models on the CBMC regression tests [ 14], a collection of approximately 1,000 self-contained C files where ground truth is established by the CBMC verifier. We c...

work page arXiv 2025
[41]

‘x ‘ is d ec la red as ‘ un si gne d int ‘ and i n i t i a l i z e d to ‘0 ‘
[42]

Then we hit the loop : 24 while ( x < N ) { x += 2; } The loop c o n d i t i o n co mpa re s ‘x ‘ ( which is ‘0 ‘) against ‘N ‘ , but ‘N ‘ has never been as sig ne d any value

‘N ‘ is d ec la red as ‘ un si gne d short ‘ but ** never i n i t i a l i z e d ** -- it holds an i n d e t e r m i n a t e value . Then we hit the loop : 24 while ( x < N ) { x += 2; } The loop c o n d i t i o n co mpa re s ‘x ‘ ( which is ‘0 ‘) against ‘N ‘ , but ‘N ‘ has never been as sig ne d any value . Reading ‘N ‘ here is a c c e s s i n g u n i n ...
[43]

** U n i n i t i a l i z e d read of ‘N ‘** -- u n d e f i n e d be hav io r at the loop c o n d i t i o n
[44]

The r e a c h a b i l i t y of the error is g u a r a n t e e d r e g a r d l e s s of the value of ‘N ‘

** A s s e r t i o n v i o l a t i o n ** -- even ig no rin g the UB , ‘x ‘ is always even , so ‘x % 2 ‘ is always ‘0 ‘ , and the a s s e r t i o n will always fail . The r e a c h a b i l i t y of the error is g u a r a n t e e d r e g a r d l e s s of the value of ‘N ‘. The informalized trace presents the analysis in natural language with explicit reaso...