Autoregressive, Yet Revisable: In Decoding Revision for Secure Code Generation

Chengran Yang; David Lo; Heminghao Deng; Jinfeng Jiang; Ming Wen; Tianyi Wu; Ting Zhang; Zhensu Sun; Zichao Wei

arxiv: 2602.01187 · v2 · submitted 2026-02-01 · 💻 cs.SE · cs.AI

Autoregressive, Yet Revisable: In Decoding Revision for Secure Code Generation

Chengran Yang , Zichao Wei , Heminghao Deng , Jinfeng Jiang , Zhensu Sun , Ting Zhang , Tianyi Wu , Ming Wen

show 1 more author

David Lo

This is my paper

Pith reviewed 2026-05-16 08:51 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords secure code generationLLM decoding revisionautoregressive self-correctionaction tokensvulnerability reductioninternal revision loopcode generation

0 comments

The pith

LLMs can use special action tokens to backtrack and revise their own code outputs during a single generation pass, reducing vulnerabilities without external tools.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that code generation by large language models can shift from a rigid, one-way token stream to a self-correcting process that stays inside the model's own autoregressive steps. It introduces action tokens that let the model decide on the fly to revisit and edit earlier parts of the code it has already produced. This matters because prior fixes either add slow external agents or static scanners, while this method keeps everything fast and internal by drawing on the model's existing semantic knowledge. A reader following the argument would expect fewer security flaws in the final code and almost no increase in generation time.

Core claim

Stream of Revision turns monotonic autoregressive decoding into a dynamic trajectory by inserting specific action tokens that let the model backtrack and edit its own prior outputs inside one forward pass, thereby activating latent revision capabilities for secure code without outside dependencies.

What carries the argument

Stream of Revision using action tokens to trigger backtracking and self-editing of generation history within a single forward pass.

If this is right

Vulnerability rates in generated code drop substantially on secure coding tasks.
Inference cost stays close to standard autoregressive decoding.
The model activates its own revision abilities without post-hoc agents or external tools.
Generation becomes a self-correcting trajectory rather than a fixed linear sequence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same token mechanism could be tested on non-code generation tasks where quality improves through mid-stream fixes.
Fewer downstream security scanners would be needed if models routinely apply these internal edits.
Models trained on revision tokens might show better handling of long, interdependent outputs in other domains.
Extending the approach to different programming languages would reveal whether the revision behavior generalizes.

Load-bearing premise

The model can learn to interpret action tokens as instructions to meaningfully revise earlier tokens using only its internal reasoning while preserving the autoregressive property.

What would settle it

Generate code on the same secure coding benchmarks once with the action tokens available and once without them, then measure whether vulnerability rates drop only in the version that actually invokes revision steps.

Figures

Figures reproduced from arXiv: 2602.01187 by Chengran Yang, David Lo, Heminghao Deng, Jinfeng Jiang, Ming Wen, Tianyi Wu, Ting Zhang, Zhensu Sun, Zichao Wei.

**Figure 1.** Figure 1: Stream of Generation vs. Stream of Revision. Conventional code generation treats generation as a linear stream of token appending, lacking the ability to revise earlier tokens. In contrast, our proposed Stream of Revision framework introduces action tokens that enable dynamic backtracking and in-place editing within a single pass. generation and on-the-fly editing. Empirical studies of developer behavio… view at source ↗

**Figure 2.** Figure 2: Overview of Stream of Revision for alignment data construction and single pass inference. Top: from real world CVE pairs, we filter, extract code diffs, and linearize the change into a revision trajectory with an revision trigger, a localized vulnerable span, and a patch span. Bottom: during autoregressive decoding, the model can emit a trigger token to start a revision episode, localize a vulnerable span … view at source ↗

**Figure 3.** Figure 3: Impact of Training Data Scale. Blue bars (Left Axis) denote Security Pass Rate (SPR), Orange bars (Right Axis) denote Avg. Inference Tokens. Comparing the hatched bars to solid bars shows that adding more data yields negligible security gains but incurs higher inference costs due to more revisions. paramount over raw quantity, and our method is sampleefficient. Meanwhile, the ablation variant without revi… view at source ↗

**Figure 4.** Figure 4: Per-Category Secure Patch Rate (SPR) on Top-10 CWEs. Comparison between the vanilla base model and Stream of Revision [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗

**Figure 5.** Figure 5: Case Study of Stream of Revision in Action. header, a delimiter, and the message body into a heap buffer. In the initial draft, the code allocates a fixed size buffer of 256 bytes and then uses unbounded string copy and concatenation routines to append the header and body. When either field is longer than the remaining capacity, this pattern can overflow the heap buffer, which is a classic memory corruptio… view at source ↗

read the original abstract

Large Language Model (LLM) based code generation is predominantly formulated as a strictly monotonic process, appending tokens linearly to an immutable prefix. This formulation contrasts to the cognitive process of programming, which is inherently interleaved with forward generation and on-the-fly revision. While prior works attempt to introduce revision via post-hoc agents or external static tools, they either suffer from high latency or fail to leverage the model's intrinsic semantic reasoning. In this paper, we propose Stream of Revision, a paradigm shift that elevates code generation from a monotonic stream to a dynamic, self-correcting trajectory by leveraging model's intrinsic capabilities. We introduce specific action tokens that enable the model to seamlessly backtrack and edit its own history within a single forward pass. By internalizing the revision loop, our framework Stream of Revision allows the model to activate its latent capabilities just-in-time without external dependencies. Empirical results on secure code generation show that Stream of Revision significantly reduces vulnerabilities with minimal inference overhead.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes Stream of Revision, a new decoding paradigm for LLM-based secure code generation. It introduces special action tokens that purportedly allow the model to backtrack and edit its own generation history inside a single forward pass, internalizing revision to reduce vulnerabilities while preserving autoregressive properties and incurring only minimal overhead.

Significance. If the core mechanism can be made precise and shown to work without violating causality, the approach would be significant: it offers an intrinsic, low-latency alternative to external agent-based or post-hoc revision methods, potentially improving security guarantees in code generation by activating latent model capabilities on the fly.

major comments (2)

[Abstract] Abstract: the central claim that action tokens enable the model to 'seamlessly backtrack and edit its own history within a single forward pass' while remaining autoregressive is not supported by any described mechanism. Standard causal decoding fixes each token once sampled; any edit to an earlier position requires either discarding the KV cache and re-running from that point, a non-causal attention mask, or an auxiliary buffer the model cannot modify inside the same pass. No such procedure, token semantics, or modified generation loop is specified.
[Abstract] The weakest assumption (action tokens enabling intra-pass history editing without breaking autoregression or requiring external intervention) is load-bearing for the entire contribution. Without a concrete decoding algorithm, attention-mask definition, or proof that the process stays strictly causal, the empirical claim of vulnerability reduction cannot be evaluated as arising from the proposed paradigm rather than from an unstated external revision step.

minor comments (1)

[Abstract] Abstract: the statement 'significantly reduces vulnerabilities with minimal inference overhead' lacks any quantitative baseline comparison, dataset details, or overhead metric; these must be supplied in the main text with explicit tables or figures.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive critique of our manuscript. The comments correctly identify that the abstract and high-level description do not provide sufficient technical detail on the decoding procedure. We will revise the paper to include a formal algorithm, token semantics, and causality argument so that the mechanism can be properly evaluated.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that action tokens enable the model to 'seamlessly backtrack and edit its own history within a single forward pass' while remaining autoregressive is not supported by any described mechanism. Standard causal decoding fixes each token once sampled; any edit to an earlier position requires either discarding the KV cache and re-running from that point, a non-causal attention mask, or an auxiliary buffer the model cannot modify inside the same pass. No such procedure, token semantics, or modified generation loop is specified.

Authors: We agree that the abstract is too high-level and that the current manuscript text does not supply an explicit decoding algorithm or token semantics. In the revision we will add a new subsection (and Algorithm 1) that defines (i) the vocabulary of action tokens (e.g., [REV_k] that signals a revision at depth k), (ii) the generation loop that maintains a causal revision stack inside the same forward pass, and (iii) the strictly lower-triangular attention mask that never attends to future tokens. This will make clear that no external buffer or non-causal operation is required. revision: yes
Referee: [Abstract] The weakest assumption (action tokens enabling intra-pass history editing without breaking autoregression or requiring external intervention) is load-bearing for the entire contribution. Without a concrete decoding algorithm, attention-mask definition, or proof that the process stays strictly causal, the empirical claim of vulnerability reduction cannot be evaluated as arising from the proposed paradigm rather than from an unstated external revision step.

Authors: We accept the referee’s point that the current exposition leaves the source of the observed gains ambiguous. The revised manuscript will contain (a) the exact autoregressive decoding procedure, (b) the formal definition of the causal attention mask, and (c) a short proof sketch showing that every token is still generated conditioned only on previously generated tokens. We will also add an explicit statement that no external agent or post-hoc revision is used; all edits occur inside the single model forward pass via the action-token mechanism. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the proposed Stream of Revision paradigm

full rationale

The paper proposes a new decoding paradigm called Stream of Revision that introduces action tokens to enable intra-pass backtracking and editing during autoregressive code generation. No equations, fitted parameters, derivations, or self-citations are present that reduce any claim to its own inputs by construction. The central contribution is framed as a methodological shift internalizing revision without external tools, supported by empirical results on vulnerability reduction, rather than any mathematical or definitional loop that collapses to prior fitted quantities or self-referential assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that LLMs possess latent revision capabilities that special tokens can activate, plus the introduction of action tokens as a new mechanism; no free parameters or external benchmarks are mentioned in the abstract.

axioms (1)

domain assumption LLMs have intrinsic semantic reasoning capabilities that can be activated for on-the-fly code revision via special tokens
Invoked to justify internalizing the revision loop without external dependencies

invented entities (1)

action tokens no independent evidence
purpose: Enable backtracking and editing of generation history within a single forward pass
Newly introduced construct to turn monotonic decoding into a revisable stream

pith-pipeline@v0.9.0 · 5487 in / 1282 out tokens · 29541 ms · 2026-05-16T08:51:39.242750+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce specific action tokens that enable the model to seamlessly backtrack and edit its own history within a single forward pass... deterministic renderer Φ that acts as a stream interpreter... B←B[:j∗−|s|]⊕s′⊕B[j∗:]
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

revision episode E=τtrig⊕⟨scope⟩s⟨/scope⟩⊕⟨patch⟩s′⟨/patch⟩

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

[1]

MultiPL-E:

doi: 10.1109/TSE.2023.3267446. URL https: //doi.org/10.1109/TSE.2023.3267446. Federico Cassano, John Gouwar, Daniel Nguyen, Sydney Nguyen, Luna Phipps-Costin, Donald Pinckney, Ming- Ho Yee, Yangtian Zi, Carolyn Jane Anderson, Molly Q Feldman, et al. Multipl-e: A scalable and polyglot ap- proach to benchmarking neural code generation.IEEE Transactions on S...

work page doi:10.1109/tse.2023.3267446 2023
[2]

Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim

URL https://openreview.net/forum? id=aJeLhLcsh0. Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. A survey on large language models for code generation.ACM Transactions on Software Engi- neering and Methodology, 2024. Xue Jiang, Yihong Dong, Yongding Tao, Huanyu Liu, Zhi Jin, and Ge Li. Rocode: Integrating backtracking mech- anism and prog...

work page 2024
[3]

In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023

IEEE, 2023. doi: 10.1109/ICSE48619.2023.00055. URL https://doi.org/10.1109/ICSE48619. 2023.00055. Theo X Olausson, Jeevana Priya Inala, Chenglong Wang, Jianfeng Gao, and Armando Solar-Lezama. Is self-repair a silver bullet for code generation?arXiv preprint arXiv:2306.09896, 2023. Kanghee Park, Timothy Zhou, and Loris D’Antoni. Flexi- ble and efficient gr...

work page doi:10.1109/icse48619.2023.00055 2023
[4]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin

URL https://openreview.net/forum? id=aEnkBIhYvO. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Advances in Neural Information Process- ing System...

work page
[5]

Generate-then-Repair

URL https://proceedings.neurips. cc/paper_files/paper/2017/file/ 3f5ee243547dee91fbd053c1c4a845aa-Paper. pdf. Xinchen Wang, Ruida Hu, Cuiyun Gao, Xin-Cheng Wen, Yujia Chen, and Qing Liao. Reposvul: A repository- level high-quality vulnerability dataset. InProceed- ings of the 2024 IEEE/ACM 46th International Con- ference on Software Engineering: Companion...

work page doi:10.1145/3639478.3647634 2017
[6]

Classification:An external critic (or the model itself) evaluates y0 to determine if it contains vulnerabilities (Binary Classification: Secure/Vulnerable)

work page
[7]

Cost Implication:This approach incurs high output token costs as it often rewrites the entire function, doubling the generation cost in the worst case

Global Repair:If marked vulnerable, the model is provided with the original code and a prompt to ”fix the security issue,” resulting in a complete regeneration of the functiony f ix. Cost Implication:This approach incurs high output token costs as it often rewrites the entire function, doubling the generation cost in the worst case. Baseline II: Localized...

work page
[8]

Triggers a backtracking operation to return to a previous context point

Localized Repair:The model is prompted with the code and the specific error location. It generates a patch or a specific replacement for the identified lines only, rather than the whole function. Cost Implication:While this minimizesoutputtokens (generating only the patch), it drastically increasesinputtokens. The model must re-read the full context and o...

work page
[9]

exactly one function is modified

work page
[10]

the modified function contains exactly one hunk The relaxed set retains commits where 16 Autoregressive, Yet Revisable: In Decoding Revision for Secure Code Generation

work page
[11]

the number of modified functions is at most5

work page
[12]

each modified function contains at most5hunks We use these two sets to study how supervision purity affects revision triggering frequency and inference token cost. F.6. General Instruction Replay We mix revision trajectories with a general code instruction dataset to preserve coding utility and calibrate trigger behavior. We apply two filters to the gener...

work page 2025
[13]

because CSRF vulnerabilities are typically caused by a localized missing guard (for example, absent token or origin validation), making them easy to detect during generation and fix with a small just in time revision. I. Case Study Examples In this section, we present detailed case studies illustrating how Stream of Revision effectively identifies and rec...

work page

[1] [1]

MultiPL-E:

doi: 10.1109/TSE.2023.3267446. URL https: //doi.org/10.1109/TSE.2023.3267446. Federico Cassano, John Gouwar, Daniel Nguyen, Sydney Nguyen, Luna Phipps-Costin, Donald Pinckney, Ming- Ho Yee, Yangtian Zi, Carolyn Jane Anderson, Molly Q Feldman, et al. Multipl-e: A scalable and polyglot ap- proach to benchmarking neural code generation.IEEE Transactions on S...

work page doi:10.1109/tse.2023.3267446 2023

[2] [2]

Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim

URL https://openreview.net/forum? id=aJeLhLcsh0. Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. A survey on large language models for code generation.ACM Transactions on Software Engi- neering and Methodology, 2024. Xue Jiang, Yihong Dong, Yongding Tao, Huanyu Liu, Zhi Jin, and Ge Li. Rocode: Integrating backtracking mech- anism and prog...

work page 2024

[3] [3]

In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023

IEEE, 2023. doi: 10.1109/ICSE48619.2023.00055. URL https://doi.org/10.1109/ICSE48619. 2023.00055. Theo X Olausson, Jeevana Priya Inala, Chenglong Wang, Jianfeng Gao, and Armando Solar-Lezama. Is self-repair a silver bullet for code generation?arXiv preprint arXiv:2306.09896, 2023. Kanghee Park, Timothy Zhou, and Loris D’Antoni. Flexi- ble and efficient gr...

work page doi:10.1109/icse48619.2023.00055 2023

[4] [4]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin

URL https://openreview.net/forum? id=aEnkBIhYvO. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Advances in Neural Information Process- ing System...

work page

[5] [5]

Generate-then-Repair

URL https://proceedings.neurips. cc/paper_files/paper/2017/file/ 3f5ee243547dee91fbd053c1c4a845aa-Paper. pdf. Xinchen Wang, Ruida Hu, Cuiyun Gao, Xin-Cheng Wen, Yujia Chen, and Qing Liao. Reposvul: A repository- level high-quality vulnerability dataset. InProceed- ings of the 2024 IEEE/ACM 46th International Con- ference on Software Engineering: Companion...

work page doi:10.1145/3639478.3647634 2017

[6] [6]

Classification:An external critic (or the model itself) evaluates y0 to determine if it contains vulnerabilities (Binary Classification: Secure/Vulnerable)

work page

[7] [7]

Cost Implication:This approach incurs high output token costs as it often rewrites the entire function, doubling the generation cost in the worst case

Global Repair:If marked vulnerable, the model is provided with the original code and a prompt to ”fix the security issue,” resulting in a complete regeneration of the functiony f ix. Cost Implication:This approach incurs high output token costs as it often rewrites the entire function, doubling the generation cost in the worst case. Baseline II: Localized...

work page

[8] [8]

Triggers a backtracking operation to return to a previous context point

Localized Repair:The model is prompted with the code and the specific error location. It generates a patch or a specific replacement for the identified lines only, rather than the whole function. Cost Implication:While this minimizesoutputtokens (generating only the patch), it drastically increasesinputtokens. The model must re-read the full context and o...

work page

[9] [9]

exactly one function is modified

work page

[10] [10]

the modified function contains exactly one hunk The relaxed set retains commits where 16 Autoregressive, Yet Revisable: In Decoding Revision for Secure Code Generation

work page

[11] [11]

the number of modified functions is at most5

work page

[12] [12]

each modified function contains at most5hunks We use these two sets to study how supervision purity affects revision triggering frequency and inference token cost. F.6. General Instruction Replay We mix revision trajectories with a general code instruction dataset to preserve coding utility and calibrate trigger behavior. We apply two filters to the gener...

work page 2025

[13] [13]

because CSRF vulnerabilities are typically caused by a localized missing guard (for example, absent token or origin validation), making them easy to detect during generation and fix with a small just in time revision. I. Case Study Examples In this section, we present detailed case studies illustrating how Stream of Revision effectively identifies and rec...

work page