24 Published in Transactions on Machine Learning Research (04/2026) A Appendix Contents A.1 Reward Design and PPO Stabilization Sensitivity

Yuqi Zhu, Ge Li, Xue Jiang, Jia Li, Hong Mei, Zhi Jin, Yihong Dong · 2026 · arXiv 2503.15341

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Uncertainty Propagation in LLM-Based Systems

cs.SE · 2026-04-26 · unverdicted · novelty 7.0

This paper introduces a systems-level conceptual framing and a three-level taxonomy (intra-model, system-level, socio-technical) for uncertainty propagation in compound LLM applications, along with engineering insights and open challenges.

Structural Anchors and Reasoning Fragility:Understanding CoT Robustness in LLM4Code

cs.SE · 2026-04-14 · unverdicted · novelty 7.0

CoT prompting in LLM4Code shows mixed robustness that depends on model family, task structure, and perturbations destabilizing structural anchors, leading to trajectory deformations like lengthening, branching, and simplification.

OracleTSC: Oracle-Informed Reward Hurdle and Uncertainty Regularization for Traffic Signal Control

cs.AI · 2026-05-08 · unverdicted · novelty 6.0

OracleTSC introduces a reward hurdle and uncertainty regularization to stabilize LLM-based reinforcement learning for traffic signal control, delivering 75% lower travel time and 67% lower queue length on benchmarks plus cross-intersection generalization.

Improving LLM Code Generation via Requirement-Aware Curriculum Reinforcement Learning

cs.SE · 2026-05-01 · unverdicted · novelty 6.0

REC RL improves LLM code generation by automatically assessing and optimizing requirement difficulty with adaptive curriculum sampling, yielding 1.23-5.62% Pass@1 gains over baselines.

AdaDec: A Uncertainty-Guided Lookahead Decoding Framework for LLM-Based Code Generation

cs.SE · 2025-06-10 · unverdicted · novelty 5.0

AdaDec improves Pass@1 accuracy of LLM code generation by up to 20.9% over greedy decoding by triggering lookahead reranking only at high-uncertainty steps on HumanEval+, MBPP+, and DevEval.

TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning

cs.LG · 2025-05-16 · unverdicted · novelty 5.0

TokUR estimates token-level uncertainty via low-rank weight perturbations in LLMs, aggregates signals to correlate with correctness, and uses them to improve reasoning performance on math tasks.

citing papers explorer

Showing 6 of 6 citing papers.

Uncertainty Propagation in LLM-Based Systems cs.SE · 2026-04-26 · unverdicted · none · ref 73
This paper introduces a systems-level conceptual framing and a three-level taxonomy (intra-model, system-level, socio-technical) for uncertainty propagation in compound LLM applications, along with engineering insights and open challenges.
Structural Anchors and Reasoning Fragility:Understanding CoT Robustness in LLM4Code cs.SE · 2026-04-14 · unverdicted · none · ref 31
CoT prompting in LLM4Code shows mixed robustness that depends on model family, task structure, and perturbations destabilizing structural anchors, leading to trajectory deformations like lengthening, branching, and simplification.
OracleTSC: Oracle-Informed Reward Hurdle and Uncertainty Regularization for Traffic Signal Control cs.AI · 2026-05-08 · unverdicted · none · ref 23
OracleTSC introduces a reward hurdle and uncertainty regularization to stabilize LLM-based reinforcement learning for traffic signal control, delivering 75% lower travel time and 67% lower queue length on benchmarks plus cross-intersection generalization.
Improving LLM Code Generation via Requirement-Aware Curriculum Reinforcement Learning cs.SE · 2026-05-01 · unverdicted · none · ref 61
REC RL improves LLM code generation by automatically assessing and optimizing requirement difficulty with adaptive curriculum sampling, yielding 1.23-5.62% Pass@1 gains over baselines.
AdaDec: A Uncertainty-Guided Lookahead Decoding Framework for LLM-Based Code Generation cs.SE · 2025-06-10 · unverdicted · none · ref 45
AdaDec improves Pass@1 accuracy of LLM code generation by up to 20.9% over greedy decoding by triggering lookahead reranking only at high-uncertainty steps on HumanEval+, MBPP+, and DevEval.
TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning cs.LG · 2025-05-16 · unverdicted · none · ref 49
TokUR estimates token-level uncertainty via low-rank weight perturbations in LLMs, aggregates signals to correlate with correctness, and uses them to improve reasoning performance on math tasks.

24 Published in Transactions on Machine Learning Research (04/2026) A Appendix Contents A.1 Reward Design and PPO Stabilization Sensitivity

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer