OPPO derives token-level advantages for LLM RL via Bayesian recursion on oracle signals, recovering prior distillation methods as a special case and showing gains on math and code benchmarks.
Advancing reasoning in large language models: Promising methods and approaches
4 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 4representative citing papers
PRJA achieves 83.6% average success injecting harmful content into LRM reasoning chains on five QA datasets without altering final answers.
A framework encodes observed trajectories and HD maps into tokens for frozen LLMs to perform spatio-temporal reasoning and predict future vehicle paths with a linear decoder.
ARTIST couples agentic reasoning with outcome-based reinforcement learning to let LLMs autonomously invoke tools in multi-turn chains, reporting up to 22% gains on math and function-calling benchmarks.
citing papers explorer
-
OPPO: Bayesian Value Recursion for Token-Level Credit Assignment in LLM Reasoning
OPPO derives token-level advantages for LLM RL via Bayesian recursion on oracle signals, recovering prior distillation methods as a special case and showing gains on math and code benchmarks.
-
Reasoning-targeted Jailbreak Attacks on Large Reasoning Models via Semantic Triggers and Psychological Framing
PRJA achieves 83.6% average success injecting harmful content into LRM reasoning chains on five QA datasets without altering final answers.
-
Frozen LLMs as Map-Aware Spatio-Temporal Reasoners for Vehicle Trajectory Prediction
A framework encodes observed trajectories and HD maps into tokens for frozen LLMs to perform spatio-temporal reasoning and predict future vehicle paths with a linear decoder.
-
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
ARTIST couples agentic reasoning with outcome-based reinforcement learning to let LLMs autonomously invoke tools in multi-turn chains, reporting up to 22% gains on math and function-calling benchmarks.