-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation

Xu, Fangzhi, Yan, Hang, Ma, Chang, Zhao, Haiteng, Liu, Jun, Lin, Qika · 2025 · DOI 10.18653/v1/2025.acl-long.647

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

cs.AI · 2025-09-02 · accept · novelty 6.0

Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.

ThinkBooster: A Unified Framework for Seamless Test-Time Scaling of LLM Reasoning

cs.CL · 2026-06-05 · unverdicted · novelty 5.0

ThinkBooster supplies a modular library, joint performance-efficiency benchmark, and deployable proxy for test-time compute scaling of LLM reasoning on math and coding tasks.

citing papers explorer

Showing 1 of 1 citing paper after filters.

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey cs.AI · 2025-09-02 · accept · none · ref 190
Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.

-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer