R2IF improves LLM function-calling accuracy by up to 34.62% on BFCL using a composite reward system with CER and SMV components optimized via GRPO, while increasing interpretability through positive CoT effectiveness.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
A conformal procedure for CoT replaces majority voting with weighted aggregation and calibrates abstention to guarantee low confident-error rates, achieving 90.1% selective accuracy on GSM8K by abstaining on under 5% of cases.
APPS approximates power targets p(x)^alpha via parallel particle propagation with proposal-corrected reweighting and future-value-guided selection at block boundaries, improving accuracy-runtime trade-offs in training-free decoding.
citing papers explorer
-
R2IF: Aligning Reasoning with Decisions via Composite Rewards for Interpretable LLM Function Calling
R2IF improves LLM function-calling accuracy by up to 34.62% on BFCL using a composite reward system with CER and SMV components optimized via GRPO, while increasing interpretability through positive CoT effectiveness.
-
Pause and Reflect: Conformal Aggregation for Chain-of-Thought Reasoning
A conformal procedure for CoT replaces majority voting with weighted aggregation and calibrates abstention to guarantee low confident-error rates, achieving 90.1% selective accuracy on GSM8K by abstaining on under 5% of cases.
-
The Model Knows, the Decoder Finds: Future Value Guided Particle Power Sampling
APPS approximates power targets p(x)^alpha via parallel particle propagation with proposal-corrected reweighting and future-value-guided selection at block boundaries, improving accuracy-runtime trade-offs in training-free decoding.