When to solve, when to verify: Compute-optimal problem solving and generative verification for llm reasoning.arXiv preprint arXiv:2504.01005, 2025

Nishad Singhi, Hritik Bansal, Arian Hosseini, Aditya Grover, Kai-Wei Chang, Marcus Rohrbach, Anna Rohrbach · 2025 · arXiv 2504.01005

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 1 support 1

representative citing papers

SIEVES: Selective Prediction Generalizes through Visual Evidence Scoring

cs.CV · 2026-04-28 · conditional · novelty 6.0 · 2 refs

SIEVES improves selective prediction coverage by up to 3x on OOD VQA benchmarks by training a selector to score the quality of visual evidence produced by reasoner models, generalizing across benchmarks and proprietary models without internal access or per-task retraining.

On the Shelf Life of Fine-Tuned LLM-Judges: Future-Proofing, Backward-Compatibility, and Question Generalization

cs.CL · 2025-09-28 · unverdicted · novelty 6.0

Fine-tuned LLM judges struggle with future-proofing to newer generators but maintain backward-compatibility more easily; DPO training and continual learning improve adaptation while all models degrade on unseen questions.

OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles

cs.CV · 2025-03-21 · conditional · novelty 6.0

Iterative SFT-RL cycles enable a 7B LVLM to develop sophisticated visual chain-of-thought reasoning and improve performance on math and general reasoning benchmarks.

LLM Reasoning Is Latent, Not the Chain of Thought

cs.AI · 2026-04-17 · unverdicted · novelty 5.0

LLM reasoning is primarily mediated by latent-state trajectories rather than by explicit surface chain-of-thought outputs.

citing papers explorer

Showing 4 of 4 citing papers.

SIEVES: Selective Prediction Generalizes through Visual Evidence Scoring cs.CV · 2026-04-28 · conditional · none · ref 37 · 2 links
SIEVES improves selective prediction coverage by up to 3x on OOD VQA benchmarks by training a selector to score the quality of visual evidence produced by reasoner models, generalizing across benchmarks and proprietary models without internal access or per-task retraining.
On the Shelf Life of Fine-Tuned LLM-Judges: Future-Proofing, Backward-Compatibility, and Question Generalization cs.CL · 2025-09-28 · unverdicted · none · ref 33
Fine-tuned LLM judges struggle with future-proofing to newer generators but maintain backward-compatibility more easily; DPO training and continual learning improve adaptation while all models degrade on unseen questions.
OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles cs.CV · 2025-03-21 · conditional · none · ref 64
Iterative SFT-RL cycles enable a 7B LVLM to develop sophisticated visual chain-of-thought reasoning and improve performance on math and general reasoning benchmarks.
LLM Reasoning Is Latent, Not the Chain of Thought cs.AI · 2026-04-17 · unverdicted · none · ref 25
LLM reasoning is primarily mediated by latent-state trajectories rather than by explicit surface chain-of-thought outputs.

When to solve, when to verify: Compute-optimal problem solving and generative verification for llm reasoning.arXiv preprint arXiv:2504.01005, 2025

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer