ReTreVal: Reasoning Tree with Validation and Cross-Problem Memory for Large Language Models

· 2026 · cs.AI · arXiv 2601.02880

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Every existing inference-time reasoning framework discards all failure context at problem boundaries, leaving a model solving problem 500 no wiser than it was on problem 1. We present ReTreVal (Reasoning Tree with Validation), a training-free framework that closes this gap through adaptive tree exploration with tool-augmented node refinement, typed-failure backtracking that injects categorized error context into the recovered branch, and a self-rewriting memory that accumulates and revises strategy entries across problems, enabling inference-time cross-problem learning on any fixed, unmodified LLM without fine-tuning. ReTreVal achieves 85.8% pass@1 on MATH-500 (+8.6 pp over Zero-Shot CoT, +8.6 pp over the strongest baseline Self-Refine) and 54.4% on MMLU-Pro (+15.3 pp over Self-Refine), with a 3.4:1 win-to-regression ratio confirming genuine error recovery rather than noise. These capabilities, previously requiring gradient updates, allow a 32B model to compete with much larger single-pass systems.

representative citing papers

GitOfThoughts: Version-Controlled Reasoning and Agent Memory You Can Replay, Diff, and Merge

cs.AI · 2026-06-12 · unverdicted · novelty 6.0

GitOfThoughts stores agent reasoning as a git repo and shows memory from past problems improves accuracy only when new problems are nearly identical (cosine similarity >0.8), with self-consistency providing the main gain on novel tasks.

citing papers explorer

Showing 1 of 1 citing paper.

GitOfThoughts: Version-Controlled Reasoning and Agent Memory You Can Replay, Diff, and Merge cs.AI · 2026-06-12 · unverdicted · none · ref 6 · internal anchor
GitOfThoughts stores agent reasoning as a git repo and shows memory from past problems improves accuracy only when new problems are nearly identical (cosine similarity >0.8), with self-consistency providing the main gain on novel tasks.

ReTreVal: Reasoning Tree with Validation and Cross-Problem Memory for Large Language Models

fields

years

verdicts

representative citing papers

citing papers explorer