ReTreVal: Reasoning Tree with Validation and Cross-Problem Memory for Large Language Models

Abhishek HS; Arpit Jain; Ashwanth Krishnan; Pavan C Shekar

arxiv: 2601.02880 · v3 · pith:IA44P4LGnew · submitted 2026-01-06 · 💻 cs.AI · cs.CL

ReTreVal: Reasoning Tree with Validation and Cross-Problem Memory for Large Language Models

Abhishek HS , Pavan C Shekar , Arpit Jain , Ashwanth Krishnan This is my paper

classification 💻 cs.AI cs.CL

keywords problemreasoningretrevaltreecontextcross-problemerrorframework

0 comments

read the original abstract

Every existing inference-time reasoning framework discards all failure context at problem boundaries, leaving a model solving problem 500 no wiser than it was on problem 1. We present ReTreVal (Reasoning Tree with Validation), a training-free framework that closes this gap through adaptive tree exploration with tool-augmented node refinement, typed-failure backtracking that injects categorized error context into the recovered branch, and a self-rewriting memory that accumulates and revises strategy entries across problems, enabling inference-time cross-problem learning on any fixed, unmodified LLM without fine-tuning. ReTreVal achieves 85.8% pass@1 on MATH-500 (+8.6 pp over Zero-Shot CoT, +8.6 pp over the strongest baseline Self-Refine) and 54.4% on MMLU-Pro (+15.3 pp over Self-Refine), with a 3.4:1 win-to-regression ratio confirming genuine error recovery rather than noise. These capabilities, previously requiring gradient updates, allow a 32B model to compete with much larger single-pass systems.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

GitOfThoughts: Version-Controlled Reasoning and Agent Memory You Can Replay, Diff, and Merge
cs.AI 2026-06 unverdicted novelty 6.0

GitOfThoughts stores agent reasoning as a git repo and shows memory from past problems improves accuracy only when new problems are nearly identical (cosine similarity >0.8), with self-consistency providing the main g...