hub Canonical reference

Large language model guided tree-of-thought

Let’s verify step by step · 2024 · arXiv 2305.08291

Canonical reference. 83% of citing Pith papers cite this work as background.

15 Pith papers citing it

Background 83% of classified citations

read on arXiv browse 15 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5 method 1

citation-polarity summary

background 5 use method 1

representative citing papers

Joint Consistency: A Unified Test-Time Aggregation Framework via Energy Minimization

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

Joint Consistency casts test-time aggregation as Ising-type energy minimization with pairwise LLM-judge interactions, subsuming voting methods and outperforming baselines across reasoning tasks.

LLM-X: A Scalable Negotiation-Oriented Exchange for Communication Among Personal LLM Agents

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

LLM-X is a scalable architecture for direct negotiation and communication among personal LLM agents, featuring federated gateways, typed protocols, and policy enforcement, shown stable in experiments with up to 12 agents.

REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding

cs.CV · 2025-11-17 · unverdicted · novelty 6.0

REVISOR adds multimodal visual-text reflection and a Dual Attribution Decoupled Reward to improve long-form video reasoning in MLLMs without extra supervised fine-tuning.

A Roadmap to Pluralistic Alignment

cs.AI · 2024-02-07 · unverdicted · novelty 6.0

The paper formalizes three types of pluralistic AI models and three benchmark classes, arguing that current alignment techniques may reduce rather than increase distributional pluralism.

Protein Thoughts: Interpretable Reasoning with Tree of Thoughts and Embedding-Space Flow Matching for Protein-Protein Interaction Discovery

q-bio.QM · 2026-05-19 · unverdicted · novelty 5.0

Protein Thoughts uses hypothesis-guided entropy-regularized Tree-of-Thoughts search and embedding flow matching to achieve mean best-binder rank 11.2 and 91.08 Micro-F1 on SHS148k by keeping sequence, structure, interface, and chemical signals transparent.

FSFM: A Biologically-Inspired Framework for Selective Forgetting of Agent Memory

cs.AI · 2026-04-22 · unverdicted · novelty 5.0

FSFM is a biologically-inspired selective forgetting framework for LLM agents that claims to boost access efficiency by 8.49%, content quality by 29.2% signal-to-noise, and eliminate security risks entirely through a taxonomy of decay, deletion, safety, and adaptive mechanisms.

Hierarchical Reasoning Model

cs.AI · 2025-06-26 · unverdicted · novelty 5.0

HRM is a recurrent architecture with high-level planning and low-level execution modules that reaches near-perfect accuracy on complex Sudoku, maze navigation, and ARC benchmarks using 27M parameters and 1000 samples without pre-training or CoT supervision.

From Test-taking to Cognitive Scaffolding: A Pedagogical Diagnostic Benchmark for LLMs on English Standardized Tests

cs.CL · 2025-05-17 · unverdicted · novelty 5.0

The paper presents ESTBook, a multimodal benchmark of 10,576 English standardized test questions augmented with formalized cognitive reasoning trajectories and distractor rationales to support diagnostic evaluation of LLMs as tutors.

Agentic Reasoning for Large Language Models

cs.AI · 2026-01-18 · unverdicted · novelty 4.0

The survey structures agentic reasoning for LLMs into foundational, self-evolving, and collective multi-agent layers while distinguishing in-context orchestration from post-training optimization and reviewing applications across domains.

IPS: In-Prompt Process Supervision for Short Video Content Moderation

cs.CL · 2024-12-15 · unverdicted · novelty 4.0

IPS adds sequential reasoning over ancillary questions to MLLM fine-tuning for short video content moderation, outperforming baselines even with model-generated labels.

A Survey of Scaling in Large Language Model Reasoning

cs.AI · 2025-04-02 · unverdicted · novelty 3.0

A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.

Large Language Model Agent: A Survey on Methodology, Applications and Challenges

cs.CL · 2025-03-27 · accept · novelty 3.0

A survey that deconstructs LLM agent systems via a methodology-centered taxonomy linking design principles to emergent behaviors, applications, and challenges.

A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications

cs.AI · 2024-02-05 · unverdicted · novelty 3.0

A systematic survey categorizes prompt engineering methods for LLMs and VLMs by application area, summarizing methodologies, applications, models, datasets, strengths, and limitations for each technique along with a taxonomy and summary table.

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

cs.CV · 2025-03-16 · unverdicted · novelty 2.0

The paper provides the first comprehensive survey of multimodal chain-of-thought reasoning, including foundational concepts, a taxonomy of methodologies, application analyses, challenges, and future directions.

LLM Multi-Agent Systems: Challenges and Open Problems

cs.MA · 2024-02-05 · unverdicted · novelty 2.0

The paper identifies inadequately addressed challenges in optimizing task allocation, fostering robust reasoning through debates, managing layered context, enhancing memory, and applying multi-agent systems to blockchain.

citing papers explorer

Showing 15 of 15 citing papers.

Joint Consistency: A Unified Test-Time Aggregation Framework via Energy Minimization cs.AI · 2026-05-07 · unverdicted · none · ref 26
Joint Consistency casts test-time aggregation as Ising-type energy minimization with pairwise LLM-judge interactions, subsuming voting methods and outperforming baselines across reasoning tasks.
LLM-X: A Scalable Negotiation-Oriented Exchange for Communication Among Personal LLM Agents cs.AI · 2026-05-12 · unverdicted · none · ref 19
LLM-X is a scalable architecture for direct negotiation and communication among personal LLM agents, featuring federated gateways, typed protocols, and policy enforcement, shown stable in experiments with up to 12 agents.
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding cs.CV · 2025-11-17 · unverdicted · none · ref 26
REVISOR adds multimodal visual-text reflection and a Dual Attribution Decoupled Reward to improve long-form video reasoning in MLLMs without extra supervised fine-tuning.
A Roadmap to Pluralistic Alignment cs.AI · 2024-02-07 · unverdicted · none · ref 72
The paper formalizes three types of pluralistic AI models and three benchmark classes, arguing that current alignment techniques may reduce rather than increase distributional pluralism.
Protein Thoughts: Interpretable Reasoning with Tree of Thoughts and Embedding-Space Flow Matching for Protein-Protein Interaction Discovery q-bio.QM · 2026-05-19 · unverdicted · none · ref 17
Protein Thoughts uses hypothesis-guided entropy-regularized Tree-of-Thoughts search and embedding flow matching to achieve mean best-binder rank 11.2 and 91.08 Micro-F1 on SHS148k by keeping sequence, structure, interface, and chemical signals transparent.
FSFM: A Biologically-Inspired Framework for Selective Forgetting of Agent Memory cs.AI · 2026-04-22 · unverdicted · none · ref 41
FSFM is a biologically-inspired selective forgetting framework for LLM agents that claims to boost access efficiency by 8.49%, content quality by 29.2% signal-to-noise, and eliminate security risks entirely through a taxonomy of decay, deletion, safety, and adaptive mechanisms.
Hierarchical Reasoning Model cs.AI · 2025-06-26 · unverdicted · none · ref 65
HRM is a recurrent architecture with high-level planning and low-level execution modules that reaches near-perfect accuracy on complex Sudoku, maze navigation, and ARC benchmarks using 27M parameters and 1000 samples without pre-training or CoT supervision.
From Test-taking to Cognitive Scaffolding: A Pedagogical Diagnostic Benchmark for LLMs on English Standardized Tests cs.CL · 2025-05-17 · unverdicted · none · ref 3
The paper presents ESTBook, a multimodal benchmark of 10,576 English standardized test questions augmented with formalized cognitive reasoning trajectories and distractor rationales to support diagnostic evaluation of LLMs as tutors.
Agentic Reasoning for Large Language Models cs.AI · 2026-01-18 · unverdicted · none · ref 107
The survey structures agentic reasoning for LLMs into foundational, self-evolving, and collective multi-agent layers while distinguishing in-context orchestration from post-training optimization and reviewing applications across domains.
IPS: In-Prompt Process Supervision for Short Video Content Moderation cs.CL · 2024-12-15 · unverdicted · none · ref 1
IPS adds sequential reasoning over ancillary questions to MLLM fine-tuning for short video content moderation, outperforming baselines even with model-generated labels.
A Survey of Scaling in Large Language Model Reasoning cs.AI · 2025-04-02 · unverdicted · none · ref 121
A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.
Large Language Model Agent: A Survey on Methodology, Applications and Challenges cs.CL · 2025-03-27 · accept · none · ref 55
A survey that deconstructs LLM agent systems via a methodology-centered taxonomy linking design principles to emergent behaviors, applications, and challenges.
A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications cs.AI · 2024-02-05 · unverdicted · none · ref 11
A systematic survey categorizes prompt engineering methods for LLMs and VLMs by application area, summarizing methodologies, applications, models, datasets, strengths, and limitations for each technique along with a taxonomy and summary table.
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey cs.CV · 2025-03-16 · unverdicted · none · ref 192
The paper provides the first comprehensive survey of multimodal chain-of-thought reasoning, including foundational concepts, a taxonomy of methodologies, application analyses, challenges, and future directions.
LLM Multi-Agent Systems: Challenges and Open Problems cs.MA · 2024-02-05 · unverdicted · none · ref 9
The paper identifies inadequately addressed challenges in optimizing task allocation, fostering robust reasoning through debates, managing layered context, enhancing memory, and applying multi-agent systems to blockchain.

Large language model guided tree-of-thought

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer