Constructing a multi-hop qa dataset for comprehensive evaluation of reasoning steps

Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, Akiko Aizawa · 2020

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

browse 9 citing papers

citation-role summary

dataset 3 background 1

citation-polarity summary

use dataset 3 background 1

representative citing papers

Inference-Time Budget Control for LLM Search Agents

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

A VOI-based controller for dual inference budgets improves multi-hop QA performance by prioritizing search actions and selectively finalizing answers.

Information Gain-based Rollout Policy Optimization: An Adaptive Tree-Structured Rollout Approach for Multi-Turn LLM Agents

cs.AI · 2026-07-07 · conditional · novelty 6.0

IGRPO allocates multi-turn LLM agent rollout budget proportional to intermediate-state information gain, inducing an exponentially tilted teacher distribution for policy optimization.

PiCA: Pivot-Based Credit Assignment for Search Agentic Reinforcement Learning

cs.AI · 2026-05-10 · unverdicted · novelty 6.0 · 2 refs

PiCA uses pivot-based potential rewards derived from historical sub-queries to supply trajectory-aware step guidance in agentic RL, delivering 15% gains on QA benchmarks for 3B/7B models.

SearchSkill: Teaching LLMs to Use Search Tools with Evolving Skill Banks

cs.AI · 2026-05-09 · unverdicted · novelty 6.0 · 3 refs

SearchSkill improves LLM query planning on knowledge QA by using explicit skill selection from an evolving SkillBank and a two-stage SFT process that aligns training with inference-time skill-grounded execution.

Uno-Orchestra: Parsimonious Agent Routing via Selective Delegation

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

A learned orchestration policy for LLM agents that jointly optimizes task decomposition and selective routing to (model, primitive) pairs, delivering 77% macro pass@1 at 10x lower cost than strong baselines across 13 benchmarks.

SpecHop: Continuous Speculation for Accelerating Multi-Hop Retrieval Agents

cs.CL · 2026-05-21 · unverdicted · novelty 5.0

SpecHop accelerates multi-hop LLM tool use via continuous multi-threaded speculation with asynchronous verification, approaching oracle latency gains and reducing latency up to 40% on retrieval tasks.

Scaling Retrieval-Augmented Reasoning with Parallel Search and Explicit Merging

cs.AI · 2026-05-13 · unverdicted · novelty 5.0

MultiSearch uses parallel multi-query retrieval plus explicit merging inside a reinforcement-learning loop to improve retrieval-augmented reasoning, outperforming baselines on seven QA benchmarks.

MemReread: Enhancing Agentic Long-Context Reasoning via Memory-Guided Rereading

cs.CL · 2026-05-11 · unverdicted · novelty 5.0

MemReread improves agent long-context reasoning by triggering rereading on insufficient final memory to recover discarded indirect facts, outperforming baselines at linear complexity.

JoyAI-LLM Flash: Advancing Mid-Scale LLMs with Token Efficiency

cs.CL · 2026-04-03 · unverdicted · novelty 5.0

JoyAI-LLM Flash delivers a 48B MoE LLM with 2.7B active parameters per token via FiberPO RL and dense multi-token prediction, released with checkpoints on Hugging Face.

citing papers explorer

Showing 9 of 9 citing papers.

Inference-Time Budget Control for LLM Search Agents cs.AI · 2026-05-07 · unverdicted · none · ref 13
A VOI-based controller for dual inference budgets improves multi-hop QA performance by prioritizing search actions and selectively finalizing answers.
Information Gain-based Rollout Policy Optimization: An Adaptive Tree-Structured Rollout Approach for Multi-Turn LLM Agents cs.AI · 2026-07-07 · conditional · none · ref 32
IGRPO allocates multi-turn LLM agent rollout budget proportional to intermediate-state information gain, inducing an exponentially tilted teacher distribution for policy optimization.
PiCA: Pivot-Based Credit Assignment for Search Agentic Reinforcement Learning cs.AI · 2026-05-10 · unverdicted · none · ref 7 · 2 links
PiCA uses pivot-based potential rewards derived from historical sub-queries to supply trajectory-aware step guidance in agentic RL, delivering 15% gains on QA benchmarks for 3B/7B models.
SearchSkill: Teaching LLMs to Use Search Tools with Evolving Skill Banks cs.AI · 2026-05-09 · unverdicted · none · ref 5 · 3 links
SearchSkill improves LLM query planning on knowledge QA by using explicit skill selection from an evolving SkillBank and a two-stage SFT process that aligns training with inference-time skill-grounded execution.
Uno-Orchestra: Parsimonious Agent Routing via Selective Delegation cs.AI · 2026-05-06 · unverdicted · none · ref 26
A learned orchestration policy for LLM agents that jointly optimizes task decomposition and selective routing to (model, primitive) pairs, delivering 77% macro pass@1 at 10x lower cost than strong baselines across 13 benchmarks.
SpecHop: Continuous Speculation for Accelerating Multi-Hop Retrieval Agents cs.CL · 2026-05-21 · unverdicted · none · ref 6
SpecHop accelerates multi-hop LLM tool use via continuous multi-threaded speculation with asynchronous verification, approaching oracle latency gains and reducing latency up to 40% on retrieval tasks.
Scaling Retrieval-Augmented Reasoning with Parallel Search and Explicit Merging cs.AI · 2026-05-13 · unverdicted · none · ref 22
MultiSearch uses parallel multi-query retrieval plus explicit merging inside a reinforcement-learning loop to improve retrieval-augmented reasoning, outperforming baselines on seven QA benchmarks.
MemReread: Enhancing Agentic Long-Context Reasoning via Memory-Guided Rereading cs.CL · 2026-05-11 · unverdicted · none · ref 19
MemReread improves agent long-context reasoning by triggering rereading on insufficient final memory to recover discarded indirect facts, outperforming baselines at linear complexity.
JoyAI-LLM Flash: Advancing Mid-Scale LLMs with Token Efficiency cs.CL · 2026-04-03 · unverdicted · none · ref 60
JoyAI-LLM Flash delivers a 48B MoE LLM with 2.7B active parameters per token via FiberPO RL and dense multi-token prediction, released with checkpoints on Hugging Face.

Constructing a multi-hop qa dataset for comprehensive evaluation of reasoning steps

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer