hub

and Lewis, Mike , editor =

Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah Smith, Mike Lewis · 2023 · Findings of the Association for Computational Linguistics: EMNLP 2023 · DOI 10.18653/v1/2023.findings-emnlp.378

11 Pith papers cite this work, alongside 152 external citations. Polarity classification is still indexing.

11 Pith papers citing it

152 external citations · Crossref

open at publisher browse 11 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 1 dataset 1

citation-polarity summary

background 1 use dataset 1

representative citing papers

Self-Induced Outcome Potential: Turn-Level Credit Assignment for Agents without Verifiers

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

SIOP enables turn-level credit assignment in LLM agents via semantic clustering of final answers as latent outcomes, improving performance on reasoning benchmarks without verifiers.

How Generative AI Disrupts Search: An Empirical Study of Google Search, Gemini, and AI Overviews

cs.IR · 2026-04-30 · unverdicted · novelty 7.0

AI Overviews and Gemini retrieve substantially different sources than traditional Google search (Jaccard similarity <0.2), favor Google-owned content, appear for 51.5% of queries especially controversial ones, and are less consistent across repeated or slightly edited queries.

Skill-RAG: Failure-State-Aware Retrieval Augmentation via Hidden-State Probing and Skill Routing

cs.CL · 2026-04-17 · unverdicted · novelty 7.0

Skill-RAG detects retrieval failure states from hidden representations and routes to one of four corrective skills to raise accuracy on persistent hard cases in open-domain QA and reasoning benchmarks.

When to Stop Reusing: Dynamic Gradient Gating for Sample-Efficient RLVR

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

Dynamic Gradient Gating monitors lm_head gradient norms to safely reuse rollout batches in RLVR, achieving up to 2.93x sample efficiency and 2.14x wall-clock speedup across math, ALFWorld, WebShop, and QA tasks.

ICRL: Learning to Internalize Self-Critique with Reinforcement Learning

cs.AI · 2026-05-13 · unverdicted · novelty 6.0

ICRL uses joint RL training of solver and critic with distribution-calibration re-weighting and role-wise advantage estimation to internalize critique into unassisted LLM performance, yielding 6.4-point gains on agentic tasks and 7.0 on math reasoning with Qwen3 models.

An Information-Theoretic Criterion for Efficient Data Synthesis

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Synthetic data improves models only in information-open generation-training loops with external signals, and coarser signals like binary correctness enable better generalization by converging to the most information-efficient component.

Verbal-R3: Verbal Reranker as the Missing Bridge between Retrieval and Reasoning

cs.CL · 2026-05-02 · unverdicted · novelty 6.0

Verbal-R3 uses a verbal reranker to generate analytic narratives that guide retrieval and reasoning in LLMs, achieving SOTA results on complex QA benchmarks.

The Power of Power Law: Asymmetry Enables Compositional Reasoning

cs.AI · 2026-04-24 · unverdicted · novelty 6.0

Power-law data sampling creates beneficial asymmetry in the loss landscape that lets models acquire high-frequency skill compositions first, enabling more efficient learning of rare long-tail skills than uniform distributions.

Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers

cs.CL · 2026-04-09 · unverdicted · novelty 6.0

Recurrent-depth transformers achieve systematic generalization and depth extrapolation on implicit reasoning tasks through iterative layer reuse, a three-stage grokking process, and inference-time scaling, while vanilla transformers fail.

Agentic Reinforced Policy Optimization

cs.LG · 2025-07-26 · unverdicted · novelty 6.0

ARPO adds entropy-based adaptive rollouts and stepwise advantage attribution to RL for LLM agents, outperforming prior trajectory-level methods on 13 benchmarks with half the tool budget.

ANCHOR: Abductive Network Construction with Hierarchical Orchestration for Reliable Probability Inference in Large Language Models

cs.CL · 2026-05-11 · 2 refs

citing papers explorer

Showing 11 of 11 citing papers.

Self-Induced Outcome Potential: Turn-Level Credit Assignment for Agents without Verifiers cs.LG · 2026-05-06 · unverdicted · none · ref 27
SIOP enables turn-level credit assignment in LLM agents via semantic clustering of final answers as latent outcomes, improving performance on reasoning benchmarks without verifiers.
How Generative AI Disrupts Search: An Empirical Study of Google Search, Gemini, and AI Overviews cs.IR · 2026-04-30 · unverdicted · none · ref 46
AI Overviews and Gemini retrieve substantially different sources than traditional Google search (Jaccard similarity <0.2), favor Google-owned content, appear for 51.5% of queries especially controversial ones, and are less consistent across repeated or slightly edited queries.
Skill-RAG: Failure-State-Aware Retrieval Augmentation via Hidden-State Probing and Skill Routing cs.CL · 2026-04-17 · unverdicted · none · ref 15
Skill-RAG detects retrieval failure states from hidden representations and routes to one of four corrective skills to raise accuracy on persistent hard cases in open-domain QA and reasoning benchmarks.
When to Stop Reusing: Dynamic Gradient Gating for Sample-Efficient RLVR cs.LG · 2026-05-19 · unverdicted · none · ref 19
Dynamic Gradient Gating monitors lm_head gradient norms to safely reuse rollout batches in RLVR, achieving up to 2.93x sample efficiency and 2.14x wall-clock speedup across math, ALFWorld, WebShop, and QA tasks.
ICRL: Learning to Internalize Self-Critique with Reinforcement Learning cs.AI · 2026-05-13 · unverdicted · none · ref 29
ICRL uses joint RL training of solver and critic with distribution-calibration re-weighting and role-wise advantage estimation to internalize critique into unassisted LLM performance, yielding 6.4-point gains on agentic tasks and 7.0 on math reasoning with Qwen3 models.
An Information-Theoretic Criterion for Efficient Data Synthesis cs.LG · 2026-05-11 · unverdicted · none · ref 25
Synthetic data improves models only in information-open generation-training loops with external signals, and coarser signals like binary correctness enable better generalization by converging to the most information-efficient component.
Verbal-R3: Verbal Reranker as the Missing Bridge between Retrieval and Reasoning cs.CL · 2026-05-02 · unverdicted · none · ref 26
Verbal-R3 uses a verbal reranker to generate analytic narratives that guide retrieval and reasoning in LLMs, achieving SOTA results on complex QA benchmarks.
The Power of Power Law: Asymmetry Enables Compositional Reasoning cs.AI · 2026-04-24 · unverdicted · none · ref 45
Power-law data sampling creates beneficial asymmetry in the loss landscape that lets models acquire high-frequency skill compositions first, enabling more efficient learning of rare long-tail skills than uniform distributions.
Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers cs.CL · 2026-04-09 · unverdicted · none · ref 2
Recurrent-depth transformers achieve systematic generalization and depth extrapolation on implicit reasoning tasks through iterative layer reuse, a three-stage grokking process, and inference-time scaling, while vanilla transformers fail.
Agentic Reinforced Policy Optimization cs.LG · 2025-07-26 · unverdicted · none · ref 3
ARPO adds entropy-based adaptive rollouts and stepwise advantage attribution to RL for LLM agents, outperforming prior trajectory-level methods on 13 benchmarks with half the tool budget.
ANCHOR: Abductive Network Construction with Hierarchical Orchestration for Reliable Probability Inference in Large Language Models cs.CL · 2026-05-11 · unreviewed · ref 11 · 2 links

and Lewis, Mike , editor =

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer