Reinforcement learning: A survey.Journal of artificial intelligence research, 4:237–285

Leslie Pack Kaelbling, Michael L Littman, Andrew W Moore · 1996

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

TabQL: In-Context Q-Learning with Tabular Foundation Models

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

TabQL is a reinforcement learning framework that substitutes a tabular foundation model with in-context capabilities for the parametric Q-network in DQN, with a warm-up phase and theoretical analysis claiming improved sample efficiency.

Video-R1: Reinforcing Video Reasoning in MLLMs

cs.CV · 2025-03-27 · conditional · novelty 7.0

Video-R1 uses temporal-aware RL and mixed datasets to boost video reasoning in MLLMs, with a 7B model reaching 37.1% on VSI-Bench and surpassing GPT-4o.

Scaling Retrieval-Augmented Reasoning with Parallel Search and Explicit Merging

cs.AI · 2026-05-13 · unverdicted · novelty 5.0

MultiSearch uses parallel multi-query retrieval plus explicit merging inside a reinforcement-learning loop to improve retrieval-augmented reasoning, outperforming baselines on seven QA benchmarks.

Self-Play Enhancement via Advantage-Weighted Refinement in Online Federated LLM Fine-Tuning with Real-Time Feedback

cs.LG · 2026-05-08 · unverdicted · novelty 5.0

SPEAR enables online federated LLM fine-tuning by using feedback-guided self-play to create contrastive pairs trained with maximum likelihood on correct completions and confidence-weighted unlikelihood on incorrect ones, outperforming baselines without ground-truth contexts.

citing papers explorer

Showing 4 of 4 citing papers.

TabQL: In-Context Q-Learning with Tabular Foundation Models cs.LG · 2026-05-18 · unverdicted · none · ref 1
TabQL is a reinforcement learning framework that substitutes a tabular foundation model with in-context capabilities for the parametric Q-network in DQN, with a warm-up phase and theoretical analysis claiming improved sample efficiency.
Video-R1: Reinforcing Video Reasoning in MLLMs cs.CV · 2025-03-27 · conditional · none · ref 17
Video-R1 uses temporal-aware RL and mixed datasets to boost video reasoning in MLLMs, with a 7B model reaching 37.1% on VSI-Bench and surpassing GPT-4o.
Scaling Retrieval-Augmented Reasoning with Parallel Search and Explicit Merging cs.AI · 2026-05-13 · unverdicted · none · ref 46
MultiSearch uses parallel multi-query retrieval plus explicit merging inside a reinforcement-learning loop to improve retrieval-augmented reasoning, outperforming baselines on seven QA benchmarks.
Self-Play Enhancement via Advantage-Weighted Refinement in Online Federated LLM Fine-Tuning with Real-Time Feedback cs.LG · 2026-05-08 · unverdicted · none · ref 16
SPEAR enables online federated LLM fine-tuning by using feedback-guided self-play to create contrastive pairs trained with maximum likelihood on correct completions and confidence-weighted unlikelihood on incorrect ones, outperforming baselines without ground-truth contexts.

Reinforcement learning: A survey.Journal of artificial intelligence research, 4:237–285

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer