Offline reinforcement learning as one big sequence modeling problem.Advances in neural information processing systems, 34:1273– 1286, 2021

Michael Janner, Qiyang Li, Sergey Levine · 2021

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

TabQL: In-Context Q-Learning with Tabular Foundation Models

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

TabQL is a reinforcement learning framework that substitutes a tabular foundation model with in-context capabilities for the parametric Q-network in DQN, with a warm-up phase and theoretical analysis claiming improved sample efficiency.

Goal-Conditioned Supervised Learning for LLM Fine-Tuning

cs.LG · 2026-05-08 · unverdicted · novelty 5.0

GCSL reframes LLM fine-tuning as supervised pursuit of quality thresholds using natural-language goals, outperforming SFT and DPO on toxicity, code, and recommendation tasks.

citing papers explorer

Showing 2 of 2 citing papers.

TabQL: In-Context Q-Learning with Tabular Foundation Models cs.LG · 2026-05-18 · unverdicted · none · ref 39
TabQL is a reinforcement learning framework that substitutes a tabular foundation model with in-context capabilities for the parametric Q-network in DQN, with a warm-up phase and theoretical analysis claiming improved sample efficiency.
Goal-Conditioned Supervised Learning for LLM Fine-Tuning cs.LG · 2026-05-08 · unverdicted · none · ref 14
GCSL reframes LLM fine-tuning as supervised pursuit of quality thresholds using natural-language goals, outperforming SFT and DPO on toxicity, code, and recommendation tasks.

Offline reinforcement learning as one big sequence modeling problem.Advances in neural information processing systems, 34:1273– 1286, 2021

fields

years

verdicts

representative citing papers

citing papers explorer