pith. sign in

Offline reinforcement learning as one big sequence modeling problem.Advances in neural information processing systems, 34:1273– 1286, 2021

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.LG 2

years

2026 2

verdicts

UNVERDICTED 2

representative citing papers

TabQL: In-Context Q-Learning with Tabular Foundation Models

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

TabQL is a reinforcement learning framework that substitutes a tabular foundation model with in-context capabilities for the parametric Q-network in DQN, with a warm-up phase and theoretical analysis claiming improved sample efficiency.

Goal-Conditioned Supervised Learning for LLM Fine-Tuning

cs.LG · 2026-05-08 · unverdicted · novelty 5.0

GCSL reframes LLM fine-tuning as supervised pursuit of quality thresholds using natural-language goals, outperforming SFT and DPO on toxicity, code, and recommendation tasks.

citing papers explorer

Showing 2 of 2 citing papers.

  • TabQL: In-Context Q-Learning with Tabular Foundation Models cs.LG · 2026-05-18 · unverdicted · none · ref 39

    TabQL is a reinforcement learning framework that substitutes a tabular foundation model with in-context capabilities for the parametric Q-network in DQN, with a warm-up phase and theoretical analysis claiming improved sample efficiency.

  • Goal-Conditioned Supervised Learning for LLM Fine-Tuning cs.LG · 2026-05-08 · unverdicted · none · ref 14

    GCSL reframes LLM fine-tuning as supervised pursuit of quality thresholds using natural-language goals, outperforming SFT and DPO on toxicity, code, and recommendation tasks.