Title resolution pending

Reinforcement Learning: An Introduction , author= · 2018

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Beyond the Bellman Recursion: A Pontryagin-Guided Framework for Non-Exponential Discounting

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

PG-DPO is a new variational framework that replaces Bellman recursion with a Pontryagin-guided adjoint-MC projection for RL under non-exponential discounting and shows gains on hyperbolic and survival benchmarks.

Seeking the Unfamiliar but Memorable: Conceptual Creativity as Meta-Learning

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

Creativity is defined as meta-learning where a frozen diffusion creator optimizes candidates for rapid improvement by an adapting appraiser such as an autoencoder or CLIP adapter.

Tight Sample Complexity Bounds for Entropic Best Policy Identification

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

New concentration bounds and stopping rule close the exponential gap to match the lower bound for entropic best policy identification.

Simpson's Paradox in Behavioral Curves: How Aggregation Distorts Parametric Models of User Dynamics

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

Aggregation distorts parametric behavioral curve peaks by factors of 3-5x via Simpson's paradox and survival bias, shown by individual vs. aggregate comparisons on Goodreads and Amazon datasets with a negative control.

Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Derives contraction-based Q-value extensions for exponential utility and proves almost-sure convergence of two-timescale and one-timescale model-free algorithms in discounted MDPs.

Discovering Reinforcement Learning Interfaces with Large Language Models

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

LIMEN discovers effective RL interfaces by using LLMs to evolve observation and reward programs together from raw state, guided by policy training success, outperforming single-component optimization.

AEL: Agent Evolving Learning for Open-Ended Environments

cs.CL · 2026-04-23 · conditional · novelty 7.0

AEL uses a fast-timescale bandit for memory policy selection and slow-timescale LLM reflection for causal insights, achieving a Sharpe ratio of 2.13 on a 208-episode portfolio benchmark while showing that added mechanisms degrade performance.

Infra-Bayesian Reinforcement Learning Agents Outperform Classical RL For Worst-Case Robustness

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

An infra-Bayesian RL agent is implemented that achieves lower worst-case regret than classical RL agents in environments with Knightian uncertainty and selects the optimal action in Newcomb's problem.

Learning Minimally Rigid Graphs with High Realization Counts

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Reinforcement learning with graph neural networks finds minimally rigid graphs that match known planar realization optima and set new records for spherical realization counts.

Learning to Cut: Reinforcement Learning for Benders Decomposition

math.OC · 2026-05-07 · unverdicted · novelty 6.0

RLBD trains a neural policy with REINFORCE to select cuts adaptively in Benders decomposition, yielding faster convergence and better generalization than standard BD or SVM-based LearnBD on an EV charging problem.

InvEvolve: Evolving White-Box Inventory Policies via Large Language Models with Performance Guarantees

cs.LG · 2026-05-01 · unverdicted · novelty 6.0 · 2 refs

InvEvolve evolves inventory policies using LLMs with RL and provides statistical safety guarantees, outperforming classical and DL methods on synthetic and real data.

Learning Interpretable Point-Based Clinical Risk Scores via Direct Optimization

stat.ME · 2026-05-18 · unverdicted · novelty 5.0

Develops greedy optimization algorithms for directly learning optimal integer-weighted clinical risk scores, applied to predict post-discharge mortality in a large EHR cohort with a supporting simulation study.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer