Title resolution pending

Reinforcement Learning: An Introduction , author= · 2018

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Beyond the Bellman Recursion: A Pontryagin-Guided Framework for Non-Exponential Discounting

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

PG-DPO is a new variational framework that replaces Bellman recursion with a Pontryagin-guided adjoint-MC projection for RL under non-exponential discounting and shows gains on hyperbolic and survival benchmarks.

Seeking the Unfamiliar but Memorable: Conceptual Creativity as Meta-Learning

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

Creativity is defined as meta-learning where a frozen diffusion creator optimizes candidates for rapid improvement by an adapting appraiser such as an autoencoder or CLIP adapter.

Tight Sample Complexity Bounds for Entropic Best Policy Identification

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

New concentration bounds and stopping rule close the exponential gap to match the lower bound for entropic best policy identification.

Simpson's Paradox in Behavioral Curves: How Aggregation Distorts Parametric Models of User Dynamics

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

Aggregation distorts parametric behavioral curve peaks by factors of 3-5x via Simpson's paradox and survival bias, shown by individual vs. aggregate comparisons on Goodreads and Amazon datasets with a negative control.

Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Derives contraction-based Q-value extensions for exponential utility and proves almost-sure convergence of two-timescale and one-timescale model-free algorithms in discounted MDPs.

Discovering Reinforcement Learning Interfaces with Large Language Models

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

LIMEN discovers effective RL interfaces by using LLMs to evolve observation and reward programs together from raw state, guided by policy training success, outperforming single-component optimization.

InvEvolve: Evolving White-Box Inventory Policies via Large Language Models with Performance Guarantees

cs.LG · 2026-05-01 · unverdicted · novelty 7.0 · 2 refs

InvEvolve evolves white-box inventory policies from LLMs with statistical safety guarantees and outperforms classical and deep learning methods on synthetic and real retail data.

AEL: Agent Evolving Learning for Open-Ended Environments

cs.CL · 2026-04-23 · conditional · novelty 7.0

AEL uses a fast-timescale bandit for memory policy selection and slow-timescale LLM reflection for causal insights, achieving a Sharpe ratio of 2.13 on a 208-episode portfolio benchmark while showing that added mechanisms degrade performance.

Infra-Bayesian Reinforcement Learning Agents Outperform Classical RL For Worst-Case Robustness

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

Infra-Bayesian RL agents are shown via implementation to have lower worst-case regret than classical RL under model misspecification and to solve Newcomb's problem optimally.

Learning Minimally Rigid Graphs with High Realization Counts

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Reinforcement learning with graph neural networks finds minimally rigid graphs that match known planar realization optima and set new records for spherical realization counts.

Learning to Cut: Reinforcement Learning for Benders Decomposition

math.OC · 2026-05-07 · unverdicted · novelty 6.0

RLBD trains a neural policy with REINFORCE to select cuts adaptively in Benders decomposition, yielding faster convergence and better generalization than standard BD or SVM-based LearnBD on an EV charging problem.

Learning Interpretable Point-Based Clinical Risk Scores via Direct Optimization

stat.ME · 2026-05-18 · unverdicted · novelty 5.0

Develops greedy optimization algorithms for directly learning optimal integer-weighted clinical risk scores, applied to predict post-discharge mortality in a large EHR cohort with a supporting simulation study.

citing papers explorer

Showing 12 of 12 citing papers.

Beyond the Bellman Recursion: A Pontryagin-Guided Framework for Non-Exponential Discounting cs.LG · 2026-05-20 · unverdicted · none · ref 10
PG-DPO is a new variational framework that replaces Bellman recursion with a Pontryagin-guided adjoint-MC projection for RL under non-exponential discounting and shows gains on hyperbolic and survival benchmarks.
Seeking the Unfamiliar but Memorable: Conceptual Creativity as Meta-Learning cs.LG · 2026-05-15 · unverdicted · none · ref 7
Creativity is defined as meta-learning where a frozen diffusion creator optimizes candidates for rapid improvement by an adapting appraiser such as an autoencoder or CLIP adapter.
Tight Sample Complexity Bounds for Entropic Best Policy Identification cs.LG · 2026-05-13 · unverdicted · none · ref 7
New concentration bounds and stopping rule close the exponential gap to match the lower bound for entropic best policy identification.
Simpson's Paradox in Behavioral Curves: How Aggregation Distorts Parametric Models of User Dynamics cs.LG · 2026-05-10 · unverdicted · none · ref 10
Aggregation distorts parametric behavioral curve peaks by factors of 3-5x via Simpson's paradox and survival bias, shown by individual vs. aggregate comparisons on Goodreads and Amazon datasets with a negative control.
Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs cs.LG · 2026-05-08 · unverdicted · none · ref 179
Derives contraction-based Q-value extensions for exponential utility and proves almost-sure convergence of two-timescale and one-timescale model-free algorithms in discounted MDPs.
Discovering Reinforcement Learning Interfaces with Large Language Models cs.LG · 2026-05-05 · unverdicted · none · ref 37
LIMEN discovers effective RL interfaces by using LLMs to evolve observation and reward programs together from raw state, guided by policy training success, outperforming single-component optimization.
InvEvolve: Evolving White-Box Inventory Policies via Large Language Models with Performance Guarantees cs.LG · 2026-05-01 · unverdicted · none · ref 164 · 2 links
InvEvolve evolves white-box inventory policies from LLMs with statistical safety guarantees and outperforms classical and deep learning methods on synthetic and real retail data.
AEL: Agent Evolving Learning for Open-Ended Environments cs.CL · 2026-04-23 · conditional · none · ref 36
AEL uses a fast-timescale bandit for memory policy selection and slow-timescale LLM reflection for causal insights, achieving a Sharpe ratio of 2.13 on a 208-episode portfolio benchmark while showing that added mechanisms degrade performance.
Infra-Bayesian Reinforcement Learning Agents Outperform Classical RL For Worst-Case Robustness cs.LG · 2026-05-22 · unverdicted · none · ref 12
Infra-Bayesian RL agents are shown via implementation to have lower worst-case regret than classical RL under model misspecification and to solve Newcomb's problem optimally.
Learning Minimally Rigid Graphs with High Realization Counts cs.LG · 2026-05-12 · unverdicted · none · ref 22
Reinforcement learning with graph neural networks finds minimally rigid graphs that match known planar realization optima and set new records for spherical realization counts.
Learning to Cut: Reinforcement Learning for Benders Decomposition math.OC · 2026-05-07 · unverdicted · none · ref 37
RLBD trains a neural policy with REINFORCE to select cuts adaptively in Benders decomposition, yielding faster convergence and better generalization than standard BD or SVM-based LearnBD on an EV charging problem.
Learning Interpretable Point-Based Clinical Risk Scores via Direct Optimization stat.ME · 2026-05-18 · unverdicted · none · ref 8
Develops greedy optimization algorithms for directly learning optimal integer-weighted clinical risk scores, applied to predict post-discharge mortality in a large EHR cohort with a supporting simulation study.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer