Openai gym, 2016

Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, Wojciech Zaremba · 2016

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

browse 12 citing papers

citation-role summary

method 2 background 1 dataset 1

citation-polarity summary

use method 2 unclear 1 use dataset 1

representative citing papers

QAP-Router: Tackling Qubit Routing as Dynamic Quadratic Assignment with Reinforcement Learning

quant-ph · 2026-05-12 · unverdicted · novelty 7.0

QAP-Router models qubit routing as dynamic QAP and applies RL with a solution-aware Transformer to cut CNOT counts by 12-30% versus industry compilers on real circuit benchmarks.

Quotient-Categorical Representations for Bellman-Compatible Average-Reward Distributional Reinforcement Learning

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Develops quotient-categorical representations that render the average-reward distributional Bellman operator well-defined, non-expansive, and convergent under i.i.d. and Markovian sampling.

Training Language Agents to Learn from Experience

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

Introduces the ICT framework and an RL pipeline to train language agent reflectors that distill experience into reusable prompts, outperforming baselines on held-out tasks in ALFWorld and MiniHack.

CTF4Nuclear: Common Task Framework for Nuclear Fission and Fusion Models

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

CTF4Nuclear proposes a common task framework for benchmarking ML methods on nuclear engineering datasets using 12 metrics and a new sparse-measurement system monitoring paradigm.

Learning Local Constraints for Reinforcement-Learned Content Generators

cs.AI · 2026-05-13 · unverdicted · novelty 6.0

Constraining a PCGRL generator's action space with locally learned WFC constraints yields visually satisfying and playable puzzle-platform levels with desired global properties.

Active Bayesian Inference for Robust Control under Sensor False Data Injection Attacks

cs.LG · 2026-04-13 · unverdicted · novelty 6.0

A Bayesian inference framework with active probing on bipartite graph models of sensor pipelines outperforms baselines for detecting and mitigating sensor attacks in an inverted pendulum system.

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

GameWorld is a new benchmark providing standardized interfaces, 34 games, 170 tasks, and verifiable outcome metrics to evaluate multimodal large language model agents in video game environments.

When Do We Need LLMs? A Diagnostic for Language-Driven Bandits

cs.AI · 2026-04-07 · unverdicted · novelty 6.0

Lightweight numerical bandits on text embeddings match or exceed LLM accuracy in contextual bandits at a fraction of the cost, with an embedding-based diagnostic to choose between them.

RAGEN-2: Reasoning Collapse in Agentic RL

cs.LG · 2026-04-07 · unverdicted · novelty 6.0

Template collapse is a distinct failure mode in agentic RL invisible to entropy; mutual information proxies diagnose it better and SNR-aware filtering using reward variance improves input-dependent reasoning and task performance across planning, math, navigation, and code tasks.

Arena: a toolkit for Multi-Agent Reinforcement Learning

cs.LG · 2019-07-20 · accept · novelty 6.0

Arena introduces a modular Interface design that extends OpenAI Gym wrappers to support complex multi-agent RL scenarios including self-play and cooperative-competitive interactions.

Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives

cs.LG · 2019-06-25 · unverdicted · novelty 6.0

RL policies decompose into information-regularized primitives that compete by requesting state information amounts, with the greediest one acting, yielding better generalization than flat or hierarchical baselines.

Convolutional Reservoir Computing for World Models

cs.LG · 2019-07-18 · unverdicted · novelty 4.0

RCRC uses untrained random CNNs and reservoir computing plus evolution strategies to reach claimed state-of-the-art scores in reinforcement learning tasks while avoiding data storage and heavy training.

citing papers explorer

Showing 12 of 12 citing papers.

QAP-Router: Tackling Qubit Routing as Dynamic Quadratic Assignment with Reinforcement Learning quant-ph · 2026-05-12 · unverdicted · none · ref 10
QAP-Router models qubit routing as dynamic QAP and applies RL with a solution-aware Transformer to cut CNOT counts by 12-30% versus industry compilers on real circuit benchmarks.
Quotient-Categorical Representations for Bellman-Compatible Average-Reward Distributional Reinforcement Learning cs.LG · 2026-05-11 · unverdicted · none · ref 12
Develops quotient-categorical representations that render the average-reward distributional Bellman operator well-defined, non-expansive, and convergent under i.i.d. and Markovian sampling.
Training Language Agents to Learn from Experience cs.LG · 2026-05-19 · unverdicted · none · ref 2
Introduces the ICT framework and an RL pipeline to train language agent reflectors that distill experience into reusable prompts, outperforming baselines on held-out tasks in ALFWorld and MiniHack.
CTF4Nuclear: Common Task Framework for Nuclear Fission and Fusion Models cs.LG · 2026-05-15 · unverdicted · none · ref 8
CTF4Nuclear proposes a common task framework for benchmarking ML methods on nuclear engineering datasets using 12 metrics and a new sparse-measurement system monitoring paradigm.
Learning Local Constraints for Reinforcement-Learned Content Generators cs.AI · 2026-05-13 · unverdicted · none · ref 3
Constraining a PCGRL generator's action space with locally learned WFC constraints yields visually satisfying and playable puzzle-platform levels with desired global properties.
Active Bayesian Inference for Robust Control under Sensor False Data Injection Attacks cs.LG · 2026-04-13 · unverdicted · none · ref 16
A Bayesian inference framework with active probing on bipartite graph models of sensor pipelines outperforms baselines for detecting and mitigating sensor attacks in an inverted pendulum system.
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents cs.CV · 2026-04-08 · unverdicted · none · ref 12
GameWorld is a new benchmark providing standardized interfaces, 34 games, 170 tasks, and verifiable outcome metrics to evaluate multimodal large language model agents in video game environments.
When Do We Need LLMs? A Diagnostic for Language-Driven Bandits cs.AI · 2026-04-07 · unverdicted · none · ref 8
Lightweight numerical bandits on text embeddings match or exceed LLM accuracy in contextual bandits at a fraction of the cost, with an embedding-based diagnostic to choose between them.
RAGEN-2: Reasoning Collapse in Agentic RL cs.LG · 2026-04-07 · unverdicted · none · ref 2
Template collapse is a distinct failure mode in agentic RL invisible to entropy; mutual information proxies diagnose it better and SNR-aware filtering using reward variance improves input-dependent reasoning and task performance across planning, math, navigation, and code tasks.
Arena: a toolkit for Multi-Agent Reinforcement Learning cs.LG · 2019-07-20 · accept · none · ref 6
Arena introduces a modular Interface design that extends OpenAI Gym wrappers to support complex multi-agent RL scenarios including self-play and cooperative-competitive interactions.
Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives cs.LG · 2019-06-25 · unverdicted · none · ref 5
RL policies decompose into information-regularized primitives that compete by requesting state information amounts, with the greediest one acting, yielding better generalization than flat or hierarchical baselines.
Convolutional Reservoir Computing for World Models cs.LG · 2019-07-18 · unverdicted · none · ref 36
RCRC uses untrained random CNNs and reservoir computing plus evolution strategies to reach claimed state-of-the-art scores in reinforcement learning tasks while avoiding data storage and heavy training.

Openai gym, 2016

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer