QAP-Router models qubit routing as dynamic QAP and applies RL with a solution-aware Transformer to cut CNOT counts by 12-30% versus industry compilers on real circuit benchmarks.
Openai gym, 2016
12 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
Develops quotient-categorical representations that render the average-reward distributional Bellman operator well-defined, non-expansive, and convergent under i.i.d. and Markovian sampling.
Introduces the ICT framework and an RL pipeline to train language agent reflectors that distill experience into reusable prompts, outperforming baselines on held-out tasks in ALFWorld and MiniHack.
CTF4Nuclear proposes a common task framework for benchmarking ML methods on nuclear engineering datasets using 12 metrics and a new sparse-measurement system monitoring paradigm.
Constraining a PCGRL generator's action space with locally learned WFC constraints yields visually satisfying and playable puzzle-platform levels with desired global properties.
A Bayesian inference framework with active probing on bipartite graph models of sensor pipelines outperforms baselines for detecting and mitigating sensor attacks in an inverted pendulum system.
GameWorld is a new benchmark providing standardized interfaces, 34 games, 170 tasks, and verifiable outcome metrics to evaluate multimodal large language model agents in video game environments.
Lightweight numerical bandits on text embeddings match or exceed LLM accuracy in contextual bandits at a fraction of the cost, with an embedding-based diagnostic to choose between them.
Template collapse is a distinct failure mode in agentic RL invisible to entropy; mutual information proxies diagnose it better and SNR-aware filtering using reward variance improves input-dependent reasoning and task performance across planning, math, navigation, and code tasks.
Arena introduces a modular Interface design that extends OpenAI Gym wrappers to support complex multi-agent RL scenarios including self-play and cooperative-competitive interactions.
RL policies decompose into information-regularized primitives that compete by requesting state information amounts, with the greediest one acting, yielding better generalization than flat or hierarchical baselines.
RCRC uses untrained random CNNs and reservoir computing plus evolution strategies to reach claimed state-of-the-art scores in reinforcement learning tasks while avoiding data storage and heavy training.
citing papers explorer
-
QAP-Router: Tackling Qubit Routing as Dynamic Quadratic Assignment with Reinforcement Learning
QAP-Router models qubit routing as dynamic QAP and applies RL with a solution-aware Transformer to cut CNOT counts by 12-30% versus industry compilers on real circuit benchmarks.
-
Quotient-Categorical Representations for Bellman-Compatible Average-Reward Distributional Reinforcement Learning
Develops quotient-categorical representations that render the average-reward distributional Bellman operator well-defined, non-expansive, and convergent under i.i.d. and Markovian sampling.
-
Training Language Agents to Learn from Experience
Introduces the ICT framework and an RL pipeline to train language agent reflectors that distill experience into reusable prompts, outperforming baselines on held-out tasks in ALFWorld and MiniHack.
-
CTF4Nuclear: Common Task Framework for Nuclear Fission and Fusion Models
CTF4Nuclear proposes a common task framework for benchmarking ML methods on nuclear engineering datasets using 12 metrics and a new sparse-measurement system monitoring paradigm.
-
Learning Local Constraints for Reinforcement-Learned Content Generators
Constraining a PCGRL generator's action space with locally learned WFC constraints yields visually satisfying and playable puzzle-platform levels with desired global properties.
-
Active Bayesian Inference for Robust Control under Sensor False Data Injection Attacks
A Bayesian inference framework with active probing on bipartite graph models of sensor pipelines outperforms baselines for detecting and mitigating sensor attacks in an inverted pendulum system.
-
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents
GameWorld is a new benchmark providing standardized interfaces, 34 games, 170 tasks, and verifiable outcome metrics to evaluate multimodal large language model agents in video game environments.
-
When Do We Need LLMs? A Diagnostic for Language-Driven Bandits
Lightweight numerical bandits on text embeddings match or exceed LLM accuracy in contextual bandits at a fraction of the cost, with an embedding-based diagnostic to choose between them.
-
RAGEN-2: Reasoning Collapse in Agentic RL
Template collapse is a distinct failure mode in agentic RL invisible to entropy; mutual information proxies diagnose it better and SNR-aware filtering using reward variance improves input-dependent reasoning and task performance across planning, math, navigation, and code tasks.
-
Arena: a toolkit for Multi-Agent Reinforcement Learning
Arena introduces a modular Interface design that extends OpenAI Gym wrappers to support complex multi-agent RL scenarios including self-play and cooperative-competitive interactions.
-
Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives
RL policies decompose into information-regularized primitives that compete by requesting state information amounts, with the greediest one acting, yielding better generalization than flat or hierarchical baselines.
-
Convolutional Reservoir Computing for World Models
RCRC uses untrained random CNNs and reservoir computing plus evolution strategies to reach claimed state-of-the-art scores in reinforcement learning tasks while avoiding data storage and heavy training.