QAP-Router models qubit routing as dynamic QAP and applies RL with a solution-aware Transformer to cut CNOT counts by 12-30% versus industry compilers on real circuit benchmarks.
Openai gym, 2016
12 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
Develops quotient-categorical representations that render the average-reward distributional Bellman operator well-defined, non-expansive, and convergent under i.i.d. and Markovian sampling.
Introduces the ICT framework and an RL pipeline to train language agent reflectors that distill experience into reusable prompts, outperforming baselines on held-out tasks in ALFWorld and MiniHack.
CTF4Nuclear proposes a common task framework for benchmarking ML methods on nuclear engineering datasets using 12 metrics and a new sparse-measurement system monitoring paradigm.
Constraining a PCGRL generator's action space with locally learned WFC constraints yields visually satisfying and playable puzzle-platform levels with desired global properties.
A Bayesian inference framework with active probing on bipartite graph models of sensor pipelines outperforms baselines for detecting and mitigating sensor attacks in an inverted pendulum system.
GameWorld is a new benchmark providing standardized interfaces, 34 games, 170 tasks, and verifiable outcome metrics to evaluate multimodal large language model agents in video game environments.
Lightweight numerical bandits on text embeddings match or exceed LLM accuracy in contextual bandits at a fraction of the cost, with an embedding-based diagnostic to choose between them.
Template collapse is a distinct failure mode in agentic RL invisible to entropy; mutual information proxies diagnose it better and SNR-aware filtering using reward variance improves input-dependent reasoning and task performance across planning, math, navigation, and code tasks.
Arena introduces a modular Interface design that extends OpenAI Gym wrappers to support complex multi-agent RL scenarios including self-play and cooperative-competitive interactions.
RL policies decompose into information-regularized primitives that compete by requesting state information amounts, with the greediest one acting, yielding better generalization than flat or hierarchical baselines.
RCRC uses untrained random CNNs and reservoir computing plus evolution strategies to reach claimed state-of-the-art scores in reinforcement learning tasks while avoiding data storage and heavy training.
citing papers explorer
-
Arena: a toolkit for Multi-Agent Reinforcement Learning
Arena introduces a modular Interface design that extends OpenAI Gym wrappers to support complex multi-agent RL scenarios including self-play and cooperative-competitive interactions.