NLI autonomously discovers a vocabulary of primitive operations and interprets variable-length programs via a neural executor, allowing end-to-end training and gradient-based test-time adaptation that outperforms prior methods on combinatorial generalization tasks.
Artificial intelligence , volume=
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
ClawForge is a generator framework that creates reproducible executable benchmarks for command-line agents under state conflict, with ClawForge-Bench showing frontier models reach at most 45.3% strict accuracy and that state inspection drives most performance gaps.
UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.
LEO enables efficient all-goals learning in goal-conditioned RL by jointly predicting for all goals in one network pass, yielding >250x speedup over relabelling and better performance on Craftax.
Ada-Diffuser is a causal diffusion model that jointly learns observed interaction structure and underlying latent dynamics from minimal observations for adaptive planning and policy learning.
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.
citing papers explorer
-
Gradient-Based Program Synthesis with Neurally Interpreted Languages
NLI autonomously discovers a vocabulary of primitive operations and interprets variable-length programs via a neural executor, allowing end-to-end training and gradient-based test-time adaptation that outperforms prior methods on combinatorial generalization tasks.
-
ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents
ClawForge is a generator framework that creates reproducible executable benchmarks for command-line agents under state conflict, with ClawForge-Bench showing frontier models reach at most 45.3% strict accuracy and that state inspection drives most performance gaps.
-
Learning Interactive Real-World Simulators
UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.
-
Goal-Conditioned Agents that Learn Everything All at Once
LEO enables efficient all-goals learning in goal-conditioned RL by jointly predicting for all goals in one network pass, yielding >250x speedup over relabelling and better performance on Craftax.
-
Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making
Ada-Diffuser is a causal diffusion model that jointly learns observed interaction structure and underlying latent dynamics from minimal observations for adaptive planning and policy learning.
-
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.