SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.
Mixed citations
Text2reward: Automated dense reward function generation for reinforcement learning
Mixed citation behavior. Most common role is background (67%).
citation-role summary
citation-polarity summary
representative citing papers
Iterative search over reward functions with ranked feedback in GRPO training improves LLM math reasoning, achieving F1 of 0.795 on GSM8K versus 0.609 for baseline.
PR-CAD unifies text-to-CAD generation and editing via progressive refinement with LLMs, a new interaction dataset, and RL-enhanced reasoning to achieve better controllability and faithfulness.
Closed-loop prompt-based translation with hierarchical verification and iterative repair produces equivalent high-performance RL environments across five cases including new TCGJax.
VoxPoser uses LLMs to compose 3D value maps via VLM interaction for model-based synthesis of robust robot trajectories on open-set language-specified manipulation tasks.
EvoNav automates the design of reward functions for RL robot navigation by evolving LLM proposals through a three-stage cheap-to-expensive evaluation process and claims better policies than hand-crafted or prior automated rewards.
LLM-TALE steers RL exploration using LLM-generated plans at task and affordance levels with online suboptimality correction, improving sample efficiency and success rates on pick-and-place tasks without human supervision.
This survey frames foundation agents using brain-inspired modular architectures and reviews challenges in evolution, collaboration, and safety.
citing papers explorer
-
SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning
SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.
-
Enhanced LLM Reasoning by Optimizing Reward Functions with Search-Driven Reinforcement Learning
Iterative search over reward functions with ranked feedback in GRPO training improves LLM math reasoning, achieving F1 of 0.795 on GSM8K versus 0.609 for baseline.
-
PR-CAD: Progressive Refinement for Unified Controllable and Faithful Text-to-CAD Generation with Large Language Models
PR-CAD unifies text-to-CAD generation and editing via progressive refinement with LLMs, a new interaction dataset, and RL-enhanced reasoning to achieve better controllability and faithfulness.
-
Automatic Generation of High-Performance RL Environments
Closed-loop prompt-based translation with hierarchical verification and iterative repair produces equivalent high-performance RL environments across five cases including new TCGJax.
-
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
VoxPoser uses LLMs to compose 3D value maps via VLM interaction for model-based synthesis of robust robot trajectories on open-set language-specified manipulation tasks.
-
EvoNav: Evolutionary Reward Function Design for Robot Navigation with Large Language Models
EvoNav automates the design of reward functions for RL robot navigation by evolving LLM proposals through a three-stage cheap-to-expensive evaluation process and claims better policies than hand-crafted or prior automated rewards.
-
LLM-Guided Task- and Affordance-Level Exploration in Reinforcement Learning
LLM-TALE steers RL exploration using LLM-generated plans at task and affordance levels with online suboptimality correction, improving sample efficiency and success rates on pick-and-place tasks without human supervision.
-
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
This survey frames foundation agents using brain-inspired modular architectures and reviews challenges in evolution, collaboration, and safety.