A planner-orchestrator system learns long-horizon image editing by maximizing outcome-based rewards from a vision-language judge and refining plans from successful trajectories.
hub
Advances in neural information processing systems35, 24824–24837 (2022)
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
representative citing papers
LLM-Foraging uses off-the-shelf LLMs for decentralized tactical decisions in CPFA-based swarm foraging, collecting more resources than GA-tuned baselines across 36 varied configurations while showing greater consistency.
SciEval is a new benchmark of expert-annotated K-12 science lessons for LLM-based automatic evaluation, where zero-shot models perform poorly but fine-tuning yields up to 11% gains.
RESP uses reference-guided sequential prompting with VLMs to improve frame-level and video-level visual glitch detection in games by establishing per-video baselines.
TAIHRI is the first task-aware VLM for close-range HRI that localizes metric-scale 3D coordinates of critical keypoints by quantizing space and performing 2D keypoint reasoning via next-token prediction.
A trajectory-aware process reward using DTW on sentence embeddings, combined with exact-match in GRPO after SFT, raises mean medical VQA accuracy from 0.598 to 0.689 across six benchmarks.
UniMesh unifies 3D mesh generation and understanding in one model via a Mesh Head interface, Chain of Mesh iterative editing, and an Actor-Evaluator self-reflection loop.
Subagent architectures deliver stable high-throughput optimization under tight time limits while agent teams enable deeper refactoring at the cost of higher fragility.
LC-RAG augments standard RAG by incorporating environment logs to contextualize student discourse, yielding better retrieval and more relevant guidance from the Copa agent in the C2STEM modeling environment.
citing papers explorer
-
From Plans to Pixels: Learning to Plan and Orchestrate for Open-Ended Image Editing
A planner-orchestrator system learns long-horizon image editing by maximizing outcome-based rewards from a vision-language judge and refining plans from successful trajectories.
-
LLM-Foraging: Large Language Models for Decentralized Swarm Robot Foraging
LLM-Foraging uses off-the-shelf LLMs for decentralized tactical decisions in CPFA-based swarm foraging, collecting more resources than GA-tuned baselines across 36 varied configurations while showing greater consistency.
-
SciEval: A Benchmark for Automatic Evaluation of K-12 Science Instructional Materials
SciEval is a new benchmark of expert-annotated K-12 science lessons for LLM-based automatic evaluation, where zero-shot models perform poorly but fine-tuning yields up to 11% gains.
-
RESP: Reference-guided Sequential Prompting for Visual Glitch Detection in Video Games
RESP uses reference-guided sequential prompting with VLMs to improve frame-level and video-level visual glitch detection in games by establishing per-video baselines.
-
TAIHRI: Task-Aware 3D Human Keypoints Localization for Close-Range Human-Robot Interaction
TAIHRI is the first task-aware VLM for close-range HRI that localizes metric-scale 3D coordinates of critical keypoints by quantizing space and performing 2D keypoint reasoning via next-token prediction.
-
Improving Medical VQA through Trajectory-Aware Process Supervision
A trajectory-aware process reward using DTW on sentence embeddings, combined with exact-match in GRPO after SFT, raises mean medical VQA accuracy from 0.598 to 0.689 across six benchmarks.
-
UniMesh: Unifying 3D Mesh Understanding and Generation
UniMesh unifies 3D mesh generation and understanding in one model via a Mesh Head interface, Chain of Mesh iterative editing, and an Actor-Evaluator self-reflection loop.
-
An Empirical Study of Multi-Agent Collaboration for Automated Research
Subagent architectures deliver stable high-throughput optimization under tight time limits while agent teams enable deeper refactoring at the cost of higher fragility.
-
Personalizing Student-Agent Interactions Using Log-Contextualized Retrieval-Augmented Generation (RAG)
LC-RAG augments standard RAG by incorporating environment logs to contextualize student discourse, yielding better retrieval and more relevant guidance from the Copa agent in the C2STEM modeling environment.
- Look-Closer-Then-Diagnose: Confidence-Aware Ultrasound VQA via Active Zooming
- MedSynapse-V: Bridging Visual Perception and Clinical Intuition via Latent Memory Evolution