A planner-orchestrator system learns long-horizon image editing by maximizing outcome-based rewards from a vision-language judge and refining plans from successful trajectories.
hub
Advances in neural information processing systems35, 24824–24837 (2022)
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
representative citing papers
LLM-Foraging uses off-the-shelf LLMs for decentralized tactical decisions in CPFA-based swarm foraging, collecting more resources than GA-tuned baselines across 36 varied configurations while showing greater consistency.
SciEval is a new benchmark of expert-annotated K-12 science lessons for LLM-based automatic evaluation, where zero-shot models perform poorly but fine-tuning yields up to 11% gains.
RESP uses reference-guided sequential prompting with VLMs to improve frame-level and video-level visual glitch detection in games by establishing per-video baselines.
TAIHRI is the first task-aware VLM for close-range HRI that localizes metric-scale 3D coordinates of critical keypoints by quantizing space and performing 2D keypoint reasoning via next-token prediction.
A trajectory-aware process reward using DTW on sentence embeddings, combined with exact-match in GRPO after SFT, raises mean medical VQA accuracy from 0.598 to 0.689 across six benchmarks.
UniMesh unifies 3D mesh generation and understanding in one model via a Mesh Head interface, Chain of Mesh iterative editing, and an Actor-Evaluator self-reflection loop.
Subagent architectures deliver stable high-throughput optimization under tight time limits while agent teams enable deeper refactoring at the cost of higher fragility.
LC-RAG augments standard RAG by incorporating environment logs to contextualize student discourse, yielding better retrieval and more relevant guidance from the Copa agent in the C2STEM modeling environment.