LATTE coordinates LLM agent teams with an evolving shared task graph, cutting token use, time, and failures while matching or beating accuracy of MetaGPT, leader-worker, and static methods.
Chain-of-thought prompting elicits reasoning in large language models
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 4polarities
background 4representative citing papers
GiGPO adds a hierarchical grouping mechanism to group-based RL so that LLM agents receive both global trajectory and local step-level credit signals, yielding >12% gains on ALFWorld and >9% on WebShop over GRPO while keeping the same rollout and memory footprint.
TS-Reasoner is a domain-oriented agent using LLMs, computational tools, and error feedback for multi-step time series inference, showing better performance than general LLMs on understanding and reasoning benchmarks.
NaVid, a video-based VLM trained on 510k navigation and 763k web samples, achieves SOTA VLN performance using only monocular RGB video for next-step action planning in sim and real environments.
TrustLLM defines eight trustworthiness principles, creates a six-dimension benchmark, and evaluates 16 LLMs showing proprietary models generally lead but some open-source ones are close while over-calibration can hurt utility.
Chat Modeling is a multi-agent LLM framework with modeling memory and dynamic chat widgets that translates text inputs into interactive 3D modeling operations for literature-grounded biological structures.
GPT-4V processes interleaved image-text inputs generically and supports visual referring prompting for new human-AI interaction.
citing papers explorer
-
Improving the Efficiency of Language Agent Teams with Adaptive Task Graphs
LATTE coordinates LLM agent teams with an evolving shared task graph, cutting token use, time, and failures while matching or beating accuracy of MetaGPT, leader-worker, and static methods.
-
Group-in-Group Policy Optimization for LLM Agent Training
GiGPO adds a hierarchical grouping mechanism to group-based RL so that LLM agents receive both global trajectory and local step-level credit signals, yielding >12% gains on ALFWorld and >9% on WebShop over GRPO while keeping the same rollout and memory footprint.
-
TS-Reasoner: Domain-Oriented Time Series Inference Agents for Reasoning and Automated Analysis
TS-Reasoner is a domain-oriented agent using LLMs, computational tools, and error feedback for multi-step time series inference, showing better performance than general LLMs on understanding and reasoning benchmarks.
-
NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation
NaVid, a video-based VLM trained on 510k navigation and 763k web samples, achieves SOTA VLN performance using only monocular RGB video for next-step action planning in sim and real environments.
-
TrustLLM: Trustworthiness in Large Language Models
TrustLLM defines eight trustworthiness principles, creates a six-dimension benchmark, and evaluates 16 LLMs showing proprietary models generally lead but some open-source ones are close while over-calibration can hurt utility.
-
Chat Modeling: Interaction-Enhanced Agent Framework for Visualizing Literature-Grounded Biological Structures
Chat Modeling is a multi-agent LLM framework with modeling memory and dynamic chat widgets that translates text inputs into interactive 3D modeling operations for literature-grounded biological structures.