PRIME enables agents to proactively reason in user-centric tasks by iteratively evolving structured memories from interaction trajectories without gradient-based training.
Collabllm: From passive responders to active collaborators
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6roles
background 1polarities
background 1representative citing papers
MLReplicate benchmark evaluates six autonomous systems on 45 manuscripts from ICML 2025 papers, finding that automated reviews accept flawed outputs with fabricated claims while human review exposes methodological failures, and that the cheapest system outperforms the most expensive by a wide margin
AI alignment must move beyond assuming users have fully formed goals and instead provide active cognitive support to help form and refine intent over time.
CoFi-PGMA derives a unified counterfactual policy gradient objective based on marginal contribution to correct filtered feedback for both routing and collaborative multi-agent LLM training.
Fine-tuned simulators grounded in real human data produce LLM assistants that win more often against real users than those trained against role-playing simulators.
citing papers explorer
-
PRIME: Training Free Proactive Reasoning via Iterative Memory Evolution for User-Centric Agent
PRIME enables agents to proactively reason in user-centric tasks by iteratively evolving structured memories from interaction trajectories without gradient-based training.
-
MLReplicate: Benchmarking Autonomous Research Systems for Machine Learning Reproducibility
MLReplicate benchmark evaluates six autonomous systems on 45 manuscripts from ICML 2025 papers, finding that automated reviews accept flawed outputs with fabricated claims while human review exposes methodological failures, and that the cheapest system outperforms the most expensive by a wide margin
-
Alignment has a Fantasia Problem
AI alignment must move beyond assuming users have fully formed goals and instead provide active cognitive support to help form and refine intent over time.
-
CoFi-PGMA: Counterfactual Policy Gradients under Filtered Feedback for Multi-Agent LLMs
CoFi-PGMA derives a unified counterfactual policy gradient objective based on marginal contribution to correct filtered feedback for both routing and collaborative multi-agent LLM training.
-
Quantifying the Utility of User Simulators for Building Collaborative LLM Assistants
Fine-tuned simulators grounded in real human data produce LLM assistants that win more often against real users than those trained against role-playing simulators.
- Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic