Dual Latent Memory for Visual Multi-agent System

Bo Yin; Chengming Xu; Cheng Tan; Cheng Yang; Jiangning Zhang; Shuicheng Yan; Xiaobin Hu; Xinlei Yu; Yihao Hu; Yongbo He

arxiv: 2602.00471 · v2 · pith:WVAJGKVRnew · submitted 2026-01-31 · 💻 cs.AI · cs.CV

Dual Latent Memory for Visual Multi-agent System

Xinlei Yu , Chengming Xu , Zhangquan Chen , Bo Yin , Cheng Yang , Yongbo He , Yihao Hu , Jiangning Zhang

show 3 more authors

Cheng Tan Xiaobin Hu Shuicheng Yan

This is my paper

classification 💻 cs.AI cs.CV

keywords whileduallatentmulti-agentcollaborationinformationinter-agentmemories

0 comments

read the original abstract

While Visual Multi-Agent Systems (VMAS) promise to enhance comprehensive abilities through inter-agent collaboration, empirical evidence reveals a counter-intuitive "scaling wall": increasing agent turns often degrades performance while exponentially inflating token costs. We attribute this failure to the information bottleneck inherent in text-centric communication, where converting perceptual and thinking trajectories into discrete natural language inevitably induces semantic loss. To this end, we propose \textbf{L}$\mathbf{^{2}}$\textbf{-VMAS}, a novel model-agnostic framework that enables inter-agent collaboration with dual latent memories. Furthermore, we decouple the perception and thinking while dynamically synthesizing dual latent memories. Additionally, we introduce an entropy-driven proactive triggering that replaces passive information transmission with efficient, on-demand memory access. Extensive experiments among backbones, sizes, and multi-agent structures demonstrate that our method effectively breaks the "scaling wall" with superb scalability, improving average accuracy by 2.7-5.4% while reducing token usage by 21.3-44.8%.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

From Experience to Skill: Multi-Agent Generative Engine Optimization via Reusable Strategy Learning
cs.AI 2026-04 unverdicted novelty 7.0

MAGEO is a multi-agent system that distills validated editing patterns into reusable optimization skills for generative engines, outperforming heuristic baselines on visibility and fidelity via a new benchmark and eva...
SkillGraph: Self-Evolving Multi-Agent Collaboration with Multimodal Graph Topology
cs.AI 2026-04 unverdicted novelty 6.0

SkillGraph jointly evolves agent skills and collaboration topologies in multi-agent vision-language systems using a multimodal graph transformer and a skill designer, yielding consistent performance gains on benchmarks.
Latent Action Reparameterization for Efficient Agent Inference
cs.AI 2026-05 unverdicted novelty 5.0

LAR learns a compact latent action space from trajectories that shortens the effective decision horizon for LLM agents, reducing token count and inference time while preserving task success.