{"total":17,"items":[{"citing_arxiv_id":"2606.00405","ref_index":17,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"From Talking Words to Sharing Thoughts: Scalable Multi-LLM Aggregation via Structured Message Passing","primary_cat":"cs.GT","submitted_at":"2026-05-29T22:47:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"UNKNOWN","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A bipartite factor graph with message-passing protocol and asymmetric damping aggregates multi-LLM predictions, cutting token use by 97% and API calls by 6X while outperforming baselines on MMLU, MMLU-Pro, GPQA, and MedMCQA.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22504","ref_index":45,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"LACO: Adaptive Latent Communication for Collaborative Driving","primary_cat":"cs.AI","submitted_at":"2026-05-21T13:54:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LACO introduces Iterative Latent Deliberation, Cross-Horizon Saliency Attribution, and Structured Semantic Knowledge Distillation to enable low-latency latent communication in collaborative driving while preserving performance in CARLA simulations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17467","ref_index":38,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"VerifyMAS: Hypothesis Verification for Failure Attribution in LLM Multi-Agent Systems","primary_cat":"cs.CL","submitted_at":"2026-05-17T14:09:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"VerifyMAS improves failure attribution in LLM multi-agent systems via hypothesis verification on full trajectories, error taxonomy-based data construction, and fine-tuned verifier models, outperforming prior direct-prediction methods on Aegis-Bench and Who&When.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16471","ref_index":160,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"From AI-Generated Content to Agentic Action: Security and Safety Threats in Generative AI","primary_cat":"cs.CR","submitted_at":"2026-05-15T13:53:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"The paper analyzes evolving security and safety threats in generative AI from content generation to agentic actions, noting that attack surfaces expand faster than defenses and that many safeguards require institutional coordination not yet in place.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15622","ref_index":108,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Position: Zeroth-Order Optimization in Deep Learning Is Underexplored, Not Underpowered","primary_cat":"cs.LG","submitted_at":"2026-05-15T05:11:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"UNKNOWN","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Zeroth-order optimization is underexplored rather than underpowered in deep learning, with limitations stemming from full-space designs that can be addressed via subspace, spectral, and systems-aware approaches.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14892","ref_index":229,"ref_count":4,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems","primary_cat":"cs.AI","submitted_at":"2026-05-14T14:36:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A survey that unifies prior work on multi-agent LLM systems via the LIFE framework, mapping dependencies across collaboration, failure attribution, and autonomous self-evolution while identifying cross-stage challenges.","context_count":2,"top_context_role":"background","top_context_polarity":"background","context_text":"the MARE system [229] similarly distributes responsibilities among agents with specialized expertise to - 27 - 3 Multi-Agent Collaboration support structured requirement analysis and task delegation. Beyond general data processing tasks, static role allocation has also been applied in domain-specific multi-agent systems where reliability and consistency are essential [230]. In healthcare applications, ColaCare [231] employs predefined roles for collaborative health record modeling to ensure stable and accurate task execution. MEDCO [232] adopts a similar design in medical education, where agents assume roles such as tutor and assessor to guide and evaluate learners within a structured training process. Several systems employ static role allocation to facilitate structured conversational or collabora-"},{"citing_arxiv_id":"2605.13839","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Good Agentic Friends Do Not Just Give Verbal Advice: They Can Update Your Weights","primary_cat":"cs.CL","submitted_at":"2026-05-13T17:58:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TFlow enables multi-agent LLMs to collaborate via transient low-rank LoRA perturbations derived from sender activations, yielding up to 8.5 accuracy gains and 83% token reduction versus text-based baselines on Qwen3-4B models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12471","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference","primary_cat":"cs.LG","submitted_at":"2026-05-12T17:53:47+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"KV-Fold turns frozen transformers into stable long-context models by folding the KV cache across sequence chunks in repeated forward passes.","context_count":1,"top_context_role":"extension","top_context_polarity":"extend","context_text":"can itself serve as a recurrent state. In a decoder-only transformer, the KV cache stores layer-wise representations of previous tokens, which later tokens access through attention. Although this cache is usually treated as a serving optimization, it is also a structured record of the model's past computation. Recent work on latent multi-agent communication [8] showed that one transformer pass can attend to another pass's KV cache as a prefix, allowing information transfer directly through latent state. We repurpose this primitive for long-context inference within a single pretrained model. We introduceKV-F old. A long sequence is divided into chunks. When processing chunk t, the model attends to the accumulated KV cache from earlier chunks as a prefix, produces new keys and values"},{"citing_arxiv_id":"2605.09104","ref_index":119,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Token Economics for LLM Agents: A Dual-View Study from Computing and Economics","primary_cat":"cs.AI","submitted_at":"2026-05-09T18:18:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The paper delivers a unified survey of token economics for LLM agents, conceptualizing tokens as production factors, exchange mediums, and units of account across micro, meso, macro, and security dimensions using established economic theories.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Hybrid architectures address the carrying- versus-stockout tradeoﬀ directly: SRMT [117] couples each agent's personal memory vector with a shared recurrent pool via cross-attention, while LEGOMem [ 118] assigns full task memories to the orchestrator and subtask-scoped memories to executors, containing the carrying cost of shared context to the roles that actually require it. LatentMAS [ 119] pushes the shared-pool paradigm further by dispensing with text- based exchange entirely: agents communicate via layer-wise KV-cache transfers in continuous latent space, bypassing the encoding-decoding cycle that dominates token cost in conventional text-mediated collabo- ration and achieving 70-84% token reduction relative to text-based MAS while maintaining or improving"},{"citing_arxiv_id":"2605.07315","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"LaTER: Efficient Test-Time Reasoning via Latent Exploration and Explicit Verification","primary_cat":"cs.CL","submitted_at":"2026-05-08T06:23:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LaTER reduces LLM token usage 16-33% on reasoning benchmarks by exploring in latent space then switching to explicit CoT verification, with gains like 70% to 73.3% on AIME 2025 in the training-free version.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"scaffolding, or discarded solution paths before reaching a stable derivation. Recent work therefore studies reasoning in a continuous latent space [ 4, 5]. Instead of sampling a visible token at every reasoning step, a model can feed back a hidden state or a soft embedding as the next input, using either an analytic mapping such as a pseudo-inverse projection [ 6] or a learned projector [ 7], and only decodes discrete readable tokens in the final answer stage. This can substantially reduce visible token generation and has shown promising efficiency gains [ 4, 8, †Corresponding author. Preprint. arXiv:2605.07315v1 [cs.CL] 8 May 2026 9]. However, pure latent reasoning also has a clear weakness: when a problem requires careful"},{"citing_arxiv_id":"2605.06623","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems","primary_cat":"cs.AI","submitted_at":"2026-05-07T17:35:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MASPO jointly optimizes prompts in multi-agent LLM systems via downstream-success evaluation and evolutionary beam search, delivering 2.9 average accuracy gains over prior methods across six tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.03884","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"QKVShare: Quantized KV-Cache Handoff for Multi-Agent On-Device LLMs","primary_cat":"cs.AI","submitted_at":"2026-05-05T15:44:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"QKVShare enables efficient quantized KV-cache handoff for on-device multi-agent LLMs, cutting TTFT versus re-prefill across tested contexts while adaptive quantization stays competitive with uniform baselines on GSM8K.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.02801","ref_index":89,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces","primary_cat":"cs.CL","submitted_at":"2026-05-04T16:42:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"This survey organizes RL for LLM multi-agent systems into reward families, credit units, and five orchestration sub-decisions, notes the absence of explicit stopping-decision training in its paper pool, and releases a tagged corpus.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01111","ref_index":49,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"When Less is Enough: Efficient Inference via Collaborative Reasoning","primary_cat":"cs.LG","submitted_at":"2026-05-01T21:31:59+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A large model generates a compact reasoning signal that a small model uses to solve tasks, reducing the large model's output tokens by up to 60% on benchmarks like AIME and GPQA.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.27351","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Heterogeneous Scientific Foundation Model Collaboration","primary_cat":"cs.AI","submitted_at":"2026-04-30T03:02:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Eywa enables language-based agentic AI systems to collaborate with specialized scientific foundation models for improved performance on structured data tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Sun, Chaoqi Yang, Kun Qian, Tian Wang, Changran Hu, Manling Li, Quanzheng Li, Hao Peng, Sheng Wang, Jingbo Shang, Chao Zhang, Jiaxuan You, Liyuan Liu, Pan Lu, Yu Zhang, Heng Ji, Yejin Choi, Dawn Song, Jimeng Sun, and Jiawei Han. Adaptation of agentic AI.CoRR, abs/2512.16301, 2025. doi: 10.48550/ARXIV.2512.16301. URLhttps://doi.org/10.48550/arXiv.2512.16301. [6] Jiaru Zou, Xiyuan Yang, Ruizhong Qiu, Gaotang Li, Katherine Tieu, Pan Lu, Ke Shen, Hanghang Tong, Yejin Choi, Jingrui He, James Zou, Mengdi Wang, and Ling Yang. Latent collaboration in multi-agent systems.CoRR, abs/2511.20639, 2025. doi: 10.48550/ARXIV.2511.20639. URL https://doi.org/10.48550/arXiv.2511.20639. [7] Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Kaijie Zhu, Hao Chen, Linyi Yang, Xiaoyuan"},{"citing_arxiv_id":"2604.10815","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"MeloTune: On-Device Arousal Learning and Peer-to-Peer Mood Coupling for Proactive Music Curation","primary_cat":"cs.SD","submitted_at":"2026-04-12T20:56:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MeloTune implements learned per-listener Personal Arousal Functions and mesh memory protocols on mobile devices to predict affective trajectories and enable peer-coupled proactive music selection, reporting 96.6% pattern accuracy in deployment.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.03809","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Representational Collapse in Multi-Agent LLM Committees: Measurement and Diversity-Aware Consensus","primary_cat":"cs.LG","submitted_at":"2026-04-04T17:30:23+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LLM agent committees exhibit representational collapse with mean cosine similarity of 0.888, and diversity-aware consensus reaches 87% accuracy on GSM8K versus 84% for self-consistency at lower cost.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}