pith. sign in

hub Canonical reference

Seeing, listening, remembering, and reasoning: A multi- modal agent with long-term memory

Canonical reference. 100% of citing Pith papers cite this work as background.

20 Pith papers citing it
Background 100% of classified citations

hub tools

citation-role summary

background 6

citation-polarity summary

years

2026 20

verdicts

UNVERDICTED 20

roles

background 6

polarities

background 6

clear filters

representative citing papers

Task-Focused Memorization for Multimodal Agents

cs.CV · 2026-05-29 · unverdicted · novelty 6.0

TaskMem uses RL in two phases to learn a task-focused memorization policy for multimodal agents, yielding 5.3-7.0% VQA accuracy gains on reformulated streaming benchmarks from VideoMME, EgoLife, and EgoTempo.

POINTS-Long: Adaptive Dual-Mode Visual Reasoning in MLLMs

cs.CV · 2026-04-13 · unverdicted · novelty 6.0

POINTS-Long is a dual-mode multimodal large language model that uses dynamic visual token scaling to retain 97.7-99.7% accuracy on long-form tasks with 1/40 to 1/10th the tokens and supports streaming via detachable KV-cache.

Administrative Decentralization in Edge-Cloud Multi-Agent for Mobile Automation

cs.DC · 2026-04-09 · unverdicted · novelty 6.0

AdecPilot decentralizes administration in edge-cloud multi-agent frameworks by using a UI-agnostic cloud designer and a bimodal edge team with a Hierarchical Implicit Termination protocol, yielding 21.7% higher task success, 37.5% less cloud tokens, and 88.9% lower latency.

PersonaVLM: Long-Term Personalized Multimodal LLMs

cs.CL · 2026-03-20 · unverdicted · novelty 6.0

PersonaVLM adds memory extraction, multi-turn retrieval-based reasoning, and personality inference to multimodal LLMs, yielding 22.4% gains on a new long-term personalization benchmark and outperforming GPT-4o.

citing papers explorer

Showing 20 of 20 citing papers after filters.