pith. sign in

hub Canonical reference

Seeing, listening, remembering, and reasoning: A multimodal agent with long-term memory

Canonical reference. 100% of citing Pith papers cite this work as background.

15 Pith papers citing it
Background 100% of classified citations

hub tools

citation-role summary

background 6

citation-polarity summary

years

2026 15

verdicts

UNVERDICTED 15

roles

background 6

polarities

background 6

representative citing papers

POINTS-Long: Adaptive Dual-Mode Visual Reasoning in MLLMs

cs.CV · 2026-04-13 · unverdicted · novelty 6.0

POINTS-Long is a dual-mode multimodal large language model that uses dynamic visual token scaling to retain 97.7-99.7% accuracy on long-form tasks with 1/40 to 1/10th the tokens and supports streaming via detachable KV-cache.

Administrative Decentralization in Edge-Cloud Multi-Agent for Mobile Automation

cs.DC · 2026-04-09 · unverdicted · novelty 6.0

AdecPilot decentralizes administration in edge-cloud multi-agent frameworks by using a UI-agnostic cloud designer and a bimodal edge team with a Hierarchical Implicit Termination protocol, yielding 21.7% higher task success, 37.5% less cloud tokens, and 88.9% lower latency.

PersonaVLM: Long-Term Personalized Multimodal LLMs

cs.CL · 2026-03-20 · unverdicted · novelty 6.0

PersonaVLM adds memory extraction, multi-turn retrieval-based reasoning, and personality inference to multimodal LLMs, yielding 22.4% gains on a new long-term personalization benchmark and outperforming GPT-4o.

citing papers explorer

Showing 15 of 15 citing papers.