pith. sign in

hub Canonical reference

Flamingo: a visual language model for few-shot learning.Advances in neural information processing systems, 35:23716–23736

Canonical reference. 100% of citing Pith papers cite this work as background.

16 Pith papers citing it
Background 100% of classified citations

hub tools

citation-role summary

background 5

citation-polarity summary

fields

cs.CV 14 cs.AI 2

years

2026 13 2025 3

roles

background 5

polarities

background 5

representative citing papers

Mema: Memory-Augmented Adapter for Enhanced Vision-Language Understanding

cs.CV · 2026-02-28 · unverdicted · novelty 7.0

Mema adds a stateful memory module to vision encoders that accumulates hierarchical visual features across layers and selectively injects portions back via feedback to preserve fine-grained cues, yielding consistent gains on multimodal benchmarks.

Same Content, Different Answers: Cross-Modal Inconsistency in MLLMs

cs.AI · 2025-12-09 · unverdicted · novelty 6.0

State-of-the-art MLLMs show substantial inconsistency when reasoning over the same information presented in image, text, or mixed modalities, even after accounting for OCR errors, with inconsistency linked to visual factors and modality gap.

Character-Centered Dialogue Generation from Scene-Level Prompts

cs.CV · 2025-05-22 · unverdicted · novelty 4.0

A training-free framework generates expressive, character-grounded dialogue and speech from scene prompts using vision-language encoders, LLMs, and a recursive narrative memory bank for cross-scene consistency.

citing papers explorer

Showing 16 of 16 citing papers.