pith. sign in

hub Canonical reference

Glyph: Scaling context windows via visual-text compres- sion

Canonical reference. 83% of citing Pith papers cite this work as background.

13 Pith papers citing it
Background 83% of classified citations

hub tools

citation-role summary

background 5 baseline 1

citation-polarity summary

years

2026 12 2025 1

clear filters

representative citing papers

Visual Text Compression as Measure Transport

cs.CV · 2026-05-06 · unverdicted · novelty 7.0

Framing visual text compression as measure transport decomposes encoding loss into precision and coverage costs, enabling a label-free routing rule that matches oracle performance on 17 of 24 NLP datasets while using 10% fewer tokens.

Memory Shot for Long-Term Dialogue

cs.IR · 2026-05-30 · unverdicted · novelty 6.0

MemShot renders local dialogue spans as structured visual memory units to improve long-term dialogue modeling in LLMs, achieving competitive benchmark performance with 70x faster memory construction.

LoMo: Local Modality Substitution for Deeper Vision-Language Fusion

cs.CV · 2026-05-28 · unverdicted · novelty 6.0

LoMo is a lightweight data curation technique that locally substitutes text with images in prompts to enforce cross-modal invariance, yielding 2.67-2.82 point gains over standard SFT on two VLMs across 13 benchmarks.

The Verbose Context Problem in Medical Records

cs.CL · 2026-06-28 · unverdicted · novelty 5.0

Presents PopMedQA benchmark and shows domain-independent LLM methods fail on token-inefficient longitudinal medical records, leaving room for domain-specific approaches.

citing papers explorer

Showing 8 of 8 citing papers after filters.

  • PIXELRAG: Web Screenshots Beat Text for Retrieval-Augmented Generation cs.IR · 2026-06-01 · unverdicted · none · ref 35

    PixelRAG shows that operating RAG entirely over web screenshots outperforms text-based retrieval on NQ, SimpleQA, MMSearch, LiveVQA, and MoNaCo, with up to 18.1% accuracy gains and 3x token savings via image compression.

  • Visual Text Compression as Measure Transport cs.CV · 2026-05-06 · unverdicted · none · ref 9

    Framing visual text compression as measure transport decomposes encoding loss into precision and coverage costs, enabling a label-free routing rule that matches oracle performance on 17 of 24 NLP datasets while using 10% fewer tokens.

  • Memory Shot for Long-Term Dialogue cs.IR · 2026-05-30 · unverdicted · none · ref 3

    MemShot renders local dialogue spans as structured visual memory units to improve long-term dialogue modeling in LLMs, achieving competitive benchmark performance with 70x faster memory construction.

  • LoMo: Local Modality Substitution for Deeper Vision-Language Fusion cs.CV · 2026-05-28 · unverdicted · none · ref 5

    LoMo is a lightweight data curation technique that locally substitutes text with images in prompts to enforce cross-modal invariance, yielding 2.67-2.82 point gains over standard SFT on two VLMs across 13 benchmarks.

  • POINTS-Seeker: Towards Training a Multimodal Agentic Search Model from Scratch cs.CV · 2026-04-15 · unverdicted · none · ref 4

    POINTS-Seeker-8B is an 8B multimodal model trained from scratch for agentic search that uses seeding and visual-space history folding to outperform prior models on six visual reasoning benchmarks.

  • The Verbose Context Problem in Medical Records cs.CL · 2026-06-28 · unverdicted · none · ref 3

    Presents PopMedQA benchmark and shows domain-independent LLM methods fail on token-inefficient longitudinal medical records, leaving room for domain-specific approaches.

  • LensVLM: Selective Context Expansion for Compressed Visual Representation of Text cs.CV · 2026-05-07 · unverdicted · none · ref 3

    LensVLM trains VLMs to scan compressed rendered text images and selectively expand task-relevant regions, achieving 4.3x compression with near full-text accuracy and outperforming baselines up to 10.1x on text QA benchmarks.

  • MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning cs.AI · 2026-01-29 · unverdicted · none · ref 4

    MemOCR renders structured memory as images with adaptive visual density to improve long-horizon reasoning under tight context budgets.