pith. sign in

Canonical reference

Glyph: Scaling context windows via visual-text compres- sion

Canonical reference. 83% of citing Pith papers cite this work as background.

11 Pith papers citing it
Background 83% of classified citations

citation-role summary

background 5 baseline 1

citation-polarity summary

years

2026 10 2025 1

clear filters

representative citing papers

Visual Text Compression as Measure Transport

cs.CV · 2026-05-06 · unverdicted · novelty 7.0

Framing visual text compression as measure transport decomposes encoding loss into precision and coverage costs, enabling a label-free routing rule that matches oracle performance on 17 of 24 NLP datasets while using 10% fewer tokens.

LoMo: Local Modality Substitution for Deeper Vision-Language Fusion

cs.CV · 2026-05-28 · unverdicted · novelty 6.0

LoMo is a lightweight data curation technique that locally substitutes text with images in prompts to enforce cross-modal invariance, yielding 2.67-2.82 point gains over standard SFT on two VLMs across 13 benchmarks.

The Verbose Context Problem in Medical Records

cs.CL · 2026-06-28 · unverdicted · novelty 5.0

Presents PopMedQA benchmark and shows domain-independent LLM methods fail on token-inefficient longitudinal medical records, leaving room for domain-specific approaches.

citing papers explorer

Showing 3 of 3 citing papers after filters.

  • Visual Text Compression as Measure Transport cs.CV · 2026-05-06 · unverdicted · none · ref 9

    Framing visual text compression as measure transport decomposes encoding loss into precision and coverage costs, enabling a label-free routing rule that matches oracle performance on 17 of 24 NLP datasets while using 10% fewer tokens.

  • POINTS-Seeker: Towards Training a Multimodal Agentic Search Model from Scratch cs.CV · 2026-04-15 · unverdicted · none · ref 4

    POINTS-Seeker-8B is an 8B multimodal model trained from scratch for agentic search that uses seeding and visual-space history folding to outperform prior models on six visual reasoning benchmarks.

  • LensVLM: Selective Context Expansion for Compressed Visual Representation of Text cs.CV · 2026-05-07 · unverdicted · none · ref 3

    LensVLM trains VLMs to scan compressed rendered text images and selectively expand task-relevant regions, achieving 4.3x compression with near full-text accuracy and outperforming baselines up to 10.1x on text QA benchmarks.