pith. sign in

hub

A diagram is worth a dozen images

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

hub tools

citation-role summary

dataset 2

citation-polarity summary

years

2026 8 2025 2

verdicts

UNVERDICTED 10

roles

dataset 2

polarities

use dataset 2

representative citing papers

Process Rewards with Learned Reliability

cs.CL · 2026-05-15 · unverdicted · novelty 6.0

BetaPRM learns distributional step rewards with explicit reliability via Beta-Binomial modeling, enabling ACA that cuts token use by up to 33.57% while raising final-answer accuracy on reasoning benchmarks.

LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs?

cs.CV · 2026-05-09 · unverdicted · novelty 6.0

LLaVA-UHD v4 reduces visual-encoding FLOPs by 55.8% for high-resolution images in MLLMs via slice-based encoding plus intra-ViT early compression while matching or exceeding baseline performance on document, OCR, and VQA benchmarks.

Perceptual Flow Network for Visually Grounded Reasoning

cs.CV · 2026-05-04 · unverdicted · novelty 5.0

PFlowNet decouples perception from reasoning, integrates multi-dimensional rewards with vicinal geometric shaping via variational RL, and reports new SOTA results on V* Bench (90.6%) and MME-RealWorld-lite (67.0%).

Let ViT Speak: Generative Language-Image Pre-training

cs.CV · 2026-05-01 · unverdicted · novelty 5.0

GenLIP pretrains ViTs to generate language tokens from visual tokens via autoregressive language modeling, matching strong baselines on multimodal tasks with less data.

Context Unrolling in Omni Models

cs.CV · 2026-04-23 · unverdicted · novelty 5.0

Omni is a multimodal model whose native training on diverse data types enables context unrolling, allowing explicit reasoning across modalities to better approximate shared knowledge and improve downstream performance.

OneThinker: All-in-one Reasoning Model for Image and Video

cs.CV · 2025-12-02 · unverdicted · novelty 5.0

OneThinker unifies image and video reasoning in one model across 10 tasks via a 600k corpus, CoT-annotated SFT, and EMA-GRPO reinforcement learning, reporting strong results on 31 benchmarks plus some cross-task transfer.

citing papers explorer

Showing 10 of 10 citing papers.

  • Visual-Advantage On-Policy Distillation for Vision-Language Models cs.CV · 2026-05-21 · unverdicted · none · ref 15

    VA-OPD improves VLM performance over standard on-policy distillation by reweighting rollouts and separating KL terms according to token-level visual advantage on math and visual benchmarks.

  • Process Rewards with Learned Reliability cs.CL · 2026-05-15 · unverdicted · none · ref 28

    BetaPRM learns distributional step rewards with explicit reliability via Beta-Binomial modeling, enabling ACA that cuts token use by up to 33.57% while raising final-answer accuracy on reasoning benchmarks.

  • GRIP-VLM: Group-Relative Importance Pruning for Efficient Vision-Language Models cs.CV · 2026-05-13 · unverdicted · none · ref 39

    GRIP-VLM applies group-relative policy optimization via reinforcement learning to prune visual tokens in VLMs, yielding up to 15% inference speedup at matched accuracy over prior methods.

  • LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs? cs.CV · 2026-05-09 · unverdicted · none · ref 19

    LLaVA-UHD v4 reduces visual-encoding FLOPs by 55.8% for high-resolution images in MLLMs via slice-based encoding plus intra-ViT early compression while matching or exceeding baseline performance on document, OCR, and VQA benchmarks.

  • Lost in Modality: Evaluating the Effectiveness of Text-Based Membership Inference Attacks on Large Multimodal Models cs.CR · 2025-12-02 · unverdicted · none · ref 14

    Logit-based MIAs perform comparably on MLLMs with or without visual inputs in-distribution but visual inputs mask membership signals in out-of-distribution settings.

  • Perceptual Flow Network for Visually Grounded Reasoning cs.CV · 2026-05-04 · unverdicted · none · ref 17

    PFlowNet decouples perception from reasoning, integrates multi-dimensional rewards with vicinal geometric shaping via variational RL, and reports new SOTA results on V* Bench (90.6%) and MME-RealWorld-lite (67.0%).

  • Let ViT Speak: Generative Language-Image Pre-training cs.CV · 2026-05-01 · unverdicted · none · ref 32

    GenLIP pretrains ViTs to generate language tokens from visual tokens via autoregressive language modeling, matching strong baselines on multimodal tasks with less data.

  • Context Unrolling in Omni Models cs.CV · 2026-04-23 · unverdicted · none · ref 21

    Omni is a multimodal model whose native training on diverse data types enables context unrolling, allowing explicit reasoning across modalities to better approximate shared knowledge and improve downstream performance.

  • Seed1.8 Model Card: Towards Generalized Real-World Agency cs.AI · 2026-03-21 · unverdicted · none · ref 33

    Seed1.8 is a new foundation model that adds unified agentic capabilities for search, code execution, and GUI interaction to existing LLM and vision strengths.

  • OneThinker: All-in-one Reasoning Model for Image and Video cs.CV · 2025-12-02 · unverdicted · none · ref 50

    OneThinker unifies image and video reasoning in one model across 10 tasks via a 600k corpus, CoT-annotated SFT, and EMA-GRPO reinforcement learning, reporting strong results on 31 benchmarks plus some cross-task transfer.