Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Mmmu: A massive multi-discipline multimodal understanding, reasoning benchmark for expert agi , author=

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

representative citing papers

WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata

cs.CV · 2026-05-20 · conditional · novelty 7.0

WikiVQABench is a human-curated collection of Wikipedia-based VQA items that require both visual evidence and external knowledge from Wikidata to answer correctly.

OProver: A Unified Framework for Agentic Formal Theorem Proving

cs.CL · 2026-05-17 · unverdicted · novelty 6.0

OProver-32B achieves top Pass@32 scores on MiniF2F, ProverBench, and PutnamBench by combining continued pretraining with iterative agentic proving, retrieval, SFT on repairs, and RL on unresolved cases using a 6.86M-proof dataset.

Deep Pre-Alignment for VLMs

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

Deep Pre-Alignment uses a small VLM perceiver instead of ViT to pre-align visual features with LLM text space, yielding 1.9-3.0 point gains on multimodal benchmarks and 32.9% less language forgetting.

RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction

cs.LG · 2026-04-14 · unverdicted · novelty 6.0

RetentiveKV uses entropy to drive state-space model transitions that retain and reactivate low-attention visual tokens in a continuous memory instead of pruning them, delivering 5x KV cache compression and 1.5x faster decoding.

Muon is Scalable for LLM Training

cs.LG · 2025-02-24 · unverdicted · novelty 6.0

Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.

Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding

cs.LG · 2026-04-23 · unverdicted · novelty 5.0

A co-evolving proposer-critic RL framework improves GUI grounding accuracy by letting the model critique its own proposals rendered on screenshots.

citing papers explorer

Showing 6 of 6 citing papers.

WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata cs.CV · 2026-05-20 · conditional · none · ref 116
WikiVQABench is a human-curated collection of Wikipedia-based VQA items that require both visual evidence and external knowledge from Wikidata to answer correctly.
OProver: A Unified Framework for Agentic Formal Theorem Proving cs.CL · 2026-05-17 · unverdicted · none · ref 12
OProver-32B achieves top Pass@32 scores on MiniF2F, ProverBench, and PutnamBench by combining continued pretraining with iterative agentic proving, retrieval, SFT on repairs, and RL on unresolved cases using a 6.86M-proof dataset.
Deep Pre-Alignment for VLMs cs.CV · 2026-05-14 · unverdicted · none · ref 96
Deep Pre-Alignment uses a small VLM perceiver instead of ViT to pre-align visual features with LLM text space, yielding 1.9-3.0 point gains on multimodal benchmarks and 32.9% less language forgetting.
RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction cs.LG · 2026-04-14 · unverdicted · none · ref 3
RetentiveKV uses entropy to drive state-space model transitions that retain and reactivate low-attention visual tokens in a continuous memory instead of pruning them, delivering 5x KV cache compression and 1.5x faster decoding.
Muon is Scalable for LLM Training cs.LG · 2025-02-24 · unverdicted · none · ref 44
Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.
Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding cs.LG · 2026-04-23 · unverdicted · none · ref 44
A co-evolving proposer-critic RL framework improves GUI grounding accuracy by letting the model critique its own proposals rendered on screenshots.

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

fields

years

verdicts

representative citing papers

citing papers explorer