Videomathqa: Benchmarking mathematical reasoning via multimodal understanding in videos

Hanoona Rasheed, Abdelrahman Shaker, Anqi Tang, Muhammad Maaz, Ming-Hsuan Yang, Salman Khan, Fahad Shahbaz Khan · 2025 · arXiv 2506.05349

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

dataset 1 method 1

citation-polarity summary

use dataset 1 use method 1

representative citing papers

Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

A unified learnable KV eviction policy with cross-layer calibration reduces memory and matches or exceeds full-cache performance on long-context tasks by retaining useful tokens and limiting attention dilution.

Co-Evolving Policy Distillation

cs.LG · 2026-04-29 · unverdicted · novelty 6.0

CoPD integrates multiple expert capabilities by running parallel RLVR training with bidirectional online policy distillation among experts, outperforming mixed RLVR and sequential OPD while surpassing domain-specific experts on text-image-video reasoning.

OneThinker: All-in-one Reasoning Model for Image and Video

cs.CV · 2025-12-02 · unverdicted · novelty 5.0

OneThinker unifies image and video reasoning in one model across 10 tasks via a 600k corpus, CoT-annotated SFT, and EMA-GRPO reinforcement learning, reporting strong results on 31 benchmarks plus some cross-task transfer.

EasyVideoR1: Easier RL for Video Understanding

cs.CV · 2026-04-18 · unverdicted · novelty 4.0

EasyVideoR1 delivers an optimized RL pipeline for video understanding in large vision-language models, achieving 1.47x throughput gains and aligned results on 22 benchmarks.

citing papers explorer

Showing 4 of 4 citing papers.

Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction cs.LG · 2026-05-10 · unverdicted · none · ref 21
A unified learnable KV eviction policy with cross-layer calibration reduces memory and matches or exceeds full-cache performance on long-context tasks by retaining useful tokens and limiting attention dilution.
Co-Evolving Policy Distillation cs.LG · 2026-04-29 · unverdicted · none · ref 40
CoPD integrates multiple expert capabilities by running parallel RLVR training with bidirectional online policy distillation among experts, outperforming mixed RLVR and sequential OPD while surpassing domain-specific experts on text-image-video reasoning.
OneThinker: All-in-one Reasoning Model for Image and Video cs.CV · 2025-12-02 · unverdicted · none · ref 62
OneThinker unifies image and video reasoning in one model across 10 tasks via a 600k corpus, CoT-annotated SFT, and EMA-GRPO reinforcement learning, reporting strong results on 31 benchmarks plus some cross-task transfer.
EasyVideoR1: Easier RL for Video Understanding cs.CV · 2026-04-18 · unverdicted · none · ref 29
EasyVideoR1 delivers an optimized RL pipeline for video understanding in large vision-language models, achieving 1.47x throughput gains and aligned results on 22 benchmarks.

Videomathqa: Benchmarking mathematical reasoning via multimodal understanding in videos

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer