VideoP2R separates perception and reasoning in a process-aware RFT pipeline with a new CoT dataset and PA-GRPO rewards, reaching SOTA on six of seven video benchmarks.
Moment sampling in video llms for long-form video qa
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Introduces V-RAGBench benchmark and CARVE method that selects per-chunk retrieval configurations via parallel retrievers and adaptive reranking, outperforming eight VideoRAG baselines.
MemoryCard organizes long videos into self-contained topic-aware Memory Cards that improve long-video QA accuracy by up to 21.8% relative under fixed visual-token budgets.
ASC-MQRA applies answer self-consistency across stochastic video QA runs and optional margin-triggered re-arbitration to achieve 81.16% average accuracy on the CVPR 2026 VidLLMs Challenge Track 2 test set.
citing papers explorer
-
Rethinking RAG in Long Videos: What to Retrieve and How to Use It?
Introduces V-RAGBench benchmark and CARVE method that selects per-chunk retrieval configurations via parallel retrievers and adaptive reranking, outperforming eight VideoRAG baselines.
-
MemoryCard: Topic-Aware Multi-Modal Clue Compression for Long-Video Question Answering
MemoryCard organizes long videos into self-contained topic-aware Memory Cards that improve long-video QA accuracy by up to 21.8% relative under fixed visual-token budgets.
-
Answer Self-Consistency with Margin-Triggered Question Re-Arbitration for the CVPR 2026 VidLLMs Challenge
ASC-MQRA applies answer self-consistency across stochastic video QA runs and optional margin-triggered re-arbitration to achieve 81.16% average accuracy on the CVPR 2026 VidLLMs Challenge Track 2 test set.