Videocogqa: A controllable benchmark for evalu- ating cognitive abilities in video-language models

Chenglin Li, Qianglong Chen, Zhi Li, Feng Tao, Yin Zhang · 2024 · arXiv 2411.09105

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 3

citation-polarity summary

background 2 unclear 1

representative citing papers

VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

VGenST-Bench is a new video benchmark for MLLM spatio-temporal reasoning built via generative synthesis, a multi-agent pipeline with human oversight, a 3x2x2 taxonomy, and hierarchical tasks separating perception from reasoning.

ConeSep: Cone-based Robust Noise-Unlearning Compositional Network for Composed Image Retrieval

cs.CV · 2026-04-22 · unverdicted · novelty 7.0

ConeSep tackles noisy triplet correspondences in composed image retrieval by introducing geometric fidelity quantization to locate noise, negative boundary learning for semantic opposites, and targeted unlearning via optimal transport, outperforming prior methods on FashionIQ and CIRR.

VideoThinker: Building Agentic VideoLLMs with LLM-Guided Tool Reasoning

cs.CV · 2026-01-22 · unverdicted · novelty 7.0

VideoThinker uses LLM-generated synthetic tool trajectories in caption space grounded to video frames to train agentic VideoLLMs that outperform baselines on long-video benchmarks.

Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

Air-Know decouples MLLM-based external arbitration from proxy learning via knowledge internalization and dual-stream training to overcome noisy triplet correspondence in composed image retrieval.

citing papers explorer

Showing 4 of 4 citing papers.

VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis cs.CV · 2026-05-21 · unverdicted · none · ref 42
VGenST-Bench is a new video benchmark for MLLM spatio-temporal reasoning built via generative synthesis, a multi-agent pipeline with human oversight, a 3x2x2 taxonomy, and hierarchical tasks separating perception from reasoning.
ConeSep: Cone-based Robust Noise-Unlearning Compositional Network for Composed Image Retrieval cs.CV · 2026-04-22 · unverdicted · none · ref 85
ConeSep tackles noisy triplet correspondences in composed image retrieval by introducing geometric fidelity quantization to locate noise, negative boundary learning for semantic opposites, and targeted unlearning via optimal transport, outperforming prior methods on FashionIQ and CIRR.
VideoThinker: Building Agentic VideoLLMs with LLM-Guided Tool Reasoning cs.CV · 2026-01-22 · unverdicted · none · ref 14
VideoThinker uses LLM-generated synthetic tool trajectories in caption space grounded to video frames to train agentic VideoLLMs that outperform baselines on long-video benchmarks.
Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval cs.CV · 2026-04-21 · unverdicted · none · ref 37
Air-Know decouples MLLM-based external arbitration from proxy learning via knowledge internalization and dual-stream training to overcome noisy triplet correspondence in composed image retrieval.

Videocogqa: A controllable benchmark for evalu- ating cognitive abilities in video-language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer