Livevqa: Live visual knowledge seeking

Mingyang Fu, Yuyang Peng, Benlin Liu, Yao Wan, Dongping Chen · 2025 · arXiv 2504.05288

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

dataset 3 background 1

citation-polarity summary

use dataset 2 background 1 unclear 1

representative citing papers

MMSearch-R1: Incentivizing LMMs to Search

cs.CV · 2025-06-25 · unverdicted · novelty 7.0

MMSearch-R1 uses reinforcement learning to train multimodal models for on-demand multi-turn internet search with image and text tools, outperforming same-size RAG baselines and matching larger ones while cutting search calls by over 30%.

HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents

cs.LG · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

HyperEyes presents a parallel multimodal search agent using dual-grained efficiency-aware RL with a new TRACE reward and IMEB benchmark, claiming 9.9% higher accuracy and 5.3x fewer tool calls than prior open-source agents.

DR-MMSearchAgent: Deepening Reasoning in Multimodal Search Agents

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

DR-MMSearchAgent derives batch-wide trajectory advantages and uses differentiated Gaussian rewards to prevent premature collapse in multimodal agents, outperforming MMSearch-R1 by 8.4% on FVQA-test.

POINTS-Seeker: Towards Training a Multimodal Agentic Search Model from Scratch

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

POINTS-Seeker-8B is an 8B multimodal model trained from scratch for agentic search that uses seeding and visual-space history folding to outperform prior models on six visual reasoning benchmarks.

WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

cs.IR · 2025-08-07 · unverdicted · novelty 6.0

WebWatcher introduces a vision-language deep research agent trained on synthetic multimodal trajectories and RL that outperforms baselines on VQA benchmarks, along with a new BrowseComp-VL evaluation.

ProMMSearchAgent: A Generalizable Multimodal Search Agent Trained with Process-Oriented Rewards

cs.CV · 2026-04-22 · unverdicted · novelty 5.0

A sandbox-trained multimodal search agent with process-oriented rewards transfers zero-shot to real Google Search and outperforms prior methods on FVQA, InfoSeek, and MMSearch.

citing papers explorer

Showing 6 of 6 citing papers.

MMSearch-R1: Incentivizing LMMs to Search cs.CV · 2025-06-25 · unverdicted · none · ref 18
MMSearch-R1 uses reinforcement learning to train multimodal models for on-demand multi-turn internet search with image and text tools, outperforming same-size RAG baselines and matching larger ones while cutting search calls by over 30%.
HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents cs.LG · 2026-05-08 · unverdicted · none · ref 7 · 2 links
HyperEyes presents a parallel multimodal search agent using dual-grained efficiency-aware RL with a new TRACE reward and IMEB benchmark, claiming 9.9% higher accuracy and 5.3x fewer tool calls than prior open-source agents.
DR-MMSearchAgent: Deepening Reasoning in Multimodal Search Agents cs.CV · 2026-04-21 · unverdicted · none · ref 18
DR-MMSearchAgent derives batch-wide trajectory advantages and uses differentiated Gaussian rewards to prevent premature collapse in multimodal agents, outperforming MMSearch-R1 by 8.4% on FVQA-test.
POINTS-Seeker: Towards Training a Multimodal Agentic Search Model from Scratch cs.CV · 2026-04-15 · unverdicted · none · ref 11
POINTS-Seeker-8B is an 8B multimodal model trained from scratch for agentic search that uses seeding and visual-space history folding to outperform prior models on six visual reasoning benchmarks.
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent cs.IR · 2025-08-07 · unverdicted · none · ref 10
WebWatcher introduces a vision-language deep research agent trained on synthetic multimodal trajectories and RL that outperforms baselines on VQA benchmarks, along with a new BrowseComp-VL evaluation.
ProMMSearchAgent: A Generalizable Multimodal Search Agent Trained with Process-Oriented Rewards cs.CV · 2026-04-22 · unverdicted · none · ref 7
A sandbox-trained multimodal search agent with process-oriented rewards transfers zero-shot to real Google Search and outperforms prior methods on FVQA, InfoSeek, and MMSearch.

Livevqa: Live visual knowledge seeking

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer