Livevqa: Live visual knowledge seeking

LiveVQA: Live Visual Knowledge Seeking , author= · 2025 · arXiv 2504.05288

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

dataset 3 background 1

citation-polarity summary

use dataset 2 background 1 unclear 1

representative citing papers

PIXELRAG: Web Screenshots Beat Text for Retrieval-Augmented Generation

cs.IR · 2026-06-01 · unverdicted · novelty 7.0

PixelRAG shows that operating RAG entirely over web screenshots outperforms text-based retrieval on NQ, SimpleQA, MMSearch, LiveVQA, and MoNaCo, with up to 18.1% accuracy gains and 3x token savings via image compression.

MMSearch-R1: Incentivizing LMMs to Search

cs.CV · 2025-06-25 · unverdicted · novelty 7.0

MMSearch-R1 uses reinforcement learning to train multimodal models for on-demand multi-turn internet search with image and text tools, outperforming same-size RAG baselines and matching larger ones while cutting search calls by over 30%.

HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents

cs.LG · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

HyperEyes presents a parallel multimodal search agent using dual-grained efficiency-aware RL with a new TRACE reward and IMEB benchmark, claiming 9.9% higher accuracy and 5.3x fewer tool calls than prior open-source agents.

DR-MMSearchAgent: Deepening Reasoning in Multimodal Search Agents

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

DR-MMSearchAgent derives batch-wide trajectory advantages and uses differentiated Gaussian rewards to prevent premature collapse in multimodal agents, outperforming MMSearch-R1 by 8.4% on FVQA-test.

POINTS-Seeker: Towards Training a Multimodal Agentic Search Model from Scratch

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

POINTS-Seeker-8B is an 8B multimodal model trained from scratch for agentic search that uses seeding and visual-space history folding to outperform prior models on six visual reasoning benchmarks.

WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

cs.IR · 2025-08-07 · unverdicted · novelty 6.0

WebWatcher introduces a vision-language deep research agent trained on synthetic multimodal trajectories and RL that outperforms baselines on VQA benchmarks, along with a new BrowseComp-VL evaluation.

ProMMSearchAgent: A Generalizable Multimodal Search Agent Trained with Process-Oriented Rewards

cs.CV · 2026-04-22 · unverdicted · novelty 5.0

A sandbox-trained multimodal search agent with process-oriented rewards transfers zero-shot to real Google Search and outperforms prior methods on FVQA, InfoSeek, and MMSearch.

SimpleSearch-VL: A Simple Recipe for Multimodal Agentic Deep Search

cs.CV · 2026-06-30 · unverdicted · novelty 4.0

SimpleSearch-VL improves Qwen3-VL multimodal agent baselines by 15.8-16 points on average using 7K total training examples and reaches parity with Gemini-3-Pro on the 30B variant.

citing papers explorer

Showing 8 of 8 citing papers.

PIXELRAG: Web Screenshots Beat Text for Retrieval-Augmented Generation cs.IR · 2026-06-01 · unverdicted · none · ref 32
PixelRAG shows that operating RAG entirely over web screenshots outperforms text-based retrieval on NQ, SimpleQA, MMSearch, LiveVQA, and MoNaCo, with up to 18.1% accuracy gains and 3x token savings via image compression.
MMSearch-R1: Incentivizing LMMs to Search cs.CV · 2025-06-25 · unverdicted · none · ref 18
MMSearch-R1 uses reinforcement learning to train multimodal models for on-demand multi-turn internet search with image and text tools, outperforming same-size RAG baselines and matching larger ones while cutting search calls by over 30%.
HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents cs.LG · 2026-05-08 · unverdicted · none · ref 7 · 2 links
HyperEyes presents a parallel multimodal search agent using dual-grained efficiency-aware RL with a new TRACE reward and IMEB benchmark, claiming 9.9% higher accuracy and 5.3x fewer tool calls than prior open-source agents.
DR-MMSearchAgent: Deepening Reasoning in Multimodal Search Agents cs.CV · 2026-04-21 · unverdicted · none · ref 18
DR-MMSearchAgent derives batch-wide trajectory advantages and uses differentiated Gaussian rewards to prevent premature collapse in multimodal agents, outperforming MMSearch-R1 by 8.4% on FVQA-test.
POINTS-Seeker: Towards Training a Multimodal Agentic Search Model from Scratch cs.CV · 2026-04-15 · unverdicted · none · ref 11
POINTS-Seeker-8B is an 8B multimodal model trained from scratch for agentic search that uses seeding and visual-space history folding to outperform prior models on six visual reasoning benchmarks.
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent cs.IR · 2025-08-07 · unverdicted · none · ref 10
WebWatcher introduces a vision-language deep research agent trained on synthetic multimodal trajectories and RL that outperforms baselines on VQA benchmarks, along with a new BrowseComp-VL evaluation.
ProMMSearchAgent: A Generalizable Multimodal Search Agent Trained with Process-Oriented Rewards cs.CV · 2026-04-22 · unverdicted · none · ref 7
A sandbox-trained multimodal search agent with process-oriented rewards transfers zero-shot to real Google Search and outperforms prior methods on FVQA, InfoSeek, and MMSearch.
SimpleSearch-VL: A Simple Recipe for Multimodal Agentic Deep Search cs.CV · 2026-06-30 · unverdicted · none · ref 22
SimpleSearch-VL improves Qwen3-VL multimodal agent baselines by 15.8-16 points on average using 7K total training examples and reaches parity with Gemini-3-Pro on the 30B variant.

Livevqa: Live visual knowledge seeking

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer