WebEye benchmark and Pixel-Searcher agent enable visual perception tasks by using web search to resolve object identities before precise localization or answering.
Sam 3: Segment anything with concepts
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5representative citing papers
MaSC is a masked similarity metric that decomposes concept-driven image generation evaluation into subject-specific preservation and background-based prompt following using SigLIP2 embeddings, outperforming global baselines on human correlation and identity benchmarks.
A task-conditioned two-stage system decouples grasp localization from interaction trajectory planning using specialized foundation models to improve generalization across heterogeneous object types.
A generative pipeline creates realistic synthetic pitting defects and other surface flaws that, when added to real training data, yield modest gains in industrial defect detectors without replacing the need for authentic samples.
HY-World 2.0 generates and reconstructs high-fidelity navigable 3D Gaussian Splatting worlds from text, images, or videos via upgraded panorama, planning, expansion, and composition modules, with released code claiming open-source SOTA performance.
citing papers explorer
-
From Web to Pixels: Bringing Agentic Search into Visual Perception
WebEye benchmark and Pixel-Searcher agent enable visual perception tasks by using web search to resolve object identities before precise localization or answering.
-
MaSC: A Masked Similarity Metric for Evaluating Concept-Driven Generation
MaSC is a masked similarity metric that decomposes concept-driven image generation evaluation into subject-specific preservation and background-based prompt following using SigLIP2 embeddings, outperforming global baselines on human correlation and identity benchmarks.
-
HeteroGenManip: Generalizable Manipulation For Heterogeneous Object Interactions
A task-conditioned two-stage system decouples grasp localization from interaction trajectory planning using specialized foundation models to improve generalization across heterogeneous object types.
-
SynSur: An end-to-end generative pipeline for synthetic industrial surface defect generation and detection
A generative pipeline creates realistic synthetic pitting defects and other surface flaws that, when added to real training data, yield modest gains in industrial defect detectors without replacing the need for authentic samples.
-
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds
HY-World 2.0 generates and reconstructs high-fidelity navigable 3D Gaussian Splatting worlds from text, images, or videos via upgraded panorama, planning, expansion, and composition modules, with released code claiming open-source SOTA performance.