ViGoRL introduces visually grounded RL that anchors reasoning steps to image coordinates and uses multi-turn zooming to outperform standard RL and supervised baselines on spatial and GUI reasoning benchmarks.
Visual program distillation: Distilling tools and programmatic reasoning into vision-language models, 2024
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
A PRISMA-based survey of 158 computational works on toxic meme detection introduces a new toxicity taxonomy and a framework linking target, intent, and conveyance tactics while noting trends in LLMs and cross-modal methods.
BLINK benchmark shows multimodal LLMs reach only 45-51 percent accuracy on core visual perception tasks where humans achieve 95 percent, indicating these abilities have not emerged.
IndusAgent achieves state-of-the-art zero-shot performance on industrial anomaly benchmarks by using a custom Indus-CoT dataset, dynamic tool orchestration, and gated RL to optimize anomaly classification, localization, and reasoning.
citing papers explorer
-
Grounded Reinforcement Learning for Visual Reasoning
ViGoRL introduces visually grounded RL that anchors reasoning steps to image coordinates and uses multi-turn zooming to outperform standard RL and supervised baselines on spatial and GUI reasoning benchmarks.
-
Toxic Memes: A Survey of Computational Perspectives on the Detection and Explanation of Meme Toxicities
A PRISMA-based survey of 158 computational works on toxic meme detection introduces a new toxicity taxonomy and a framework linking target, intent, and conveyance tactics while noting trends in LLMs and cross-modal methods.
-
BLINK: Multimodal Large Language Models Can See but Not Perceive
BLINK benchmark shows multimodal LLMs reach only 45-51 percent accuracy on core visual perception tasks where humans achieve 95 percent, indicating these abilities have not emerged.
-
IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools
IndusAgent achieves state-of-the-art zero-shot performance on industrial anomaly benchmarks by using a custom Indus-CoT dataset, dynamic tool orchestration, and gated RL to optimize anomaly classification, localization, and reasoning.