Visualsphinx: Large-scale synthetic vision logic puzzles for rl

Yichen Feng, Zhangchen Xu, Fengqing Jiang, Yuetai Li, Bhaskar Ramasubramanian, Luyao Niu, Bill Yuchen Lin, Radha Poovendran · 2025 · arXiv 2505.23977

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

cs.CL · 2026-06-16 · unverdicted · novelty 7.0

ZPPO improves distillation to small vision-language models by using binary and negative candidate prompts plus a replay buffer for hard questions, outperforming standard distillation and GRPO on a 31-benchmark suite with largest gains at the 0.8B scale.

Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

cs.AI · 2025-03-12 · unverdicted · novelty 5.0

The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients cs.CL · 2026-06-16 · unverdicted · none · ref 127
ZPPO improves distillation to small vision-language models by using binary and negative candidate prompts plus a replay buffer for hard questions, outperforming standard distillation and GRPO on a 31-benchmark suite with largest gains at the 0.8B scale.
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models cs.AI · 2025-03-12 · unverdicted · none · ref 186
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.

Visualsphinx: Large-scale synthetic vision logic puzzles for rl

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer