Vision2Code is a multi-domain benchmark that evaluates image-to-code generation via rendered outputs scored by a VLM rater with dataset-specific rubrics, revealing domain-dependent model performance and enabling improvement without paired reference code.
Associated with the WaltonFuture GeoQA-8K-direct-synthesizing dataset release
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4representative citing papers
G-Zero uses the Hint-δ intrinsic reward to drive co-evolution between a Proposer and Generator via GRPO and DPO, providing a theoretical suboptimality guarantee for self-improvement from internal dynamics alone.
DDRL reduces spurious reward noise in test-time RL for math by excluding ambiguous samples, using fixed advantages, and adding consensus-based updates, outperforming prior TTRL methods on math benchmarks.
PSRD mitigates visual hallucinations in LVLMs via phase-wise self-reward decoding, cutting rates by 50% on LLaVA-1.5-7B and outperforming prior methods on five benchmarks.
citing papers explorer
-
Vision2Code: A Multi-Domain Benchmark for Evaluating Image-to-Code Generation
Vision2Code is a multi-domain benchmark that evaluates image-to-code generation via rendered outputs scored by a VLM rater with dataset-specific rubrics, revealing domain-dependent model performance and enabling improvement without paired reference code.
-
G-Zero: Self-Play for Open-Ended Generation from Zero Data
G-Zero uses the Hint-δ intrinsic reward to drive co-evolution between a Proposer and Generator via GRPO and DPO, providing a theoretical suboptimality guarantee for self-improvement from internal dynamics alone.
-
Understanding and Mitigating Spurious Signal Amplification in Test-Time Reinforcement Learning for Math Reasoning
DDRL reduces spurious reward noise in test-time RL for math by excluding ambiguous samples, using fixed advantages, and adding consensus-based updates, outperforming prior TTRL methods on math benchmarks.
-
Mitigating Multimodal Hallucination via Phase-wise Self-reward
PSRD mitigates visual hallucinations in LVLMs via phase-wise self-reward decoding, cutting rates by 50% on LLaVA-1.5-7B and outperforming prior methods on five benchmarks.