Associated with the WaltonFuture GeoQA-8K-direct-synthesizing dataset release

URL https://arxiv · 2025 · arXiv 2505.22453

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1 baseline 1

citation-polarity summary

background 1 baseline 1

representative citing papers

Vision2Code: A Multi-Domain Benchmark for Evaluating Image-to-Code Generation

cs.CV · 2026-05-11 · accept · novelty 8.0

Vision2Code is a multi-domain benchmark that evaluates image-to-code generation via rendered outputs scored by a VLM rater with dataset-specific rubrics, revealing domain-dependent model performance and enabling improvement without paired reference code.

G-Zero: Self-Play for Open-Ended Generation from Zero Data

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

G-Zero uses the Hint-δ intrinsic reward to drive co-evolution between a Proposer and Generator via GRPO and DPO, providing a theoretical suboptimality guarantee for self-improvement from internal dynamics alone.

Understanding and Mitigating Spurious Signal Amplification in Test-Time Reinforcement Learning for Math Reasoning

cs.LG · 2026-04-23 · unverdicted · novelty 6.0

DDRL reduces spurious reward noise in test-time RL for math by excluding ambiguous samples, using fixed advantages, and adding consensus-based updates, outperforming prior TTRL methods on math benchmarks.

Mitigating Multimodal Hallucination via Phase-wise Self-reward

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

PSRD mitigates visual hallucinations in LVLMs via phase-wise self-reward decoding, cutting rates by 50% on LLaVA-1.5-7B and outperforming prior methods on five benchmarks.

citing papers explorer

Showing 4 of 4 citing papers.

Vision2Code: A Multi-Domain Benchmark for Evaluating Image-to-Code Generation cs.CV · 2026-05-11 · accept · none · ref 22
Vision2Code is a multi-domain benchmark that evaluates image-to-code generation via rendered outputs scored by a VLM rater with dataset-specific rubrics, revealing domain-dependent model performance and enabling improvement without paired reference code.
G-Zero: Self-Play for Open-Ended Generation from Zero Data cs.LG · 2026-05-11 · unverdicted · none · ref 30
G-Zero uses the Hint-δ intrinsic reward to drive co-evolution between a Proposer and Generator via GRPO and DPO, providing a theoretical suboptimality guarantee for self-improvement from internal dynamics alone.
Understanding and Mitigating Spurious Signal Amplification in Test-Time Reinforcement Learning for Math Reasoning cs.LG · 2026-04-23 · unverdicted · none · ref 10
DDRL reduces spurious reward noise in test-time RL for math by excluding ambiguous samples, using fixed advantages, and adding consensus-based updates, outperforming prior TTRL methods on math benchmarks.
Mitigating Multimodal Hallucination via Phase-wise Self-reward cs.CV · 2026-04-20 · unverdicted · none · ref 50
PSRD mitigates visual hallucinations in LVLMs via phase-wise self-reward decoding, cutting rates by 50% on LLaVA-1.5-7B and outperforming prior methods on five benchmarks.

Associated with the WaltonFuture GeoQA-8K-direct-synthesizing dataset release

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer