arXiv preprint arXiv:2507.10548 , year =

Mingxian Lin, Wei Huang, Yitang Li, Chengjie Jiang, Kui Wu, Fangwei Zhong, Shengju Qian, Xin Wang, Xiaojuan Qi · 2025 · arXiv 2507.10548

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

PokeGym is a new benchmark that tests VLMs on long-horizon tasks in a complex 3D game using only visual observations, identifying deadlock recovery as the primary failure mode.

OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

cs.CV · 2026-06-08 · unverdicted · novelty 6.0

OmniGameArena is a unified UE5 benchmark with 12 games and the IDC harness for cold-start scores and improvement dynamics of VLM agents.

Trust Region On-Policy Distillation

cs.LG · 2026-05-31 · unverdicted · novelty 5.0

TrOPD stabilizes on-policy distillation for LLMs with trust-region learning, outlier estimation, and off-policy guidance, outperforming prior OPD methods on reasoning and code benchmarks.

citing papers explorer

Showing 3 of 3 citing papers after filters.

PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models cs.CV · 2026-04-09 · unverdicted · none · ref 35
PokeGym is a new benchmark that tests VLMs on long-horizon tasks in a complex 3D game using only visual observations, identifying deadlock recovery as the primary failure mode.
OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics cs.CV · 2026-06-08 · unverdicted · none · ref 49
OmniGameArena is a unified UE5 benchmark with 12 games and the IDC harness for cold-start scores and improvement dynamics of VLM agents.
Trust Region On-Policy Distillation cs.LG · 2026-05-31 · unverdicted · none · ref 291
TrOPD stabilizes on-policy distillation for LLMs with trust-region learning, outlier estimation, and off-policy guidance, outperforming prior OPD methods on reasoning and code benchmarks.

arXiv preprint arXiv:2507.10548 , year =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer