OVOW reconstructs instance-level, simulation-ready 4D mesh scenes from monocular video via a four-stage training-free pipeline and introduces a new benchmark for structured Video-to-4D evaluation.
LLM s P ark: A Benchmark for Evaluating Large Language Models in Strategic Gaming Contexts
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3verdicts
UNVERDICTED 3representative citing papers
PairCoder is a two-agent pair-programming method that leverages toolchain verification oracles to improve LLM generation of verifiable structured artifacts on 17 benchmarks across seven models.
Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.
citing papers explorer
-
PairCoder++: Pair Programming as a Universal Paradigm for Verified Code-Driven Multimodal and Structured-Artifact Generation
PairCoder is a two-agent pair-programming method that leverages toolchain verification oracles to improve LLM generation of verifiable structured artifacts on 17 benchmarks across seven models.