SciTikZer-8B uses a new dataset, benchmark, and dual self-consistency RL to generate TikZ code for scientific graphics, outperforming much larger models like Gemini-2.5-Pro.
Beyond outcomes: Transparent assessment of LLM reasoning in games.CoRR, abs/2412.13602
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
baseline 1
dataset 1
citation-polarity summary
verdicts
UNVERDICTED 3representative citing papers
InternBootcamp supplies 1000+ verifiable, auto-generated task environments across domains that enable task scaling to improve LLM reasoning, producing a 32B model with state-of-the-art results on the new Bootcamp-EVAL benchmark.
This survey categorizes agentic environments for LLMs by eight attributes and domains, introduces symbolic and neural synthesis paradigms with evaluation, and outlines four agent evolution pathways plus three environment evolution paradigms.
citing papers explorer
No citing papers match the current filters.