Aime: Ai system optimization via multiple llm evaluators.arXiv preprint arXiv:2410.03131

URL https://arxiv · 2024 · arXiv 2410.03131

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Where to Spend Rollouts: Hit-Utility Optimal Rollout Allocation for Group-Based RLVR

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

HORA adaptively allocates rollouts using hit utility to improve Pass@K over compute-matched GRPO on math reasoning benchmarks while preserving Pass@1.

ALL-FEM: Agentic Large Language models Fine-tuned for Finite Element Methods

cs.CE · 2026-01-08 · unverdicted · novelty 6.0

ALL-FEM fine-tunes LLMs on a corpus of verified FEniCS scripts and uses multi-agent workflows to automate finite element code generation, achieving 71.79% success on 39 benchmarks across elasticity, flow, and coupled problems.

OpenCodeReasoning: Advancing Data Distillation for Competitive Coding

cs.CL · 2025-04-02 · unverdicted · novelty 4.0

A new open SFT dataset for reasoning distillation lets coding models hit state-of-the-art scores on LiveCodeBench and CodeContests with supervised fine-tuning alone, outperforming RL-trained baselines.

Wan-Image: Pushing the Boundaries of Generative Visual Intelligence

cs.CV · 2026-04-21 · unverdicted · novelty 3.0

Wan-Image is a unified multi-modal system that integrates LLMs and diffusion transformers to deliver professional-grade image generation features including complex typography, multi-subject consistency, and precise editing, outperforming several prior models in human tests.

LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods

cs.CL · 2024-12-07 · accept · novelty 3.0

A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.

citing papers explorer

Showing 5 of 5 citing papers.

Where to Spend Rollouts: Hit-Utility Optimal Rollout Allocation for Group-Based RLVR cs.LG · 2026-05-08 · unverdicted · none · ref 9
HORA adaptively allocates rollouts using hit utility to improve Pass@K over compute-matched GRPO on math reasoning benchmarks while preserving Pass@1.
ALL-FEM: Agentic Large Language models Fine-tuned for Finite Element Methods cs.CE · 2026-01-08 · unverdicted · none · ref 50
ALL-FEM fine-tunes LLMs on a corpus of verified FEniCS scripts and uses multi-agent workflows to automate finite element code generation, achieving 71.79% success on 39 benchmarks across elasticity, flow, and coupled problems.
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding cs.CL · 2025-04-02 · unverdicted · none · ref 16
A new open SFT dataset for reasoning distillation lets coding models hit state-of-the-art scores on LiveCodeBench and CodeContests with supervised fine-tuning alone, outperforming RL-trained baselines.
Wan-Image: Pushing the Boundaries of Generative Visual Intelligence cs.CV · 2026-04-21 · unverdicted · none · ref 23
Wan-Image is a unified multi-modal system that integrates LLMs and diffusion transformers to deliver professional-grade image generation features including complex typography, multi-subject consistency, and precise editing, outperforming several prior models in human tests.
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods cs.CL · 2024-12-07 · accept · none · ref 180
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.

Aime: Ai system optimization via multiple llm evaluators.arXiv preprint arXiv:2410.03131

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer