Since AIME24 and AIME25 each contain 30 problems, we report pass@1 using 32 samples per problem (avg@32)

Mathematical reasoning:We evaluate on AIME24, AIME25, BeyondAIME (ByteDance-Seed · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.CL · 2026-02-12 · conditional · novelty 6.0

Composition-RL improves LLM reasoning by composing multiple verifiable prompts into new training questions for RL.

Showing 1 of 1 citing paper.

Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models cs.CL · 2026-02-12 · conditional · none · ref 1
Composition-RL improves LLM reasoning by composing multiple verifiable prompts into new training questions for RL.