We use Fully Sharded Data Parallel (FSDP) with full parameter sharding and optional CPU offloading for parameters and optimizer states to balance GPU memory

Gradients are clipped at a max norm of 1 · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis

cs.AI · 2025-11-13 · unverdicted · novelty 5.0

A reasoning-driven problem generator plans synthesis directions with CoT and uses solver performance feedback to adapt difficulty, producing complementary problems that yield a 3.4% average improvement across 10 reasoning benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis cs.AI · 2025-11-13 · unverdicted · none · ref 23
A reasoning-driven problem generator plans synthesis directions with CoT and uses solver performance feedback to adapt difficulty, producing complementary problems that yield a 3.4% average improvement across 10 reasoning benchmarks.

We use Fully Sharded Data Parallel (FSDP) with full parameter sharding and optional CPU offloading for parameters and optimizer states to balance GPU memory

fields

years

verdicts

representative citing papers

citing papers explorer