MathConstraint generates scalable, automatically verifiable combinatorial problems where LLMs achieve 18.5-66.9% accuracy without tools but roughly double that with solver access.
Enigmata: Scaling logical reasoning in large language models with synthetic verifiable puzzles
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
The paper delivers the first systematic taxonomy and hierarchical framework for data-efficient reinforcement learning post-training of large language models across data-centric, training-centric, and framework-centric views.
NExt accelerates RLVR training for LLMs by nonlinearly extrapolating low-rank parameter trajectories extracted from LoRA runs.
HC-INR uses a hierarchical hypernetwork to warp input coordinates into a disentangled space, raising the representable frequency bound while cutting parameters by 30-60% and boosting fidelity up to 4x over prior INRs.
NSTR models space-varying frequency fields in implicit neural representations by learning a frequency transport PDE that modulates global sinusoids, achieving better accuracy-parameter trade-offs than SIREN or Instant-NGP on images, audio, and 3D tasks.
RL training compute for logical reasoning follows a power law with horizon depth whose exponent rises with logical expressiveness, yielding better downstream transfer when models train on richer logics.
SPHINX generates synthetic visual puzzles for benchmarking LVLMs, where GPT-5 scores 51.1% and RLVR training improves both in-domain and external visual reasoning performance.
A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.
citing papers explorer
-
MathConstraint: Automated Generation of Verified Combinatorial Reasoning Instances for LLMs
MathConstraint generates scalable, automatically verifiable combinatorial problems where LLMs achieve 18.5-66.9% accuracy without tools but roughly double that with solver access.
-
A Survey of Reinforcement Learning for Large Language Models under Data Scarcity: Challenges and Solutions
The paper delivers the first systematic taxonomy and hierarchical framework for data-efficient reinforcement learning post-training of large language models across data-centric, training-centric, and framework-centric views.
-
Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration
NExt accelerates RLVR training for LLMs by nonlinearly extrapolating low-rank parameter trajectories extracted from LoRA runs.
-
Scaling Implicit Fields via Hypernetwork-Driven Multiscale Coordinate Transformations
HC-INR uses a hierarchical hypernetwork to warp input coordinates into a disentangled space, raising the representable frequency bound while cutting parameters by 30-60% and boosting fidelity up to 4x over prior INRs.
-
NSTR: Neural Spectral Transport Representation for Space-Varying Frequency Fields
NSTR models space-varying frequency fields in implicit neural representations by learning a frequency transport PDE that modulates global sinusoids, achieving better accuracy-parameter trade-offs than SIREN or Instant-NGP on images, audio, and 3D tasks.
-
Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key
RL training compute for logical reasoning follows a power law with horizon depth whose exponent rises with logical expressiveness, yielding better downstream transfer when models train on richer logics.
-
SPHINX: A Synthetic Environment for Visual Perception and Reasoning
SPHINX generates synthetic visual puzzles for benchmarking LVLMs, where GPT-5 scores 51.1% and RLVR training improves both in-domain and external visual reasoning performance.
-
A Survey of Reinforcement Learning for Large Reasoning Models
A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.