Dystruct formulates flexible-length generation in diffusion language models as a dynamic structural inference problem solved via Bayesian integration of local uncertainty and structural signals.
hub Tool reference
Training verifiers to solve math word problems
Tool reference. 100% of classified Pith citations use this work as a method, library, or software dependency, not as a substantive claim.
hub tools
citation-role summary
citation-polarity summary
roles
dataset 7polarities
use dataset 7representative citing papers
A single LoRA adapter placed at the gradient-energy-dominant shallow FFN module outperforms distributed LoRA across instruction, math, code, and conversation tasks.
GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.
Learning-Zone Energy is a new online data selection framework for RL post-training that retains 40% of data per step yet matches or exceeds full-data baselines on math tasks with 36% lower FLOPs.
CTF4Nuclear proposes a common task framework for benchmarking ML methods on nuclear engineering datasets using 12 metrics and a new sparse-measurement system monitoring paradigm.
Verifier-backed committee search boosts a weak reasoning model from 67% to 76.4% on SWE-bench Verified, matching stronger models by using local soundness signals to select among proposals.
S-FLM is a hyperspherical latent flow language model that learns velocity fields on the unit sphere to generate token sequences via deterministic ODE integration without materializing one-hot vectors.
EMO pretrains MoEs using document boundaries to induce semantic expert specialization, enabling modular subset deployment with minimal accuracy loss unlike standard MoEs.
Template collapse is a distinct failure mode in agentic RL invisible to entropy; mutual information proxies diagnose it better and SNR-aware filtering using reward variance improves input-dependent reasoning and task performance across planning, math, navigation, and code tasks.
Repeated sampling scales problem coverage log-linearly with sample count, improving SWE-bench Lite performance from 15.9% to 56% using 250 samples.
Gradual fine-tuning that removes explicit CoT steps lets GPT-2 Small reach 99% accuracy on 9x9 multiplication and Mistral 7B exceed 50% on GSM8K with no intermediate outputs.
DPOP is a new loss function that prevents DPO from lowering preferred response likelihoods and outperforms standard DPO on diverse datasets, MT-Bench, and enables Smaug-72B to exceed 80% on the Open LLM Leaderboard.
Squeeze Evolve is a multi-model orchestration framework that improves efficiency and performance in verifier-free evolutionary inference, cutting costs up to 3x while matching verifier-based methods on several benchmarks.
Flux Attention uses a context-aware Layer Router to dynamically assign full or sparse attention to each LLM layer, achieving up to 2.8x prefill and 2.0x decode speedups with competitive performance on long-context and reasoning tasks.
STELLA aligns ESM3 bimodal sequence-structure encodings with Llama-3.1-8B text modeling to claim state-of-the-art results on protein functional description prediction and enzyme-catalyzed reaction prediction.
Rule-based RL on 5K logic puzzles induces advanced reasoning in a 7B model that transfers to AIME and AMC.
Safactory integrates three platforms for simulation, data management, and agent evolution to create a unified pipeline for training trustworthy autonomous AI.
Agentic RAG embeds agents with reflection, planning, tool use, and collaboration into retrieval pipelines to overcome static RAG limitations, and the survey offers a taxonomy by agent count, control, autonomy, and knowledge representation plus applications and open challenges.
The paper surveys reinforced reasoning techniques for LLMs, covering automated data construction, learning-to-reason methods, and test-time scaling as steps toward Large Reasoning Models.
citing papers explorer
-
Dystruct: Dynamically Structured Diffusion Language Model Decoding via Bayesian Inference
Dystruct formulates flexible-length generation in diffusion language models as a dynamic structural inference problem solved via Bayesian integration of local uncertainty and structural signals.
-
Rethinking Adapter Placement: A Dominant Adaptation Module Perspective
A single LoRA adapter placed at the gradient-energy-dominant shallow FFN module outperforms distributed LoRA across instruction, math, code, and conversation tasks.
-
GAIA: a benchmark for General AI Assistants
GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.
-
Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training
Learning-Zone Energy is a new online data selection framework for RL post-training that retains 40% of data per step yet matches or exceeds full-data baselines on math tasks with 36% lower FLOPs.
-
CTF4Nuclear: Common Task Framework for Nuclear Fission and Fusion Models
CTF4Nuclear proposes a common task framework for benchmarking ML methods on nuclear engineering datasets using 12 metrics and a new sparse-measurement system monitoring paradigm.
-
Agentic Systems as Boosting Weak Reasoning Models
Verifier-backed committee search boosts a weak reasoning model from 67% to 76.4% on SWE-bench Verified, matching stronger models by using local soundness signals to select among proposals.
-
Language Modeling with Hyperspherical Flows
S-FLM is a hyperspherical latent flow language model that learns velocity fields on the unit sphere to generate token sequences via deterministic ODE integration without materializing one-hot vectors.
-
EMO: Pretraining Mixture of Experts for Emergent Modularity
EMO pretrains MoEs using document boundaries to induce semantic expert specialization, enabling modular subset deployment with minimal accuracy loss unlike standard MoEs.
-
RAGEN-2: Reasoning Collapse in Agentic RL
Template collapse is a distinct failure mode in agentic RL invisible to entropy; mutual information proxies diagnose it better and SNR-aware filtering using reward variance improves input-dependent reasoning and task performance across planning, math, navigation, and code tasks.
-
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Repeated sampling scales problem coverage log-linearly with sample count, improving SWE-bench Lite performance from 15.9% to 56% using 250 samples.
-
From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step
Gradual fine-tuning that removes explicit CoT steps lets GPT-2 Small reach 99% accuracy on 9x9 multiplication and Mistral 7B exceed 50% on GSM8K with no intermediate outputs.
-
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
DPOP is a new loss function that prevents DPO from lowering preferred response likelihoods and outperforms standard DPO on diverse datasets, MT-Bench, and enables Smaug-72B to exceed 80% on the Open LLM Leaderboard.
-
Squeeze Evolve: Unified Multi-Model Orchestration for Verifier-Free Evolution
Squeeze Evolve is a multi-model orchestration framework that improves efficiency and performance in verifier-free evolutionary inference, cutting costs up to 3x while matching verifier-based methods on several benchmarks.
-
Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference
Flux Attention uses a context-aware Layer Router to dynamically assign full or sparse attention to each LLM layer, achieving up to 2.8x prefill and 2.0x decode speedups with competitive performance on long-context and reasoning tasks.
-
STELLA: A Multimodal LLM for Protein Functional Annotation via Unified Sequence-Structure Encoding
STELLA aligns ESM3 bimodal sequence-structure encodings with Llama-3.1-8B text modeling to claim state-of-the-art results on protein functional description prediction and enzyme-catalyzed reaction prediction.
-
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
Rule-based RL on 5K logic puzzles induces advanced reasoning in a 7B model that transfers to AIME and AMC.
-
Safactory: A Scalable Agentic Infrastructure for Training Trustworthy Autonomous Intelligence
Safactory integrates three platforms for simulation, data management, and agent evolution to create a unified pipeline for training trustworthy autonomous AI.
-
Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG
Agentic RAG embeds agents with reflection, planning, tool use, and collaboration into retrieval pipelines to overcome static RAG limitations, and the survey offers a taxonomy by agent count, control, autonomy, and knowledge representation plus applications and open challenges.
-
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
The paper surveys reinforced reasoning techniques for LLMs, covering automated data construction, learning-to-reason methods, and test-time scaling as steps toward Large Reasoning Models.