Dystruct formulates flexible-length generation in diffusion language models as a dynamic structural inference problem solved via Bayesian integration of local uncertainty and structural signals.
hub Tool reference
Training verifiers to solve math word problems
Tool reference. 100% of classified Pith citations use this work as a method, library, or software dependency, not as a substantive claim.
hub tools
citation-role summary
citation-polarity summary
roles
dataset 7polarities
use dataset 7representative citing papers
A single LoRA adapter placed at the gradient-energy-dominant shallow FFN module outperforms distributed LoRA across instruction, math, code, and conversation tasks.
GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.
Learning-Zone Energy is a new online data selection framework for RL post-training that retains 40% of data per step yet matches or exceeds full-data baselines on math tasks with 36% lower FLOPs.
CTF4Nuclear proposes a common task framework for benchmarking ML methods on nuclear engineering datasets using 12 metrics and a new sparse-measurement system monitoring paradigm.
Verifier-backed committee search boosts a weak reasoning model from 67% to 76.4% on SWE-bench Verified, matching stronger models by using local soundness signals to select among proposals.
S-FLM is a hyperspherical latent flow language model that learns velocity fields on the unit sphere to generate token sequences via deterministic ODE integration without materializing one-hot vectors.
EMO pretrains MoEs using document boundaries to induce semantic expert specialization, enabling modular subset deployment with minimal accuracy loss unlike standard MoEs.
Template collapse is a distinct failure mode in agentic RL invisible to entropy; mutual information proxies diagnose it better and SNR-aware filtering using reward variance improves input-dependent reasoning and task performance across planning, math, navigation, and code tasks.
Repeated sampling scales problem coverage log-linearly with sample count, improving SWE-bench Lite performance from 15.9% to 56% using 250 samples.
Gradual fine-tuning that removes explicit CoT steps lets GPT-2 Small reach 99% accuracy on 9x9 multiplication and Mistral 7B exceed 50% on GSM8K with no intermediate outputs.
DPOP is a new loss function that prevents DPO from lowering preferred response likelihoods and outperforms standard DPO on diverse datasets, MT-Bench, and enables Smaug-72B to exceed 80% on the Open LLM Leaderboard.
Squeeze Evolve is a multi-model orchestration framework that improves efficiency and performance in verifier-free evolutionary inference, cutting costs up to 3x while matching verifier-based methods on several benchmarks.
Flux Attention uses a context-aware Layer Router to dynamically assign full or sparse attention to each LLM layer, achieving up to 2.8x prefill and 2.0x decode speedups with competitive performance on long-context and reasoning tasks.
STELLA aligns ESM3 bimodal sequence-structure encodings with Llama-3.1-8B text modeling to claim state-of-the-art results on protein functional description prediction and enzyme-catalyzed reaction prediction.
Rule-based RL on 5K logic puzzles induces advanced reasoning in a 7B model that transfers to AIME and AMC.
Safactory integrates three platforms for simulation, data management, and agent evolution to create a unified pipeline for training trustworthy autonomous AI.
Agentic RAG embeds agents with reflection, planning, tool use, and collaboration into retrieval pipelines to overcome static RAG limitations, and the survey offers a taxonomy by agent count, control, autonomy, and knowledge representation plus applications and open challenges.
The paper surveys reinforced reasoning techniques for LLMs, covering automated data construction, learning-to-reason methods, and test-time scaling as steps toward Large Reasoning Models.
citing papers explorer
-
From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step
Gradual fine-tuning that removes explicit CoT steps lets GPT-2 Small reach 99% accuracy on 9x9 multiplication and Mistral 7B exceed 50% on GSM8K with no intermediate outputs.
-
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
DPOP is a new loss function that prevents DPO from lowering preferred response likelihoods and outperforms standard DPO on diverse datasets, MT-Bench, and enables Smaug-72B to exceed 80% on the Open LLM Leaderboard.