arXiv preprint arXiv:2602.10604 , year=

Step 3 · 2026 · arXiv 2602.10604

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

read on arXiv browse 13 citing papers

citation-role summary

background 2 baseline 1 method 1

citation-polarity summary

background 2 baseline 1 use method 1

representative citing papers

FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents

cs.CL · 2026-06-10 · unverdicted · novelty 7.0

FORT synthesizes shortcut-resistant search tasks by controlling four identified shortcut risks across entity selection, graph construction, question formulation, and refinement, producing training data that yields agents with longer search trajectories and top performance among open-source models on

Knowledge Index of Noah's Ark

cs.AI · 2026-06-03 · unverdicted · novelty 7.0

Introduces KINA benchmark with 899 items over 261 disciplines, formal (1-1/e) coverage guarantee and bonus-on-bar tournament theorem, plus evaluations of 42 models with top score 53.17%.

Not All Proofs Are Equal: Evaluating LLM Proof Quality Beyond Correctness

cs.CL · 2026-05-11 · unverdicted · novelty 7.0 · 2 refs

ProofRank benchmark shows substantial differences in LLM proof quality not captured by correctness, with trade-offs between quality metrics and accuracy.

Training with Harnesses: On-Policy Harness Self-Distillation for Complex Reasoning

cs.CL · 2026-05-09 · unverdicted · novelty 7.0

OPHSD uses harness-augmented models as teachers to distill reasoning capabilities into base LLMs, yielding strong standalone performance on classification and math tasks.

The Cancellation Hypothesis in Critic-Free RL: From Outcome Rewards to Token Credits

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

The cancellation hypothesis shows how rollout-level rewards produce token-level credit assignment in critic-free RL through cancellation of opposing signals on shared tokens, with empirical support and batching interventions that enhance performance.

MathDuels: Evaluating LLMs as Problem Posers and Solvers

cs.CL · 2026-04-23 · unverdicted · novelty 7.0

Self-play between LLMs for problem authoring and solving, scored via Rasch modeling, shows that authoring and solving skills are partially decoupled and that the benchmark difficulty evolves with new models.

Data-Constrained Language Model Pretraining: Improved Regularization and Scaling Laws

cs.LG · 2026-06-05 · unverdicted · novelty 6.0

MIR improves validation loss in repeated-data pretraining and SoftQ fits data-constrained scaling experiments better than additive laws, equating MIR gains to roughly 1.3 times more unique data.

DFlare: Scaling Up Draft Capacity for Block Diffusion Speculative Decoding

cs.CL · 2026-06-01 · unverdicted · novelty 6.0

DFlare replaces DFlash's shared fused representation with per-draft-layer attention to distinct target-layer combinations, enabling deeper drafts and 2.4M training samples for 5-11% higher speedups than DFlash on Qwen3 and GPT-OSS models.

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

math.OC · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

Proposes equivariant optimizer updates matched to layer symmetries for embeddings, SwiGLU MLPs, and MoE routers, with reported gains in validation loss and training stability on several language model architectures.

Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter

cs.DC · 2026-04-16 · unverdicted · novelty 6.0

PrfaaS enables practical cross-datacenter prefill-decode disaggregation for hybrid-attention models via selective offloading, bandwidth-aware scheduling, and cache-aware placement, yielding 54% higher throughput and 64% lower P90 TTFT than homogeneous baselines in a 1T-parameter case study.

Attention Editing: A Versatile Framework for Cross-Architecture Attention Conversion

cs.CL · 2026-04-07 · conditional · novelty 6.0

Attention Editing converts pre-trained LLMs to new attention architectures through layer-wise teacher-forced optimization and model-level distillation, preserving performance with efficiency gains.

CODA: Difficulty-Aware Compute Allocation for Adaptive Reasoning

cs.CL · 2026-03-09 · unverdicted · novelty 6.0

CODA uses rollout-based difficulty signals to drive two gates that penalize verbosity on easy instances and promote deliberation on hard ones, cutting token use over 60% on simple tasks while maintaining accuracy.

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

cs.CV · 2026-06-10 · unverdicted · novelty 4.0

InternVideo3 introduces Multimodal Contextual Reasoning and M^2LA attention to enable closed-loop evidence accumulation in long-video understanding and agentic tool use, reporting strong benchmark results.

citing papers explorer

Showing 6 of 6 citing papers after filters.

FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents cs.CL · 2026-06-10 · unverdicted · none · ref 36
FORT synthesizes shortcut-resistant search tasks by controlling four identified shortcut risks across entity selection, graph construction, question formulation, and refinement, producing training data that yields agents with longer search trajectories and top performance among open-source models on
Not All Proofs Are Equal: Evaluating LLM Proof Quality Beyond Correctness cs.CL · 2026-05-11 · unverdicted · none · ref 60 · 2 links
ProofRank benchmark shows substantial differences in LLM proof quality not captured by correctness, with trade-offs between quality metrics and accuracy.
Training with Harnesses: On-Policy Harness Self-Distillation for Complex Reasoning cs.CL · 2026-05-09 · unverdicted · none · ref 15
OPHSD uses harness-augmented models as teachers to distill reasoning capabilities into base LLMs, yielding strong standalone performance on classification and math tasks.
MathDuels: Evaluating LLMs as Problem Posers and Solvers cs.CL · 2026-04-23 · unverdicted · none · ref 16
Self-play between LLMs for problem authoring and solving, scored via Rasch modeling, shows that authoring and solving skills are partially decoupled and that the benchmark difficulty evolves with new models.
DFlare: Scaling Up Draft Capacity for Block Diffusion Speculative Decoding cs.CL · 2026-06-01 · unverdicted · none · ref 26
DFlare replaces DFlash's shared fused representation with per-draft-layer attention to distinct target-layer combinations, enabling deeper drafts and 2.4M training samples for 5-11% higher speedups than DFlash on Qwen3 and GPT-OSS models.
CODA: Difficulty-Aware Compute Allocation for Adaptive Reasoning cs.CL · 2026-03-09 · unverdicted · none · ref 16
CODA uses rollout-based difficulty signals to drive two gates that penalize verbosity on easy instances and promote deliberation on hard ones, cutting token use over 60% on simple tasks while maintaining accuracy.

arXiv preprint arXiv:2602.10604 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer