{"total":336,"items":[{"citing_arxiv_id":"2606.17289","ref_index":29,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Nothing from Something: Can a Language Model Discover 0?","primary_cat":"cs.AI","submitted_at":"2026-06-15T20:54:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Language models require explicit examples to learn zero in arithmetic but language pretraining halves the examples needed.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.07006","ref_index":44,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"RASFT: Rollout-Adaptive Supervised Fine-Tuning for Reasoning","primary_cat":"cs.LG","submitted_at":"2026-06-05T07:52:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RASFT is an adaptive SFT method that strengthens or relaxes expert imitation per problem based on on-policy rollout solvability and adds clipped reference-policy ratio to limit drift, reporting better results than standard SFT and RL on math and code benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.01476","ref_index":13,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification","primary_cat":"cs.LG","submitted_at":"2026-05-31T22:31:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"OmniOPD replaces token-level logit matching in on-policy distillation with Monte Carlo chunk-level semantic verification and a peak-entropy scheduler.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.01400","ref_index":12,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Consistent and Distinctive: LLM Benchmark Efficiency via Maximum Independent Set Prompt Selection on Similarity Graphs","primary_cat":"cs.CL","submitted_at":"2026-05-31T18:45:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A graph-based MIS prompt selection method on embedding similarity graphs yields reduced benchmark subsets with highly consistent LLM rankings (Kendall's W ≥ 0.90 in 99.2% of cases) and 25-48% size reduction at higher thresholds.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.02643","ref_index":7,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Inference Cost Attacks for Retrieval-Augmented Large Language Models","primary_cat":"cs.CR","submitted_at":"2026-05-31T15:11:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Poisoning external knowledge bases with LLM-agent-crafted documents can increase RAG inference token consumption by up to 13.12 times at over 90% success rate while preserving answer quality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.01168","ref_index":47,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Thinking Economically: A Hierarchical Framework for Adaptive-Complexity Reasoning in LLMs","primary_cat":"cs.CL","submitted_at":"2026-05-31T11:20:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"HAB applies coarse-to-fine budgeting to LLM reasoning, predicting per-problem depth and learning intra-step token budgets via PPL comparisons and adaptive Pareto optimization, yielding higher accuracy and lower token use than standard CoT on GSM8K and MATH500.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.01075","ref_index":12,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"On the Generalization Gap in Self-Evolving Language Model Reasoning","primary_cat":"cs.CL","submitted_at":"2026-05-31T07:43:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Closed-loop self-evolution on LLMs improves reasoning on Knights and Knaves tasks but plateaus short of oracle-supervised levels, with multi-turn revision nearly matching it for large models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00869","ref_index":45,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Enhancing LLM Metacognition via Cognitive Pairwise Training","primary_cat":"cs.LG","submitted_at":"2026-05-30T19:53:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CPT is introduced as a pairwise reasoning-trace comparison stage that improves the reasoning-metacognition trade-off over standard SFT+RL pipelines across model scales.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00726","ref_index":58,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Latent Reward Steering: An Adaptive Inference-Time Framework that Implicitly Promotes Cognitive Behaviors in Reasoning LLMs","primary_cat":"cs.AI","submitted_at":"2026-05-30T13:38:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LRS trains a latent reward model on final-answer correctness to steer SAE states during inference, improving reasoning performance and implicitly encouraging better cognitive behaviors.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00628","ref_index":14,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Robust Reasoning via Dynamic Token Selection for Distribution-Aligned Self-Distillation","primary_cat":"cs.CL","submitted_at":"2026-05-30T09:03:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"DASD dynamically selects tokens in self-distillation to keep logical corrections while suppressing stylistic noise, improving robustness on math, code, and commonsense benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00570","ref_index":32,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Revisiting Parameter-Based Knowledge Editing in Large Language Models: Theoretical Limits and Empirical Evidence","primary_cat":"cs.CL","submitted_at":"2026-05-30T06:44:40+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Parameter-based knowledge editing in LLMs induces reasoning collapse via dimensional collapse and is consistently outperformed by a retrieval baseline across varied edit counts, knowledge complexity, and evaluation metrics.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.31455","ref_index":7,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization","primary_cat":"cs.LG","submitted_at":"2026-05-29T15:49:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DRIFT achieves multi-turn RL performance via offline importance-weighted SFT by leveraging the equivalence of KL-regularized RL to weighted supervised learning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.31268","ref_index":27,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Mellum2 Technical Report","primary_cat":"cs.CL","submitted_at":"2026-05-29T13:01:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Mellum 2 is a 12B MoE model with 2.5B active parameters, trained on 10.6T tokens with MoE, GQA, SWA, and MTP, then post-trained into Instruct and Thinking variants, claimed competitive with 4B-14B models at 2.5B compute.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.31164","ref_index":10,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"D$^3$: Dynamic Directional Graph-Constrained Data Scheduling for LLM Training","primary_cat":"cs.CL","submitted_at":"2026-05-29T11:13:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"D³ introduces a dynamic directional graph-constrained framework that models sample interactions via loss dependencies to derive an optimized training sequence for LLMs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.07603","ref_index":18,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"MetaEvo: A Meta-Optimization Framework for Experience-Driven Agent Evolution","primary_cat":"cs.LG","submitted_at":"2026-05-29T09:31:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"MetaEvo is a two-stage framework using preference optimization for principle abstraction followed by modular reuse to enable continual improvement of LLM agents on reasoning tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.30832","ref_index":9,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"SLAT: Segment-Level Adaptive Trimming for Efficient CoT Reasoning","primary_cat":"cs.AI","submitted_at":"2026-05-29T04:37:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SLAT applies segment-level adaptive trimming in RL to reduce CoT reasoning length by 50% while maintaining competitive accuracy on benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.30789","ref_index":14,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO","primary_cat":"cs.LG","submitted_at":"2026-05-29T03:25:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Smaller models provide temporally correlated policy-level diversity that serves as structured exploration for training larger models in GRPO, yielding accuracy gains such as +8.8% on AIME 24 with reduced compute via the S2L-PO framework.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.30753","ref_index":6,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Efficient Diffusion LLMs via Temporal-Spatial Parallel Decoding and Confidence Extrapolation","primary_cat":"cs.CL","submitted_at":"2026-05-29T02:29:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Introduces TSPD with a trajectory-feature controller and training-free CE to reduce denoising steps in dLLMs while aiming to preserve quality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00132","ref_index":20,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Foundation-Preserving Adaptation via Generalized Rayleigh-Quotient Optimization","primary_cat":"cs.LG","submitted_at":"2026-05-28T21:22:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"FoLoRA applies generalized Rayleigh-quotient optimization to LoRA updates so that directions are gated by downstream utility divided by a pretraining-proxy forgetting penalty.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.30451","ref_index":6,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"VeriGate: Verifier-Gated Step-Level Supervision for GRPO","primary_cat":"cs.LG","submitted_at":"2026-05-28T18:20:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"VeriGate adds verifier-gated step-level supervision to GRPO via cumulated PRM rewards and group-normalized token advantages, raising accuracy 20% and 12% on 1.5B and 7B models on MATH and six benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29727","ref_index":29,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Bastion: Budget-Aware Speculative Decoding with Tree-structured Block Diffusion Drafting","primary_cat":"cs.LG","submitted_at":"2026-05-28T10:21:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"BASTION is a budget-aware speculative decoding framework with adaptive tree-structured block diffusion drafting that reports up to 6.61x speedup and 39% improvement over block-diffusion baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29707","ref_index":11,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding","primary_cat":"cs.CL","submitted_at":"2026-05-28T10:07:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Domino decouples causal dependency modeling from autoregressive draft execution via a parallel backbone plus lightweight causal head and a base-anchored training curriculum, reporting up to 5.49x speedup.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29343","ref_index":12,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Draft-OPD: On-Policy Distillation for Speculative Draft Models","primary_cat":"cs.CL","submitted_at":"2026-05-28T04:30:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Draft-OPD applies on-policy distillation via target-assisted generation and error replay to train speculative draft models, yielding over 5x lossless acceleration and gains over EAGLE-3 and DFlash.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29247","ref_index":6,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"DenseSteer: Steering Small Language Models towards Dense Math Reasoning","primary_cat":"cs.AI","submitted_at":"2026-05-28T02:07:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"DenseSteer is an inference-time steering framework that improves small LLMs' accuracy on math reasoning by modulating representations toward dense reasoning patterns with fewer but higher-density steps.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.28421","ref_index":10,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes","primary_cat":"cs.AI","submitted_at":"2026-05-27T12:52:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"DenoiseRL optimizes recovery from noisy prefixes in weak-model reasoning failures to improve performance and self-correction on math and general reasoning benchmarks without external supervision.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.28388","ref_index":11,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Mechanistically Interpreting the Role of Sample Difficulty in RLVR for LLMs","primary_cat":"cs.AI","submitted_at":"2026-05-27T12:25:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Sample difficulty in RLVR shows non-monotonic effects on LLM reasoning, with easy/medium problems strengthening computation and reasoning features while hard problems often yield weak or harmful signals.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.23200","ref_index":42,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Adaptive Mass-Segmented KV Compression for Long-Context Reasoning","primary_cat":"cs.LG","submitted_at":"2026-05-22T03:32:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AMS KV compression adaptively partitions the cache by attention mass regions and assigns quotas to protect contiguous reasoning blocks during long-context LLM inference.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.23078","ref_index":12,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"GEMQ: Global Expert-Level Mixed-Precision Quantization for MoE LLMs","primary_cat":"cs.LG","submitted_at":"2026-05-21T22:23:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GEMQ applies global LP-based expert importance estimation and router fine-tuning within progressive quantization to cut memory and speed inference in MoE LLMs with little accuracy loss.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22211","ref_index":17,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"CLORE: Content-Level Optimization for Reasoning Efficiency","primary_cat":"cs.AI","submitted_at":"2026-05-21T09:16:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CLORE augments correct on-policy rollouts by deleting repetitive and irrelevant segments then optimizes with auxiliary DPO to improve accuracy-efficiency trade-off on math benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22106","ref_index":9,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"ArborKV: Structure-Aware KV Cache Management for Scaling Tree-based LLM Reasoning","primary_cat":"cs.AI","submitted_at":"2026-05-21T07:40:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ArborKV uses search-structure awareness to evict low-reuse KV states in Tree-of-Thoughts inference, delivering up to 4x memory savings with near-full accuracy retention.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21856","ref_index":45,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"The Illusion of Reasoning: Exposing Evasive Data Contamination in LLMs via Zero-CoT Truncation","primary_cat":"cs.LG","submitted_at":"2026-05-21T01:06:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ZCP detects direct and evasive data contamination in LLMs by truncating CoT reasoning and contrasting zero-CoT accuracy on original versus perturbed isomorphic datasets, plus a Contamination Confidence metric.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21801","ref_index":31,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Why Semantic Entropy Fails: Geometry-Aware and Calibrated Uncertainty for Policy Optimization","primary_cat":"cs.LG","submitted_at":"2026-05-20T22:59:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Identifies two gaps in entropy-based uncertainty for LLM post-training and proposes GCPO to align geometry-aware disagreement measures with reward-based calibration for better gradient regulation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21699","ref_index":15,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"X-Token: Projection-Guided Cross-Tokenizer Knowledge Distillation","primary_cat":"cs.LG","submitted_at":"2026-05-20T19:59:31+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"X-Token proposes projection-guided P-KL and H-KL losses to fix uncommon-token suppression and over-conservative matching in logit-based cross-tokenizer distillation, yielding gains over GOLD on Llama-3.2-1B.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21606","ref_index":12,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"When Are Teacher Tokens Reliable? Position-Weighted On-Policy Self-Distillation for Reasoning","primary_cat":"cs.LG","submitted_at":"2026-05-20T18:14:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Position-Weighted On-Policy Self-Distillation (PW-OPSD) weights later tokens more heavily after a diagnostic shows position predicts teacher reliability better than entropy, yielding +1.0 and +1.1 Avg@12 gains on AIME 2024/2025.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21467","ref_index":24,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards","primary_cat":"cs.LG","submitted_at":"2026-05-20T17:53:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DelTA estimates token coefficients to amplify discriminative directions in token-gradient vectors, reweighting the RLVR surrogate to produce more contrastive side-wise centroids and yielding 3.26 and 2.62 point gains on math benchmarks for 8B and 14B Qwen3 models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21177","ref_index":37,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning","primary_cat":"cs.LG","submitted_at":"2026-05-20T13:44:44+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ChunkFT enables full-parameter fine-tuning of Llama 3-8B on one 24 GB GPU and Llama 3-70B on two 80 GB GPUs by streaming gradients over dynamically activated sub-tensors.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20722","ref_index":5,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback","primary_cat":"cs.LG","submitted_at":"2026-05-20T05:20:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"AGPO adaptively sets trust-region size and exploration temperature from group reward dispersion, entropy, and KL drift, yielding higher scores than PPO and GRPO on nine math benchmarks under fixed token budget.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20654","ref_index":45,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"REFLECTOR: Internalizing Step-wise Reflection against Indirect Jailbreak","primary_cat":"cs.LG","submitted_at":"2026-05-20T03:16:15+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20075","ref_index":12,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning","primary_cat":"cs.CL","submitted_at":"2026-05-19T16:28:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"CopT reverses CoT by eliciting a draft answer first then using continuous-embedding contrastive verification and on-policy thinking to reflect and correct, yielding up to 23% higher accuracy and 57% fewer tokens without training.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19461","ref_index":16,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Beyond Mode Collapse: Distribution Matching for Diverse Reasoning","primary_cat":"cs.AI","submitted_at":"2026-05-19T07:13:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DMPO approximates forward KL minimization in on-policy RL by aligning the policy to a group-level reward-proportional target distribution, yielding 9-12% relative gains over GRPO on NP-Bench and smaller gains on math reasoning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19444","ref_index":19,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Detecting and Mitigating the Correct-Answer Extinction Window in Test-Time Reinforcement Learning with Majority Voting","primary_cat":"cs.LG","submitted_at":"2026-05-19T06:58:44+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18753","ref_index":28,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention","primary_cat":"cs.CL","submitted_at":"2026-05-18T17:59:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DashAttention introduces differentiable adaptive sparse hierarchical attention via α-entmax block selection, achieving full-attention accuracy at 75% sparsity with improved Pareto performance over NSA and InfLLMv2.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17958","ref_index":17,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Enhancing the Code Reasoning Capabilities of LLMs via Consistency-based Reinforcement Learning","primary_cat":"cs.LG","submitted_at":"2026-05-18T07:12:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CodeThinker improves LLM code reasoning via consistency-based RL with stepwise training data, dynamic beam sampling, and consistency rewards, reaching SOTA on benchmarks with 4.3% gains on Qwen2.5-Coder-7B.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16893","ref_index":21,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"NGM: A Plug-and-Play Training-Free Memory Module for LLMs","primary_cat":"cs.AI","submitted_at":"2026-05-16T09:12:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"NGM is a plug-and-play n-gram memory module that encodes n-grams from pretrained embeddings and gates their injection to improve LLM performance by 0.5-1.2 points on average across eight benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16865","ref_index":4,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"MixSD: Mixed Contextual Self-Distillation for Knowledge Injection","primary_cat":"cs.CL","submitted_at":"2026-05-16T07:57:09+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16632","ref_index":31,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Learning How to Cube","primary_cat":"cs.LG","submitted_at":"2026-05-15T21:03:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A neuro-symbolic post-training pipeline lets a 4B transformer learn cubing heuristics that reach pass@5 of 53 on 100 SAT competition instances, matching the strongest symbolic baseline.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18871","ref_index":45,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Distributional Energy-Based Models for Uncertainty-Aware Structured LLM Reasoning","primary_cat":"cs.LG","submitted_at":"2026-05-15T17:08:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A 149M-parameter distributional energy-based verifier with low-rank adapter ensemble reduces constraint violations in structured LLM reasoning and outperforms or matches much larger models on five benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15604","ref_index":9,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"VSPO: Vector-Steered Policy Optimization for Behavioral Control","primary_cat":"cs.LG","submitted_at":"2026-05-15T04:31:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"VSPO samples rollouts at varying steering intensities to improve behavioral control in LLMs while preserving task accuracy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14738","ref_index":62,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability","primary_cat":"cs.LG","submitted_at":"2026-05-14T12:01:05+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14503","ref_index":21,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Not All RAGs Are Created Equal: A Component-Wise Empirical Study for Software Engineering Tasks","primary_cat":"cs.SE","submitted_at":"2026-05-14T07:47:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Retriever-side choices, particularly the retrieval algorithm, exert more influence on RAG performance than generator selection across code generation, summarization, and repair tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}