{"total":13,"items":[{"citing_arxiv_id":"2606.10142","ref_index":31,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"DB-3DME: From Dataset to Benchmark for Human-aligned Automatic 3D Mesh Evaluation","primary_cat":"cs.CV","submitted_at":"2026-06-08T20:17:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DB-3DME supplies a human-rated 3D mesh dataset and shows that fine-tuning the visual encoder of Qwen-2.5-VL-7B produces automatic evaluations that align better with humans than prior VLMs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.05315","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LoRi: Low-Rank Distillation for Implicit Reasoning","primary_cat":"cs.CL","submitted_at":"2026-06-03T18:05:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LoRi distills implicit chain-of-thought by matching low-rank structures in hidden states, raising math-reasoning accuracy toward explicit CoT levels on LLaMA and Qwen models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22873","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions","primary_cat":"cs.LG","submitted_at":"2026-05-20T03:15:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Early entropy dynamics during LLM decoding mark when explicit reasoning becomes beneficial, enabling the training-free EDRM router that selects strategies per instance and yields 41-55% token savings with accuracy gains across 15 benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14344","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"CrystalReasoner: Reasoning and RL for Property-Conditioned Crystal Structure Generation","primary_cat":"cs.AI","submitted_at":"2026-05-14T04:08:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CrystalReasoner combines LLM reasoning traces with physical priors and multi-objective RL to generate valid, stable, and property-conditioned crystal structures.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11260","ref_index":38,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Curriculum Learning-Guided Progressive Distillation in Large Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-11T21:37:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"CLPD improves LLM distillation for reasoning by combining explicit data curriculum with progressive teacher scheduling of increasing capacity.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Angles don't lie: Unlocking training-efficient rl through the model's own signals.arXiv preprint arXiv:2506.02281, 2025. [37] Zayne Sprague, Fangcong Yin, Juan Diego Rodriguez, Dongwei Jiang, Manya Wadhwa, Prasann Singhal, Xinyu Zhao, Xi Ye, Kyle Mahowald, and Greg Durrett. To cot or not to cot? chain- of-thought helps mainly on math and symbolic reasoning.arXiv preprint arXiv:2409.12183, 2024. [38] Leandro V on Werra, Younes Belkada, Sajjad Bayat, and Thomas Wolf. Trl: Transformer reinforcement learning library. https://github.com/huggingface/trl, 2020. Hugging Face. [39] Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022."},{"citing_arxiv_id":"2605.05715","ref_index":60,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Decodable but Not Corrected by Fixed Residual-Stream Linear Steering: Evidence from Medical LLM Failure Regimes","primary_cat":"cs.AI","submitted_at":"2026-05-07T05:58:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Overthinking in medical QA is linearly decodable at 71.6% accuracy yet fixed residual-stream steering yields no correction across 29 configurations, while enabling selective abstention with AUROC 0.610.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.27143","ref_index":25,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Enhancing Linux Privilege Escalation Attack Capabilities of Local LLM Agents","primary_cat":"cs.CR","submitted_at":"2026-04-29T19:54:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Targeted prompting and system interventions enable local LLMs such as Llama 3.1 70B to exploit 83% of tested Linux privilege escalation vulnerabilities.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Despite their general capabilities, LLMs often exhibit deficiencies in structured reasoning, long-horizon planning, and domain-specific knowledge. A variety of enhancement techniques have been proposed to mitigate these limitations. Chain-of-Thought (CoT)prompting encourages models to generate in- termediate reasoning steps before producing an answer, improving performance in multi-step reasoning tasks [25,30,25,17,3]. CoT can be elicited via few-shot exemplar traces [30], zero-shot cues such as\"Let's think step by step\"[13], or Plan-and-Solve prompting [28] which separates planning from execution. Retrieval Augmented Generation (RAG)augments LLMs retrieving relevant documents from external knowledge sources and incorporating them into the prompt [5,14], extending the model's knowledge without modifying its"},{"citing_arxiv_id":"2604.25155","ref_index":77,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Rethinking Wireless Communications through Formal Mathematical AI Reasoning","primary_cat":"eess.SP","submitted_at":"2026-04-28T02:57:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Proposes a three-layer framework using formal AI reasoning for verification, derivation, and discovery in wireless communications theory.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"fication of coverage probability expressions, and closed-form approximation discovery for bit error rate analysis in complex fading environments. D. LLM-based Reasoning LLMs have demonstrated emerging capabilities in multi- step mathematical reasoning through chain-of-thought prompt- ing, which encourages models to articulate intermediate derivation steps before producing a final answer [76], [77]. Numerical reasoning capabilities have been further strength- ened through targeted pretraining strategies and skill injection techniques that endow models with stronger numeracy and arithmetic grounding [78], [79]. Self-consistency and process reward models improve reliability by sampling multiple rea- soning paths and selecting results that are consistent across"},{"citing_arxiv_id":"2604.15994","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ReactBench: A Benchmark for Topological Reasoning in MLLMs on Chemical Reaction Diagrams","primary_cat":"cs.AI","submitted_at":"2026-04-17T12:16:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ReactBench benchmark shows MLLMs suffer over 30% performance drop on complex topological reasoning tasks versus basic ones when evaluated on chemical reaction diagrams.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.09237","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ScheMatiQ: From Research Question to Structured Data through Interactive Schema Discovery","primary_cat":"cs.CL","submitted_at":"2026-04-10T11:51:23+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08232","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"HiRO-Nav: Hybrid ReasOning Enables Efficient Embodied Navigation","primary_cat":"cs.AI","submitted_at":"2026-04-09T13:22:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"HiRO-Nav adaptively triggers reasoning only on high-entropy actions via a hybrid training pipeline and shows better success-token trade-offs than always-reason or never-reason baselines on the CHORES-S benchmark.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Deepseekmath: Pushing the limits of math- ematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024. 1 [28] Mohit Shridhar, Xingdi Yuan, Marc-Alexandre C ˆot'e, Yonatan Bisk, Adam Trischler, and Matthew Hausknecht. Alfworld: Aligning text and embodied environments for in- teractive learning.arXiv preprint arXiv:2010.03768, 2020. 8 [29] Zayne Sprague, Fangcong Yin, Juan Diego Rodriguez, Dongwei Jiang, Manya Wadhwa, Prasann Singhal, Xinyu Zhao, Xi Ye, Kyle Mahowald, and Greg Durrett. To cot or not to cot? chain-of-thought helps mainly on math and sym- bolic reasoning.arXiv preprint arXiv:2409.12183, 2024. 2, 3, 1 [30] Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean- Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk,"},{"citing_arxiv_id":"2601.06993","ref_index":31,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Can Textual Reasoning Improve the Performance of MLLMs on Fine-grained Visual Classification?","primary_cat":"cs.CV","submitted_at":"2026-01-11T17:07:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Longer textual reasoning chains degrade MLLM accuracy on fine-grained visual tasks; a new normalization and constrained-reward training framework mitigates the effect and sets new SOTA numbers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2504.05605","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ShadowCoT: Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs","primary_cat":"cs.CR","submitted_at":"2025-04-08T01:36:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ShadowCoT introduces a reasoning-level backdoor attack on LLMs achieving 94.4% attack success rate and 88.4% hijacking success rate with 0.15% parameter updates via internal state conditioning and reasoning chain pollution.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}