{"total":12,"items":[{"citing_arxiv_id":"2606.01168","ref_index":56,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Thinking Economically: A Hierarchical Framework for Adaptive-Complexity Reasoning in LLMs","primary_cat":"cs.CL","submitted_at":"2026-05-31T11:20:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"HAB applies coarse-to-fine budgeting to LLM reasoning, predicting per-problem depth and learning intra-step token budgets via PPL comparisons and adaptive Pareto optimization, yielding higher accuracy and lower token use than standard CoT on GSM8K and MATH500.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10195","ref_index":41,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Breaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Exploration","primary_cat":"cs.LG","submitted_at":"2026-05-11T08:45:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SPEX delivers 1.2-3x speedup on ToT algorithms via speculative path selection, dynamic budget allocation, and adaptive early termination, reaching up to 4.1x when combined with token-level speculative decoding.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"arXiv preprint arXiv:2505.22662, 2025. [39] Mathematical Association of America. American Invita- tional Mathematics Examination (AIME) 2024. https: //www.maa.org/math-competitions/aime, 2024. [40] Mathematical Association of America. American Invita- tional Mathematics Examination (AIME) 2025. https: //www.maa.org/math-competitions/aime, 2025. 14 [41] Xuefei Ning, Zinan Lin, Zixuan Zhou, Zifu Wang, Huazhong Yang, and Yu Wang. Skeleton-of-thought: Prompting llms for efficient parallel generation.arXiv preprint arXiv:2307.15337, 2023. [42] OpenAI. Learning to reason with llms. Technical report, OpenAI, 2024. Technical Report. [43] OpenAI. OpenAI O3-mini System Card. Technical report, OpenAI, January 2025."},{"citing_arxiv_id":"2605.06914","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Regulating Branch Parallelism in LLM Serving","primary_cat":"cs.DC","submitted_at":"2026-05-07T20:23:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TAPER regulates LLM branch parallelism by admitting extra branches opportunistically when predicted externality fits slack, delivering 1.48-1.77x higher goodput than eager or fixed-cap baselines on Qwen3-32B while keeping over 95% SLO attainment.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Figure 1:Intra-request parallelism across workloads.Proportion of decomposable requests (PDR), parallel token share (PTS), and average branch fanout (ABF) for three datasets. 2.1 Intra-Request parallelism in the wild Several recent methods shorten the critical path of LLM decoding by exposing independent branches within a single response. Skeleton-of-Thought [4] expands outline points concurrently; APAR [6] emits explicit branch tokens; PASTA [5] introduces asynchronous promises; ASPD [8] trains branch- invisible attention masks for native parallel decoding; and Multiverse [9] exposes Map/Process/Re- duce control flow. These methods differ in how branches are discovered and represented, but they produce the same serving-visible structure [8]."},{"citing_arxiv_id":"2604.20685","ref_index":87,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MGDA-Decoupled: Geometry-Aware Multi-Objective Optimisation for DPO-based LLM Alignment","primary_cat":"cs.LG","submitted_at":"2026-04-22T15:33:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MGDA-Decoupled applies geometry-based multi-objective optimization within the DPO framework to find shared descent directions that account for each objective's convergence dynamics, yielding higher win rates on UltraFeedback.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.21619","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"On the Overscaling Curse of Parallel Thinking: System Efficacy Contradicts Sample Efficiency","primary_cat":"cs.LG","submitted_at":"2026-01-29T12:22:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Parallel thinking in LLMs suffers from overscaling where fixed global budgets waste samples; LanBo predicts per-sample budgets from latent states to raise utilization without hurting accuracy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.14044","ref_index":24,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"OmniDrive-R1: Reinforcement-driven Interleaved Multi-modal Chain-of-Thought for Trustworthy Vision-Language Autonomous Driving","primary_cat":"cs.CV","submitted_at":"2025-12-16T03:19:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"OmniDrive-R1 boosts VLM reasoning score from 51.77% to 80.35% and answer accuracy from 37.81% to 73.62% on DriveLMM-o1 via reinforcement-driven interleaved multi-modal chain-of-thought with annotation-free grounding.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.23322","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Mitigating Visual Context Degradation in Large Multimodal Models: A Training-Free Decoupled Agentic Framework","primary_cat":"cs.CV","submitted_at":"2025-09-27T14:13:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DRP decouples reasoning from perception in LMMs by using an LLM reasoner to query an LMM observer for visual details as needed, reducing visual grounding loss.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2507.21035","ref_index":78,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis","primary_cat":"cs.AI","submitted_at":"2025-07-28T17:55:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GenoMAS deploys six specialized LLM agents with guided planning to preprocess transcriptomic data and identify genes, reaching 89.13% composite similarity and 60.48% F1 on the GenoTEX benchmark while outperforming prior methods.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"sumed approximately 0.3GB CPU RAM during execution, remaining consistent across different LLM 8 backbones. For LLM inference, we employed a hybrid deployment strategy: proprietary models (e.g., OpenAI o3, Claude Sonnet 4) were accessed via their official APIs, while open-source models (DeepSeek R1, Qwen 3 235B) were served through Novita AI's infrastructure [78], offering reduced latency com- pared to their respective official endpoints. Metrics Our end-to-end evaluation assesses the ability of GenoMAS to identify significant genes by analyzing raw input data in GTA tasks. We adopt AUROC and F 1 scores, as detailed in Section 3.3, as primary indicators of analytical performance. To further validate the regression models produced"},{"citing_arxiv_id":"2504.02181","ref_index":144,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Survey of Scaling in Large Language Model Reasoning","primary_cat":"cs.AI","submitted_at":"2025-04-02T23:51:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Studies indicate that fine-tuned large-scale LLMs substantially outperform smaller alternatives [70], with per- formance scaling with model size [90, 144] across financial decision- making tasks. The multi-step reasoning capabilities of scaled LLMs prove particularly valuable for complex financial analysis, signifi- cantly outperforming direct approaches [144, 243]. Financial sen- timent analysis benefits from increased numbers of examples in many-shot ICL scenarios [2]. RAG-based approaches incorporating banking webpages and policy guides improve question-answering performance, with results scaling with the number of retrieved documents [234]. Multi-agent debate frameworks yield promising results in investment and trading decision scenarios [209, 225, 226],"},{"citing_arxiv_id":"2412.13171","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Compressed Chain of Thought: Efficient Reasoning Through Dense Representations","primary_cat":"cs.CL","submitted_at":"2024-12-17T18:50:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CCoT generates variable-length continuous contemplation tokens that compress explicit reasoning chains, enabling additional dense reasoning and accuracy gains in off-the-shelf language models while allowing adaptive control of token count.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2404.14294","ref_index":47,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Survey on Efficient Inference for Large Language Models","primary_cat":"cs.CL","submitted_at":"2024-04-22T15:53:08+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":3.0,"formal_verification":"none","one_line_summary":"The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"duces reflection tokens to make the LLM controllable during the inference phase. 4.2 Output Organization The traditional generation process of LLMs is entirely se- quential, leading to significant time consumption. Output organization techniques aim to (partially) parallelize gener- ation via organizing the structure of output content. Skeleton-of-Thought (SoT) [47] is pioneering in this di- rection. The core idea behind SoT is to leverage the emerg- ing ability of LLMs to plan the output content's struc- ture. Specifically, SoT consists of two main phases. In the first phase (i.e., skeleton phase), SoT instructs the LLM to generate a concise skeleton of the answer using a prede- fined \"skeleton prompt.\" For instance, given a question like"},{"citing_arxiv_id":"2309.01219","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models","primary_cat":"cs.CL","submitted_at":"2023-09-03T16:56:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A literature survey that taxonomizes hallucination phenomena in LLMs, reviews evaluation benchmarks, and analyzes approaches for their detection, explanation, and mitigation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}