{"total":16,"items":[{"citing_arxiv_id":"2606.29928","ref_index":28,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Latent-CURE for Breast Cancer Diagnosis","primary_cat":"cs.CV","submitted_at":"2026-06-29T08:05:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Latent-CURE introduces latent-space chain-of-thought reasoning and dual-asymmetric optimization to produce transparent, robust breast cancer diagnoses in imbalanced cohorts.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.25354","ref_index":33,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Efficient and Trainable Language Model Test-Time Scaling via Local Branch Routing","primary_cat":"cs.CL","submitted_at":"2026-06-24T03:42:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LBR performs token-level test-time scaling via local branch routing on hidden states, enabling end-to-end RL training and improving Pass@1 and Pass@32 on math benchmarks over CoT and RLVR baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.10184","ref_index":37,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Dropout-GRPO: Variational Stochasticity for Continuous Latent Reasoning","primary_cat":"cs.LG","submitted_at":"2026-06-08T21:21:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Dropout-GRPO uses structured dropout to generate trajectory variance for GRPO in latent-reasoning models like Coconut, raising GSM8K pass@1 from 27.29% to 29.01%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.01168","ref_index":51,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Thinking Economically: A Hierarchical Framework for Adaptive-Complexity Reasoning in LLMs","primary_cat":"cs.CL","submitted_at":"2026-05-31T11:20:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"HAB applies coarse-to-fine budgeting to LLM reasoning, predicting per-problem depth and learning intra-step token budgets via PPL comparisons and adaptive Pareto optimization, yielding higher accuracy and lower token use than standard CoT on GSM8K and MATH500.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19376","ref_index":20,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Generative Recursive Reasoning","primary_cat":"cs.AI","submitted_at":"2026-05-19T05:20:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GRAM is a latent-variable generative model that performs recursive reasoning via stochastic trajectories, trained with amortized variational inference to support multi-hypothesis reasoning and unconditional generation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17187","ref_index":54,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media","primary_cat":"cs.CL","submitted_at":"2026-05-16T22:52:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16961","ref_index":48,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Latent Action Control for Reasoning-Guided Unified Image Generation","primary_cat":"cs.CV","submitted_at":"2026-05-16T12:23:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Latent Action Control learns unobserved action trajectories via variational alignment and GRPO to inject reasoning into flow-based image generation, yielding gains on compositional benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11856","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"UniVLR: Unifying Text and Vision in Visual Latent Reasoning for Multimodal LLMs","primary_cat":"cs.CV","submitted_at":"2026-05-12T09:40:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"UniVLR unifies textual and visual reasoning in multimodal LLMs by compressing reasoning traces and auxiliary images into visual latent tokens for direct inference without interleaved text CoT.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[15] Yiming Qin, Bomin Wei, Jiaxin Ge, Konstantinos Kallidromitis, Stephanie Fu, Trevor Darrell, and Xudong Wang. Chain-of-visual-thought: Teaching vlms to see and think better with continuous visual tokens.CoRR, abs/2511.19418, 2025. 10 [16] Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, and Yuan- dong Tian. Training large language models to reason in a continuous latent space.CoRR, abs/2412.06769, 2024. [17] Zhen Zhang, Xuehai He, Weixiang Yan, Ao Shen, Chenyang Zhao, Shuohang Wang, Yelong Shen, and Xin Eric Wang. Soft thinking: Unlocking the reasoning potential of llms in continuous concept space.CoRR, abs/2505.15778, 2025. [18] Xinlei Yu, Zhangquan Chen, Yongbo He, Tianyu Fu, Cheng Yang, Chengming Xu, Yue Ma, Xiaobin Hu, Zhe Cao, Jie Xu, et al. The latent space: Foundation, evolution, mechanism, ability,"},{"citing_arxiv_id":"2604.22709","ref_index":47,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Thinking Without Words: Efficient Latent Reasoning with Abstract Chain-of-Thought","primary_cat":"cs.CL","submitted_at":"2026-04-24T16:45:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Abstract-CoT lets models reason with short discrete latent token sequences from a reserved vocabulary, using warm-up training and RL to match verbal CoT performance with up to 11.6x fewer tokens.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.17892","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LEPO: Latent Reasoning Policy Optimization for Large Language Models","primary_cat":"cs.LG","submitted_at":"2026-04-20T07:05:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LEPO applies RL to continuous latent representations in LLMs by injecting Gumbel-Softmax stochasticity for diverse trajectory sampling and unified gradient estimation, outperforming existing discrete and latent RL methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08299","ref_index":57,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SeLaR: Selective Latent Reasoning in Large Language Models","primary_cat":"cs.CL","submitted_at":"2026-04-09T14:32:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SeLaR selectively applies latent soft reasoning in LLMs via entropy gating and contrastive regularization, outperforming standard CoT on five benchmarks without training.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.03679","ref_index":79,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LightThinker++: From Reasoning Compression to Memory Management","primary_cat":"cs.CL","submitted_at":"2026-04-04T10:46:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LightThinker++ adds explicit adaptive memory management and a trajectory synthesis pipeline to LLM reasoning, cutting peak token use by ~70% while gaining accuracy in standard and long-horizon agent tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.12529","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"TERMINATOR: Learning Optimal Exit Points for Early Stopping in Chain-of-Thought Reasoning","primary_cat":"cs.LG","submitted_at":"2026-03-13T00:07:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Terminator learns to predict optimal early-exit points in chain-of-thought reasoning by training on the first positions where the model emits its final answer, yielding 14-55% shorter outputs with no accuracy loss.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.09850","ref_index":23,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Towards Explainable Industrial Anomaly Detection via Knowledge-Guided Latent Reasoning","primary_cat":"cs.CV","submitted_at":"2026-02-10T14:54:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Reason-IAD improves explainable industrial anomaly detection by combining retrieval-augmented category knowledge with entropy-guided latent reasoning and dynamic visual patch injection in MLLMs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.12623","ref_index":37,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space","primary_cat":"cs.CV","submitted_at":"2025-12-14T10:07:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DMLR performs dynamic visual-textual interleaving in latent space using confidence-guided latent policy gradient optimization and a dynamic visual injection strategy, yielding improved multimodal reasoning on benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.02547","ref_index":205,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Landscape of Agentic Reinforcement Learning for LLMs: A Survey","primary_cat":"cs.AI","submitted_at":"2025-09-02T17:46:26+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"extended deliberation depending on task complexity. Developing such cognitively-aligned mechanisms is a key step toward building reasoning agents that are both efficient and reliable. 21 3.6. Perception By bridging visual perception with linguistic abstraction, Large Vision-Language Models (LVLMs) have demonstrated unprecedented capabilities for perceiving and understanding multimodal content [200, 201, 202, 203, 204, 205, 206, 207]. Central to this progress is the incorporation of explicit reasoning mechanisms into multimodal learning frameworks [208, 209], moving beyond passive perception toward active visual cognition [210]. RL has emerged as a powerful paradigm for this purpose, enabling the alignment of vision-language-action models with complex, multi-step reasoning objectives that go beyond the constraints"}],"limit":50,"offset":0}