ReactBench benchmark shows MLLMs suffer over 30% performance drop on complex topological reasoning tasks versus basic ones when evaluated on chemical reaction diagrams.
To cot or not to cot? chain- of-thought helps mainly on math and symbolic reasoning
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 8verdicts
UNVERDICTED 8roles
background 2polarities
background 2representative citing papers
ScheMatiQ uses LLMs to automatically generate schemas and extract structured data from text corpora based on natural language questions, supported by interactive user steering.
HiRO-Nav adaptively triggers reasoning only on high-entropy actions via a hybrid training pipeline and shows better success-token trade-offs than always-reason or never-reason baselines on the CHORES-S benchmark.
CLPD improves LLM distillation for reasoning by combining explicit data curriculum with progressive teacher scheduling of increasing capacity.
Overthinking in medical QA is linearly decodable at 71.6% accuracy yet fixed residual-stream steering yields no correction across 29 configurations, while enabling selective abstention with AUROC 0.610.
Targeted prompting and system interventions enable local LLMs such as Llama 3.1 70B to exploit 83% of tested Linux privilege escalation vulnerabilities.
Longer textual reasoning chains degrade MLLM accuracy on fine-grained visual tasks; a new normalization and constrained-reward training framework mitigates the effect and sets new SOTA numbers.
Proposes a three-layer framework using formal AI reasoning for verification, derivation, and discovery in wireless communications theory.
citing papers explorer
-
ReactBench: A Benchmark for Topological Reasoning in MLLMs on Chemical Reaction Diagrams
ReactBench benchmark shows MLLMs suffer over 30% performance drop on complex topological reasoning tasks versus basic ones when evaluated on chemical reaction diagrams.
-
ScheMatiQ: From Research Question to Structured Data through Interactive Schema Discovery
ScheMatiQ uses LLMs to automatically generate schemas and extract structured data from text corpora based on natural language questions, supported by interactive user steering.
-
HiRO-Nav: Hybrid ReasOning Enables Efficient Embodied Navigation
HiRO-Nav adaptively triggers reasoning only on high-entropy actions via a hybrid training pipeline and shows better success-token trade-offs than always-reason or never-reason baselines on the CHORES-S benchmark.
-
Curriculum Learning-Guided Progressive Distillation in Large Language Models
CLPD improves LLM distillation for reasoning by combining explicit data curriculum with progressive teacher scheduling of increasing capacity.
-
Decodable but Not Corrected by Fixed Residual-Stream Linear Steering: Evidence from Medical LLM Failure Regimes
Overthinking in medical QA is linearly decodable at 71.6% accuracy yet fixed residual-stream steering yields no correction across 29 configurations, while enabling selective abstention with AUROC 0.610.
-
Enhancing Linux Privilege Escalation Attack Capabilities of Local LLM Agents
Targeted prompting and system interventions enable local LLMs such as Llama 3.1 70B to exploit 83% of tested Linux privilege escalation vulnerabilities.
-
Can Textual Reasoning Improve the Performance of MLLMs on Fine-grained Visual Classification?
Longer textual reasoning chains degrade MLLM accuracy on fine-grained visual tasks; a new normalization and constrained-reward training framework mitigates the effect and sets new SOTA numbers.
-
Rethinking Wireless Communications through Formal Mathematical AI Reasoning
Proposes a three-layer framework using formal AI reasoning for verification, derivation, and discovery in wireless communications theory.