hub

Chain-of-thought prompting elicits reasoning in large language models

· 2022

36 Pith papers cite this work. Polarity classification is still indexing.

36 Pith papers citing it

browse 36 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning

cs.CL · 2026-05-21 · unverdicted · novelty 7.0

LatentOmni proposes a latent-space cross-modal reasoning framework that uses feature-level supervision and Omni-Sync Position Embedding to align and synchronize audio-visual latents, supported by a new 35K interleaved reasoning dataset and showing gains over text CoT baselines.

CAN-QA: A Question-Answering Benchmark for Reasoning over In-Vehicle CAN Traffic

cs.CR · 2026-04-27 · accept · novelty 7.0

CAN-QA creates 33,128 QA pairs from CAN traffic logs in 10 categories to test LLMs, which capture patterns but struggle with temporal reasoning and multi-condition inference.

Weak-Link Optimization for Multi-Agent Reasoning and Collaboration

cs.AI · 2026-04-17 · unverdicted · novelty 7.0

WORC improves multi-agent LLM reasoning to 82.2% average accuracy by predicting and compensating for the weakest agent via targeted extra sampling rather than uniform reinforcement.

Structural Anchors and Reasoning Fragility:Understanding CoT Robustness in LLM4Code

cs.SE · 2026-04-14 · unverdicted · novelty 7.0

CoT prompting in LLM4Code shows mixed robustness that depends on model family, task structure, and perturbations destabilizing structural anchors, leading to trajectory deformations like lengthening, branching, and simplification.

Debate-Enhanced Pseudo Labeling and Frequency-Aware Progressive Debiasing for Weakly-Supervised Camouflaged Object Detection with Scribble Annotations

cs.CV · 2025-12-23 · unverdicted · novelty 7.0

D³ETOR combines debate-enhanced pseudo labeling from SAM with frequency-aware progressive debiasing in FADeNet to achieve state-of-the-art weakly-supervised camouflaged object detection using scribbles.

ProCrit: Self-Elicited Multi-Perspective Reasoning with Critic-Guided Revision for Multimodal Sarcasm Detection

cs.MA · 2026-05-20 · unverdicted · novelty 6.0

ProCrit proposes a Proposal-Critic framework that synthesizes process-level annotations via agentic rollout and uses draft-critique-revise with mutual-refinement RL to improve multimodal sarcasm detection.

When Model Editing Meets Service Evolution: A Knowledge-Update Perspective for Service Recommendation

cs.SE · 2026-04-29 · unverdicted · novelty 6.0

EVOREC integrates locate-then-edit model editing with FA-constrained decoding to improve LLM-based service recommendation under evolution, reporting 25.9% average relative gain in Recall@5 over baselines and 22.3% over fine-tuning in dynamic scenarios.

Decoupled Travel Planning with Behavior Forest

cs.LG · 2026-04-23 · unverdicted · novelty 6.0

Behavior Forest decouples multi-constraint travel planning into parallel behavior trees with LLM nodes and global coordination, yielding 6.67% and 11.82% gains over prior methods on two benchmarks.

CoDA: Towards Effective Cross-domain Knowledge Transfer via CoT-guided Domain Adaptation

cs.AI · 2026-04-21 · unverdicted · novelty 6.0

CoDA aligns cross-domain latent reasoning representations in LLMs via CoT distillation and MMD to enable effective knowledge transfer without in-domain demonstrations.

Proactive Detection of GUI Defects in Multi-Window Scenarios via Multimodal Reasoning

cs.SE · 2026-04-21 · unverdicted · novelty 6.0

Proactive multi-window state triggering plus Set-of-Mark alignment and multimodal LLM reasoning detects GUI defects in Android apps, reporting 184% more text truncation, 87.2% F1 on occlusion, and 40 defect-prone apps at 10% FPR.

PARM: Pipeline-Adapted Reward Model

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

PARM adapts reward models to multi-stage LLM pipelines via pipeline data and direct preference optimization, improving execution rate and solving accuracy on optimization benchmarks and showing transfer to GSM8K.

AQPIM: Breaking the PIM Capacity Wall for LLMs with In-Memory Activation Quantization

cs.AR · 2026-04-20 · unverdicted · novelty 6.0

AQPIM performs in-memory product quantization of activations for LLMs on PIM hardware, reducing GPU-CPU communication by 90-98.5% and delivering 3.4x speedup over prior PIM methods.

LoReC: Rethinking Large Language Models for Graph Data Analysis

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

LoReC enhances LLMs for graph tasks via attention redistribution, graph re-injection into FFN, and logit rectification, yielding improvements over GraphLLM and GNN baselines on diverse datasets.

Chat-Scene++: Exploiting Context-Rich Object Identification for 3D LLM

cs.CV · 2026-03-29 · unverdicted · novelty 6.0

Chat-Scene++ improves 3D scene understanding in multimodal LLMs by representing scenes as context-rich object sequences with identifier tokens and grounded chain-of-thought reasoning, reaching state-of-the-art on five benchmarks using pre-trained encoders.

KG-Hopper: Empowering Compact Open LLMs with Knowledge Graph Reasoning via Reinforcement Learning

cs.CL · 2026-03-22 · unverdicted · novelty 6.0

KG-Hopper uses RL to embed full multi-hop KG traversal and backtracking into a single LLM inference round, enabling a 7B model to outperform larger multi-step systems and compete with GPT-3.5/GPT-4o-mini on eight benchmarks.

TDA-RC: Task-Driven Alignment for Knowledge-Based Reasoning Chains in Large Language Models

cs.CL · 2026-03-13 · unverdicted · novelty 6.0

TDA-RC embeds topological patterns from multi-round reasoning into CoT via persistent homology and a repair agent, yielding better accuracy-efficiency trade-offs than ToT or GoT on tested datasets.

C2F-Thinker: Coarse-to-Fine Reasoning with Hint-Guided Reinforcement Learning for Multimodal Sentiment Analysis

cs.CL · 2026-03-10 · unverdicted · novelty 6.0

C2F-Thinker combines structured coarse-to-fine chain-of-thought reasoning with hint-guided GRPO reinforcement learning to achieve competitive fine-grained sentiment regression and superior cross-domain generalization in multimodal analysis.

VERDI: VLM-Embedded Reasoning for Autonomous Driving

cs.RO · 2025-05-21 · conditional · novelty 6.0

VERDI aligns perception, prediction, and planning outputs of end-to-end AD models with VLM-generated text features at training time to embed structured reasoning, yielding up to 11% better l2 distance and 10% higher non-collision rate in closed-loop tests.

General Hazard Detection

cs.CV · 2026-05-22 · unverdicted · novelty 5.0

Introduces CompliVision dataset and active learning framework for rule-based hazard compliance assessment using vision-language models grounded in safety standards.

Agentic AIs Are the Missing Paradigm for Out-of-Distribution Generalization in Foundation Models

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

Agentic AI systems are required to overcome the parameter coverage ceiling that prevents foundation models from handling certain out-of-distribution cases.

AutoVQA-G: Self-Improving Agentic Framework for Automated Visual Question Answering and Grounding Annotation

cs.CV · 2026-04-19 · unverdicted · novelty 5.0

AutoVQA-G is a self-improving framework that generates VQA-G datasets with higher visual grounding accuracy than leading multimodal LLMs via iterative CoT verification and prompt refinement.

Rethinking the Necessity of Adaptive Retrieval-Augmented Generation through the Lens of Adaptive Listwise Ranking

cs.IR · 2026-04-17 · unverdicted · novelty 5.0

AdaRankLLM shows adaptive listwise reranking outperforms fixed-depth retrieval for most LLMs by acting as a noise filter for weak models and an efficiency optimizer for strong ones, with lower context use.

Cross-Lingual Attention Distillation with Personality-Informed Generative Augmentation for Multilingual Personality Recognition

cs.CL · 2026-04-10 · unverdicted · novelty 5.0

ADAM uses personality-guided LLM augmentation and cross-lingual attention distillation to raise balanced accuracy on multilingual personality recognition to 0.6332 on Essays and 0.7448 on Kaggle, outperforming standard BCE loss.

Can LLMs Make (Personalized) Access Control Decisions?

cs.CR · 2025-11-25 · unverdicted · novelty 5.0

LLMs reflect users' privacy preferences in access control decisions with up to 86% agreement and can promote safer behavior, but personalization trades off higher individual match for potentially less secure results when users over-permission.

citing papers explorer

Showing 36 of 36 citing papers.

LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning cs.CL · 2026-05-21 · unverdicted · none · ref 15
LatentOmni proposes a latent-space cross-modal reasoning framework that uses feature-level supervision and Omni-Sync Position Embedding to align and synchronize audio-visual latents, supported by a new 35K interleaved reasoning dataset and showing gains over text CoT baselines.
CAN-QA: A Question-Answering Benchmark for Reasoning over In-Vehicle CAN Traffic cs.CR · 2026-04-27 · accept · none · ref 12
CAN-QA creates 33,128 QA pairs from CAN traffic logs in 10 categories to test LLMs, which capture patterns but struggle with temporal reasoning and multi-condition inference.
Weak-Link Optimization for Multi-Agent Reasoning and Collaboration cs.AI · 2026-04-17 · unverdicted · none · ref 3
WORC improves multi-agent LLM reasoning to 82.2% average accuracy by predicting and compensating for the weakest agent via targeted extra sampling rather than uniform reinforcement.
Structural Anchors and Reasoning Fragility:Understanding CoT Robustness in LLM4Code cs.SE · 2026-04-14 · unverdicted · none · ref 13
CoT prompting in LLM4Code shows mixed robustness that depends on model family, task structure, and perturbations destabilizing structural anchors, leading to trajectory deformations like lengthening, branching, and simplification.
Debate-Enhanced Pseudo Labeling and Frequency-Aware Progressive Debiasing for Weakly-Supervised Camouflaged Object Detection with Scribble Annotations cs.CV · 2025-12-23 · unverdicted · none · ref 31
D³ETOR combines debate-enhanced pseudo labeling from SAM with frequency-aware progressive debiasing in FADeNet to achieve state-of-the-art weakly-supervised camouflaged object detection using scribbles.
ProCrit: Self-Elicited Multi-Perspective Reasoning with Critic-Guided Revision for Multimodal Sarcasm Detection cs.MA · 2026-05-20 · unverdicted · none · ref 29
ProCrit proposes a Proposal-Critic framework that synthesizes process-level annotations via agentic rollout and uses draft-critique-revise with mutual-refinement RL to improve multimodal sarcasm detection.
When Model Editing Meets Service Evolution: A Knowledge-Update Perspective for Service Recommendation cs.SE · 2026-04-29 · unverdicted · none · ref 17
EVOREC integrates locate-then-edit model editing with FA-constrained decoding to improve LLM-based service recommendation under evolution, reporting 25.9% average relative gain in Recall@5 over baselines and 22.3% over fine-tuning in dynamic scenarios.
Decoupled Travel Planning with Behavior Forest cs.LG · 2026-04-23 · unverdicted · none · ref 64
Behavior Forest decouples multi-constraint travel planning into parallel behavior trees with LLM nodes and global coordination, yielding 6.67% and 11.82% gains over prior methods on two benchmarks.
CoDA: Towards Effective Cross-domain Knowledge Transfer via CoT-guided Domain Adaptation cs.AI · 2026-04-21 · unverdicted · none · ref 4
CoDA aligns cross-domain latent reasoning representations in LLMs via CoT distillation and MMD to enable effective knowledge transfer without in-domain demonstrations.
Proactive Detection of GUI Defects in Multi-Window Scenarios via Multimodal Reasoning cs.SE · 2026-04-21 · unverdicted · none · ref 9
Proactive multi-window state triggering plus Set-of-Mark alignment and multimodal LLM reasoning detects GUI defects in Android apps, reporting 184% more text truncation, 87.2% F1 on occlusion, and 40 defect-prone apps at 10% FPR.
PARM: Pipeline-Adapted Reward Model cs.AI · 2026-04-20 · unverdicted · none · ref 30
PARM adapts reward models to multi-stage LLM pipelines via pipeline data and direct preference optimization, improving execution rate and solving accuracy on optimization benchmarks and showing transfer to GSM8K.
AQPIM: Breaking the PIM Capacity Wall for LLMs with In-Memory Activation Quantization cs.AR · 2026-04-20 · unverdicted · none · ref 66
AQPIM performs in-memory product quantization of activations for LLMs on PIM hardware, reducing GPU-CPU communication by 90-98.5% and delivering 3.4x speedup over prior PIM methods.
LoReC: Rethinking Large Language Models for Graph Data Analysis cs.LG · 2026-04-20 · unverdicted · none · ref 13
LoReC enhances LLMs for graph tasks via attention redistribution, graph re-injection into FFN, and logit rectification, yielding improvements over GraphLLM and GNN baselines on diverse datasets.
Chat-Scene++: Exploiting Context-Rich Object Identification for 3D LLM cs.CV · 2026-03-29 · unverdicted · none · ref 27
Chat-Scene++ improves 3D scene understanding in multimodal LLMs by representing scenes as context-rich object sequences with identifier tokens and grounded chain-of-thought reasoning, reaching state-of-the-art on five benchmarks using pre-trained encoders.
KG-Hopper: Empowering Compact Open LLMs with Knowledge Graph Reasoning via Reinforcement Learning cs.CL · 2026-03-22 · unverdicted · none · ref 28
KG-Hopper uses RL to embed full multi-hop KG traversal and backtracking into a single LLM inference round, enabling a 7B model to outperform larger multi-step systems and compete with GPT-3.5/GPT-4o-mini on eight benchmarks.
TDA-RC: Task-Driven Alignment for Knowledge-Based Reasoning Chains in Large Language Models cs.CL · 2026-03-13 · unverdicted · none · ref 4
TDA-RC embeds topological patterns from multi-round reasoning into CoT via persistent homology and a repair agent, yielding better accuracy-efficiency trade-offs than ToT or GoT on tested datasets.
C2F-Thinker: Coarse-to-Fine Reasoning with Hint-Guided Reinforcement Learning for Multimodal Sentiment Analysis cs.CL · 2026-03-10 · unverdicted · none · ref 10
C2F-Thinker combines structured coarse-to-fine chain-of-thought reasoning with hint-guided GRPO reinforcement learning to achieve competitive fine-grained sentiment regression and superior cross-domain generalization in multimodal analysis.
VERDI: VLM-Embedded Reasoning for Autonomous Driving cs.RO · 2025-05-21 · conditional · none · ref 26
VERDI aligns perception, prediction, and planning outputs of end-to-end AD models with VLM-generated text features at training time to embed structured reasoning, yielding up to 11% better l2 distance and 10% higher non-collision rate in closed-loop tests.
General Hazard Detection cs.CV · 2026-05-22 · unverdicted · none · ref 37
Introduces CompliVision dataset and active learning framework for rule-based hazard compliance assessment using vision-language models grounded in safety standards.
Agentic AIs Are the Missing Paradigm for Out-of-Distribution Generalization in Foundation Models cs.LG · 2026-05-07 · unverdicted · none · ref 44
Agentic AI systems are required to overcome the parameter coverage ceiling that prevents foundation models from handling certain out-of-distribution cases.
AutoVQA-G: Self-Improving Agentic Framework for Automated Visual Question Answering and Grounding Annotation cs.CV · 2026-04-19 · unverdicted · none · ref 23
AutoVQA-G is a self-improving framework that generates VQA-G datasets with higher visual grounding accuracy than leading multimodal LLMs via iterative CoT verification and prompt refinement.
Rethinking the Necessity of Adaptive Retrieval-Augmented Generation through the Lens of Adaptive Listwise Ranking cs.IR · 2026-04-17 · unverdicted · none · ref 40
AdaRankLLM shows adaptive listwise reranking outperforms fixed-depth retrieval for most LLMs by acting as a noise filter for weak models and an efficiency optimizer for strong ones, with lower context use.
Cross-Lingual Attention Distillation with Personality-Informed Generative Augmentation for Multilingual Personality Recognition cs.CL · 2026-04-10 · unverdicted · none · ref 17
ADAM uses personality-guided LLM augmentation and cross-lingual attention distillation to raise balanced accuracy on multilingual personality recognition to 0.6332 on Essays and 0.7448 on Kaggle, outperforming standard BCE loss.
Can LLMs Make (Personalized) Access Control Decisions? cs.CR · 2025-11-25 · unverdicted · none · ref 19
LLMs reflect users' privacy preferences in access control decisions with up to 86% agreement and can promote safer behavior, but personalization trades off higher individual match for potentially less secure results when users over-permission.
OpsAgent: An Evolving Multi-agent System for Incident Management in Microservices cs.AI · 2025-10-28 · unverdicted · none · ref 39
OpsAgent presents a training-free multi-agent framework with dual self-evolution for automated incident management in microservices, claiming SOTA results on OPENRCA benchmark and successful production deployment at Lenovo.
RF Instrument Agent (RFIA): Empowering RF Instruments with Natural Language Understanding, Scheduling and Execution of Complex Tasks eess.SY · 2026-05-22 · unverdicted · none · ref 8
RFIA presents a decoupled LLM-agent architecture for natural-language RF instrument control, with a structured knowledge base and hybrid execution graphs, evaluated on a 16-task VNA benchmark.
Hierarchical Prompting with Dual LLM Modules for Robotic Task and Motion Planning cs.RO · 2026-05-08 · unverdicted · none · ref 15
A dual-LLM hierarchical framework for robotic task and motion planning, integrating object detection, achieves 86% success across 24 test scenarios ranging from simple spatial commands to infeasible requests.
CyberAId: AI-Driven Cybersecurity for Financial Service Providers cs.AI · 2026-05-03 · unverdicted · none · ref 11
CyberAId is a proposed on-premise multi-agent system that coordinates LLM subagents with classical security tools to improve threat response and regulatory alignment in financial services.
Plausible but Wrong: A case study on Agentic Failures in Astrophysical Workflows cs.AI · 2026-04-28 · unverdicted · none · ref 2
CMBAgent achieves high accuracy on well-specified astrophysical tasks with context but generates silent, plausible-yet-incorrect outputs on reasoning-challenging problems, with no self-diagnosis of inconsistencies.
Rethinking Wireless Communications through Formal Mathematical AI Reasoning eess.SP · 2026-04-28 · unverdicted · none · ref 76
Proposes a three-layer framework using formal AI reasoning for verification, derivation, and discovery in wireless communications theory.
Automated Auditing of Hospital Discharge Summaries for Care Transitions cs.AI · 2026-04-07 · unverdicted · none · ref 16
An LLM-based framework automates auditing of discharge summaries using a DISCHARGED-derived checklist on MIMIC-IV data to detect missing or ambiguous documentation elements.
AICCE: AI Driven Compliance Checker Engine cs.CR · 2026-04-03 · unverdicted · none · ref 14
AICCE combines RAG-based retrieval of protocol specs with dual LLM pipelines for debate-driven explanations or fast script execution, reporting up to 99% accuracy on IPv6 samples.
From Prompts to Pavement: LMMs-based Agentic Behavior-Tree Generation Framework for Autonomous Vehicles cs.CV · 2026-01-18 · unverdicted · none · ref 9
An agentic LLM/LVM framework generates adaptive behavior trees on-the-fly for AV navigation in CARLA+Nav2 simulation, succeeding in obstacle avoidance where static BTs fail.
LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator cs.RO · 2025-12-11 · unverdicted · none · ref 24
LEO-RobotAgent is a general-purpose framework that enables LLMs to independently plan, use tools, and collaborate with humans while operating multiple robot types for unpredictable tasks.
DeliCIR: Deliberative Test-Time Evolutionary Hierarchical Multi-Agents for Composed Image Retrieval cs.CV · 2026-05-21 · unreviewed · ref 46
Cognitive Pivot Points and Visual Anchoring: Unveiling and Rectifying Hallucinations in Multimodal Reasoning Models cs.AI · 2026-04-11 · unreviewed · ref 7

Chain-of-thought prompting elicits reasoning in large language models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer