LatentOmni proposes a latent-space cross-modal reasoning framework that uses feature-level supervision and Omni-Sync Position Embedding to align and synchronize audio-visual latents, supported by a new 35K interleaved reasoning dataset and showing gains over text CoT baselines.
hub
Chain-of-thought prompting elicits reasoning in large language models
36 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
CAN-QA creates 33,128 QA pairs from CAN traffic logs in 10 categories to test LLMs, which capture patterns but struggle with temporal reasoning and multi-condition inference.
WORC improves multi-agent LLM reasoning to 82.2% average accuracy by predicting and compensating for the weakest agent via targeted extra sampling rather than uniform reinforcement.
CoT prompting in LLM4Code shows mixed robustness that depends on model family, task structure, and perturbations destabilizing structural anchors, leading to trajectory deformations like lengthening, branching, and simplification.
D³ETOR combines debate-enhanced pseudo labeling from SAM with frequency-aware progressive debiasing in FADeNet to achieve state-of-the-art weakly-supervised camouflaged object detection using scribbles.
ProCrit proposes a Proposal-Critic framework that synthesizes process-level annotations via agentic rollout and uses draft-critique-revise with mutual-refinement RL to improve multimodal sarcasm detection.
EVOREC integrates locate-then-edit model editing with FA-constrained decoding to improve LLM-based service recommendation under evolution, reporting 25.9% average relative gain in Recall@5 over baselines and 22.3% over fine-tuning in dynamic scenarios.
Behavior Forest decouples multi-constraint travel planning into parallel behavior trees with LLM nodes and global coordination, yielding 6.67% and 11.82% gains over prior methods on two benchmarks.
CoDA aligns cross-domain latent reasoning representations in LLMs via CoT distillation and MMD to enable effective knowledge transfer without in-domain demonstrations.
Proactive multi-window state triggering plus Set-of-Mark alignment and multimodal LLM reasoning detects GUI defects in Android apps, reporting 184% more text truncation, 87.2% F1 on occlusion, and 40 defect-prone apps at 10% FPR.
PARM adapts reward models to multi-stage LLM pipelines via pipeline data and direct preference optimization, improving execution rate and solving accuracy on optimization benchmarks and showing transfer to GSM8K.
AQPIM performs in-memory product quantization of activations for LLMs on PIM hardware, reducing GPU-CPU communication by 90-98.5% and delivering 3.4x speedup over prior PIM methods.
LoReC enhances LLMs for graph tasks via attention redistribution, graph re-injection into FFN, and logit rectification, yielding improvements over GraphLLM and GNN baselines on diverse datasets.
Chat-Scene++ improves 3D scene understanding in multimodal LLMs by representing scenes as context-rich object sequences with identifier tokens and grounded chain-of-thought reasoning, reaching state-of-the-art on five benchmarks using pre-trained encoders.
KG-Hopper uses RL to embed full multi-hop KG traversal and backtracking into a single LLM inference round, enabling a 7B model to outperform larger multi-step systems and compete with GPT-3.5/GPT-4o-mini on eight benchmarks.
TDA-RC embeds topological patterns from multi-round reasoning into CoT via persistent homology and a repair agent, yielding better accuracy-efficiency trade-offs than ToT or GoT on tested datasets.
C2F-Thinker combines structured coarse-to-fine chain-of-thought reasoning with hint-guided GRPO reinforcement learning to achieve competitive fine-grained sentiment regression and superior cross-domain generalization in multimodal analysis.
VERDI aligns perception, prediction, and planning outputs of end-to-end AD models with VLM-generated text features at training time to embed structured reasoning, yielding up to 11% better l2 distance and 10% higher non-collision rate in closed-loop tests.
Introduces CompliVision dataset and active learning framework for rule-based hazard compliance assessment using vision-language models grounded in safety standards.
Agentic AI systems are required to overcome the parameter coverage ceiling that prevents foundation models from handling certain out-of-distribution cases.
AutoVQA-G is a self-improving framework that generates VQA-G datasets with higher visual grounding accuracy than leading multimodal LLMs via iterative CoT verification and prompt refinement.
AdaRankLLM shows adaptive listwise reranking outperforms fixed-depth retrieval for most LLMs by acting as a noise filter for weak models and an efficiency optimizer for strong ones, with lower context use.
ADAM uses personality-guided LLM augmentation and cross-lingual attention distillation to raise balanced accuracy on multilingual personality recognition to 0.6332 on Essays and 0.7448 on Kaggle, outperforming standard BCE loss.
LLMs reflect users' privacy preferences in access control decisions with up to 86% agreement and can promote safer behavior, but personalization trades off higher individual match for potentially less secure results when users over-permission.
citing papers explorer
-
LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning
LatentOmni proposes a latent-space cross-modal reasoning framework that uses feature-level supervision and Omni-Sync Position Embedding to align and synchronize audio-visual latents, supported by a new 35K interleaved reasoning dataset and showing gains over text CoT baselines.
-
CAN-QA: A Question-Answering Benchmark for Reasoning over In-Vehicle CAN Traffic
CAN-QA creates 33,128 QA pairs from CAN traffic logs in 10 categories to test LLMs, which capture patterns but struggle with temporal reasoning and multi-condition inference.
-
Weak-Link Optimization for Multi-Agent Reasoning and Collaboration
WORC improves multi-agent LLM reasoning to 82.2% average accuracy by predicting and compensating for the weakest agent via targeted extra sampling rather than uniform reinforcement.
-
Structural Anchors and Reasoning Fragility:Understanding CoT Robustness in LLM4Code
CoT prompting in LLM4Code shows mixed robustness that depends on model family, task structure, and perturbations destabilizing structural anchors, leading to trajectory deformations like lengthening, branching, and simplification.
-
Debate-Enhanced Pseudo Labeling and Frequency-Aware Progressive Debiasing for Weakly-Supervised Camouflaged Object Detection with Scribble Annotations
D³ETOR combines debate-enhanced pseudo labeling from SAM with frequency-aware progressive debiasing in FADeNet to achieve state-of-the-art weakly-supervised camouflaged object detection using scribbles.
-
ProCrit: Self-Elicited Multi-Perspective Reasoning with Critic-Guided Revision for Multimodal Sarcasm Detection
ProCrit proposes a Proposal-Critic framework that synthesizes process-level annotations via agentic rollout and uses draft-critique-revise with mutual-refinement RL to improve multimodal sarcasm detection.
-
When Model Editing Meets Service Evolution: A Knowledge-Update Perspective for Service Recommendation
EVOREC integrates locate-then-edit model editing with FA-constrained decoding to improve LLM-based service recommendation under evolution, reporting 25.9% average relative gain in Recall@5 over baselines and 22.3% over fine-tuning in dynamic scenarios.
-
Decoupled Travel Planning with Behavior Forest
Behavior Forest decouples multi-constraint travel planning into parallel behavior trees with LLM nodes and global coordination, yielding 6.67% and 11.82% gains over prior methods on two benchmarks.
-
CoDA: Towards Effective Cross-domain Knowledge Transfer via CoT-guided Domain Adaptation
CoDA aligns cross-domain latent reasoning representations in LLMs via CoT distillation and MMD to enable effective knowledge transfer without in-domain demonstrations.
-
Proactive Detection of GUI Defects in Multi-Window Scenarios via Multimodal Reasoning
Proactive multi-window state triggering plus Set-of-Mark alignment and multimodal LLM reasoning detects GUI defects in Android apps, reporting 184% more text truncation, 87.2% F1 on occlusion, and 40 defect-prone apps at 10% FPR.
-
PARM: Pipeline-Adapted Reward Model
PARM adapts reward models to multi-stage LLM pipelines via pipeline data and direct preference optimization, improving execution rate and solving accuracy on optimization benchmarks and showing transfer to GSM8K.
-
AQPIM: Breaking the PIM Capacity Wall for LLMs with In-Memory Activation Quantization
AQPIM performs in-memory product quantization of activations for LLMs on PIM hardware, reducing GPU-CPU communication by 90-98.5% and delivering 3.4x speedup over prior PIM methods.
-
LoReC: Rethinking Large Language Models for Graph Data Analysis
LoReC enhances LLMs for graph tasks via attention redistribution, graph re-injection into FFN, and logit rectification, yielding improvements over GraphLLM and GNN baselines on diverse datasets.
-
Chat-Scene++: Exploiting Context-Rich Object Identification for 3D LLM
Chat-Scene++ improves 3D scene understanding in multimodal LLMs by representing scenes as context-rich object sequences with identifier tokens and grounded chain-of-thought reasoning, reaching state-of-the-art on five benchmarks using pre-trained encoders.
-
KG-Hopper: Empowering Compact Open LLMs with Knowledge Graph Reasoning via Reinforcement Learning
KG-Hopper uses RL to embed full multi-hop KG traversal and backtracking into a single LLM inference round, enabling a 7B model to outperform larger multi-step systems and compete with GPT-3.5/GPT-4o-mini on eight benchmarks.
-
TDA-RC: Task-Driven Alignment for Knowledge-Based Reasoning Chains in Large Language Models
TDA-RC embeds topological patterns from multi-round reasoning into CoT via persistent homology and a repair agent, yielding better accuracy-efficiency trade-offs than ToT or GoT on tested datasets.
-
C2F-Thinker: Coarse-to-Fine Reasoning with Hint-Guided Reinforcement Learning for Multimodal Sentiment Analysis
C2F-Thinker combines structured coarse-to-fine chain-of-thought reasoning with hint-guided GRPO reinforcement learning to achieve competitive fine-grained sentiment regression and superior cross-domain generalization in multimodal analysis.
-
VERDI: VLM-Embedded Reasoning for Autonomous Driving
VERDI aligns perception, prediction, and planning outputs of end-to-end AD models with VLM-generated text features at training time to embed structured reasoning, yielding up to 11% better l2 distance and 10% higher non-collision rate in closed-loop tests.
-
General Hazard Detection
Introduces CompliVision dataset and active learning framework for rule-based hazard compliance assessment using vision-language models grounded in safety standards.
-
Agentic AIs Are the Missing Paradigm for Out-of-Distribution Generalization in Foundation Models
Agentic AI systems are required to overcome the parameter coverage ceiling that prevents foundation models from handling certain out-of-distribution cases.
-
AutoVQA-G: Self-Improving Agentic Framework for Automated Visual Question Answering and Grounding Annotation
AutoVQA-G is a self-improving framework that generates VQA-G datasets with higher visual grounding accuracy than leading multimodal LLMs via iterative CoT verification and prompt refinement.
-
Rethinking the Necessity of Adaptive Retrieval-Augmented Generation through the Lens of Adaptive Listwise Ranking
AdaRankLLM shows adaptive listwise reranking outperforms fixed-depth retrieval for most LLMs by acting as a noise filter for weak models and an efficiency optimizer for strong ones, with lower context use.
-
Cross-Lingual Attention Distillation with Personality-Informed Generative Augmentation for Multilingual Personality Recognition
ADAM uses personality-guided LLM augmentation and cross-lingual attention distillation to raise balanced accuracy on multilingual personality recognition to 0.6332 on Essays and 0.7448 on Kaggle, outperforming standard BCE loss.
-
Can LLMs Make (Personalized) Access Control Decisions?
LLMs reflect users' privacy preferences in access control decisions with up to 86% agreement and can promote safer behavior, but personalization trades off higher individual match for potentially less secure results when users over-permission.
-
OpsAgent: An Evolving Multi-agent System for Incident Management in Microservices
OpsAgent presents a training-free multi-agent framework with dual self-evolution for automated incident management in microservices, claiming SOTA results on OPENRCA benchmark and successful production deployment at Lenovo.
-
RF Instrument Agent (RFIA): Empowering RF Instruments with Natural Language Understanding, Scheduling and Execution of Complex Tasks
RFIA presents a decoupled LLM-agent architecture for natural-language RF instrument control, with a structured knowledge base and hybrid execution graphs, evaluated on a 16-task VNA benchmark.
-
Hierarchical Prompting with Dual LLM Modules for Robotic Task and Motion Planning
A dual-LLM hierarchical framework for robotic task and motion planning, integrating object detection, achieves 86% success across 24 test scenarios ranging from simple spatial commands to infeasible requests.
-
CyberAId: AI-Driven Cybersecurity for Financial Service Providers
CyberAId is a proposed on-premise multi-agent system that coordinates LLM subagents with classical security tools to improve threat response and regulatory alignment in financial services.
-
Plausible but Wrong: A case study on Agentic Failures in Astrophysical Workflows
CMBAgent achieves high accuracy on well-specified astrophysical tasks with context but generates silent, plausible-yet-incorrect outputs on reasoning-challenging problems, with no self-diagnosis of inconsistencies.
-
Rethinking Wireless Communications through Formal Mathematical AI Reasoning
Proposes a three-layer framework using formal AI reasoning for verification, derivation, and discovery in wireless communications theory.
-
Automated Auditing of Hospital Discharge Summaries for Care Transitions
An LLM-based framework automates auditing of discharge summaries using a DISCHARGED-derived checklist on MIMIC-IV data to detect missing or ambiguous documentation elements.
-
AICCE: AI Driven Compliance Checker Engine
AICCE combines RAG-based retrieval of protocol specs with dual LLM pipelines for debate-driven explanations or fast script execution, reporting up to 99% accuracy on IPv6 samples.
-
From Prompts to Pavement: LMMs-based Agentic Behavior-Tree Generation Framework for Autonomous Vehicles
An agentic LLM/LVM framework generates adaptive behavior trees on-the-fly for AV navigation in CARLA+Nav2 simulation, succeeding in obstacle avoidance where static BTs fail.
-
LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator
LEO-RobotAgent is a general-purpose framework that enables LLMs to independently plan, use tools, and collaborate with humans while operating multiple robot types for unpredictable tasks.
- DeliCIR: Deliberative Test-Time Evolutionary Hierarchical Multi-Agents for Composed Image Retrieval
- Cognitive Pivot Points and Visual Anchoring: Unveiling and Rectifying Hallucinations in Multimodal Reasoning Models