MedCoG: Maximizing LLM Inference Density in Medical Reasoning via Meta-Cognitive Regulation

Dacheng Tao; Hao Guan; Ying Zhang; Yongcheng Jing; Yu Zhao

arxiv: 2602.07905 · v2 · pith:B7UNA7AInew · submitted 2026-02-08 · 💻 cs.AI

MedCoG: Maximizing LLM Inference Density in Medical Reasoning via Meta-Cognitive Regulation

Yu Zhao , Hao Guan , Yongcheng Jing , Ying Zhang , Dacheng Tao This is my paper

classification 💻 cs.AI

keywords inferenceknowledgedensitymedicalreasoningscalingllmsmedcog

0 comments

read the original abstract

Large Language Models (LLMs) have shown strong potential in complex medical reasoning yet face diminishing gains under inference scaling laws. While existing studies augment LLMs with various knowledge types, it remains unclear how effectively the additional costs translate into accuracy. In this paper, we explore how meta-cognition of LLMs, i.e., their self-assessment of their own cognitive states, can regulate the reasoning process. Specifically, we propose MedCoG, a Medical Meta-Cognition Agent with Knowledge Graph, where the meta-cognitive assessments of task complexity, familiarity, and knowledge density dynamically regulate utilization of procedural, episodic, and factual knowledge. The LLM-centric on-demand reasoning aims to mitigate the diminishing returns under scaling law by (1) reducing costs via avoiding indiscriminate scaling, (2) improving accuracy via filtering out distractive knowledge. To validate this, we empirically characterize the scaling curve and introduce inference density to quantify inference efficiency. Experiments demonstrate the effectiveness and efficiency of MedCoG on five hard sets of medical benchmarks, yielding 6.2x inference density. Furthermore, the Oracle study highlights the significant potential of meta-cognitive regulation.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MIRROR: A Hierarchical Benchmark for Metacognitive Calibration in Large Language Models
cs.AI 2026-04 unverdicted novelty 7.0

MIRROR benchmark shows LLMs universally fail at compositional self-prediction and cannot translate partial self-knowledge into better agentic actions, with external metacognitive control reducing confident failures by...