Towards Robust Endogenous Reasoning: Unifying Drift Adaptation in Non-Stationary Tuning
Pith reviewed 2026-05-10 08:25 UTC · model grok-4.3
The pith
CPO++ adapts reinforcement fine-tuning of MLLMs to endogenous multi-modal concept drift through counterfactual reasoning and preference optimization, yielding better coherence and cross-domain robustness in safety-critical settings.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MLLMs are highly susceptible to endogenous reasoning drift... CPO++ achieves superior performance in reasoning coherence, decision-making precision, and inherent robustness against extreme interference with exceptional zero-shot cross-domain generalization.
Load-bearing premise
That controlled counterfactual perturbations combined with preference optimization can reliably disentangle spurious correlations caused by endogenous drift without introducing new instabilities or domain-specific biases.
Figures
read the original abstract
Reinforcement Fine-Tuning (RFT) has established itself as a critical paradigm for the alignment of Multi-modal Large Language Models (MLLMs) with complex human values and domain-specific requirements. Nevertheless, current research primarily focuses on mitigating exogenous distribution shifts arising from data-centric factors, the non-stationarity inherent in the endogenous reasoning remains largely unexplored. In this work, a critical vulnerability is revealed within MLLMs: they are highly susceptible to endogenous reasoning drift, across both thinking and perception perspectives. It manifests as unpredictable distribution changes that emerge spontaneously during the autoregressive generation process, independent of external environmental perturbations. To adapt it, we first theoretically define endogenous reasoning drift within the RFT of MLLMs as the multi-modal concept drift. In this context, this paper proposes Counterfactual Preference Optimization ++ (CPO++), a comprehensive and autonomous framework adapted to the multi-modal concept drift. It integrates counterfactual reasoning with domain knowledge to execute controlled perturbations across thinking and perception, employing preference optimization to disentangle spurious correlations. Extensive empirical evaluations across two highly dynamic and safety-critical domains: medical diagnosis and autonomous driving. They demonstrate that the proposed framework achieves superior performance in reasoning coherence, decision-making precision, and inherent robustness against extreme interference. The methodology also exhibits exceptional zero-shot cross-domain generalization, providing a principled foundation for reliable multi-modal reasoning in safety-critical applications.
Editorial analysis
A structured set of objections, weighed in public.
Circularity Check
No significant circularity; derivation chain is conceptual with no equations or reductions shown
full rationale
The abstract and available text present a high-level framework proposal (CPO++) and a conceptual definition of endogenous reasoning drift as multi-modal concept drift, but contain no mathematical derivations, equations, parameter fits, or self-citations. No load-bearing steps reduce to inputs by construction, and the description does not invoke uniqueness theorems or rename known results. This is the expected honest non-finding when the paper's chain is not mathematically specified.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Endogenous reasoning drift manifests as unpredictable distribution changes during autoregressive generation independent of external perturbations
invented entities (1)
-
endogenous reasoning drift
no independent evidence
Forward citations
Cited by 1 Pith paper
-
Autonomous Drift Learning in Data Streams: A Unified Perspective
A survey proposes a novel 3D taxonomy classifying drifts into time stream, data stream, and model stream categories to unify research on non-stationary autonomous learning.
Reference graph
Works this paper leans on
-
[1]
ReFT: Reasoning with Reinforced Fine-Tuning,
L. Trung, X. Zhang, Z. Jie, P. Sun, X. Jin, and H. Li, “ReFT: Reasoning with Reinforced Fine-Tuning,” inProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), L.-W. Ku, A. Martins, and V . Srikumar, Eds. Bangkok, Thailand: Association for Computational Linguistics, 2024, pp. 7601– 7614
work page 2024
-
[2]
Visual-RFT: Visual Reinforcement Fine-Tuning
Z. Liu, Z. Sun, Y . Zang, X. Dong, Y . Cao, H. Duan, D. Lin, and J. Wang, “Visual-rft: Visual reinforcement fine-tuning,”arXiv preprint arXiv:2503.01785, 2025
work page internal anchor Pith review arXiv 2025
-
[3]
Reason-rft: Reinforcement fine-tuning for visual reasoning of vision language models,
H. Tan, Y . Ji, X. Hao, X. Chen, P. Wang, Z. Wang, and S. Zhang, “Reason-rft: Reinforcement fine-tuning for visual reasoning of vision language models,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[4]
Sft memorizes, rl generalizes: A comparative study of foundation model post-training,
T. Chu, Y . Zhai, J. Yang, S. Tong, S. Xie, D. Schuurmans, Q. V . Le, S. Levine, and Y . Ma, “Sft memorizes, rl generalizes: A comparative study of foundation model post-training,” inInternational Conference on Machine Learning. PMLR, 2025, pp. 10 818–10 838
work page 2025
-
[5]
RL fine-tuning heals the OOD forgetting in SFT,
H. Jin, S. Luan, S. Lyu, G. Rabusseau, D. Precup, and M. Hamdaqa, “RL fine-tuning heals the OOD forgetting in SFT,” inFirst Workshop on Foundations of Reasoning in Language Models, 2025. [Online]. Available: https://openreview.net/forum?id=SN1PCQ0ApV
work page 2025
-
[6]
H. Wang, Z. Wu, G. J. Kolar, H. R. Korsapati, B. Bartlett, B. Hull, and J. Sun, “Reinforcement learning for out-of-distribution reasoning in LLMs: An empirical study on diagnosis-related group coding,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. [Online]. Available: https: //openreview.net/forum?id=0jvnfH0WYV
work page 2025
-
[7]
Y . Miao, L. Ding, S. Zhang, R. Bao, L. Zhang, and D. Tao, “Information-theoretic reward modeling for stable rlhf: Detecting and mitigating reward hacking,”arXiv preprint arXiv:2510.13694, 2025
-
[8]
Reward shaping to mitigate reward hacking in RLHF,
J. Fu, X. Zhao, C. Yao, H. Wang, Q. Han, and Y . Xiao, “Reward shaping to mitigate reward hacking in RLHF,” inICML 2025 Workshop on Reliable and Responsible Foundation Models, 2025. [Online]. Available: https://openreview.net/forum?id=62A4d5Mokc
work page 2025
-
[9]
RRM: Robust reward model training mitigates reward hacking,
T. Liu, W. Xiong, J. Ren, L. Chen, J. Wu, R. Joshi, Y . Gao, J. Shen, Z. Qin, T. Yu, D. Sohn, A. Makarova, J. Z. Liu, Y . Liu, B. Piot, A. Ittycheriah, A. Kumar, and M. Saleh, “RRM: Robust reward model training mitigates reward hacking,” inThe Thirteenth International Conference on Learning Representations, 2025. [Online]. Available: https://openreview.ne...
work page 2025
-
[10]
DAPO: An open-source LLM reinforcement learning system at scale,
Q. Yu, Z. Zhang, R. Zhu, Y . Yuan, X. Zuo, YuYue, W. Dai, T. Fan, G. Liu, J. Liu, L. Liu, X. Liu, H. Lin, Z. Lin, B. Ma, G. Sheng, Y . Tong, C. Zhang, M. Zhang, R. Zhang, W. Zhang, H. Zhu, J. Zhu, J. Chen, J. Chen, C. Wang, H. Yu, Y . Song, X. Wei, H. Zhou, J. Liu, W.-Y . Ma, Y .-Q. Zhang, L. Yan, Y . Wu, and M. Wang, “DAPO: An open-source LLM reinforceme...
work page 2025
-
[11]
Beyond reverse KL: Generalizing direct preference optimization with diverse divergence constraints,
C. Wang, Y . Jiang, C. Yang, H. Liu, and Y . Chen, “Beyond reverse KL: Generalizing direct preference optimization with diverse divergence constraints,” inThe Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https: //openreview.net/forum?id=2cRzmWXK9N
work page 2024
-
[12]
Is DPO superior to PPO for LLM alignment? a comprehensive study,
S. Xu, W. Fu, J. Gao, W. Ye, W. Liu, Z. Mei, G. Wang, C. Yu, and Y . Wu, “Is DPO superior to PPO for LLM alignment? a comprehensive study,” inForty-first International Conference on Machine Learning, 2024. [Online]. Available: https: //openreview.net/forum?id=6XH8R7YrSk 13
work page 2024
-
[13]
Learning dynamics of LLM finetuning,
Y . Ren and D. J. Sutherland, “Learning dynamics of LLM finetuning,” inThe Thirteenth International Conference on Learning Representations, 2025. [Online]. Available: https://openreview.net/ forum?id=tPNHOoZFl9
work page 2025
-
[14]
K. Kumar, T. Ashraf, O. Thawakar, R. M. Anwer, H. Cholakkal, M. Shah, M.-H. Yang, P. H. Torr, F. S. Khan, and S. Khan, “Llm post- training: A deep dive into reasoning large language models,”arXiv preprint arXiv:2502.21321, 2025
-
[15]
Q. Team, “Qwen2.5-vl,” January 2025. [Online]. Available: https: //qwenlm.github.io/blog/qwen2.5-vl/
work page 2025
-
[16]
Direct Preference Optimization: Your Language Model is Secretly a Reward Model,
R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn, “Direct Preference Optimization: Your Language Model is Secretly a Reward Model,” vol. 36, pp. 53 728–53 741, 2023. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/2023/hash/ a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html
work page 2023
-
[17]
A. E. Johnson, T. J. Pollard, S. J. Berkowitz, N. R. Greenbaum, M. P. Lungren, C.-y. Deng, R. G. Mark, and S. Horng, “Mimic-cxr, a de- identified publicly available database of chest radiographs with free- text reports,”Scientific data, vol. 6, no. 1, p. 317, 2019
work page 2019
-
[18]
Efficient streaming language models with attention sinks,
G. Xiao, Y . Tian, B. Chen, S. Han, and M. Lewis, “Efficient streaming language models with attention sinks,” inThe Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/forum?id=NG7sS51zVF
work page 2024
-
[19]
X. Yang, J. Lu, and E. Yu, “Walking the tightrope: Autonomous disentangling beneficial and detrimental drifts in non-stationary custom-tuning,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. [Online]. Available: https: //openreview.net/forum?id=1BAiQmAFsx
work page 2025
-
[20]
Models, reasoning and inference,
J. Pearlet al., “Models, reasoning and inference,”Cambridge, UK: CambridgeUniversityPress, vol. 19, no. 2, p. 3, 2000
work page 2000
-
[21]
Interpretation and identification of causal mediation
J. Pearl, “Interpretation and identification of causal mediation.”Psy- chological methods, vol. 19, no. 4, p. 459, 2014
work page 2014
-
[22]
Reinforced Self-Training (ReST) for Language Modeling
C. Gulcehre, T. L. Paine, S. Srinivasan, K. Konyushkova, L. Weerts, A. Sharma, A. Siddhant, A. Ahern, M. Wang, C. Gu, W. Macherey, A. Doucet, O. Firat, and N. de Freitas, “Reinforced self- training (rest) for language modeling,” 2023. [Online]. Available: https://arxiv.org/abs/2308.08998
work page Pith review arXiv 2023
-
[23]
Scaling relationship on learning mathematical reasoning with large language models,
Z. Yuan, H. Yuan, C. Li, G. Dong, K. Lu, C. Tan, C. Zhou, and J. Zhou, “Scaling relationship on learning mathematical reasoning with large language models,” 2024. [Online]. Available: https://openreview.net/forum?id=cijO0f8u35
work page 2024
-
[24]
B-STar: Monitoring and balancing exploration and exploitation in self-taught reasoners,
W. Zeng, Y . Huang, L. Zhao, Y . Wang, Z. Shan, and J. He, “B-STar: Monitoring and balancing exploration and exploitation in self-taught reasoners,” inThe Thirteenth International Conference on Learning Representations, 2025. [Online]. Available: https: //openreview.net/forum?id=P6dwZJpJ4m
work page 2025
-
[25]
The Lessons of Developing Process Reward Models in Mathematical Reasoning
Z. Zhang, C. Zheng, Y . Wu, B. Zhang, R. Lin, B. Yu, D. Liu, J. Zhou, and J. Lin, “The lessons of developing process reward models in mathematical reasoning,”arXiv preprint arXiv:2501.07301, 2025
work page internal anchor Pith review arXiv 2025
-
[26]
Scalar: Spatial-concept alignment for robust vision in harsh open world,
X. Yang, L. Xu, X. Zeng, X. Wang, H. Li, and S. Zhang, “Scalar: Spatial-concept alignment for robust vision in harsh open world,” Pattern Recognition, p. 113203, 2026
work page 2026
-
[27]
S. Young, X. Zeng, and L. Xu, “Fewer tokens, greater scaling: Self-adaptive visual bases for efficient and expansive representation learning,”arXiv preprint arXiv:2511.19515, 2025
-
[28]
X. Yang, J. Lu, and E. Yu, “Learning from all: Concept alignment for autonomous distillation from multiple drifting mllms,” arXiv preprint arXiv:2510.04142, 2025. [Online]. Available: https: //arxiv.org/abs/2510.04142
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[29]
Deep reinforcement learning from human preferences,
P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei, “Deep reinforcement learning from human preferences,” Advances in neural information processing systems, vol. 30, 2017
work page 2017
-
[30]
Training language models to follow instructions with human feedback,
L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Rayet al., “Training language models to follow instructions with human feedback,”Advances in neural information processing systems, vol. 35, pp. 27 730–27 744, 2022
work page 2022
-
[31]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[32]
Constitutional AI: Harmlessness from AI Feedback
Y . Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnonet al., “Constitutional ai: Harmlessness from ai feedback,”arXiv preprint arXiv:2212.08073, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[33]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, Q. Zhu, S. Ma, P. Wang, X. Biet al., “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,”arXiv preprint arXiv:2501.12948, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[34]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y . Liet al., “Deepseekmath: Pushing the limits of mathematical reasoning in open language models,”arXiv preprint arXiv:2402.03300, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[35]
From System 1 to System 2: A Survey of Reasoning Large Language Models
Z.-Z. Li, D. Zhang, M.-L. Zhang, J. Zhang, Z. Liu, Y . Yao, H. Xu, J. Zheng, P.-J. Wang, X. Chenet al., “From system 1 to system 2: A survey of reasoning large language models,”arXiv preprint arXiv:2502.17419, 2025
work page internal anchor Pith review arXiv 2025
-
[36]
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models
Q. Chen, L. Qin, J. Liu, D. Peng, J. Guan, P. Wang, M. Hu, Y . Zhou, T. Gao, and W. Che, “Towards reasoning era: A survey of long chain-of-thought for reasoning large language models,”arXiv preprint arXiv:2503.09567, 2025
work page internal anchor Pith review arXiv 2025
-
[37]
Unleashing the potential of diffusion models towards diversified sequential recommendations,
Z. Cai, S. Wang, V . W. Chu, U. Naseem, Y . Wang, and F. Chen, “Unleashing the potential of diffusion models towards diversified sequential recommendations,” inProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025, pp. 1476–1486
work page 2025
-
[38]
M. Lu, Y . Zhang, M. Wu, and Y . Feng, “From query to counsel: Structured reasoning with a multi-agent framework and dataset for legal consultation,” 2026. [Online]. Available: [https: //arxiv.org/abs/2604.10470](https://arxiv.org/abs/2604.10470)
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[39]
Multimodal Chain-of-Thought Reasoning in Language Models
Z. Zhang, A. Zhang, M. Li, H. Zhao, G. Karypis, and A. Smola, “Multimodal chain-of-thought reasoning in language models,”arXiv preprint arXiv:2302.00923, 2023
work page internal anchor Pith review arXiv 2023
-
[40]
Q. Chen, L. Qin, J. Zhang, Z. Chen, X. Xu, and W. Che, “m 3 cot: A novel benchmark for multi-domain multi-step multi-modal chain-of- thought,”arXiv preprint arXiv:2405.16473, 2024
-
[41]
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Y . Wang, S. Wu, Y . Zhang, S. Yan, Z. Liu, J. Luo, and H. Fei, “Multimodal chain-of-thought reasoning: A comprehensive survey,” arXiv preprint arXiv:2503.12605, 2025
work page internal anchor Pith review arXiv 2025
-
[42]
Ddcot: Duty-distinct chain-of-thought prompting for multimodal reasoning in language models,
G. Zheng, B. Yang, J. Tang, H.-Y . Zhou, and S. Yang, “Ddcot: Duty-distinct chain-of-thought prompting for multimodal reasoning in language models,”Advances in Neural Information Processing Systems, vol. 36, pp. 5168–5191, 2023
work page 2023
-
[43]
Steering diffusion models towards credible content recommendation,
Z. Cai, S. Wang, J. Li, P. Zhou, V . W. Chu, F. Chen, T. Zhu, and C. C. Aggarwal, “Steering diffusion models towards credible content recommendation,” inThe Fourteenth International Conference on Learning Representations, 2026
work page 2026
-
[44]
From newborn to impact: Bias-aware citation prediction,
M. Lu, M. Wu, J. Xu, W. Li, F. Liu, Y . Ding, Y . Sun, J. Lu, and Y . Zhang, “From newborn to impact: Bias-aware citation prediction,” arXiv preprint arXiv:2510.19246, 2025
-
[45]
Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang
M. Lu, M. Wu, F. Liu, J. Xu, W. Li, H. Wang, Z. Hu, Y . Ding, Y . Sun, J. Luet al., “Choosing how to remember: Adaptive memory structures for llm agents,”arXiv preprint arXiv:2602.14038, 2026
-
[46]
Revealing multimodal causality with large language models,
J. Li, S. Wang, Q. Zhang, F. Liu, T. Liu, L. Cao, S. Yu, and F. Chen, “Revealing multimodal causality with large language models,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. [Online]. Available: https://openreview.net/forum?id= nufqobhME7
work page 2025
-
[47]
G. Zhou, Y . Yan, X. Zou, K. Wang, A. Liu, and X. Hu, “Mitigating modality prior-induced hallucinations in multimodal large language models via deciphering attention causality,” inThe Thirteenth International Conference on Learning Representations, 2025. [Online]. Available: https://openreview.net/forum?id=A V7OXVlAyi
work page 2025
-
[48]
Causal-cog: A causal- effect look at context generation for boosting multi-modal language models,
S. Zhao, Z. Li, Y . Lu, A. Yuille, and Y . Wang, “Causal-cog: A causal- effect look at context generation for boosting multi-modal language models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024, pp. 13 342–13 351
work page 2024
-
[49]
Ensemble learning for data stream analysis: A survey,
B. Krawczyk, L. L. Minku, J. Gama, J. Stefanowski, and M. Wo ´zniak, “Ensemble learning for data stream analysis: A survey,”Information Fusion, vol. 37, pp. 132–156, 2017
work page 2017
-
[50]
Learning under Concept Drift: A Review,
J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, and G. Zhang, “Learning under Concept Drift: A Review,” vol. 31, no. 12, pp. 2346– 2363, 2019. [Online]. Available: https://ieeexplore.ieee.org/abstract/ document/8496795
-
[51]
Recent Advances in Concept Drift Adaptation Methods for Deep Learning
L. Yuan, H. Li, B. Xia, C. Gao, M. Liu, W. Yuan, and X. You, “Recent Advances in Concept Drift Adaptation Methods for Deep Learning.” inIJCAI, 2022, pp. 5654–5661
work page 2022
-
[52]
Concept Neural Network Based on Time-Delay Regret for Dynamic Stream Learning,
Y .-L. Mi, “Concept Neural Network Based on Time-Delay Regret for Dynamic Stream Learning,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 5, pp. 3796–3814, May 2025
work page 2025
-
[53]
Drift-aware collabora- tive assistance mixture of experts for heterogeneous multistream learn- ing,
E. Yu, J. Lu, K. Wang, X. Yang, and G. Zhang, “Drift-aware collabora- tive assistance mixture of experts for heterogeneous multistream learn- ing,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 19, 2026, pp. 16 199–16 207. 14
work page 2026
-
[54]
Generalized incremental learning under concept drift across evolving data streams,
E. Yu, J. Lu, and G. Zhang, “Generalized incremental learning under concept drift across evolving data streams,”arXiv preprint arXiv:2506.05736, 2025
-
[55]
Automated Concept Drift Handling for Fault Prediction in Edge Clouds Using Reinforcement Learning,
B. Shayesteh, C. Fu, A. Ebrahimzadeh, and R. H. Glitho, “Automated Concept Drift Handling for Fault Prediction in Edge Clouds Using Reinforcement Learning,”IEEE Transactions on Network and Service Management, vol. 19, no. 2, pp. 1321–1335, Jun. 2022
work page 2022
-
[56]
DRMD: Deep Reinforcement Learning for Malware Detection under Concept Drift,
S. McFadden, M. Foley, M. D’Onghia, C. Hicks, V . Mavroudis, N. Paoletti, and F. Pierazzi, “DRMD: Deep Reinforcement Learning for Malware Detection under Concept Drift,” inThe Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), Nov. 2025
work page 2025
-
[57]
Adapting multi-modal large language model to concept drift from pre-training onwards,
X. Yang, J. Lu, and E. Yu, “Adapting multi-modal large language model to concept drift from pre-training onwards,” inThe Thirteenth International Conference on Learning Representations, Y . Yue, A. Garg, N. Peng, F. Sha, and R. Yu, Eds., vol. 2025, 2025, pp. 90 869–90 891. [Online]. Available: https://proceedings.iclr.cc/paper files/paper/2025/ file/e25d8...
work page 2025
-
[58]
T-distributed Spherical Feature Representation for Imbalanced Classification,
X. Yang, Y . Chen, X. Yue, S. Xu, and C. Ma, “T-distributed Spherical Feature Representation for Imbalanced Classification,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 9, pp. 10 825–10 833, 2023. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/26284
work page 2023
-
[59]
Available: https://arxiv.org/abs/2502.07620
X. Yang, J. Lu, E. Yu, and W. Duan, “Resilient contrastive pre-training under non-stationary drift,”arXiv preprint arXiv:2502.07620, 2025. [Online]. Available: https://arxiv.org/abs/2502.07620
-
[60]
X. Yang, L. Xu, H. Li, and S. Zhang, “One leaf reveals the season: Occlusion-based contrastive learning with semantic- aware views for efficient visual representation,” inForty-second International Conference on Machine Learning, 2025. [Online]. Available: https://openreview.net/forum?id=toZOqONu9x
work page 2025
-
[61]
Causal diagrams for empirical research,
J. Pearl, “Causal diagrams for empirical research,”Biometrika, vol. 82, no. 4, pp. 669–688, 1995
work page 1995
-
[62]
——,Causal inference in statistics: a primer. John Wiley & Sons, 2016
work page 2016
-
[63]
——, “Direct and indirect effects,” inProbabilistic and causal infer- ence: the works of Judea Pearl, 2022, pp. 373–392
work page 2022
-
[64]
X. Yang, L. Xu, S. Yu, Q. Xia, H. Li, and S. Zhang, “Segmentation and vascular vectorization for coronary artery by geometry-based cascaded neural network,”IEEE Transactions on Medical Imaging, vol. 44, no. 1, pp. 259–269, 2024
work page 2024
-
[65]
Local linear embedding based interpolation neural network in pancreatic tumor segmentation,
X. Yang, Y . Chen, X. Yue, C. Ma, and P. Yang, “Local linear embedding based interpolation neural network in pancreatic tumor segmentation,” Applied Intelligence, vol. 52, no. 8, pp. 8746–8756, 2022
work page 2022
-
[66]
arXiv preprint arXiv:2603.01143 (2026)
Z. Chen, S. Young, and L. Xu, “Tc-ssa: Token compression via semantic slot aggregation for gigapixel pathology reasoning,”arXiv preprint arXiv:2603.01143, 2026
-
[67]
Knowledge matters: Chest radiology report generation with general and specific knowledge,
S. Yang, X. Wu, S. Ge, S. K. Zhou, and L. Xiao, “Knowledge matters: Chest radiology report generation with general and specific knowledge,”Medical image analysis, vol. 80, p. 102510, 2022
work page 2022
-
[68]
Clinical-bert: Vision-language pre-training for radiograph diagnosis and reports generation,
B. Yan and M. Pei, “Clinical-bert: Vision-language pre-training for radiograph diagnosis and reports generation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 3, 2022, pp. 2982–2990
work page 2022
-
[69]
Metransformer: Radiology report generation by transformer with multiple learnable expert tokens,
Z. Wang, L. Liu, L. Wang, and L. Zhou, “Metransformer: Radiology report generation by transformer with multiple learnable expert tokens,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 11 558–11 567
work page 2023
-
[70]
Spatio-temporal and retrieval-augmented modelling for chest x-ray report generation,
Y . Yang, X. You, K. Zhang, Z. Fu, X. Wang, J. Ding, J. Sun, Z. Yu, Q. Huang, W. Hanet al., “Spatio-temporal and retrieval-augmented modelling for chest x-ray report generation,”IEEE Transactions on Medical Imaging, 2025
work page 2025
-
[71]
Diagnostic Captioning by Cooperative Task Interactions and Sample-Graph Consistency,
Z. Wang, L. Wang, X. Li, and L. Zhou, “Diagnostic Captioning by Cooperative Task Interactions and Sample-Graph Consistency,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 8, pp. 6585–6598, Aug. 2025
work page 2025
-
[72]
R2gengpt: Radiology report generation with frozen llms,
Z. Wang, L. Liu, L. Wang, and L. Zhou, “R2gengpt: Radiology report generation with frozen llms,”Meta-Radiology, vol. 1, no. 3, p. 100033, 2023
work page 2023
-
[73]
Promptmrg: Diagnosis-driven prompts for medical report generation,
H. Jin, H. Che, Y . Lin, and H. Chen, “Promptmrg: Diagnosis-driven prompts for medical report generation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 3, 2024, pp. 2607– 2615
work page 2024
-
[74]
Bootstrapping large language models for radiology report generation,
C. Liu, Y . Tian, W. Chen, Y . Song, and Y . Zhang, “Bootstrapping large language models for radiology report generation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 17, 2024, pp. 18 635–18 643
work page 2024
-
[75]
X. Wang, F. Wang, Y . Li, Q. Ma, S. Wang, B. Jiang, and J. Tang, “Cxpmrg-bench: Pre-training and benchmarking for x-ray medical report generation on chexpert plus dataset,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2025, pp. 5123–5133
work page 2025
-
[76]
P. Jing, K. Lee, Z. Zhang, H. Zhou, Z. Yuan, Z. Gao, L. Zhu, G. Pa- panastasiou, Y . Fang, and G. Yang, “Reason like a radiologist: Chain- of-thought and reinforcement learning for verifiable report generation,” Medical Image Analysis, vol. 109, p. 103910, Mar. 2026
work page 2026
-
[77]
Radiology report generation via multi-objective preference optimization,
T. Xiao, L. Shi, P. Liu, Z. Wang, and C. Bai, “Radiology report generation via multi-objective preference optimization,” inProceedings of the AAAI conference on artificial intelligence, vol. 39, no. 8, 2025, pp. 8664–8672
work page 2025
-
[78]
Fir-rad: Fine- grained reinforcement with structured reasoning for chest x-ray report generation,
X. Mei, L. Yang, D. Gao, X. Cai, J. Han, and T. Liu, “Fir-rad: Fine- grained reinforcement with structured reasoning for chest x-ray report generation,”IEEE Transactions on Medical Imaging, 2026
work page 2026
-
[79]
Textual explanations for self-driving vehicles,
J. Kim, A. Rohrbach, T. Darrell, J. Canny, and Z. Akata, “Textual explanations for self-driving vehicles,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 563–578
work page 2018
-
[80]
Drivegpt4: Interpretable end-to-end autonomous driving via large language model,
Z. Xu, Y . Zhang, E. Xie, Z. Zhao, Y . Guo, K.-Y . K. Wong, Z. Li, and H. Zhao, “Drivegpt4: Interpretable end-to-end autonomous driving via large language model,”IEEE Robotics and Automation Letters, vol. 9, no. 10, pp. 8186–8193, 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.