Recognition: unknown
Towards Robust Endogenous Reasoning: Unifying Drift Adaptation in Non-Stationary Tuning
Pith reviewed 2026-05-10 08:25 UTC · model grok-4.3
The pith
CPO++ adapts reinforcement fine-tuning of MLLMs to endogenous multi-modal concept drift through counterfactual reasoning and preference optimization, yielding better coherence and cross-domain robustness in safety-critical settings.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MLLMs are highly susceptible to endogenous reasoning drift... CPO++ achieves superior performance in reasoning coherence, decision-making precision, and inherent robustness against extreme interference with exceptional zero-shot cross-domain generalization.
Load-bearing premise
That controlled counterfactual perturbations combined with preference optimization can reliably disentangle spurious correlations caused by endogenous drift without introducing new instabilities or domain-specific biases.
Figures
read the original abstract
Reinforcement Fine-Tuning (RFT) has established itself as a critical paradigm for the alignment of Multi-modal Large Language Models (MLLMs) with complex human values and domain-specific requirements. Nevertheless, current research primarily focuses on mitigating exogenous distribution shifts arising from data-centric factors, the non-stationarity inherent in the endogenous reasoning remains largely unexplored. In this work, a critical vulnerability is revealed within MLLMs: they are highly susceptible to endogenous reasoning drift, across both thinking and perception perspectives. It manifests as unpredictable distribution changes that emerge spontaneously during the autoregressive generation process, independent of external environmental perturbations. To adapt it, we first theoretically define endogenous reasoning drift within the RFT of MLLMs as the multi-modal concept drift. In this context, this paper proposes Counterfactual Preference Optimization ++ (CPO++), a comprehensive and autonomous framework adapted to the multi-modal concept drift. It integrates counterfactual reasoning with domain knowledge to execute controlled perturbations across thinking and perception, employing preference optimization to disentangle spurious correlations. Extensive empirical evaluations across two highly dynamic and safety-critical domains: medical diagnosis and autonomous driving. They demonstrate that the proposed framework achieves superior performance in reasoning coherence, decision-making precision, and inherent robustness against extreme interference. The methodology also exhibits exceptional zero-shot cross-domain generalization, providing a principled foundation for reliable multi-modal reasoning in safety-critical applications.
Editorial analysis
A structured set of objections, weighed in public.
Circularity Check
No significant circularity; derivation chain is conceptual with no equations or reductions shown
full rationale
The abstract and available text present a high-level framework proposal (CPO++) and a conceptual definition of endogenous reasoning drift as multi-modal concept drift, but contain no mathematical derivations, equations, parameter fits, or self-citations. No load-bearing steps reduce to inputs by construction, and the description does not invoke uniqueness theorems or rename known results. This is the expected honest non-finding when the paper's chain is not mathematically specified.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Endogenous reasoning drift manifests as unpredictable distribution changes during autoregressive generation independent of external perturbations
invented entities (1)
-
endogenous reasoning drift
no independent evidence
Forward citations
Cited by 1 Pith paper
-
Autonomous Drift Learning in Data Streams: A Unified Perspective
A survey proposes a novel 3D taxonomy classifying drifts into time stream, data stream, and model stream categories to unify research on non-stationary autonomous learning.
Reference graph
Works this paper leans on
-
[1]
ReFT: Reasoning with Reinforced Fine-Tuning,
L. Trung, X. Zhang, Z. Jie, P. Sun, X. Jin, and H. Li, “ReFT: Reasoning with Reinforced Fine-Tuning,” inProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), L.-W. Ku, A. Martins, and V . Srikumar, Eds. Bangkok, Thailand: Association for Computational Linguistics, 2024, pp. 7601– 7614
2024
-
[2]
Visual-RFT: Visual Reinforcement Fine-Tuning
Z. Liu, Z. Sun, Y . Zang, X. Dong, Y . Cao, H. Duan, D. Lin, and J. Wang, “Visual-rft: Visual reinforcement fine-tuning,”arXiv preprint arXiv:2503.01785, 2025
work page internal anchor Pith review arXiv 2025
-
[3]
Reason-rft: Reinforcement fine-tuning for visual reasoning of vision language models,
H. Tan, Y . Ji, X. Hao, X. Chen, P. Wang, Z. Wang, and S. Zhang, “Reason-rft: Reinforcement fine-tuning for visual reasoning of vision language models,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
2025
-
[4]
Sft memorizes, rl generalizes: A comparative study of foundation model post-training,
T. Chu, Y . Zhai, J. Yang, S. Tong, S. Xie, D. Schuurmans, Q. V . Le, S. Levine, and Y . Ma, “Sft memorizes, rl generalizes: A comparative study of foundation model post-training,” inInternational Conference on Machine Learning. PMLR, 2025, pp. 10 818–10 838
2025
-
[5]
RL fine-tuning heals the OOD forgetting in SFT,
H. Jin, S. Luan, S. Lyu, G. Rabusseau, D. Precup, and M. Hamdaqa, “RL fine-tuning heals the OOD forgetting in SFT,” inFirst Workshop on Foundations of Reasoning in Language Models, 2025. [Online]. Available: https://openreview.net/forum?id=SN1PCQ0ApV
2025
-
[6]
Reinforcement learning for out-of-distribution reasoning in LLMs: An empirical study on diagnosis-related group coding,
H. Wang, Z. Wu, G. J. Kolar, H. R. Korsapati, B. Bartlett, B. Hull, and J. Sun, “Reinforcement learning for out-of-distribution reasoning in LLMs: An empirical study on diagnosis-related group coding,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. [Online]. Available: https: //openreview.net/forum?id=0jvnfH0WYV
2025
-
[7]
Information-theoretic reward modeling for stable rlhf: Detecting and mitigating reward hacking,
Y . Miao, L. Ding, S. Zhang, R. Bao, L. Zhang, and D. Tao, “Information-theoretic reward modeling for stable rlhf: Detecting and mitigating reward hacking,”arXiv preprint arXiv:2510.13694, 2025
-
[8]
Reward shaping to mitigate reward hacking in RLHF,
J. Fu, X. Zhao, C. Yao, H. Wang, Q. Han, and Y . Xiao, “Reward shaping to mitigate reward hacking in RLHF,” inICML 2025 Workshop on Reliable and Responsible Foundation Models, 2025. [Online]. Available: https://openreview.net/forum?id=62A4d5Mokc
2025
-
[9]
RRM: Robust reward model training mitigates reward hacking,
T. Liu, W. Xiong, J. Ren, L. Chen, J. Wu, R. Joshi, Y . Gao, J. Shen, Z. Qin, T. Yu, D. Sohn, A. Makarova, J. Z. Liu, Y . Liu, B. Piot, A. Ittycheriah, A. Kumar, and M. Saleh, “RRM: Robust reward model training mitigates reward hacking,” inThe Thirteenth International Conference on Learning Representations, 2025. [Online]. Available: https://openreview.ne...
2025
-
[10]
DAPO: An open-source LLM reinforcement learning system at scale,
Q. Yu, Z. Zhang, R. Zhu, Y . Yuan, X. Zuo, YuYue, W. Dai, T. Fan, G. Liu, J. Liu, L. Liu, X. Liu, H. Lin, Z. Lin, B. Ma, G. Sheng, Y . Tong, C. Zhang, M. Zhang, R. Zhang, W. Zhang, H. Zhu, J. Zhu, J. Chen, J. Chen, C. Wang, H. Yu, Y . Song, X. Wei, H. Zhou, J. Liu, W.-Y . Ma, Y .-Q. Zhang, L. Yan, Y . Wu, and M. Wang, “DAPO: An open-source LLM reinforceme...
2025
-
[11]
Beyond reverse KL: Generalizing direct preference optimization with diverse divergence constraints,
C. Wang, Y . Jiang, C. Yang, H. Liu, and Y . Chen, “Beyond reverse KL: Generalizing direct preference optimization with diverse divergence constraints,” inThe Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https: //openreview.net/forum?id=2cRzmWXK9N
2024
-
[12]
Is DPO superior to PPO for LLM alignment? a comprehensive study,
S. Xu, W. Fu, J. Gao, W. Ye, W. Liu, Z. Mei, G. Wang, C. Yu, and Y . Wu, “Is DPO superior to PPO for LLM alignment? a comprehensive study,” inForty-first International Conference on Machine Learning, 2024. [Online]. Available: https: //openreview.net/forum?id=6XH8R7YrSk 13
2024
-
[13]
Learning dynamics of LLM finetuning,
Y . Ren and D. J. Sutherland, “Learning dynamics of LLM finetuning,” inThe Thirteenth International Conference on Learning Representations, 2025. [Online]. Available: https://openreview.net/ forum?id=tPNHOoZFl9
2025
-
[14]
arXiv preprint arXiv:2502.21321
K. Kumar, T. Ashraf, O. Thawakar, R. M. Anwer, H. Cholakkal, M. Shah, M.-H. Yang, P. H. Torr, F. S. Khan, and S. Khan, “Llm post- training: A deep dive into reasoning large language models,”arXiv preprint arXiv:2502.21321, 2025
-
[15]
Qwen2.5-vl,
Q. Team, “Qwen2.5-vl,” January 2025. [Online]. Available: https: //qwenlm.github.io/blog/qwen2.5-vl/
2025
-
[16]
Direct Preference Optimization: Your Language Model is Secretly a Reward Model,
R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn, “Direct Preference Optimization: Your Language Model is Secretly a Reward Model,” vol. 36, pp. 53 728–53 741, 2023. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/2023/hash/ a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html
2023
-
[17]
Mimic-cxr, a de- identified publicly available database of chest radiographs with free- text reports,
A. E. Johnson, T. J. Pollard, S. J. Berkowitz, N. R. Greenbaum, M. P. Lungren, C.-y. Deng, R. G. Mark, and S. Horng, “Mimic-cxr, a de- identified publicly available database of chest radiographs with free- text reports,”Scientific data, vol. 6, no. 1, p. 317, 2019
2019
-
[18]
Efficient streaming language models with attention sinks,
G. Xiao, Y . Tian, B. Chen, S. Han, and M. Lewis, “Efficient streaming language models with attention sinks,” inThe Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/forum?id=NG7sS51zVF
2024
-
[19]
Walking the tightrope: Autonomous disentangling beneficial and detrimental drifts in non-stationary custom-tuning,
X. Yang, J. Lu, and E. Yu, “Walking the tightrope: Autonomous disentangling beneficial and detrimental drifts in non-stationary custom-tuning,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. [Online]. Available: https: //openreview.net/forum?id=1BAiQmAFsx
2025
-
[20]
Models, reasoning and inference,
J. Pearlet al., “Models, reasoning and inference,”Cambridge, UK: CambridgeUniversityPress, vol. 19, no. 2, p. 3, 2000
2000
-
[21]
Interpretation and identification of causal mediation
J. Pearl, “Interpretation and identification of causal mediation.”Psy- chological methods, vol. 19, no. 4, p. 459, 2014
2014
-
[22]
Reinforced Self-Training (ReST) for Language Modeling
C. Gulcehre, T. L. Paine, S. Srinivasan, K. Konyushkova, L. Weerts, A. Sharma, A. Siddhant, A. Ahern, M. Wang, C. Gu, W. Macherey, A. Doucet, O. Firat, and N. de Freitas, “Reinforced self- training (rest) for language modeling,” 2023. [Online]. Available: https://arxiv.org/abs/2308.08998
work page Pith review arXiv 2023
-
[23]
Scaling relationship on learning mathematical reasoning with large language models,
Z. Yuan, H. Yuan, C. Li, G. Dong, K. Lu, C. Tan, C. Zhou, and J. Zhou, “Scaling relationship on learning mathematical reasoning with large language models,” 2024. [Online]. Available: https://openreview.net/forum?id=cijO0f8u35
2024
-
[24]
B-STar: Monitoring and balancing exploration and exploitation in self-taught reasoners,
W. Zeng, Y . Huang, L. Zhao, Y . Wang, Z. Shan, and J. He, “B-STar: Monitoring and balancing exploration and exploitation in self-taught reasoners,” inThe Thirteenth International Conference on Learning Representations, 2025. [Online]. Available: https: //openreview.net/forum?id=P6dwZJpJ4m
2025
-
[25]
arXiv preprint arXiv:2501.07301 , year=
Z. Zhang, C. Zheng, Y . Wu, B. Zhang, R. Lin, B. Yu, D. Liu, J. Zhou, and J. Lin, “The lessons of developing process reward models in mathematical reasoning,”arXiv preprint arXiv:2501.07301, 2025
-
[26]
Scalar: Spatial-concept alignment for robust vision in harsh open world,
X. Yang, L. Xu, X. Zeng, X. Wang, H. Li, and S. Zhang, “Scalar: Spatial-concept alignment for robust vision in harsh open world,” Pattern Recognition, p. 113203, 2026
2026
-
[27]
S. Young, X. Zeng, and L. Xu, “Fewer tokens, greater scaling: Self-adaptive visual bases for efficient and expansive representation learning,”arXiv preprint arXiv:2511.19515, 2025
-
[28]
X. Yang, J. Lu, and E. Yu, “Learning from all: Concept alignment for autonomous distillation from multiple drifting mllms,” arXiv preprint arXiv:2510.04142, 2025. [Online]. Available: https: //arxiv.org/abs/2510.04142
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[29]
Deep reinforcement learning from human preferences,
P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei, “Deep reinforcement learning from human preferences,” Advances in neural information processing systems, vol. 30, 2017
2017
-
[30]
Training language models to follow instructions with human feedback,
L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Rayet al., “Training language models to follow instructions with human feedback,”Advances in neural information processing systems, vol. 35, pp. 27 730–27 744, 2022
2022
-
[31]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[32]
Constitutional AI: Harmlessness from AI Feedback
Y . Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnonet al., “Constitutional ai: Harmlessness from ai feedback,”arXiv preprint arXiv:2212.08073, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[33]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, Q. Zhu, S. Ma, P. Wang, X. Biet al., “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,”arXiv preprint arXiv:2501.12948, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[34]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y . Liet al., “Deepseekmath: Pushing the limits of mathematical reasoning in open language models,”arXiv preprint arXiv:2402.03300, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[35]
From System 1 to System 2: A Survey of Reasoning Large Language Models
Z.-Z. Li, D. Zhang, M.-L. Zhang, J. Zhang, Z. Liu, Y . Yao, H. Xu, J. Zheng, P.-J. Wang, X. Chenet al., “From system 1 to system 2: A survey of reasoning large language models,”arXiv preprint arXiv:2502.17419, 2025
work page internal anchor Pith review arXiv 2025
-
[36]
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models
Q. Chen, L. Qin, J. Liu, D. Peng, J. Guan, P. Wang, M. Hu, Y . Zhou, T. Gao, and W. Che, “Towards reasoning era: A survey of long chain-of-thought for reasoning large language models,”arXiv preprint arXiv:2503.09567, 2025
work page internal anchor Pith review arXiv 2025
-
[37]
Unleashing the potential of diffusion models towards diversified sequential recommendations,
Z. Cai, S. Wang, V . W. Chu, U. Naseem, Y . Wang, and F. Chen, “Unleashing the potential of diffusion models towards diversified sequential recommendations,” inProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025, pp. 1476–1486
2025
-
[38]
M. Lu, Y . Zhang, M. Wu, and Y . Feng, “From query to counsel: Structured reasoning with a multi-agent framework and dataset for legal consultation,” 2026. [Online]. Available: [https: //arxiv.org/abs/2604.10470](https://arxiv.org/abs/2604.10470)
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[39]
Multimodal Chain-of-Thought Reasoning in Language Models
Z. Zhang, A. Zhang, M. Li, H. Zhao, G. Karypis, and A. Smola, “Multimodal chain-of-thought reasoning in language models,”arXiv preprint arXiv:2302.00923, 2023
work page internal anchor Pith review arXiv 2023
-
[40]
m3 cot: A novel benchmark for multi-domain multi-step multi-modal chain-of-thought
Q. Chen, L. Qin, J. Zhang, Z. Chen, X. Xu, and W. Che, “m 3 cot: A novel benchmark for multi-domain multi-step multi-modal chain-of- thought,”arXiv preprint arXiv:2405.16473, 2024
-
[41]
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Y . Wang, S. Wu, Y . Zhang, S. Yan, Z. Liu, J. Luo, and H. Fei, “Multimodal chain-of-thought reasoning: A comprehensive survey,” arXiv preprint arXiv:2503.12605, 2025
work page internal anchor Pith review arXiv 2025
-
[42]
Ddcot: Duty-distinct chain-of-thought prompting for multimodal reasoning in language models,
G. Zheng, B. Yang, J. Tang, H.-Y . Zhou, and S. Yang, “Ddcot: Duty-distinct chain-of-thought prompting for multimodal reasoning in language models,”Advances in Neural Information Processing Systems, vol. 36, pp. 5168–5191, 2023
2023
-
[43]
Steering diffusion models towards credible content recommendation,
Z. Cai, S. Wang, J. Li, P. Zhou, V . W. Chu, F. Chen, T. Zhu, and C. C. Aggarwal, “Steering diffusion models towards credible content recommendation,” inThe Fourteenth International Conference on Learning Representations, 2026
2026
-
[44]
From newborn to impact: Bias-aware citation prediction,
M. Lu, M. Wu, J. Xu, W. Li, F. Liu, Y . Ding, Y . Sun, J. Lu, and Y . Zhang, “From newborn to impact: Bias-aware citation prediction,” arXiv preprint arXiv:2510.19246, 2025
-
[45]
Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang
M. Lu, M. Wu, F. Liu, J. Xu, W. Li, H. Wang, Z. Hu, Y . Ding, Y . Sun, J. Luet al., “Choosing how to remember: Adaptive memory structures for llm agents,”arXiv preprint arXiv:2602.14038, 2026
-
[46]
Revealing multimodal causality with large language models,
J. Li, S. Wang, Q. Zhang, F. Liu, T. Liu, L. Cao, S. Yu, and F. Chen, “Revealing multimodal causality with large language models,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. [Online]. Available: https://openreview.net/forum?id= nufqobhME7
2025
-
[47]
Mitigating modality prior-induced hallucinations in multimodal large language models via deciphering attention causality,
G. Zhou, Y . Yan, X. Zou, K. Wang, A. Liu, and X. Hu, “Mitigating modality prior-induced hallucinations in multimodal large language models via deciphering attention causality,” inThe Thirteenth International Conference on Learning Representations, 2025. [Online]. Available: https://openreview.net/forum?id=A V7OXVlAyi
2025
-
[48]
Causal-cog: A causal- effect look at context generation for boosting multi-modal language models,
S. Zhao, Z. Li, Y . Lu, A. Yuille, and Y . Wang, “Causal-cog: A causal- effect look at context generation for boosting multi-modal language models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024, pp. 13 342–13 351
2024
-
[49]
Ensemble learning for data stream analysis: A survey,
B. Krawczyk, L. L. Minku, J. Gama, J. Stefanowski, and M. Wo ´zniak, “Ensemble learning for data stream analysis: A survey,”Information Fusion, vol. 37, pp. 132–156, 2017
2017
-
[50]
Learning under Concept Drift: A Review,
J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, and G. Zhang, “Learning under Concept Drift: A Review,” vol. 31, no. 12, pp. 2346– 2363, 2019. [Online]. Available: https://ieeexplore.ieee.org/abstract/ document/8496795
-
[51]
Recent Advances in Concept Drift Adaptation Methods for Deep Learning
L. Yuan, H. Li, B. Xia, C. Gao, M. Liu, W. Yuan, and X. You, “Recent Advances in Concept Drift Adaptation Methods for Deep Learning.” inIJCAI, 2022, pp. 5654–5661
2022
-
[52]
Concept Neural Network Based on Time-Delay Regret for Dynamic Stream Learning,
Y .-L. Mi, “Concept Neural Network Based on Time-Delay Regret for Dynamic Stream Learning,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 5, pp. 3796–3814, May 2025
2025
-
[53]
Drift-aware collabora- tive assistance mixture of experts for heterogeneous multistream learn- ing,
E. Yu, J. Lu, K. Wang, X. Yang, and G. Zhang, “Drift-aware collabora- tive assistance mixture of experts for heterogeneous multistream learn- ing,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 19, 2026, pp. 16 199–16 207. 14
2026
-
[54]
Generalized incremental learning under concept drift across evolving data streams,
E. Yu, J. Lu, and G. Zhang, “Generalized incremental learning under concept drift across evolving data streams,”arXiv preprint arXiv:2506.05736, 2025
-
[55]
Automated Concept Drift Handling for Fault Prediction in Edge Clouds Using Reinforcement Learning,
B. Shayesteh, C. Fu, A. Ebrahimzadeh, and R. H. Glitho, “Automated Concept Drift Handling for Fault Prediction in Edge Clouds Using Reinforcement Learning,”IEEE Transactions on Network and Service Management, vol. 19, no. 2, pp. 1321–1335, Jun. 2022
2022
-
[56]
DRMD: Deep Reinforcement Learning for Malware Detection under Concept Drift,
S. McFadden, M. Foley, M. D’Onghia, C. Hicks, V . Mavroudis, N. Paoletti, and F. Pierazzi, “DRMD: Deep Reinforcement Learning for Malware Detection under Concept Drift,” inThe Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), Nov. 2025
2025
-
[57]
Adapting multi-modal large language model to concept drift from pre-training onwards,
X. Yang, J. Lu, and E. Yu, “Adapting multi-modal large language model to concept drift from pre-training onwards,” inThe Thirteenth International Conference on Learning Representations, Y . Yue, A. Garg, N. Peng, F. Sha, and R. Yu, Eds., vol. 2025, 2025, pp. 90 869–90 891. [Online]. Available: https://proceedings.iclr.cc/paper files/paper/2025/ file/e25d8...
2025
-
[58]
T-distributed Spherical Feature Representation for Imbalanced Classification,
X. Yang, Y . Chen, X. Yue, S. Xu, and C. Ma, “T-distributed Spherical Feature Representation for Imbalanced Classification,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 9, pp. 10 825–10 833, 2023. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/26284
2023
-
[59]
arXiv preprint arXiv:2502.07620 (2025)
X. Yang, J. Lu, E. Yu, and W. Duan, “Resilient contrastive pre-training under non-stationary drift,”arXiv preprint arXiv:2502.07620, 2025. [Online]. Available: https://arxiv.org/abs/2502.07620
-
[60]
One leaf reveals the season: Occlusion-based contrastive learning with semantic- aware views for efficient visual representation,
X. Yang, L. Xu, H. Li, and S. Zhang, “One leaf reveals the season: Occlusion-based contrastive learning with semantic- aware views for efficient visual representation,” inForty-second International Conference on Machine Learning, 2025. [Online]. Available: https://openreview.net/forum?id=toZOqONu9x
2025
-
[61]
Causal diagrams for empirical research,
J. Pearl, “Causal diagrams for empirical research,”Biometrika, vol. 82, no. 4, pp. 669–688, 1995
1995
-
[62]
John Wiley & Sons, 2016
——,Causal inference in statistics: a primer. John Wiley & Sons, 2016
2016
-
[63]
Direct and indirect effects,
——, “Direct and indirect effects,” inProbabilistic and causal infer- ence: the works of Judea Pearl, 2022, pp. 373–392
2022
-
[64]
Segmentation and vascular vectorization for coronary artery by geometry-based cascaded neural network,
X. Yang, L. Xu, S. Yu, Q. Xia, H. Li, and S. Zhang, “Segmentation and vascular vectorization for coronary artery by geometry-based cascaded neural network,”IEEE Transactions on Medical Imaging, vol. 44, no. 1, pp. 259–269, 2024
2024
-
[65]
Local linear embedding based interpolation neural network in pancreatic tumor segmentation,
X. Yang, Y . Chen, X. Yue, C. Ma, and P. Yang, “Local linear embedding based interpolation neural network in pancreatic tumor segmentation,” Applied Intelligence, vol. 52, no. 8, pp. 8746–8756, 2022
2022
-
[66]
arXiv preprint arXiv:2603.01143 (2026)
Z. Chen, S. Young, and L. Xu, “Tc-ssa: Token compression via semantic slot aggregation for gigapixel pathology reasoning,”arXiv preprint arXiv:2603.01143, 2026
-
[67]
Knowledge matters: Chest radiology report generation with general and specific knowledge,
S. Yang, X. Wu, S. Ge, S. K. Zhou, and L. Xiao, “Knowledge matters: Chest radiology report generation with general and specific knowledge,”Medical image analysis, vol. 80, p. 102510, 2022
2022
-
[68]
Clinical-bert: Vision-language pre-training for radiograph diagnosis and reports generation,
B. Yan and M. Pei, “Clinical-bert: Vision-language pre-training for radiograph diagnosis and reports generation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 3, 2022, pp. 2982–2990
2022
-
[69]
Metransformer: Radiology report generation by transformer with multiple learnable expert tokens,
Z. Wang, L. Liu, L. Wang, and L. Zhou, “Metransformer: Radiology report generation by transformer with multiple learnable expert tokens,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 11 558–11 567
2023
-
[70]
Spatio-temporal and retrieval-augmented modelling for chest x-ray report generation,
Y . Yang, X. You, K. Zhang, Z. Fu, X. Wang, J. Ding, J. Sun, Z. Yu, Q. Huang, W. Hanet al., “Spatio-temporal and retrieval-augmented modelling for chest x-ray report generation,”IEEE Transactions on Medical Imaging, 2025
2025
-
[71]
Diagnostic Captioning by Cooperative Task Interactions and Sample-Graph Consistency,
Z. Wang, L. Wang, X. Li, and L. Zhou, “Diagnostic Captioning by Cooperative Task Interactions and Sample-Graph Consistency,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 8, pp. 6585–6598, Aug. 2025
2025
-
[72]
R2gengpt: Radiology report generation with frozen llms,
Z. Wang, L. Liu, L. Wang, and L. Zhou, “R2gengpt: Radiology report generation with frozen llms,”Meta-Radiology, vol. 1, no. 3, p. 100033, 2023
2023
-
[73]
Promptmrg: Diagnosis-driven prompts for medical report generation,
H. Jin, H. Che, Y . Lin, and H. Chen, “Promptmrg: Diagnosis-driven prompts for medical report generation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 3, 2024, pp. 2607– 2615
2024
-
[74]
Bootstrapping large language models for radiology report generation,
C. Liu, Y . Tian, W. Chen, Y . Song, and Y . Zhang, “Bootstrapping large language models for radiology report generation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 17, 2024, pp. 18 635–18 643
2024
-
[75]
Cxpmrg-bench: Pre-training and benchmarking for x-ray medical report generation on chexpert plus dataset,
X. Wang, F. Wang, Y . Li, Q. Ma, S. Wang, B. Jiang, and J. Tang, “Cxpmrg-bench: Pre-training and benchmarking for x-ray medical report generation on chexpert plus dataset,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2025, pp. 5123–5133
2025
-
[76]
Reason like a radiologist: Chain- of-thought and reinforcement learning for verifiable report generation,
P. Jing, K. Lee, Z. Zhang, H. Zhou, Z. Yuan, Z. Gao, L. Zhu, G. Pa- panastasiou, Y . Fang, and G. Yang, “Reason like a radiologist: Chain- of-thought and reinforcement learning for verifiable report generation,” Medical Image Analysis, vol. 109, p. 103910, Mar. 2026
2026
-
[77]
Radiology report generation via multi-objective preference optimization,
T. Xiao, L. Shi, P. Liu, Z. Wang, and C. Bai, “Radiology report generation via multi-objective preference optimization,” inProceedings of the AAAI conference on artificial intelligence, vol. 39, no. 8, 2025, pp. 8664–8672
2025
-
[78]
Fir-rad: Fine- grained reinforcement with structured reasoning for chest x-ray report generation,
X. Mei, L. Yang, D. Gao, X. Cai, J. Han, and T. Liu, “Fir-rad: Fine- grained reinforcement with structured reasoning for chest x-ray report generation,”IEEE Transactions on Medical Imaging, 2026
2026
-
[79]
Textual explanations for self-driving vehicles,
J. Kim, A. Rohrbach, T. Darrell, J. Canny, and Z. Akata, “Textual explanations for self-driving vehicles,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 563–578
2018
-
[80]
Drivegpt4: Interpretable end-to-end autonomous driving via large language model,
Z. Xu, Y . Zhang, E. Xie, Z. Zhao, Y . Guo, K.-Y . K. Wong, Z. Li, and H. Zhao, “Drivegpt4: Interpretable end-to-end autonomous driving via large language model,”IEEE Robotics and Automation Letters, vol. 9, no. 10, pp. 8186–8193, 2024
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.