pith. machine review for the scientific record. sign in

arxiv: 2604.15705 · v1 · submitted 2026-04-17 · 💻 cs.LG

Recognition: unknown

Towards Robust Endogenous Reasoning: Unifying Drift Adaptation in Non-Stationary Tuning

Authors on Pith no claims yet

Pith reviewed 2026-05-10 08:25 UTC · model grok-4.3

classification 💻 cs.LG
keywords reasoningdriftendogenousmulti-modalacrossmllmsautonomousconcept
0
0 comments X

The pith

CPO++ adapts reinforcement fine-tuning of MLLMs to endogenous multi-modal concept drift through counterfactual reasoning and preference optimization, yielding better coherence and cross-domain robustness in safety-critical settings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Multi-modal large language models process images, text, and other inputs together. When they are fine-tuned using reinforcement methods, their step-by-step reasoning can shift unpredictably on its own, even if the outside world stays the same. The authors call this endogenous reasoning drift and treat it as a form of concept drift across modalities. Their CPO++ method creates controlled changes in both thinking and perception steps, then uses preference optimization to break unwanted links between inputs and outputs. Tests in medical and driving scenarios show improved consistency and the ability to handle new situations without retraining.

Core claim

MLLMs are highly susceptible to endogenous reasoning drift... CPO++ achieves superior performance in reasoning coherence, decision-making precision, and inherent robustness against extreme interference with exceptional zero-shot cross-domain generalization.

Load-bearing premise

That controlled counterfactual perturbations combined with preference optimization can reliably disentangle spurious correlations caused by endogenous drift without introducing new instabilities or domain-specific biases.

Figures

Figures reproduced from arXiv: 2604.15705 by En Yu, Jie Lu, Wei Duan, Xiaoyu Yang.

Figure 1
Figure 1. Figure 1: Endogenous Reasoning Drift in RFT. probability and semantic differentiation during the thinking process, the resulting predictions for distinct pathologies can become diametrically opposed. This instability highlights a critical vulnerability where the reasoning trajectory of the model undergoes a systemic divergence, unmooring the final decision from its original logical premises. We further extend the an… view at source ↗
Figure 2
Figure 2. Figure 2: The proposed Counterfactual Preference Optimization ++ (CPO++) framework. To mitigate endogenous reasoning drift, the methodology theoretically characterizes it as multi-modal concept drift, and incorporates counterfactual inference to disentangle spurious correlations from genuine causal logic within the original outputs. By leveraging hierarchical domain knowledge and perception-thinking consistency prot… view at source ↗
Figure 3
Figure 3. Figure 3: Structural Causal Graph. X: Inputs, Z: Prediction Results, T: Chain-of-Thought, and D: Latent Concept Drift within Non-Stationary CoT. Fortunately, counterfactual causes provide an explicit man￾ner to decouple these two competing goals. We construct a structural causal graph [61], [62] to formalize the causal [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Example Case of Hierarchical Domain Knowledge Graph in Medical Domain. To disentangle detrimental drift, we introduce the graph that generates plausible counterfactual CoTs through controlled attribute perturbations. Green lines represent attributes that are positively associated with the disease, while the red denotes that they are exclusive. • Entities (E): The core objects of interest (e.g., Pneumo￾nia … view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative evaluation of the attention scores over visual tokens during CoT decoding. When the term ’lung opacity’ is subtly altered to ’opacity’, the model still produces high responses in key areas, such as the visual tokens at the right side of ① and the pneumonia ②. To qualitatively evaluate the effectiveness of the proposed framework in mitigating endogenous reasoning drift, a vi￾sualization of cross… view at source ↗
Figure 6
Figure 6. Figure 6: Quantitative evaluation of diagnostic robustness against counterfactual interference on the medical MS￾CXR-T [84] dataset. It reports the Top-1 accuracy across five pulmonary pathologies, including consolidation (Con.), pleural effusion (PE), pneumonia (Pna.), pneumothorax (Pnx.), and Edema (Ede.) and their overall average (Avg.).To simulate non-stationary and complex reasoning, varying ratios of coun￾terf… view at source ↗
Figure 1
Figure 1. Figure 1: Specifically, for pathologies with highly distinct visual [PITH_FULL_IMAGE:figures/full_fig_p010_1.png] view at source ↗
Figure 7
Figure 7. Figure 7: Ablation study of the counterfactual decoupling mechanism. The evaluation systematically compares the DPO baseline against isolated interventions, specifically integrating only reasoning counterfactuals or only visual counterfactuals, alongside the complete CPO++ framework featuring dual alignment. Performance is rigorously measured across four critical dimensions: 1) reasoning capability, evaluated via th… view at source ↗
read the original abstract

Reinforcement Fine-Tuning (RFT) has established itself as a critical paradigm for the alignment of Multi-modal Large Language Models (MLLMs) with complex human values and domain-specific requirements. Nevertheless, current research primarily focuses on mitigating exogenous distribution shifts arising from data-centric factors, the non-stationarity inherent in the endogenous reasoning remains largely unexplored. In this work, a critical vulnerability is revealed within MLLMs: they are highly susceptible to endogenous reasoning drift, across both thinking and perception perspectives. It manifests as unpredictable distribution changes that emerge spontaneously during the autoregressive generation process, independent of external environmental perturbations. To adapt it, we first theoretically define endogenous reasoning drift within the RFT of MLLMs as the multi-modal concept drift. In this context, this paper proposes Counterfactual Preference Optimization ++ (CPO++), a comprehensive and autonomous framework adapted to the multi-modal concept drift. It integrates counterfactual reasoning with domain knowledge to execute controlled perturbations across thinking and perception, employing preference optimization to disentangle spurious correlations. Extensive empirical evaluations across two highly dynamic and safety-critical domains: medical diagnosis and autonomous driving. They demonstrate that the proposed framework achieves superior performance in reasoning coherence, decision-making precision, and inherent robustness against extreme interference. The methodology also exhibits exceptional zero-shot cross-domain generalization, providing a principled foundation for reliable multi-modal reasoning in safety-critical applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Circularity Check

0 steps flagged

No significant circularity; derivation chain is conceptual with no equations or reductions shown

full rationale

The abstract and available text present a high-level framework proposal (CPO++) and a conceptual definition of endogenous reasoning drift as multi-modal concept drift, but contain no mathematical derivations, equations, parameter fits, or self-citations. No load-bearing steps reduce to inputs by construction, and the description does not invoke uniqueness theorems or rename known results. This is the expected honest non-finding when the paper's chain is not mathematically specified.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract introduces endogenous reasoning drift as a new theoretical construct and relies on the assumption that preference optimization can separate spurious correlations; no explicit free parameters or external benchmarks are stated.

axioms (1)
  • domain assumption Endogenous reasoning drift manifests as unpredictable distribution changes during autoregressive generation independent of external perturbations
    Directly stated in the abstract as the core vulnerability to be addressed.
invented entities (1)
  • endogenous reasoning drift no independent evidence
    purpose: To capture spontaneous multi-modal distribution shifts inside MLLM generation
    Newly defined in the paper as distinct from exogenous shifts

pith-pipeline@v0.9.0 · 5546 in / 1189 out tokens · 41247 ms · 2026-05-10T08:25:28.813688+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Autonomous Drift Learning in Data Streams: A Unified Perspective

    cs.LG 2026-05 unverdicted novelty 7.0

    A survey proposes a novel 3D taxonomy classifying drifts into time stream, data stream, and model stream categories to unify research on non-stationary autonomous learning.

Reference graph

Works this paper leans on

114 extracted references · 27 canonical work pages · cited by 1 Pith paper · 14 internal anchors

  1. [1]

    ReFT: Reasoning with Reinforced Fine-Tuning,

    L. Trung, X. Zhang, Z. Jie, P. Sun, X. Jin, and H. Li, “ReFT: Reasoning with Reinforced Fine-Tuning,” inProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), L.-W. Ku, A. Martins, and V . Srikumar, Eds. Bangkok, Thailand: Association for Computational Linguistics, 2024, pp. 7601– 7614

  2. [2]

    Visual-RFT: Visual Reinforcement Fine-Tuning

    Z. Liu, Z. Sun, Y . Zang, X. Dong, Y . Cao, H. Duan, D. Lin, and J. Wang, “Visual-rft: Visual reinforcement fine-tuning,”arXiv preprint arXiv:2503.01785, 2025

  3. [3]

    Reason-rft: Reinforcement fine-tuning for visual reasoning of vision language models,

    H. Tan, Y . Ji, X. Hao, X. Chen, P. Wang, Z. Wang, and S. Zhang, “Reason-rft: Reinforcement fine-tuning for visual reasoning of vision language models,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  4. [4]

    Sft memorizes, rl generalizes: A comparative study of foundation model post-training,

    T. Chu, Y . Zhai, J. Yang, S. Tong, S. Xie, D. Schuurmans, Q. V . Le, S. Levine, and Y . Ma, “Sft memorizes, rl generalizes: A comparative study of foundation model post-training,” inInternational Conference on Machine Learning. PMLR, 2025, pp. 10 818–10 838

  5. [5]

    RL fine-tuning heals the OOD forgetting in SFT,

    H. Jin, S. Luan, S. Lyu, G. Rabusseau, D. Precup, and M. Hamdaqa, “RL fine-tuning heals the OOD forgetting in SFT,” inFirst Workshop on Foundations of Reasoning in Language Models, 2025. [Online]. Available: https://openreview.net/forum?id=SN1PCQ0ApV

  6. [6]

    Reinforcement learning for out-of-distribution reasoning in LLMs: An empirical study on diagnosis-related group coding,

    H. Wang, Z. Wu, G. J. Kolar, H. R. Korsapati, B. Bartlett, B. Hull, and J. Sun, “Reinforcement learning for out-of-distribution reasoning in LLMs: An empirical study on diagnosis-related group coding,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. [Online]. Available: https: //openreview.net/forum?id=0jvnfH0WYV

  7. [7]

    Information-theoretic reward modeling for stable rlhf: Detecting and mitigating reward hacking,

    Y . Miao, L. Ding, S. Zhang, R. Bao, L. Zhang, and D. Tao, “Information-theoretic reward modeling for stable rlhf: Detecting and mitigating reward hacking,”arXiv preprint arXiv:2510.13694, 2025

  8. [8]

    Reward shaping to mitigate reward hacking in RLHF,

    J. Fu, X. Zhao, C. Yao, H. Wang, Q. Han, and Y . Xiao, “Reward shaping to mitigate reward hacking in RLHF,” inICML 2025 Workshop on Reliable and Responsible Foundation Models, 2025. [Online]. Available: https://openreview.net/forum?id=62A4d5Mokc

  9. [9]

    RRM: Robust reward model training mitigates reward hacking,

    T. Liu, W. Xiong, J. Ren, L. Chen, J. Wu, R. Joshi, Y . Gao, J. Shen, Z. Qin, T. Yu, D. Sohn, A. Makarova, J. Z. Liu, Y . Liu, B. Piot, A. Ittycheriah, A. Kumar, and M. Saleh, “RRM: Robust reward model training mitigates reward hacking,” inThe Thirteenth International Conference on Learning Representations, 2025. [Online]. Available: https://openreview.ne...

  10. [10]

    DAPO: An open-source LLM reinforcement learning system at scale,

    Q. Yu, Z. Zhang, R. Zhu, Y . Yuan, X. Zuo, YuYue, W. Dai, T. Fan, G. Liu, J. Liu, L. Liu, X. Liu, H. Lin, Z. Lin, B. Ma, G. Sheng, Y . Tong, C. Zhang, M. Zhang, R. Zhang, W. Zhang, H. Zhu, J. Zhu, J. Chen, J. Chen, C. Wang, H. Yu, Y . Song, X. Wei, H. Zhou, J. Liu, W.-Y . Ma, Y .-Q. Zhang, L. Yan, Y . Wu, and M. Wang, “DAPO: An open-source LLM reinforceme...

  11. [11]

    Beyond reverse KL: Generalizing direct preference optimization with diverse divergence constraints,

    C. Wang, Y . Jiang, C. Yang, H. Liu, and Y . Chen, “Beyond reverse KL: Generalizing direct preference optimization with diverse divergence constraints,” inThe Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https: //openreview.net/forum?id=2cRzmWXK9N

  12. [12]

    Is DPO superior to PPO for LLM alignment? a comprehensive study,

    S. Xu, W. Fu, J. Gao, W. Ye, W. Liu, Z. Mei, G. Wang, C. Yu, and Y . Wu, “Is DPO superior to PPO for LLM alignment? a comprehensive study,” inForty-first International Conference on Machine Learning, 2024. [Online]. Available: https: //openreview.net/forum?id=6XH8R7YrSk 13

  13. [13]

    Learning dynamics of LLM finetuning,

    Y . Ren and D. J. Sutherland, “Learning dynamics of LLM finetuning,” inThe Thirteenth International Conference on Learning Representations, 2025. [Online]. Available: https://openreview.net/ forum?id=tPNHOoZFl9

  14. [14]

    arXiv preprint arXiv:2502.21321

    K. Kumar, T. Ashraf, O. Thawakar, R. M. Anwer, H. Cholakkal, M. Shah, M.-H. Yang, P. H. Torr, F. S. Khan, and S. Khan, “Llm post- training: A deep dive into reasoning large language models,”arXiv preprint arXiv:2502.21321, 2025

  15. [15]

    Qwen2.5-vl,

    Q. Team, “Qwen2.5-vl,” January 2025. [Online]. Available: https: //qwenlm.github.io/blog/qwen2.5-vl/

  16. [16]

    Direct Preference Optimization: Your Language Model is Secretly a Reward Model,

    R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn, “Direct Preference Optimization: Your Language Model is Secretly a Reward Model,” vol. 36, pp. 53 728–53 741, 2023. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/2023/hash/ a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html

  17. [17]

    Mimic-cxr, a de- identified publicly available database of chest radiographs with free- text reports,

    A. E. Johnson, T. J. Pollard, S. J. Berkowitz, N. R. Greenbaum, M. P. Lungren, C.-y. Deng, R. G. Mark, and S. Horng, “Mimic-cxr, a de- identified publicly available database of chest radiographs with free- text reports,”Scientific data, vol. 6, no. 1, p. 317, 2019

  18. [18]

    Efficient streaming language models with attention sinks,

    G. Xiao, Y . Tian, B. Chen, S. Han, and M. Lewis, “Efficient streaming language models with attention sinks,” inThe Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/forum?id=NG7sS51zVF

  19. [19]

    Walking the tightrope: Autonomous disentangling beneficial and detrimental drifts in non-stationary custom-tuning,

    X. Yang, J. Lu, and E. Yu, “Walking the tightrope: Autonomous disentangling beneficial and detrimental drifts in non-stationary custom-tuning,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. [Online]. Available: https: //openreview.net/forum?id=1BAiQmAFsx

  20. [20]

    Models, reasoning and inference,

    J. Pearlet al., “Models, reasoning and inference,”Cambridge, UK: CambridgeUniversityPress, vol. 19, no. 2, p. 3, 2000

  21. [21]

    Interpretation and identification of causal mediation

    J. Pearl, “Interpretation and identification of causal mediation.”Psy- chological methods, vol. 19, no. 4, p. 459, 2014

  22. [22]

    Reinforced Self-Training (ReST) for Language Modeling

    C. Gulcehre, T. L. Paine, S. Srinivasan, K. Konyushkova, L. Weerts, A. Sharma, A. Siddhant, A. Ahern, M. Wang, C. Gu, W. Macherey, A. Doucet, O. Firat, and N. de Freitas, “Reinforced self- training (rest) for language modeling,” 2023. [Online]. Available: https://arxiv.org/abs/2308.08998

  23. [23]

    Scaling relationship on learning mathematical reasoning with large language models,

    Z. Yuan, H. Yuan, C. Li, G. Dong, K. Lu, C. Tan, C. Zhou, and J. Zhou, “Scaling relationship on learning mathematical reasoning with large language models,” 2024. [Online]. Available: https://openreview.net/forum?id=cijO0f8u35

  24. [24]

    B-STar: Monitoring and balancing exploration and exploitation in self-taught reasoners,

    W. Zeng, Y . Huang, L. Zhao, Y . Wang, Z. Shan, and J. He, “B-STar: Monitoring and balancing exploration and exploitation in self-taught reasoners,” inThe Thirteenth International Conference on Learning Representations, 2025. [Online]. Available: https: //openreview.net/forum?id=P6dwZJpJ4m

  25. [25]

    arXiv preprint arXiv:2501.07301 , year=

    Z. Zhang, C. Zheng, Y . Wu, B. Zhang, R. Lin, B. Yu, D. Liu, J. Zhou, and J. Lin, “The lessons of developing process reward models in mathematical reasoning,”arXiv preprint arXiv:2501.07301, 2025

  26. [26]

    Scalar: Spatial-concept alignment for robust vision in harsh open world,

    X. Yang, L. Xu, X. Zeng, X. Wang, H. Li, and S. Zhang, “Scalar: Spatial-concept alignment for robust vision in harsh open world,” Pattern Recognition, p. 113203, 2026

  27. [27]

    Fewer tokens, greater scaling: Self-adaptive visual bases for efficient and expansive representation learning,

    S. Young, X. Zeng, and L. Xu, “Fewer tokens, greater scaling: Self-adaptive visual bases for efficient and expansive representation learning,”arXiv preprint arXiv:2511.19515, 2025

  28. [28]

    Turning Drift into Constraint: Robust Reasoning Alignment in Non-Stationary Multi-Stream Environments

    X. Yang, J. Lu, and E. Yu, “Learning from all: Concept alignment for autonomous distillation from multiple drifting mllms,” arXiv preprint arXiv:2510.04142, 2025. [Online]. Available: https: //arxiv.org/abs/2510.04142

  29. [29]

    Deep reinforcement learning from human preferences,

    P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei, “Deep reinforcement learning from human preferences,” Advances in neural information processing systems, vol. 30, 2017

  30. [30]

    Training language models to follow instructions with human feedback,

    L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Rayet al., “Training language models to follow instructions with human feedback,”Advances in neural information processing systems, vol. 35, pp. 27 730–27 744, 2022

  31. [31]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

  32. [32]

    Constitutional AI: Harmlessness from AI Feedback

    Y . Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnonet al., “Constitutional ai: Harmlessness from ai feedback,”arXiv preprint arXiv:2212.08073, 2022

  33. [33]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, Q. Zhu, S. Ma, P. Wang, X. Biet al., “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,”arXiv preprint arXiv:2501.12948, 2025

  34. [34]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y . Liet al., “Deepseekmath: Pushing the limits of mathematical reasoning in open language models,”arXiv preprint arXiv:2402.03300, 2024

  35. [35]

    From System 1 to System 2: A Survey of Reasoning Large Language Models

    Z.-Z. Li, D. Zhang, M.-L. Zhang, J. Zhang, Z. Liu, Y . Yao, H. Xu, J. Zheng, P.-J. Wang, X. Chenet al., “From system 1 to system 2: A survey of reasoning large language models,”arXiv preprint arXiv:2502.17419, 2025

  36. [36]

    Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

    Q. Chen, L. Qin, J. Liu, D. Peng, J. Guan, P. Wang, M. Hu, Y . Zhou, T. Gao, and W. Che, “Towards reasoning era: A survey of long chain-of-thought for reasoning large language models,”arXiv preprint arXiv:2503.09567, 2025

  37. [37]

    Unleashing the potential of diffusion models towards diversified sequential recommendations,

    Z. Cai, S. Wang, V . W. Chu, U. Naseem, Y . Wang, and F. Chen, “Unleashing the potential of diffusion models towards diversified sequential recommendations,” inProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025, pp. 1476–1486

  38. [38]

    From Query to Counsel: Structured Reasoning with a Multi-Agent Framework and Dataset for Legal Consultation

    M. Lu, Y . Zhang, M. Wu, and Y . Feng, “From query to counsel: Structured reasoning with a multi-agent framework and dataset for legal consultation,” 2026. [Online]. Available: [https: //arxiv.org/abs/2604.10470](https://arxiv.org/abs/2604.10470)

  39. [39]

    Multimodal Chain-of-Thought Reasoning in Language Models

    Z. Zhang, A. Zhang, M. Li, H. Zhao, G. Karypis, and A. Smola, “Multimodal chain-of-thought reasoning in language models,”arXiv preprint arXiv:2302.00923, 2023

  40. [40]

    m3 cot: A novel benchmark for multi-domain multi-step multi-modal chain-of-thought

    Q. Chen, L. Qin, J. Zhang, Z. Chen, X. Xu, and W. Che, “m 3 cot: A novel benchmark for multi-domain multi-step multi-modal chain-of- thought,”arXiv preprint arXiv:2405.16473, 2024

  41. [41]

    Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

    Y . Wang, S. Wu, Y . Zhang, S. Yan, Z. Liu, J. Luo, and H. Fei, “Multimodal chain-of-thought reasoning: A comprehensive survey,” arXiv preprint arXiv:2503.12605, 2025

  42. [42]

    Ddcot: Duty-distinct chain-of-thought prompting for multimodal reasoning in language models,

    G. Zheng, B. Yang, J. Tang, H.-Y . Zhou, and S. Yang, “Ddcot: Duty-distinct chain-of-thought prompting for multimodal reasoning in language models,”Advances in Neural Information Processing Systems, vol. 36, pp. 5168–5191, 2023

  43. [43]

    Steering diffusion models towards credible content recommendation,

    Z. Cai, S. Wang, J. Li, P. Zhou, V . W. Chu, F. Chen, T. Zhu, and C. C. Aggarwal, “Steering diffusion models towards credible content recommendation,” inThe Fourteenth International Conference on Learning Representations, 2026

  44. [44]

    From newborn to impact: Bias-aware citation prediction,

    M. Lu, M. Wu, J. Xu, W. Li, F. Liu, Y . Ding, Y . Sun, J. Lu, and Y . Zhang, “From newborn to impact: Bias-aware citation prediction,” arXiv preprint arXiv:2510.19246, 2025

  45. [45]

    Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang

    M. Lu, M. Wu, F. Liu, J. Xu, W. Li, H. Wang, Z. Hu, Y . Ding, Y . Sun, J. Luet al., “Choosing how to remember: Adaptive memory structures for llm agents,”arXiv preprint arXiv:2602.14038, 2026

  46. [46]

    Revealing multimodal causality with large language models,

    J. Li, S. Wang, Q. Zhang, F. Liu, T. Liu, L. Cao, S. Yu, and F. Chen, “Revealing multimodal causality with large language models,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. [Online]. Available: https://openreview.net/forum?id= nufqobhME7

  47. [47]

    Mitigating modality prior-induced hallucinations in multimodal large language models via deciphering attention causality,

    G. Zhou, Y . Yan, X. Zou, K. Wang, A. Liu, and X. Hu, “Mitigating modality prior-induced hallucinations in multimodal large language models via deciphering attention causality,” inThe Thirteenth International Conference on Learning Representations, 2025. [Online]. Available: https://openreview.net/forum?id=A V7OXVlAyi

  48. [48]

    Causal-cog: A causal- effect look at context generation for boosting multi-modal language models,

    S. Zhao, Z. Li, Y . Lu, A. Yuille, and Y . Wang, “Causal-cog: A causal- effect look at context generation for boosting multi-modal language models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024, pp. 13 342–13 351

  49. [49]

    Ensemble learning for data stream analysis: A survey,

    B. Krawczyk, L. L. Minku, J. Gama, J. Stefanowski, and M. Wo ´zniak, “Ensemble learning for data stream analysis: A survey,”Information Fusion, vol. 37, pp. 132–156, 2017

  50. [50]

    Learning under Concept Drift: A Review,

    J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, and G. Zhang, “Learning under Concept Drift: A Review,” vol. 31, no. 12, pp. 2346– 2363, 2019. [Online]. Available: https://ieeexplore.ieee.org/abstract/ document/8496795

  51. [51]

    Recent Advances in Concept Drift Adaptation Methods for Deep Learning

    L. Yuan, H. Li, B. Xia, C. Gao, M. Liu, W. Yuan, and X. You, “Recent Advances in Concept Drift Adaptation Methods for Deep Learning.” inIJCAI, 2022, pp. 5654–5661

  52. [52]

    Concept Neural Network Based on Time-Delay Regret for Dynamic Stream Learning,

    Y .-L. Mi, “Concept Neural Network Based on Time-Delay Regret for Dynamic Stream Learning,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 5, pp. 3796–3814, May 2025

  53. [53]

    Drift-aware collabora- tive assistance mixture of experts for heterogeneous multistream learn- ing,

    E. Yu, J. Lu, K. Wang, X. Yang, and G. Zhang, “Drift-aware collabora- tive assistance mixture of experts for heterogeneous multistream learn- ing,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 19, 2026, pp. 16 199–16 207. 14

  54. [54]

    Generalized incremental learning under concept drift across evolving data streams,

    E. Yu, J. Lu, and G. Zhang, “Generalized incremental learning under concept drift across evolving data streams,”arXiv preprint arXiv:2506.05736, 2025

  55. [55]

    Automated Concept Drift Handling for Fault Prediction in Edge Clouds Using Reinforcement Learning,

    B. Shayesteh, C. Fu, A. Ebrahimzadeh, and R. H. Glitho, “Automated Concept Drift Handling for Fault Prediction in Edge Clouds Using Reinforcement Learning,”IEEE Transactions on Network and Service Management, vol. 19, no. 2, pp. 1321–1335, Jun. 2022

  56. [56]

    DRMD: Deep Reinforcement Learning for Malware Detection under Concept Drift,

    S. McFadden, M. Foley, M. D’Onghia, C. Hicks, V . Mavroudis, N. Paoletti, and F. Pierazzi, “DRMD: Deep Reinforcement Learning for Malware Detection under Concept Drift,” inThe Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), Nov. 2025

  57. [57]

    Adapting multi-modal large language model to concept drift from pre-training onwards,

    X. Yang, J. Lu, and E. Yu, “Adapting multi-modal large language model to concept drift from pre-training onwards,” inThe Thirteenth International Conference on Learning Representations, Y . Yue, A. Garg, N. Peng, F. Sha, and R. Yu, Eds., vol. 2025, 2025, pp. 90 869–90 891. [Online]. Available: https://proceedings.iclr.cc/paper files/paper/2025/ file/e25d8...

  58. [58]

    T-distributed Spherical Feature Representation for Imbalanced Classification,

    X. Yang, Y . Chen, X. Yue, S. Xu, and C. Ma, “T-distributed Spherical Feature Representation for Imbalanced Classification,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 9, pp. 10 825–10 833, 2023. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/26284

  59. [59]

    arXiv preprint arXiv:2502.07620 (2025)

    X. Yang, J. Lu, E. Yu, and W. Duan, “Resilient contrastive pre-training under non-stationary drift,”arXiv preprint arXiv:2502.07620, 2025. [Online]. Available: https://arxiv.org/abs/2502.07620

  60. [60]

    One leaf reveals the season: Occlusion-based contrastive learning with semantic- aware views for efficient visual representation,

    X. Yang, L. Xu, H. Li, and S. Zhang, “One leaf reveals the season: Occlusion-based contrastive learning with semantic- aware views for efficient visual representation,” inForty-second International Conference on Machine Learning, 2025. [Online]. Available: https://openreview.net/forum?id=toZOqONu9x

  61. [61]

    Causal diagrams for empirical research,

    J. Pearl, “Causal diagrams for empirical research,”Biometrika, vol. 82, no. 4, pp. 669–688, 1995

  62. [62]

    John Wiley & Sons, 2016

    ——,Causal inference in statistics: a primer. John Wiley & Sons, 2016

  63. [63]

    Direct and indirect effects,

    ——, “Direct and indirect effects,” inProbabilistic and causal infer- ence: the works of Judea Pearl, 2022, pp. 373–392

  64. [64]

    Segmentation and vascular vectorization for coronary artery by geometry-based cascaded neural network,

    X. Yang, L. Xu, S. Yu, Q. Xia, H. Li, and S. Zhang, “Segmentation and vascular vectorization for coronary artery by geometry-based cascaded neural network,”IEEE Transactions on Medical Imaging, vol. 44, no. 1, pp. 259–269, 2024

  65. [65]

    Local linear embedding based interpolation neural network in pancreatic tumor segmentation,

    X. Yang, Y . Chen, X. Yue, C. Ma, and P. Yang, “Local linear embedding based interpolation neural network in pancreatic tumor segmentation,” Applied Intelligence, vol. 52, no. 8, pp. 8746–8756, 2022

  66. [66]

    arXiv preprint arXiv:2603.01143 (2026)

    Z. Chen, S. Young, and L. Xu, “Tc-ssa: Token compression via semantic slot aggregation for gigapixel pathology reasoning,”arXiv preprint arXiv:2603.01143, 2026

  67. [67]

    Knowledge matters: Chest radiology report generation with general and specific knowledge,

    S. Yang, X. Wu, S. Ge, S. K. Zhou, and L. Xiao, “Knowledge matters: Chest radiology report generation with general and specific knowledge,”Medical image analysis, vol. 80, p. 102510, 2022

  68. [68]

    Clinical-bert: Vision-language pre-training for radiograph diagnosis and reports generation,

    B. Yan and M. Pei, “Clinical-bert: Vision-language pre-training for radiograph diagnosis and reports generation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 3, 2022, pp. 2982–2990

  69. [69]

    Metransformer: Radiology report generation by transformer with multiple learnable expert tokens,

    Z. Wang, L. Liu, L. Wang, and L. Zhou, “Metransformer: Radiology report generation by transformer with multiple learnable expert tokens,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 11 558–11 567

  70. [70]

    Spatio-temporal and retrieval-augmented modelling for chest x-ray report generation,

    Y . Yang, X. You, K. Zhang, Z. Fu, X. Wang, J. Ding, J. Sun, Z. Yu, Q. Huang, W. Hanet al., “Spatio-temporal and retrieval-augmented modelling for chest x-ray report generation,”IEEE Transactions on Medical Imaging, 2025

  71. [71]

    Diagnostic Captioning by Cooperative Task Interactions and Sample-Graph Consistency,

    Z. Wang, L. Wang, X. Li, and L. Zhou, “Diagnostic Captioning by Cooperative Task Interactions and Sample-Graph Consistency,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 8, pp. 6585–6598, Aug. 2025

  72. [72]

    R2gengpt: Radiology report generation with frozen llms,

    Z. Wang, L. Liu, L. Wang, and L. Zhou, “R2gengpt: Radiology report generation with frozen llms,”Meta-Radiology, vol. 1, no. 3, p. 100033, 2023

  73. [73]

    Promptmrg: Diagnosis-driven prompts for medical report generation,

    H. Jin, H. Che, Y . Lin, and H. Chen, “Promptmrg: Diagnosis-driven prompts for medical report generation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 3, 2024, pp. 2607– 2615

  74. [74]

    Bootstrapping large language models for radiology report generation,

    C. Liu, Y . Tian, W. Chen, Y . Song, and Y . Zhang, “Bootstrapping large language models for radiology report generation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 17, 2024, pp. 18 635–18 643

  75. [75]

    Cxpmrg-bench: Pre-training and benchmarking for x-ray medical report generation on chexpert plus dataset,

    X. Wang, F. Wang, Y . Li, Q. Ma, S. Wang, B. Jiang, and J. Tang, “Cxpmrg-bench: Pre-training and benchmarking for x-ray medical report generation on chexpert plus dataset,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2025, pp. 5123–5133

  76. [76]

    Reason like a radiologist: Chain- of-thought and reinforcement learning for verifiable report generation,

    P. Jing, K. Lee, Z. Zhang, H. Zhou, Z. Yuan, Z. Gao, L. Zhu, G. Pa- panastasiou, Y . Fang, and G. Yang, “Reason like a radiologist: Chain- of-thought and reinforcement learning for verifiable report generation,” Medical Image Analysis, vol. 109, p. 103910, Mar. 2026

  77. [77]

    Radiology report generation via multi-objective preference optimization,

    T. Xiao, L. Shi, P. Liu, Z. Wang, and C. Bai, “Radiology report generation via multi-objective preference optimization,” inProceedings of the AAAI conference on artificial intelligence, vol. 39, no. 8, 2025, pp. 8664–8672

  78. [78]

    Fir-rad: Fine- grained reinforcement with structured reasoning for chest x-ray report generation,

    X. Mei, L. Yang, D. Gao, X. Cai, J. Han, and T. Liu, “Fir-rad: Fine- grained reinforcement with structured reasoning for chest x-ray report generation,”IEEE Transactions on Medical Imaging, 2026

  79. [79]

    Textual explanations for self-driving vehicles,

    J. Kim, A. Rohrbach, T. Darrell, J. Canny, and Z. Akata, “Textual explanations for self-driving vehicles,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 563–578

  80. [80]

    Drivegpt4: Interpretable end-to-end autonomous driving via large language model,

    Z. Xu, Y . Zhang, E. Xie, Z. Zhao, Y . Guo, K.-Y . K. Wong, Z. Li, and H. Zhao, “Drivegpt4: Interpretable end-to-end autonomous driving via large language model,”IEEE Robotics and Automation Letters, vol. 9, no. 10, pp. 8186–8193, 2024

Showing first 80 references.