pith. sign in

arxiv: 2510.04142 · v3 · submitted 2025-10-05 · 💻 cs.CV · cs.AI· cs.LG

Turning Drift into Constraint: Robust Reasoning Alignment in Non-Stationary Multi-Stream Environments

Pith reviewed 2026-05-18 10:40 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords reasoning alignmentnon-stationary environmentsmulti-modal large language modelsconcept driftpreference optimizationchest X-ray interpretationconstraint satisfactionmodel consensus
0
0 comments X

The pith

Treating reasoning disagreements between models as dynamic negative constraints allows a 7B model to outperform larger proprietary sources on chest X-ray tasks when environments change over time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how multiple multi-modal language models produce drifting and biased reasoning outputs in non-stationary settings, which then get passed to a target model. It reframes the alignment task as a constraint satisfaction problem drawn from concept drift theory and introduces Autonomous Preference Optimization to convert those divergences into usable negative signals. The method follows a two-stage sequence of supervised bootstrapping to reach the combined capability level of the sources, followed by optimization that suppresses inconsistent trajectories through a multi-negative Plackett-Luce objective. Experiments on chest X-ray interpretation show the resulting 7B model reaching higher average accuracy than the proprietary source models under the same drifting conditions. The authors also release a benchmark of nearly 171,000 reasoning trajectories collected from seven models to support further work on this form of alignment.

Core claim

Multi-source reasoning alignment under non-stationary conditions can be solved by treating inter-model divergences as dynamic negative constraints rather than noise. After a supervised bootstrapping stage that places the target model inside the capability union of the sources, a multi-negative Plackett-Luce objective explicitly penalizes drifting trajectories and synthesizes a consistent consensus manifold. This produces a target model whose reasoning remains stable even as source distributions evolve.

What carries the argument

Autonomous Preference Optimization (APO) two-stage protocol that first performs supervised bootstrapping and then applies a multi-negative Plackett-Luce objective to suppress drifting trajectories while forming a consensus manifold.

If this is right

  • A 7B target model reaches higher average accuracy than the proprietary source models on chest X-ray interpretation under non-stationary conditions.
  • The two-stage protocol first projects the target into the union of source capabilities and then enforces consistency through explicit negative constraints.
  • A new benchmark called CXR-MAX supplies 170,982 reasoning trajectories from seven MLLMs for studying alignment under drift.
  • Drift is converted from a source of transmitted bias into an explicit supervisory signal that guides the target toward a stable manifold.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same constraint formulation could be tested on non-stationary streams outside medical imaging, such as evolving visual question answering or autonomous navigation data.
  • If the negative constraints prove stable across different drift speeds, the approach might reduce the need for frequent full retraining of alignment targets.
  • The released trajectory dataset opens direct comparisons between APO and other drift-correction techniques that do not use multi-negative ranking losses.

Load-bearing premise

Inter-model divergences in non-stationary environments can be reliably modeled and suppressed as dynamic negative constraints in a multi-negative Plackett-Luce objective without discarding useful reasoning signals or introducing new systematic biases.

What would settle it

If a 7B model trained with the full APO pipeline shows lower average accuracy than the same model after only the bootstrapping stage on a fresh non-stationary chest X-ray stream, the value of the constraint suppression step would be falsified.

Figures

Figures reproduced from arXiv: 2510.04142 by En Yu, Jie Lu, Wei Duan, Xiaoyu Yang.

Figure 1
Figure 1. Figure 1: Transmission of Concept Drift behind Distillation of MLLMs Within the concept drift framework, our analysis uncov￾ers fundamental limitations in distilling knowledge from multiple drifting MLLMs in customized domain-specific scenarios. Figure 1a presents the diagnostic reasoning outcomes of various teacher MLLMs on MIMIC-CXR, featuring seven leading publicly accessible MLLMs with precision, recall, and sem… view at source ↗
Figure 2
Figure 2. Figure 2: The main contributions of our methods. (a) By formalizing the autoregressive inference of [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
read the original abstract

This paper identifies a critical yet underexplored challenge in reasoning alignment from multiple multi-modal large language models (MLLMs): In non-stationary environments, the diverse reasoning distributions of source models often evolve unpredictably, transmitting systematic biases and drift to the target model. To address this, we formulate multi-source reasoning alignment as a constraint satisfaction problem under concept drift theory. We propose Autonomous Preference Optimization (APO), a novel framework that treats inter-model divergences not as noise, but as dynamic negative constraints. APO operates via a two-stage protocol: first, supervised bootstrapping projects the target model into the capability union of source models; second, constraint-aware optimization synthesizes a consistent consensus manifold by explicitly suppressing drifting trajectories via a multi-negative Plackett-Luce objective. Extensive experiments on chest X-ray interpretation demonstrate that our 7B model achieves superior robustness, outperforming even proprietary source models in average accuracy. Furthermore, we release CXR-MAX, a large-scale benchmark comprising 170,982 reasoning trajectories from seven large-scale MLLMs to facilitate research on reasoning alignment under drift. Code and data are available at: https://github.com/XiaoyuYoung/APO.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Autonomous Preference Optimization (APO), a two-stage framework for multi-source reasoning alignment in non-stationary environments. It formulates inter-model divergences from multiple MLLMs as dynamic negative constraints under concept drift theory. The first stage performs supervised bootstrapping to embed the target model in the union of source capabilities; the second stage applies constraint-aware optimization via a multi-negative Plackett-Luce objective to suppress drifting trajectories and synthesize a consensus manifold. Experiments on chest X-ray interpretation tasks claim that the resulting 7B target model achieves superior average accuracy and robustness compared to proprietary source models. The authors also release the CXR-MAX benchmark containing 170,982 reasoning trajectories from seven MLLMs.

Significance. If the central experimental claim holds and the optimization step demonstrably preserves valid reasoning paths while eliminating only non-stationary drift, the work would offer a principled way to achieve robust alignment across heterogeneous models without explicit access to source internals. The release of CXR-MAX constitutes a concrete, reusable resource that could accelerate research on drift-aware preference optimization. The explicit grounding in concept drift theory and the parameter-light treatment of divergences as suppressible constraints are strengths that distinguish the approach from standard distillation or ensemble methods.

major comments (2)
  1. [Abstract and §5] Abstract and §5 (Experiments): the central claim that the 7B APO model 'outperforms even proprietary source models in average accuracy' is presented without any reported numerical values, standard deviations, baseline comparisons, or statistical significance tests. Because this quantitative result is the primary evidence for superior robustness under non-stationary conditions, its absence prevents assessment of whether the improvement is meaningful or merely within noise.
  2. [§4.2] §4.2, multi-negative Plackett-Luce formulation (Eq. (7)–(9)): the objective treats inter-model divergences as dynamic negative constraints to be suppressed. The paper does not specify how negative samples are selected or weighted so that task-relevant chest X-ray reasoning features (e.g., lesion localization logic) are not inadvertently down-weighted; without an ablation or correlation analysis, it remains possible that the optimization step discards useful signals rather than only drift.
minor comments (2)
  1. [§3.1] Notation for the capability union and consensus manifold is introduced without a compact mathematical definition; a single-line set notation would improve readability.
  2. [§5.1] The CXR-MAX benchmark description would benefit from an explicit statement of the train/validation/test split sizes and the exact protocol used to generate the 170,982 trajectories.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will make the necessary revisions to improve clarity and completeness.

read point-by-point responses
  1. Referee: [Abstract and §5] Abstract and §5 (Experiments): the central claim that the 7B APO model 'outperforms even proprietary source models in average accuracy' is presented without any reported numerical values, standard deviations, baseline comparisons, or statistical significance tests. Because this quantitative result is the primary evidence for superior robustness under non-stationary conditions, its absence prevents assessment of whether the improvement is meaningful or merely within noise.

    Authors: We agree that the abstract and the summary in §5 would benefit from explicit quantitative reporting to allow readers to evaluate the strength of the central claim. While detailed results and comparisons appear in the tables of §5, we acknowledge that key numbers, standard deviations, baseline details, and statistical tests are not summarized in the abstract or the section overview. In the revised version, we will update the abstract with the primary accuracy metrics and add a short paragraph in §5 that reports the numerical values, standard deviations, baseline comparisons, and significance test results. revision: yes

  2. Referee: [§4.2] §4.2, multi-negative Plackett-Luce formulation (Eq. (7)–(9)): the objective treats inter-model divergences as dynamic negative constraints to be suppressed. The paper does not specify how negative samples are selected or weighted so that task-relevant chest X-ray reasoning features (e.g., lesion localization logic) are not inadvertently down-weighted; without an ablation or correlation analysis, it remains possible that the optimization step discards useful signals rather than only drift.

    Authors: We thank the referee for this important observation. The current description of the multi-negative Plackett-Luce objective does not provide sufficient detail on negative sample selection criteria or weighting, nor does it include supporting ablations or correlation analyses. We will revise §4.2 to explicitly describe how negative samples are identified using divergence thresholds linked to concept drift detection and how they are weighted in the objective. We will also add an ablation study and a correlation analysis examining the relationship between suppressed trajectories and task-relevant features such as lesion localization to demonstrate that only non-stationary drift signals are targeted. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation chain is self-contained

full rationale

The paper motivates the APO framework from concept drift theory as an external foundation and presents the two-stage protocol (supervised bootstrapping followed by multi-negative Plackett-Luce optimization) as a novel construction for treating divergences as dynamic negative constraints. No equation or step reduces by construction to quantities defined solely from fitted parameters on the target data, nor does the central claim rely on self-citation chains or imported uniqueness theorems. The CXR-MAX benchmark and reported experiments on chest X-ray tasks provide independent external validation points outside the optimization loop itself.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim depends on the applicability of concept drift theory to reasoning distributions and the validity of extending Plackett-Luce to multi-negative constraints for consensus synthesis.

free parameters (1)
  • optimization hyperparameters for constraint-aware stage
    Tuning parameters for the multi-negative Plackett-Luce objective are required but not detailed in the abstract.
axioms (1)
  • domain assumption Inter-model divergences can be treated as dynamic negative constraints without loss of critical information
    Invoked in the formulation of APO as constraint satisfaction under concept drift.
invented entities (1)
  • Autonomous Preference Optimization (APO) framework no independent evidence
    purpose: To synthesize consistent consensus manifold by suppressing drifting trajectories
    Newly proposed two-stage protocol not present in cited prior work.

pith-pipeline@v0.9.0 · 5748 in / 1436 out tokens · 36770 ms · 2026-05-18T10:40:42.057660+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Autonomous Drift Learning in Data Streams: A Unified Perspective

    cs.LG 2026-05 unverdicted novelty 7.0

    A survey proposes a novel 3D taxonomy classifying drifts into time stream, data stream, and model stream categories to unify research on non-stationary autonomous learning.

  2. XrayClaw: Cooperative-Competitive Multi-Agent Alignment for Trustworthy Chest X-ray Diagnosis

    cs.CV 2026-04 unverdicted novelty 7.0

    XrayClaw deploys cooperative-competitive multi-agent alignment and Competitive Preference Optimization to raise diagnostic accuracy, reasoning fidelity, and generalization on chest X-ray benchmarks.

  3. Towards Robust Endogenous Reasoning: Unifying Drift Adaptation in Non-Stationary Tuning

    cs.LG 2026-04 unverdicted novelty 5.0

    CPO++ adapts reinforcement fine-tuning of MLLMs to endogenous multi-modal concept drift through counterfactual reasoning and preference optimization, yielding better coherence and cross-domain robustness in safety-cri...

Reference graph

Works this paper leans on

118 extracted references · 118 canonical work pages · cited by 3 Pith papers · 11 internal anchors

  1. [1]

    Zhuang, K

    Shen, Y ., Z. Zhuang, K. Yuan, et al. Medical Multimodal Model Stealing Attacks via Ad- versarial Domain Alignment.Proceedings of the AAAI Conference on Artificial Intelligence, 9 39(7):6842–6850, 2025

  2. [2]

    Shu, F., Y . Liao, L. Zhang, et al. LLaV A-mod: Making LLaV A tiny via moe-knowledge distillation. InThe Thirteenth International Conference on Learning Representations. 2025

  3. [3]

    Zhang, T

    Cao, J., Y . Zhang, T. Huang, et al. MoVE-KD: Knowledge Distillation for VLMs with Mix- ture of Visual Encoders. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19846–19856. 2025

  4. [4]

    Feng, Q., W. Li, T. Lin, et al. Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Large Model Enhancement. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4178–4188. 2025

  5. [5]

    Yang, X., L. Xu, S. Yu, et al. Geometry-based end-to-end segmentation of coronary artery in computed tomography angiography. InInternational Workshop on Trustworthy Machine Learning for Healthcare, pages 190–196. Springer, 2023

  6. [6]

    Yang, X., L. Xu, H. Li, et al. Vilam: A vision-language model with enhanced visual ground- ing and generalization capability.CoRR, 2023

  7. [7]

    Gu, Y ., Z. Tong, I. Castro, et al. Multi-MLLM Knowledge Distillation for Out-of-Context News Detection, 2025

  8. [8]

    Chen, K., Y . Du, T. You, et al. LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 10772–10778. 2024

  9. [9]

    Lu, J., A. Liu, F. Dong, et al. Learning under Concept Drift: A Review. 31(12):2346–2363, 2019

  10. [10]

    Yang, X., J. Lu, E. Yu. Adapting multi-modal large language model to concept drift from pre- training onwards. InThe Thirteenth International Conference on Learning Representations, vol. 2025, pages 90869–90891. 2025

  11. [11]

    Walking the tightrope: Disentangling beneficial and detrimental drifts in non-stationary custom-tuning.arXiv preprint arXiv:2505.13081, 2025

    —. Walking the tightrope: Disentangling beneficial and detrimental drifts in non-stationary custom-tuning.arXiv preprint arXiv:2505.13081, 2025

  12. [12]

    Sharma, E

    Rafailov, R., A. Sharma, E. Mitchell, et al. Direct preference optimization: Your language model is secretly a reward model.Advances in neural information processing systems, 36:53728–53741, 2023

  13. [13]

    Vieillard, Y

    Agarwal, R., N. Vieillard, Y . Zhou, et al. On-policy distillation of language models: Learning from self-generated mistakes. InThe twelfth international conference on learning represen- tations. 2024

  14. [14]

    Sharma, E

    Rafailov, R., A. Sharma, E. Mitchell, et al. Direct Preference Optimization: Your Language Model is Secretly a Reward Model. 36:53728–53741, 2023

  15. [15]

    Bradley, R. A., M. E. Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons.Biometrika, 39(3/4):324–345, 1952

  16. [16]

    Johnson, A. E., T. J. Pollard, S. J. Berkowitz, et al. Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports.Scientific data, 6(1):317, 2019

  17. [17]

    Hyland, Q

    Bannur, S., S. Hyland, Q. Liu, et al. Learning to exploit temporal structure for biomedical vision-language processing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15016–15027. 2023

  18. [18]

    Karwande, G., A. B. Mbakwe, J. T. Wu, et al. Chexrelnet: An anatomy-aware model for tracking longitudinal relationships between chest x-rays. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 581–591. Springer, 2022. 10

  19. [19]

    Usuyama, S

    Boecking, B., N. Usuyama, S. Bannur, et al. Making the most of text semantics to improve biomedical vision–language processing. InEuropean conference on computer vision, pages 1–21. Springer, 2022

  20. [20]

    Yang, J., B. Su, X. Zhao, et al. Unlocking the power of spatial and temporal information in medical multimodal pre-training. InForty-first International Conference on Machine Learn- ing. 2024

  21. [21]

    Yang, Z., L. Shen. Tempa-vlp: Temporal-aware vision-language pretraining for longitudinal exploration in chest x-ray image. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 4625–4634. 2025

  22. [22]

    Chen, Y ., S. Xu, A. Sellergren, et al. Coca-cxr: Contrastive captioners learn strong temporal structures for chest x-ray vision-language understanding, 2025

  23. [23]

    Bai, S., K. Chen, X. Liu, et al. Qwen2. 5-vl technical report.arXiv preprint arXiv:2502.13923, 2025

  24. [24]

    Claude sonnet 4, 2025

    Anthropic. Claude sonnet 4, 2025

  25. [25]

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Comanici, G., E. Bieber, M. Schaekermann, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv preprint arXiv:2507.06261, 2025

  26. [26]

    Team, V ., W. Hong, W. Yu, et al. Glm-4.5v and glm-4.1v-thinking: Towards versatile multi- modal reasoning with scalable reinforcement learning, 2025

  27. [27]

    Introducing gpt-5, 2025

    OpenAI. Introducing gpt-5, 2025

  28. [28]

    Moonshot v1 (kimi), 2025

    AI, M. Moonshot v1 (kimi), 2025

  29. [29]

    Hyland, F

    Bannur, S., S. Hyland, F. Liu, et al. Learning to exploit temporal structure for biomedical vision-language processing. InThe IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2023

  30. [30]

    Wu, J. T., N. N. Agu, I. Lourentzou, et al. Chest imagenome dataset for clinical reasoning. arXiv preprint arXiv:2108.00316, 2021

  31. [31]

    Wang, Z., L. Liu, L. Wang, et al. Metransformer: Radiology report generation by trans- former with multiple learnable expert tokens. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11558–11567. 2023

  32. [32]

    R2gengpt: Radiology report generation with frozen llms.Meta-Radiology, 1(3):100033, 2023

    —. R2gengpt: Radiology report generation with frozen llms.Meta-Radiology, 1(3):100033, 2023

  33. [33]

    Liu, C., Y . Tian, W. Chen, et al. Bootstrapping large language models for radiology report generation. InProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pages 18635–18643. 2024

  34. [34]

    Wang, X., F. Wang, Y . Li, et al. CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset, 2024

  35. [35]

    Antani, M

    Demner-Fushman, D., S. Antani, M. Simpson, et al. Design and development of a multimodal biomedical information retrieval system.Journal of Computing Science and Engineering, 6(2):168–177, 2012

  36. [36]

    Wang, X., Y . Peng, L. Lu, et al. Chestx-ray8: Hospital-scale chest x-ray database and bench- marks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. 11

  37. [37]

    Rajpurkar, M

    Irvin, J., P. Rajpurkar, M. Ko, et al. Chexpert: A large chest radiograph dataset with un- certainty labels and expert comparison. InProceedings of the AAAI conference on artificial intelligence, vol. 33, pages 590–597. 2019

  38. [38]

    Liu, J., J. Lian, Y . Yu. Chestx-det10: Chest x-ray dataset on detection of thoracic abnormali- ties, 2020

  39. [39]

    Talius, P

    Tiu, E., E. Talius, P. Patel, et al. Expert-level detection of pathologies from unannotated chest x-ray images via self-supervised learning.Nature biomedical engineering, 6(12):1399–1406, 2022

  40. [40]

    Zhang, Y

    Wu, C., X. Zhang, Y . Zhang, et al. Medklip: Medical knowledge enhanced language-image pre-training for x-ray diagnosis. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 21372–21383. 2023

  41. [41]

    Zhang, X., C. Wu, Y . Zhang, et al. Knowledge-enhanced visual-language pre-training on chest radiology images.Nature Communications, 14(1):4542, 2023

  42. [42]

    Lai, H., Q. Yao, Z. Jiang, et al. Carzero: Cross-attention alignment for radiology zero-shot classification. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11137–11146. 2024

  43. [43]

    Lu, J., A. Liu, Y . Song, et al. Data-driven decision support under concept drift in streamed big data.Complex & intelligent systems, 6(1):157–163, 2020

  44. [44]

    Xiong, A

    Wang, K., L. Xiong, A. Liu, et al. A self-adaptive ensemble for user interest drift learning. 577:127308, 2024

  45. [45]

    Jiao, B., Y . Guo, D. Gong, et al. Dynamic Ensemble Selection for Imbalanced Data Streams With Concept Drift. 35(1):1278–1291, 2024

  46. [46]

    Yang, X., Y . Chen, H. Liang. Square root based activation function in neural networks. In 2018 International conference on audio, language and image processing (ICALIP), pages 84–89. IEEE, 2018

  47. [47]

    Cerqueira, V ., H. M. Gomes, A. Bifet, et al. STUDD: A student–teacher method for unsuper- vised concept drift detection. 112(11):4351–4378, 2023

  48. [48]

    Yang, X., Y . Chen, X. Yue, et al. T-distributed Spherical Feature Representation for Im- balanced Classification.Proceedings of the AAAI Conference on Artificial Intelligence, 37(9):10825–10833, 2023

  49. [49]

    Yu, E., J. Lu, X. Yang, et al. Learning robust spectral dynamics for temporal domain gener- alization.arXiv preprint arXiv:2505.12585, 2025

  50. [50]

    Yang, X., J. Lu, E. Yu. Causal-informed contrastive learning: Towards bias-resilient pre- training under concept drift.arXiv preprint arXiv:2502.07620, 2025

  51. [51]

    Yu, E., J. Lu, B. Zhang, et al. Online boosting adaptive learning under concept drift for multistream classification. InProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pages 16522–16530. 2024

  52. [52]

    Yu, E., Y . Song, G. Zhang, et al. Learn-to-adapt: Concept drift adaptation for hybrid multiple streams. 496:121–130, 2022

  53. [53]

    Yu, E., J. Lu, G. Zhang. Fuzzy shared representation learning for multistream classification. IEEE Transactions on Fuzzy Systems, 32(10):5625–5637, 2024

  54. [54]

    Yu, E., J. Lu, K. Wang, et al. Drift-aware collaborative assistance mixture of experts for heterogeneous multistream learning.arXiv preprint arXiv:2508.01598, 2025

  55. [55]

    Yang, X., L. Xu, H. Li, et al. One leaf reveals the season: Occlusion-based contrastive learning with semantic-aware views for efficient visual representation. InForty-second Inter- national Conference on Machine Learning. 2025. 12

  56. [56]

    Yu, H., W. Liu, J. Lu, et al. Detecting group concept drift from multiple data streams. 134:109113, 2023

  57. [57]

    Li, W., X. Yang, W. Liu, et al. DDG-DA: Data Distribution Generation for Predictable Con- cept Drift Adaptation. 36(4):4092–4100, 2022-06-28

  58. [58]

    Vinyals, J

    Hinton, G., O. Vinyals, J. Dean. Distilling the knowledge in a neural network. InNeurIPS Deep Learning Workshop. 2015

  59. [59]

    Cheng, Z

    Sun, S., Y . Cheng, Z. Gan, et al. Patient knowledge distillation for bert model compression. InEMNLP. 2019

  60. [60]

    Jiao, X., Y . Yin, L. Shang, et al. Tinybert: Distilling bert for natural language understanding. InEMNLP. 2020

  61. [61]

    Debut, J

    Sanh, V ., L. Debut, J. Chaumond, et al. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. InNeurIPS Workshop on Energy Efficient Machine Learning. 2019

  62. [62]

    On-policy distillation of language models: Learning from self-generated mistakes

    Agarwal, R., et al. On-policy distillation of language models: Learning from self-generated mistakes. InICLR. 2023

  63. [63]

    Self-distillation bridges distribution gap in language model fine-tuning

    Yang, Z., et al. Self-distillation bridges distribution gap in language model fine-tuning. In ACL, pages 1028–1043. 2024

  64. [64]

    Kordi, S

    Wang, Y ., Y . Kordi, S. Mishra, et al. Self-instruct: Aligning language models with self generated instructions.ACL, 2023

  65. [65]

    LIMA: Less Is More for Alignment

    Zhou, C., S. Hooker, S. Sukhbaatar, et al. Lima: Less is more for alignment.arXiv preprint arXiv:2305.11206, 2023

  66. [66]

    Ouyang, L., J. Wu, X. Jiang, et al. Training language models to follow instructions with human feedback. InNeurIPS. 2022

  67. [67]

    Constitutional AI: Harmlessness from AI Feedback

    Bai, Y ., A. Jones, K. Ndousse, et al. Constitutional ai: Harmlessness from ai feedback.arXiv preprint arXiv:2212.08073, 2022

  68. [68]

    Bhagavatula, J

    West, P., C. Bhagavatula, J. Hessel, et al. Symbolic knowledge distillation: from general language models to commonsense models.NAACL, 2022

  69. [69]

    Gu, Y ., X. Han, Z. Liu, et al. Knowledge distillation of large language models.arXiv preprint arXiv:2306.08543, 2023

  70. [70]

    Radford, A., J. W. Kim, C. Hallacy, et al. Learning transferable visual models from natural language supervision. InICML. 2021

  71. [71]

    Zhang, X

    Yao, Y ., S. Zhang, X. Pan, et al. Filip: Fine-grained interactive language-image pre-training. InICML. 2022

  72. [72]

    Li, J., D. Li, S. Savarese. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. InICML. 2023

  73. [73]

    Liu, H., C. Li, Q. Wu, et al. Visual instruction tuning.arXiv preprint arXiv:2304.08485, 2023

  74. [74]

    Dai, W., Z. Li, L. Zhang, et al. Instructblip: Towards general-purpose vision-language models with instruction tuning.arXiv preprint arXiv:2305.06500, 2023

  75. [75]

    Llava-mod: Making llava tiny via moe-knowledge distillation

    Shu, F., et al. Llava-mod: Making llava tiny via moe-knowledge distillation. InICLR. 2024

  76. [76]

    Feng, Q., W. Li, T. Lin, et al. Align-kd: Distilling cross-modal alignment knowledge for mobile vision-language large model enhancement. InCVPR, pages 4178–4188. 2025

  77. [77]

    Move-kd: Knowledge distillation for vlms with mixture of visual encoders

    Cao, J., et al. Move-kd: Knowledge distillation for vlms with mixture of visual encoders. In CVPR, pages 19846–19856. 2025. 13

  78. [78]

    Llm-assisted multi-teacher continual learning for visual question answering in robotic surgery

    Chen, K., et al. Llm-assisted multi-teacher continual learning for visual question answering in robotic surgery. InICRA, pages 10772–10778. 2024

  79. [79]

    Medical multimodal model stealing attacks via adversarial domain alignment

    Shen, Y ., et al. Medical multimodal model stealing attacks via adversarial domain alignment. Proceedings of the AAAI Conference on Artificial Intelligence, 39(7):6842–6850, 2025

  80. [80]

    Yang, X., L. Xu, S. Yu, et al. Segmentation and vascular vectorization for coronary artery by geometry-based cascaded neural network.IEEE Transactions on Medical Imaging, 44(1):259–269, 2025

Showing first 80 references.