arxiv: 2605.02200 · v1 · submitted 2026-05-04 · 💻 cs.CL

Recognition: 2 theorem links

· Lean Theorem

ARGUS: Policy-Adaptive Ad Governance via Evolving Reinforcement with Adversarial Umpiring

Deyi Ji , Junyu Lu , Xuanyi Liu , Liqun Liu , Hailong Zhang , Peng Shu , Huan Yu , Jie Jiang

show 2 more authors

Tianru Chen Lanyun Zhu

Authors on Pith no claims yet

Pith reviewed 2026-05-08 19:33 UTC · model grok-4.3

classification 💻 cs.CL

keywords argusadversarialevolvinggovernancepolicypolicy-adaptivereinforcementdata

0 comments

The pith

ARGUS uses a Prosecutor-Defender-Umpire multi-agent setup plus RAG and chain-of-thought rewards to adapt ad policy enforcement to new regulations using minimal fresh labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Online ads must follow shifting rules, such as new bans on certain education-related content. Old training data quickly becomes inconsistent with these rules. ARGUS tries to handle this by first seeding basic policy understanding, then running a three-way debate among AI agents (one acting as prosecutor, one as defender, one as umpire) to fix mismatched labels. A third stage uses group discussion to spot subtle violations. The system feeds policy documents through retrieval-augmented generation and step-by-step reasoning to create rewards that guide reinforcement learning. Experiments on industry and public datasets reportedly show better adaptation than standard fine-tuning when only small amounts of new labeled examples are available.

Core claim

ARGUS significantly outperforms traditional fine-tuning baselines, achieving superior policy-adaptive learning with minimal gold data.

Load-bearing premise

That the Prosecutor-Defender-Umpire architecture and tripartite dialectical discussion reliably resolve label conflicts and discover gray-area violations without introducing new systematic errors.

Figures

Figures reproduced from arXiv: 2605.02200 by Deyi Ji, Hailong Zhang, Huan Yu, Jie Jiang, Junyu Lu, Lanyun Zhu, Liqun Liu, Peng Shu, Tianru Chen, Xuanyi Liu.

**Figure 1.** Figure 1: Overview of the ARGUS. ARGUS transitions through three stages: view at source ↗

**Figure 2.** Figure 2: The online deployment of ARGUS. 3.4 Stage II: Adversarial Label Rectification Following the seeding stage, we initiate Adversarial Label Rectification to resolve explicit conflicts where historical labels in Dhist contradict the emerging logic of ∆P. Dialectical Debate. We establish a competitive reasoning environment to “stress-test” historical labels. The current policy model acts as the Prosecutor, g… view at source ↗

read the original abstract

Online advertising governance faces significant challenges due to the non-stationary nature of regulatory policies, where emerging mandates (e.g., restrictions on education or aesthetic anxiety) create severe label inconsistencies and reasoning ambiguities in historical datasets. In this paper, we propose ARGUS, a policy-adaptive governance system that enables evolving reinforcement through multi-agent adversarial umpiring. ARGUS addresses the sparsity of new policy data by employing a three-stage framework: (1) Policy Seeding for initial perception; (2) Adversarial Label Rectification, which utilizes a ``Prosecutor-Defender-Umpire'' architecture to resolve conflicts between stale labels and new mandates; and (3) Latent Knowledge Discovery, which employs a tripartite dialectical discussion to unearth sophisticated, ``gray-area'' violations. By leveraging RAG-enhanced policy knowledge and Chain-of-Thought synthesis as dynamic rewards for reinforcement learning, ARGUS synchronizes its reasoning pathways with evolving regulations. Extensive experiments on both industrial and public datasets demonstrate that ARGUS significantly outperforms traditional fine-tuning baselines, achieving superior policy-adaptive learning with minimal gold data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ARGUS puts a Prosecutor-Defender-Umpire loop around RAG and CoT rewards to adapt ad policies to new rules with little fresh data, but the abstract gives no numbers to show it actually works better than plain fine-tuning.

read the letter

The paper's main contribution is a three-stage pipeline that starts with policy seeding, moves to adversarial label fixing via the named Prosecutor-Defender-Umpire roles, and ends with tripartite discussion to surface gray-area violations. It feeds RAG-retrieved policy text and chain-of-thought reasoning into reinforcement learning as dynamic rewards. This is a concrete way to handle non-stationary regulations in online ads without retraining from scratch every time a new mandate appears. The framing of label inconsistency and data sparsity is clear and matches a real industrial issue. The specific role names and dialectical step are not standard in the cited prior work on multi-agent RL or RAG, so that combination is the fresh part. The approach is logically sequenced and reuses proven components without overclaiming theoretical novelty. The main weakness is the complete absence of supporting evidence in the abstract. It asserts superior performance over fine-tuning baselines on industrial and public datasets with minimal gold data, yet supplies no accuracy figures, no baseline names, no ablation results, and no checks on whether the umpiring step reduces or adds errors on emerging rules. If the full experiments are thin or the gains come mostly from the RAG component rather than the adversarial loop, the central claim does not hold. The risk that the Prosecutor-Defender-Umpire process introduces its own systematic mislabeling is left unaddressed. This work is aimed at applied researchers in ad tech, content moderation, or regulatory AI who need practical pipelines for shifting policies. A reader looking for a ready-to-adapt multi-agent template could extract useful structure from it even if the results need verification. It is worth sending to peer review so the methods and numbers can be examined in detail rather than desk-rejecting on the abstract alone.

Referee Report

3 major / 2 minor

Summary. The paper proposes ARGUS, a three-stage policy-adaptive ad governance system using Policy Seeding, Adversarial Label Rectification via a Prosecutor-Defender-Umpire multi-agent architecture, and Latent Knowledge Discovery through tripartite dialectical discussion. It integrates RAG-enhanced policy knowledge and Chain-of-Thought synthesis as dynamic rewards in reinforcement learning to handle non-stationary regulations and gray-area violations, claiming significant outperformance over traditional fine-tuning baselines on industrial and public datasets while requiring minimal gold data.

Significance. If the empirical claims hold, the work could contribute to multi-agent frameworks for dynamic compliance in regulated domains like advertising, where policies evolve rapidly. The combination of adversarial umpiring with RAG/CoT rewards offers a potentially scalable approach to label rectification under data sparsity. However, the absence of any quantitative validation, baselines, or ablations in the manuscript as described substantially limits its assessed significance at present.

major comments (3)

[Abstract and §3] Abstract and §3 (Adversarial Label Rectification): The central claim that the Prosecutor-Defender-Umpire architecture reliably resolves label conflicts and discovers gray-area violations without introducing new systematic errors is load-bearing for the outperformance assertion, yet no inter-annotator agreement scores, conflict-resolution error rates, or ablation isolating the umpiring component are supplied.
[Abstract and Experiments] Abstract and Experiments section: The assertion of significant outperformance over fine-tuning baselines with minimal gold data is made without any reported metrics (e.g., accuracy, F1, or policy-adaptation deltas), baselines, error bars, dataset statistics, or statistical significance tests, rendering the primary empirical contribution unevaluable.
[§4] §4 (Latent Knowledge Discovery): The tripartite dialectical discussion is presented as unearthing sophisticated violations, but no quantitative validation (e.g., human evaluation of discovered violations or comparison against standard CoT/RAG baselines) is provided to confirm it contributes beyond the RAG and reinforcement components.

minor comments (2)

[§2 and §3] The notation for the three-stage framework and reward formulation is introduced without explicit equations or pseudocode, making the precise integration of RAG-enhanced rewards into the RL objective difficult to reconstruct.
[Experiments] Dataset descriptions in the experiments section lack details on size, policy evolution timelines, and how 'minimal gold data' is operationalized (e.g., number of labeled examples per new mandate).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough and constructive review. The comments identify important gaps in empirical validation that we will address to strengthen the manuscript. We respond to each major comment below.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (Adversarial Label Rectification): The central claim that the Prosecutor-Defender-Umpire architecture reliably resolves label conflicts and discovers gray-area violations without introducing new systematic errors is load-bearing for the outperformance assertion, yet no inter-annotator agreement scores, conflict-resolution error rates, or ablation isolating the umpiring component are supplied.

Authors: We agree that explicit quantitative validation of the Adversarial Label Rectification stage is required. In the revised manuscript we will report inter-annotator agreement scores obtained during human evaluation of rectified labels, provide conflict-resolution error rates, and include an ablation that isolates the Umpire's contribution. These additions will directly substantiate the claim that the architecture resolves conflicts without introducing new systematic errors. revision: yes
Referee: [Abstract and Experiments] Abstract and Experiments section: The assertion of significant outperformance over fine-tuning baselines with minimal gold data is made without any reported metrics (e.g., accuracy, F1, or policy-adaptation deltas), baselines, error bars, dataset statistics, or statistical significance tests, rendering the primary empirical contribution unevaluable.

Authors: We acknowledge that the current version does not present the requested quantitative details. We will expand the Experiments section to include accuracy, F1, and policy-adaptation metrics, full baseline descriptions, error bars, dataset statistics, and statistical significance tests. These revisions will make the outperformance claims with minimal gold data fully evaluable. revision: yes
Referee: [§4] §4 (Latent Knowledge Discovery): The tripartite dialectical discussion is presented as unearthing sophisticated violations, but no quantitative validation (e.g., human evaluation of discovered violations or comparison against standard CoT/RAG baselines) is provided to confirm it contributes beyond the RAG and reinforcement components.

Authors: We accept that quantitative evidence for the added value of the tripartite dialectical discussion is currently missing. In the revision we will add human evaluation scores for the discovered violations and direct comparisons against standard CoT and RAG baselines, thereby demonstrating the incremental contribution of this component beyond RAG and reinforcement learning. revision: yes

Circularity Check

0 steps flagged

No equations, derivations, or self-citations present; experimental claims are not internally forced

full rationale

The abstract and available text describe a three-stage framework (Policy Seeding, Adversarial Label Rectification via Prosecutor-Defender-Umpire, Latent Knowledge Discovery) and RAG/CoT rewards for RL, but supply no mathematical derivations, equations, fitted parameters, or citations. Performance claims rest on external comparisons to fine-tuning baselines on industrial/public datasets. No load-bearing step reduces by construction to the inputs; the architecture is presented as a proposed system without self-referential definitions or uniqueness theorems. This is the common case of a methods paper whose validity hinges on reproducible experiments rather than internal circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No explicit axioms, free parameters, or invented entities are stated in the abstract; the system implicitly assumes that multi-agent debate can produce ground-truth corrections and that RAG/CoT rewards are faithful to evolving policy.

pith-pipeline@v0.9.0 · 5518 in / 1091 out tokens · 26291 ms · 2026-05-08T19:33:30.343128+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith.Cost.FunctionalEquation / Cost.Jcost washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

R_rect(y,C) = 1(y=y*) + sim(C,C*) ... By utilizing R_rect as the primary optimization target in GRPO
IndisputableMonolith.Foundation (forcing chain) reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

three-stage framework: (1) Policy Seeding ... (2) Adversarial Label Rectification ... (3) Latent Knowledge Discovery

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

107 extracted references · 28 canonical work pages · 7 internal anchors

[1]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

1972
[2]

Publications Manual , year = "1983", publisher =

1983
[3]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981
[4]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of
[5]

Dan Gusfield , title =. 1997

1997
[6]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

2015
[7]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =
[8]

Proceedings of the IEEE international conference on computer vision , pages=

Tall: Temporal activity localization via language query , author=. Proceedings of the IEEE international conference on computer vision , pages=
[9]

IEEE Transactions on Geoscience and Remote Sensing , year=

View-centric multi-object tracking with homographic matching in moving uav , author=. IEEE Transactions on Geoscience and Remote Sensing , year=
[10]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=

Span-based Localizing Network for Natural Language Video Localization , author=. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=
[11]

Proceedings of the AAAI conference on artificial intelligence , volume=

Learning 2d temporal adjacent networks for moment localization with natural language , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
[12]

International conference on machine learning , pages=

Learning transferable visual models from natural language supervision , author=. International conference on machine learning , pages=. 2021 , organization=

2021
[13]

Advances in neural information processing systems , volume=

Flamingo: a visual language model for few-shot learning , author=. Advances in neural information processing systems , volume=
[14]

IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

Structural and Statistical Texture Knowledge Distillation and Learning for Segmentation , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=
[15]

International conference on machine learning , pages=

Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation , author=. International conference on machine learning , pages=. 2022 , organization=

2022
[16]

Advances in neural information processing systems , volume=

Visual instruction tuning , author=. Advances in neural information processing systems , volume=
[17]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[18]

Journal of artificial intelligence research , volume=

Reinforcement learning: A survey , author=. Journal of artificial intelligence research , volume=
[19]

arXiv preprint arXiv:2101.09650 , year=

Gst: Group-sparse training for accelerating deep reinforcement learning , author=. arXiv preprint arXiv:2101.09650 , year=

work page arXiv
[20]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Language-driven temporal activity localization: A semantic matching reinforcement learning model , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[21]

Advances in neural information processing systems , volume=

Learning temporal point processes via reinforcement learning , author=. Advances in neural information processing systems , volume=
[22]

International Conference on Machine Learning , pages=

Discrete Latent Perspective Learning for Segmentation and Detection , author=. International Conference on Machine Learning , pages=. 2024 , organization=

2024
[23]

Journal of Machine Learning Research , volume=

Curriculum learning for reinforcement learning domains: A framework and survey , author=. Journal of Machine Learning Research , volume=
[24]

Proceedings of the 20th International Colloquium on Automata, Languages and Programming , series =

Maintaining Discrete Probability Distributions Optimally , author =. Proceedings of the 20th International Colloquium on Automata, Languages and Programming , series =. 1993 , publisher =

1993
[25]

Advances in Neural Information Processing Systems , volume=

Tree of thoughts: Deliberate problem solving with large language models , author=. Advances in Neural Information Processing Systems , volume=
[26]

Llava-o1: Let vision language models reason step-by-step

LLaVA-o1: Let Vision Language Models Reason Step-by-Step , author=. arXiv preprint arXiv:2411.10440 , year=

work page arXiv
[27]

Hunyuan-large: An open-source moe model with 52 billion activated parameters by tencent,

Hunyuan-large: An open-source moe model with 52 billion activated parameters by tencent , author=. arXiv preprint arXiv:2411.02265 , year=

work page arXiv
[28]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Structural and statistical texture knowledge distillation for semantic segmentation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[29]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Ultra-high resolution segmentation with ultra-rich context: A novel benchmark , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[30]

Proceedings of the 32nd ACM International Conference on Multimedia , pages=

Multihateclip: A multilingual benchmark dataset for hateful video detection on youtube and bilibili , author=. Proceedings of the 32nd ACM International Conference on Multimedia , pages=
[31]

doi:10.48550/arXiv.2402.18476 , abstract =

IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding , author=. arXiv preprint arXiv:2402.18476 , year=

work page arXiv
[32]

arXiv preprint arXiv:2503.06520 (2025)

Seg-zero: Reasoning-chain guided segmentation via cognitive reinforcement , author=. arXiv preprint arXiv:2503.06520 , year=

work page arXiv
[33]

arXiv preprint arXiv:2411.08516 (2024)

Tree-of-Table: Unleashing the Power of LLMs for Enhanced Large-Scale Table Understanding , author=. arXiv preprint arXiv:2411.08516 , year=

work page arXiv
[34]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author=. arXiv preprint arXiv:2402.03300 , year=

work page internal anchor Pith review arXiv
[35]

arXiv preprint arXiv:2501.00192 , year=

MLLM-as-a-Judge for Image Safety without Human Labeling , author=. arXiv preprint arXiv:2501.00192 , year=

work page arXiv
[36]

arXiv preprint arXiv:2305.04446 , year=

Facilitating fine-grained detection of Chinese toxic language: Hierarchical taxonomy, resources, and benchmarks , author=. arXiv preprint arXiv:2305.04446 , year=

work page arXiv
[37]

The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=

Towards Comprehensive Detection of Chinese Harmful Memes , author=. The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=
[38]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author=. arXiv preprint arXiv:2501.12948 , year=

work page internal anchor Pith review arXiv
[39]

Qwen Technical Report

Qwen technical report , author=. arXiv preprint arXiv:2309.16609 , year=

work page Pith review arXiv
[40]

T oxi C loak CN : Evaluating Robustness of Offensive Language Detection in C hinese with Cloaking Perturbations

Xiao, Yunze and Hu, Yujia and Choo, Kenny Tsu Wei and Lee, Roy Ka-Wei. T oxi C loak CN : Evaluating Robustness of Offensive Language Detection in C hinese with Cloaking Perturbations. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024

2024
[41]

Findings of the Association for Computational Linguistics ACL 2024 , pages=

ToxVidLM: A Multimodal Framework for Toxicity Detection in Code-Mixed Videos , author=. Findings of the Association for Computational Linguistics ACL 2024 , pages=

2024
[42]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=
[43]

Advances in Neural Information Processing Systems , volume=

Direct preference optimization: Your language model is secretly a reward model , author=. Advances in Neural Information Processing Systems , volume=
[44]

Advances in neural information processing systems , volume=

Deep reinforcement learning from human preferences , author=. Advances in neural information processing systems , volume=
[45]

2023 , eprint=

CMMLU: Measuring massive multitask language understanding in Chinese , author=. 2023 , eprint=

2023
[46]

Advances in Neural Information Processing Systems , volume=

C-eval: A multi-level multi-discipline chinese evaluation suite for foundation models , author=. Advances in Neural Information Processing Systems , volume=
[47]

International Conference on Learning Representations , year=

Analysing Mathematical Reasoning Abilities of Neural Models , author=. International Conference on Learning Representations , year=
[48]

Mixtral of Experts

Mixtral of experts , author=. arXiv preprint arXiv:2401.04088 , year=

work page Pith review arXiv
[49]

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models , author=. arXiv preprint arXiv:2401.06066 , year=

work page internal anchor Pith review arXiv
[50]

LLaMA: Open and Efficient Foundation Language Models

Llama: Open and efficient foundation language models , author=. arXiv preprint arXiv:2302.13971 , year=

work page internal anchor Pith review arXiv
[51]

CoRR , volume =

Baichuan 2: Open large-scale language models , author=. arXiv preprint arXiv:2309.10305 , year=

work page arXiv
[52]

Evaluation of openai o1: Opportunities and challenges of agi

Evaluation of openai o1: Opportunities and challenges of agi , author=. arXiv preprint arXiv:2409.18486 , year=

work page arXiv
[53]

International conference on machine learning , pages=

Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models , author=. International conference on machine learning , pages=. 2023 , organization=

2023
[54]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=

COLD: A Benchmark for Chinese Offensive Language Detection , author=. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=

2022
[55]

See https://vicuna

Vicuna: An open-source chatbot impressing gpt-4 with 90\ author=. See https://vicuna. lmsys. org (accessed 14 April 2023) , volume=

2023
[56]

Mistral 7B

Mistral 7B , author=. arXiv preprint arXiv:2310.06825 , year=

work page Pith review arXiv
[57]

Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

Fine-tuning llama for multi-stage text retrieval , author=. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=
[58]

Generative Representational Instruction Tuning , author=
[59]

Frontiers of Computer Science , volume=

Large language models for generative information extraction: A survey , author=. Frontiers of Computer Science , volume=. 2024 , publisher=

2024
[60]

Companion Proceedings of the ACM on Web Conference 2024 , pages=

NoteLLM: A Retrievable Large Language Model for Note Recommendation , author=. Companion Proceedings of the ACM on Web Conference 2024 , pages=

2024
[61]

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

Qwen-vl: A frontier large vision-language model with versatile abilities , author=. arXiv preprint arXiv:2308.12966 , year=

work page internal anchor Pith review arXiv
[62]

ACM Transactions on Knowledge Discovery from Data , year=

Tomgpt: Reliable text-only training approach for cost-effective multi-modal large language model , author=. ACM Transactions on Knowledge Discovery from Data , year=
[63]

Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , year=

NoteLLM-2: Multimodal Large Representation Models for Recommendation , author=. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , year=
[64]

ArXiv , author=

Mm1: methods, analysis & insights from multimodal llm pre-training. ArXiv , author=. Preprint posted online on April , volume=
[65]

A survey on multimodal large language models

A survey on multimodal large language models , author=. arXiv preprint arXiv:2306.13549 , year=

work page arXiv
[66]

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Outrageously large neural networks: The sparsely-gated mixture-of-experts layer , author=. arXiv preprint arXiv:1701.06538 , year=

work page internal anchor Pith review arXiv
[67]

Proceedings of the 26th annual international conference on machine learning , pages=

Curriculum learning , author=. Proceedings of the 26th annual international conference on machine learning , pages=
[68]

international conference on machine learning , pages=

Automated curriculum learning for neural networks , author=. international conference on machine learning , pages=. 2017 , organization=

2017
[69]

International conference on machine learning , pages=

On the power of curriculum learning in training deep networks , author=. International conference on machine learning , pages=. 2019 , organization=

2019
[70]

International Journal of Computer Vision , volume=

Curriculum learning: A survey , author=. International Journal of Computer Vision , volume=. 2022 , publisher=

2022
[71]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Curriculum learning of multiple tasks , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
[72]

arXiv preprint arXiv:2402.10571 , year=

Direct preference optimization with an offset , author=. arXiv preprint arXiv:2402.10571 , year=

work page arXiv
[73]

Statistical Rejection Sampling Improves Preference Optimization , author=
[74]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Preference ranking optimization for human alignment , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[75]

Advances in neural information processing systems , volume=

Large language models are zero-shot reasoners , author=. Advances in neural information processing systems , volume=
[76]

Show Your Work: Scratchpads for Intermediate Computation with Language Models , author=
[77]

International Joint Conference on Artificial Intelligence , pages =

PPTFormer: Pseudo Multi-Perspective Transformer for UAV Segmentation , author=. International Joint Conference on Artificial Intelligence , pages =
[78]

Lanyun Zhu and Deyi Ji and Tianrun Chen and Haiyang Wu and De Wen Soh and Jun Liu , booktitle=
[79]

Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) , year =

Zhu, Qi and Lao, Jiangwei and Ji, Deyi and Luo, Junwei and Wu, Kang and Zhang, Yingying and Ru, Lixiang and Wang, Jian and Chen, Jingdong and Yang, Ming and Liu, Dong and Zhao, Feng , title =. Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) , year =
[80]

Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) , month =

Zhu, Lanyun and Chen, Tianrun and Xu, Qianxiong and Liu, Xuanyi and Ji, Deyi and Wu, Haiyang and Soh, De Wen and Liu, Jun , title =. Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) , month =. 2025 , pages =

2025

Showing first 80 references.