pith. sign in

arxiv: 2605.18653 · v1 · pith:CMFNOUVNnew · submitted 2026-05-18 · 💻 cs.MM

Will It Go Viral? Grounding Micro-Video Popularity Prediction on the Open Web

Pith reviewed 2026-05-20 00:50 UTC · model grok-4.3

classification 💻 cs.MM
keywords micro-video popularity predictionopen-web groundingvirality forecastingevidence-cardonline adaptationshort-form video datasettrend shiftspopularity regression
0
0 comments X

The pith

Structured open-web context and trend-aware adaptation are required for accurate micro-video popularity prediction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that micro-video popularity cannot be reliably predicted from content or internal platform data alone because virality hinges on external trends visible on the open web. To address this, it presents the WEBSHORTS dataset of 14K videos with real-time web evidence-cards and tracked views, and the SHORTS-CAST model that reasons over web dimensions and adapts to delayed labels indicating trend shifts. Experiments confirm superior performance in realistic offline and online settings. This matters for recommendation and advertising in short-video platforms where timing and context determine success.

Core claim

Micro-video popularity prediction is reformulated as open-web grounded prediction. The WEBSHORTS dataset couples 14K videos with real-time open-web context organized as three-dimensional evidence-cards and daily view counts over 7 days. SHORTS-CAST generates dimension-wise rationales from the evidence-card to guide popularity regression and adapts selectively when delayed labels reveal genuine trend shifts. It outperforms content-only, retrieval-augmented, and other online adaptation baselines under offline and delayed-label online protocols.

What carries the argument

The three-dimensional evidence-card capturing external attention along complementary web-context dimensions, which serves as the basis for rationale generation and popularity prediction in the SHORTS-CAST framework.

If this is right

  • Improved accuracy in popularity forecasting supports better recommendation and advertising decisions.
  • Trend-aware adaptation enables handling of fast-evolving short-form video ecosystems.
  • Use of delayed labels allows detection of genuine trend shifts for model updates.
  • Structured web context reduces reliance on historical internal video corpora.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach may generalize to predicting engagement for other time-sensitive content like live streams or social posts.
  • Real-time web data collection could be combined with privacy-preserving techniques for broader adoption.
  • Comparing performance across different web search providers might show robustness or sensitivity to data sources.

Load-bearing premise

Open-web context collected at upload time supplies predictive signal for popularity that is not already present in the video content or in retrieval from platform-internal video corpora.

What would settle it

A controlled experiment showing that a model without open-web context achieves comparable performance to SHORTS-CAST on the delayed-label online protocol would falsify the claim that web context is jointly necessary.

Figures

Figures reproduced from arXiv: 2605.18653 by Dongha Lee, Ryang Heo.

Figure 1
Figure 1. Figure 1: Conventional scenario (Left) retrieves similar videos from a static in-platform corpus, missing the external trends behind virality. In contrast, the open-web grounded scenario (Right) captures real-time web signals, closely anticipating the viral outcome. However, extending MVPP from internal-corpus retrieval to open-web grounded prediction is not straightforward, as this shift surfaces two challenges tha… view at source ↗
Figure 2
Figure 2. Figure 2: The overview of our WEBSHORTS construction pipeline. Candidate retrieval To support diversity in video topics and categories, we adopt the hierarchical topic categorization from prior video datasets [44, 45], comprising 17 main topics and 10 sub-topics per main topic [46, 47]. We further introduce a trend feature axis (e.g., Hot, Latest, Viral) with 10 variants, and construct seed queries by combining all … view at source ↗
Figure 3
Figure 3. Figure 3: WEBSHORTS statistics: (a) day-7 view count distribution by popularity tier, (b) per-tier view growth curves, and (c) web source distribution across evidence-card dimensions. E (t) i = LLMsearch(Xi , t) aligned with observation day t. Motivated by the importance of temporal alignment and early popularity in social media popularity prediction [14], we instantiate t as the first three relative observation day… view at source ↗
Figure 4
Figure 4. Figure 4: Overview of SHORTS-CAST Step 1, open-web grounded training. evidence dimension to the target video’s popularity tier. The rationale additionally assigns each dimension a saliency score (1–10) quantifying how strongly that dimension’s web signals contribute to the video’s predicted popularity, giving the predictor an explicit signal for weighing which evidence dimensions matter more for a given video. Learn… view at source ↗
Figure 5
Figure 5. Figure 5: Overview of SHORTS-CAST Step 2, online trend adaptation. the growth-curve taxonomy of [64], we calibrate percentile thresholds γlow and γhigh on Dval so that γi > γhigh captures initial viral growth (rapid early surge then plateau) and γi < γlow captures delayed viral growth (low initial views followed by a late burst) [8]. Videos within the normal range (γi ∈ (γlow, γhigh)) are excluded regardless of erro… view at source ↗
Figure 6
Figure 6. Figure 6: Case study of SHORTS-CAST on initial viral (Left) and delayed viral (Right) videos, showing predicted view counts against the ground truth across online baselines. test nMSE from 0.701 to 0.885). Parametric methods update the same LoRA architecture as SHORTS￾CAST and improve test nMSE (0.673, 0.624), yet treating every delayed label equally destabilizes the mapping mid-stream, producing the weakest prequen… view at source ↗
Figure 7
Figure 7. Figure 7: nMSE across evidence￾card snapshots by growth type. Effect of evidence-card refresh We fix SHORTS-CAST and vary only the observation snapshot t ∈ {0, 1, 2} at which the evidence-card is collected [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Distribution of cited-source publication dates relative to [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Search freshness across observation snapshots. [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Subscriber-count distribution of the source channels in [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Case study of SHORTS-CAST on a Large-tier video (a) and a Micro-tier video (b), pairing the input evidence-card (left) with the generated rationale and per-dimension saliency (right). 22 [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗
read the original abstract

Micro-video popularity prediction (MVPP) forecasts the popularity a newly uploaded short-form video will attract within a fixed number of days after upload. This task supports downstream applications in recommendation, advertising, and creator analytics, yet the problem is hard since virality depends on external trends rather than video content alone. Prior MVPP methods incorporate context by retrieving similar videos from platform-internal corpora, however historical neighbors cannot reveal whether a topic is currently trending, controversial, or already saturated across the open web. To this end, we reformulate MVPP as open-web grounded prediction and introduce WEBSHORTS, the first micro-video dataset that couples 14K videos with real-time open-web context collected at upload time, alongside daily view counts tracked over 7 days. The context for each video is organized as a structured evidence-card that captures the external attention landscape along three complementary web-context dimensions. We further propose SHORTS-CAST, a framework that generates dimension-wise rationales from the evidence-card to guide popularity regression, then adapts at deployment by selectively updating the context-to-popularity mapping when delayed labels reveal genuine trend shifts. In our experiments, SHORTS-CAST consistently outperforms content-only, video corpus retrieval-augmented, and online adaptation baselines under both offline and delayed-label online protocols, confirming that structured web context and trend-aware adaptation are jointly necessary for popularity forecasting under realistic deployment constraints in fast-evolving short-form video ecosystems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces the WEBSHORTS dataset of 14K micro-videos paired with real-time open-web context collected at upload time and daily view counts over 7 days. Context is organized into structured three-dimensional evidence-cards. It proposes SHORTS-CAST, which generates dimension-wise rationales from the evidence-card for popularity regression and selectively adapts the context-to-popularity mapping at deployment when delayed labels indicate genuine trend shifts. Experiments report that SHORTS-CAST consistently outperforms content-only, video corpus retrieval-augmented, and online adaptation baselines under both offline and delayed-label online protocols, concluding that structured web context and trend-aware adaptation are jointly necessary for micro-video popularity prediction.

Significance. If the results hold under rigorous controls, the work advances micro-video popularity prediction by demonstrating the value of grounding forecasts in contemporaneous open-web signals rather than historical internal corpora alone. The new WEBSHORTS dataset and the evidence-card representation provide a concrete resource for future research on external context in dynamic media ecosystems. The selective adaptation mechanism directly targets the challenge of evolving trends.

major comments (2)
  1. [Experiments / online protocol description] The claim that trend-aware adaptation is jointly necessary for the online protocol rests on delayed labels reliably indicating genuine external trend shifts rather than platform noise or random fluctuations. The manuscript provides no quantitative check (e.g., correlation of label deltas with independent signals such as search-volume spikes or external mention counts) at the moments adaptation is triggered. This validation is load-bearing for the 'genuine trend shifts' premise and the resulting conclusion.
  2. [Method / SHORTS-CAST framework] Details on evidence-card construction (exact sources, aggregation rules, and temporal alignment for the three dimensions) and on the rationale-generation process (models, prompts, or training) are insufficient for replication. These choices directly affect whether the reported gains can be attributed to the structured web context rather than implementation specifics.
minor comments (1)
  1. [Abstract] The abstract would benefit from a brief statement of the magnitude of improvements (e.g., relative gains or absolute metrics) to allow readers to gauge practical significance without reading the full experimental tables.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our work. The comments highlight important aspects of validation and reproducibility that we address point by point below. We will revise the manuscript to incorporate clarifications and additional details where feasible.

read point-by-point responses
  1. Referee: [Experiments / online protocol description] The claim that trend-aware adaptation is jointly necessary for the online protocol rests on delayed labels reliably indicating genuine external trend shifts rather than platform noise or random fluctuations. The manuscript provides no quantitative check (e.g., correlation of label deltas with independent signals such as search-volume spikes or external mention counts) at the moments adaptation is triggered. This validation is load-bearing for the 'genuine trend shifts' premise and the resulting conclusion.

    Authors: We acknowledge that an explicit quantitative validation linking label deltas to independent external signals would further support the interpretation of genuine trend shifts. Our online protocol is intentionally designed around the realistic constraint of delayed labels only, with the selective adaptation mechanism intended to respond to significant deviations that may reflect external changes. To address this, we will add a supplementary analysis in the revised manuscript that examines correlations between adaptation triggers and spikes in web mentions or related signals already present in the evidence-cards. We will also clarify the assumptions underlying the protocol and discuss potential noise sources as a limitation if the correlations prove modest. revision: partial

  2. Referee: [Method / SHORTS-CAST framework] Details on evidence-card construction (exact sources, aggregation rules, and temporal alignment for the three dimensions) and on the rationale-generation process (models, prompts, or training) are insufficient for replication. These choices directly affect whether the reported gains can be attributed to the structured web context rather than implementation specifics.

    Authors: We agree that the current level of detail is insufficient for replication and that this affects attribution of gains to the structured context. In the revised manuscript we will expand the Methods section (and add an appendix if needed) to specify: the exact sources and collection methods for each of the three evidence-card dimensions; the aggregation rules, counting procedures, and normalization steps; the temporal alignment logic across sources; and the precise models, prompting templates, and any training or fine-tuning procedures used for rationale generation. These additions will enable readers to reproduce the evidence-card construction and SHORTS-CAST pipeline. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external data and empirical baselines

full rationale

The paper introduces a new dataset (WEBSHORTS) coupling videos with real-time open-web context collected at upload time and proposes SHORTS-CAST for generating rationales and selective adaptation using delayed labels. Performance is evaluated against content-only, retrieval-augmented, and online adaptation baselines under offline and delayed-label protocols. No self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations appear in the provided abstract or description. The central claim of joint necessity for web context and trend-aware adaptation follows from comparative outperformance rather than by construction from the inputs themselves. The approach is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The evidence-card structure and three web-context dimensions appear to be introduced constructs whose construction rules are not detailed.

invented entities (1)
  • structured evidence-card no independent evidence
    purpose: Organize external attention landscape along three complementary web-context dimensions for popularity regression
    New data structure introduced to capture open-web context at upload time

pith-pipeline@v0.9.0 · 5784 in / 1299 out tokens · 40547 ms · 2026-05-20T00:50:25.047513+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

79 extracted references · 79 canonical work pages · 3 internal anchors

  1. [1]

    Micro tells macro: Predicting the popularity of micro-videos via a transductive model

    Jingyuan Chen, Xuemeng Song, Liqiang Nie, Xiang Wang, Hanwang Zhang, and Tat-Seng Chua. Micro tells macro: Predicting the popularity of micro-videos via a transductive model. InProceedings of the 24th ACM International Conference on Multimedia, pages 898–907, 2016. doi: 10.1145/2964284.2964314

  2. [2]

    Smp challenge: An overview of social media prediction challenge 2019

    Bo Wu, Wen-Huang Cheng, Peiye Liu, Bei Liu, Zhaoyang Zeng, and Jiebo Luo. Smp challenge: An overview of social media prediction challenge 2019. InProceedings of the 27th ACM International Conference on Multimedia, pages 2667–2671, 2019

  3. [3]

    Mvp: Winning solution to smp challenge 2025 video track

    Liliang Ye, Yunyao Zhang, Yafeng Wu, Yi-Ping Phoebe Chen, Junqing Yu, Wei Yang, and Zikai Song. Mvp: Winning solution to smp challenge 2025 video track. InProceedings of the ACM International Conference on Multimedia, pages 14079–14085, 2025. doi: 10.1145/3746027.3763761

  4. [4]

    A multimodal variational encoder-decoder framework for micro-video popularity prediction

    Jiayi Xie, Yaochen Zhu, Zhibin Zhang, Jian Peng, Jing Yi, Yaosi Hu, Hongyi Liu, and Zhenzhong Chen. A multimodal variational encoder-decoder framework for micro-video popularity prediction. In Proceedings of The Web Conference 2020, WWW ’20, page 2542–2548, New York, NY , USA, 2020. Association for Computing Machinery. ISBN 9781450370233. doi: 10.1145/336...

  5. [5]

    Predicting micro-video popularity via multi-modal retrieval augmentation

    Ting Zhong, Jian Lang, Yifan Zhang, Zhangtao Cheng, Kunpeng Zhang, and Fan Zhou. Predicting micro-video popularity via multi-modal retrieval augmentation. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2579–2583, 2024. doi: 10.1145/3626772.3657929

  6. [6]

    Seeing the unseen in micro-video popularity prediction: Self-correlation retrieval for missing modality generation

    Zhangtao Cheng, Jian Lang, Ting Zhong, and Fan Zhou. Seeing the unseen in micro-video popularity prediction: Self-correlation retrieval for missing modality generation. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 142–152, 2025. doi: 10.1145/ 3690624.3709308

  7. [7]

    Cats and captions vs

    Jack Hessel, Lillian Lee, and David Mimno. Cats and captions vs. creators and the clock: Comparing multimodal content to context in predicting relative popularity. InProceedings of the 26th international conference on world wide web, pages 927–936, 2017

  8. [8]

    Expecting to be hip: Hawkes intensity processes for social media popularity

    Marian-Andrei Rizoiu, Lexing Xie, Scott Sanner, Manuel Cebrian, Honglin Yu, and Pascal Van Hentenryck. Expecting to be hip: Hawkes intensity processes for social media popularity. InProceedings of the 26th International Conference on World Wide Web, WWW ’17, page 735–744, Republic and Canton of Geneva, CHE, 2017. International World Wide Web Conferences S...

  9. [9]

    Retrieval- augmented hypergraph for multimodal social media popularity prediction

    Zhangtao Cheng, Jienan Zhang, Xovee Xu, Goce Trajcevski, Ting Zhong, and Fan Zhou. Retrieval- augmented hypergraph for multimodal social media popularity prediction. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 445–455, 2024. doi: 10.1145/3637528.3672041

  10. [10]

    Echoes in the feed: Evolution- aware prompt-augmented micro-video popularity prediction

    Wei Chen, Jiao Li, Jian Lang, Zhangtao Cheng, Yong Wang, and Fan Zhou. Echoes in the feed: Evolution- aware prompt-augmented micro-video popularity prediction. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2744–2748, 2025. doi: 10.1145/3726302.3730184

  11. [11]

    In-context prompt-augmented micro- video popularity prediction

    Zhangtao Cheng, Jiao Li, Jian Lang, Ting Zhong, and Fan Zhou. In-context prompt-augmented micro- video popularity prediction. InProceedings of the AAAI Conference on Artificial Intelligence, pages 11527–11535, 2025. doi: 10.1609/aaai.v39i11.33254

  12. [12]

    Improving multimodal social media popularity prediction via selective retrieval knowledge augmentation

    Xovee Xu, Yifan Zhang, Fan Zhou, and Jingkuan Song. Improving multimodal social media popularity prediction via selective retrieval knowledge augmentation. InProceedings of the AAAI Conference on Artificial Intelligence, pages 932–940, 2025. doi: 10.1609/aaai.v39i1.32078

  13. [13]

    A content-driven micro-video recommendation dataset at scale.arXiv preprint arXiv:2309.15379, 2023

    Yongxin Ni, Yu Cheng, Xiangyan Liu, Junchen Fu, Youhua Li, Xiangnan He, Yongfeng Zhang, and Fajie Yuan. A content-driven micro-video recommendation dataset at scale.arXiv preprint arXiv:2309.15379, 2023

  14. [14]

    Freeman, Frédo Durand, Eli Shechtman, and Xun Huang

    Yijie Xu, Bolun Zheng, Wei Zhu, Hangjia Pan, Yuchen Yao, Ning Xu, Anan Liu, Quan Zhang, and Chenggang Yan. Smtpd: A new benchmark for temporal prediction of social media popularity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18847–18857, 2025. doi: 10.1109/CVPR52734.2025.01756. 10

  15. [15]

    Real-time short video recommendation on mobile devices

    Xudong Gong, Qinlin Feng, Yuan Zhang, Jiangling Qin, Weijie Ding, Biao Li, Peng Jiang, and Kun Gai. Real-time short video recommendation on mobile devices. InProceedings of the 31st ACM international conference on information & knowledge management, pages 3103–3112, 2022

  16. [16]

    Smp challenge: An overview and analysis of social media prediction challenge

    Bo Wu, Peiye Liu, Wen-Huang Cheng, Bei Liu, Zhaoyang Zeng, Jia Wang, Qiushi Huang, and Jiebo Luo. Smp challenge: An overview and analysis of social media prediction challenge. InProceedings of the 31st ACM International Conference on Multimedia, pages 9651–9655, 2023

  17. [17]

    What makes an image popular? InProceedings of the 23rd International Conference on World Wide Web (WWW), pages 867–876, 2014

    Aditya Khosla, Atish Das Sarma, and Raffay Hamid. What makes an image popular? InProceedings of the 23rd International Conference on World Wide Web (WWW), pages 867–876, 2014. doi: 10.1145/ 2566486.2567996

  18. [18]

    Low-rank multi-view embedding learning for micro-video popularity prediction.IEEE Transactions on Knowledge and Data Engineering (TKDE), 30(8):1519–1532, 2018

    Peiguang Jing, Yuting Su, Liqiang Nie, Xu Bai, Jing Liu, and Meng Wang. Low-rank multi-view embedding learning for micro-video popularity prediction.IEEE Transactions on Knowledge and Data Engineering (TKDE), 30(8):1519–1532, 2018. doi: 10.1109/TKDE.2017.2785784

  19. [19]

    Social media popularity prediction based on visual-textual features with xgboost

    Junhong Chen, Dayong Liang, Zhanmo Zhu, Xiaojing Zhou, Zihan Ye, and Xiuyun Mo. Social media popularity prediction based on visual-textual features with xgboost. InProceedings of the 27th ACM International Conference on Multimedia, pages 2692–2696, 2019

  20. [20]

    HyFea: Winning solution to social media popularity prediction for multimedia grand challenge 2020

    Xin Lai, Yihong Zhang, and Wei Zhang. HyFea: Winning solution to social media popularity prediction for multimedia grand challenge 2020. InProceedings of the 28th ACM International Conference on Multimedia (MM), pages 4565–4569, 2020. doi: 10.1145/3394171.3416275

  21. [21]

    Micro-video popularity prediction via multimodal varia- tional information bottleneck.IEEE Transactions on Multimedia, 25:24–37, 2021

    Jiayi Xie, Yaochen Zhu, and Zhenzhong Chen. Micro-video popularity prediction via multimodal varia- tional information bottleneck.IEEE Transactions on Multimedia, 25:24–37, 2021

  22. [22]

    Crossmodal bipolar attention for multimodal classification on social media.Neurocomputing, 514:1–12, 2022

    Tsun-hin Cheung and Kin-man Lam. Crossmodal bipolar attention for multimodal classification on social media.Neurocomputing, 514:1–12, 2022

  23. [23]

    Multi-modal variational auto-encoder model for micro-video popularity prediction

    Zhuoran Zhang, Shibiao Xu, Li Guo, and Wenke Lian. Multi-modal variational auto-encoder model for micro-video popularity prediction. InProceedings of the 8th International Conference on Communication and Information Processing (ICCIP), pages 9–16, 2022. doi: 10.1145/3571662.3571664

  24. [24]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021

  25. [25]

    Multi-queue momentum contrast for microvideo-product retrieval

    Yali Du, Yinwei Wei, Wei Ji, Fan Liu, Xin Luo, and Liqiang Nie. Multi-queue momentum contrast for microvideo-product retrieval. InProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, pages 1003–1011, 2023

  26. [26]

    Dual-stream pre-training transformer to enhance multimodal learning for social media prediction

    Wenhao Hu, Weilong Chen, Weimin Yuan, Yan Wang, Shimin Cai, and Yanru Zhang. Dual-stream pre-training transformer to enhance multimodal learning for social media prediction. InProceedings of the 32nd ACM International Conference on Multimedia, pages 11450–11456, 2024

  27. [27]

    Higher-order vision-language alignment for social media prediction

    Mingsheng Tu, Tianjiao Wan*, Qisheng Xu, Xinhao Jiang, Kele Xu, and Cheng Yang. Higher-order vision-language alignment for social media prediction. InProceedings of the 32nd ACM International Conference on Multimedia, pages 11457–11463, 2024

  28. [28]

    Efficient test-time adaptation of vision-language models

    Adilbek Karmanov, Dayan Guan, Shijian Lu, Abdulmotaleb El Saddik, and Eric Xing. Efficient test-time adaptation of vision-language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14162–14171, 2024

  29. [29]

    Realistic test-time adaptation of vision-language models

    Maxime Zanella, Clément Fuchs, Christophe De Vleeschouwer, and Ismail Ben Ayed. Realistic test-time adaptation of vision-language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25103–25112, 2025

  30. [30]

    Noisy test-time adaptation in vision-language models

    Chentao Cao, Zhun Zhong, Zhanke Zhou, Tongliang Liu, Yang Liu, Kun Zhang, and Bo Han. Noisy test-time adaptation in vision-language models. InThe Thirteenth International Conference on Learning Representations, 2025. URLhttps://openreview.net/forum?id=iylpeTI0Ql

  31. [31]

    Dota: Distributional test-time adaptation of vision-language models

    Zongbo Han, Jialong Yang, Guangyu Wang, Junfan Li, Qianli Xu, Mike Zheng Shou, and Changqing Zhang. Dota: Distributional test-time adaptation of vision-language models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems. 11

  32. [32]

    Lightweight online adaption for time series foundation model forecasts

    Thomas L Lee, William Toner, Rajkarn Singh, Artjom Joosen, and Martin Asenov. Lightweight online adaption for time series foundation model forecasts. InForty-second International Conference on Machine Learning, 2025. URLhttps://openreview.net/forum?id=gAxYbvoOQz

  33. [33]

    2020–2031

    Lifan Zhao and Yanyan Shen. Proactive model adaptation against concept drift for online time series forecasting. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2020–2031, 2025. doi: 10.1145/3690624.3709210

  34. [34]

    Fast and slow streams for online time series forecast- ing without information leakage

    Ying yee Ava Lau, Zhiwen Shao, and Dit-Yan Yeung. Fast and slow streams for online time series forecast- ing without information leakage. InThe Thirteenth International Conference on Learning Representations,

  35. [35]

    URLhttps://openreview.net/forum?id=I0n3EyogMi

  36. [36]

    Continual collaborative distillation for recommender system

    Gyuseok Lee, SeongKu Kang, Wonbin Kweon, and Hwanjo Yu. Continual collaborative distillation for recommender system. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’24, page 1495–1505, New York, NY , USA, 2024. Association for Computing Machinery. ISBN 9798400704901. doi: 10.1145/3637528.3671924. URL https://do...

  37. [37]

    Mitigating distribution shifts in sequential recommendation: An invariance perspective

    Yuxin Liao, Yonghui Yang, Min Hou, Le Wu, Hefei Xu, and Hao Liu. Mitigating distribution shifts in sequential recommendation: An invariance perspective. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1603–1613, 2025

  38. [38]

    Online drift detection with maximum concept discrepancy

    Ke Wan, Yi Liang, and Susik Yoon. Online drift detection with maximum concept discrepancy. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2924–2935, 2024. doi: 10.1145/3637528.3672016

  39. [39]

    Inflora: Interference-free low-rank adaptation for continual learning

    Yan-Shuo Liang and Wu-Jun Li. Inflora: Interference-free low-rank adaptation for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23638– 23647, 2024

  40. [40]

    Online-lora: Task-free online continual learning via low rank adaptation

    Xiwen Wei, Guihong Li, and Radu Marculescu. Online-lora: Task-free online continual learning via low rank adaptation. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

  41. [41]

    Gated integration of low-rank adaptation for continual learning of large language models

    Yan-Shuo Liang, Jiarui Chen, and Wu-Jun Li. Gated integration of low-rank adaptation for continual learning of large language models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

  42. [42]

    Hierarchical knowledge prompt tuning for multi-task test-time adaptation

    Qiang Zhang, Mengsheng Zhao, Jiawei Liu, Fanrui Zhang, Yongchao Xu, and Zheng-Jun Zha. Hierarchical knowledge prompt tuning for multi-task test-time adaptation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 30524–30533, 2025

  43. [43]

    Dpcore: Dynamic prompt coreset for continual test-time adaptation

    Yunbei Zhang, Akshay Mehra, Shuaicheng Niu, and Jihun Hamm. Dpcore: Dynamic prompt coreset for continual test-time adaptation. InForty-second International Conference on Machine Learning

  44. [44]

    Forecasting the buzz: Enriching hashtag popularity prediction with llm reasoning

    Yifei Xu, Jiaying Wu, Herun Wan, Yang Li, Zhen Hou, and Min-Yen Kan. Forecasting the buzz: Enriching hashtag popularity prediction with llm reasoning. InProceedings of the 34th ACM International Conference on Information and Knowledge Management, pages 5396–5400, 2025. doi: 10.1145/3746252.3760970

  45. [45]

    Mmsum: A dataset for multimodal summarization and thumbnail generation of videos

    Jielin Qiu, Jiacheng Zhu, William Han, Aditesh Kumar, Karthik Mittal, Claire Jin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Ding Zhao, et al. Mmsum: A dataset for multimodal summarization and thumbnail generation of videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21909–21921, 2024

  46. [46]

    Hippo-video: Simulating watch histories with large language models for personalized video highlighting

    Jeongeun Lee, Youngjae Yu, and Dongha Lee. Hippo-video: Simulating watch histories with large language models for personalized video highlighting. InConference on Language Modeling, 2025. URL https://arxiv.org/abs/2507.16873. Published as a conference paper at COLM 2025

  47. [47]

    Towards automatic learning of procedures from web instructional videos

    Luowei Zhou, Chenliang Xu, and Jason Corso. Towards automatic learning of procedures from web instructional videos. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018

  48. [48]

    Howto100m: Learning a text-video embedding by watching hundred million narrated video clips

    Antoine Miech, Dimitri Zhukov, Jean-Baptiste Alayrac, Makarand Tapaswi, Ivan Laptev, and Josef Sivic. Howto100m: Learning a text-video embedding by watching hundred million narrated video clips. In Proceedings of the IEEE/CVF international conference on computer vision, pages 2630–2640, 2019

  49. [49]

    Robust speech recognition via large-scale weak supervision

    Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. Robust speech recognition via large-scale weak supervision. InInternational conference on machine learning, pages 28492–28518. PMLR, 2023. 12

  50. [50]

    Perplexity ai.https://www.perplexity.ai/, 2024

    Perplexity AI. Perplexity ai.https://www.perplexity.ai/, 2024. Accessed: 2025-05-08

  51. [51]

    Introducing chatgpt search, 2024

    OpenAI. Introducing chatgpt search, 2024. URL https://openai.com/index/ introducing-chatgpt-search/

  52. [52]

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with ad- vanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025

  53. [53]

    Search-o1: Agentic search-enhanced large reasoning models

    Xiaoxi Li, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Peitian Zhang, and Zhicheng Dou. Search-o1: Agentic search-enhanced large reasoning models. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 5420–5438, 2025

  54. [54]

    Gonzalez

    Mihran Miroyan, Tsung-Han Wu, Logan King, Tianle Li, Jiayi Pan, Xinyan Hu, Wei-Lin Chiang, Anas- tasios Nikolas Angelopoulos, Trevor Darrell, Narges Norouzi, and Joseph E. Gonzalez. Search arena: Analyzing search-augmented LLMs. InThe Fourteenth International Conference on Learning Representa- tions, 2026. URLhttps://openreview.net/forum?id=MMGRlDnhtI

  55. [55]

    Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, and 1 others

    Zhengyang Liang, Yan Shu, Xiangrui Liu, Minghao Qin, Kaixin Liang, Paolo Rota, Nicu Sebe, Zheng Liu, and Lizi Liao. Video-browsecomp: Benchmarking agentic video research on open web.arXiv preprint arXiv:2512.23044, 2025

  56. [56]

    Agenticshop: Benchmarking agentic product curation for personalized web shopping

    Sunghwan Kim, Ryang Heo, Yongsik Seo, Jinyoung Yeo, and Dongha Lee. Agenticshop: Benchmarking agentic product curation for personalized web shopping. InProceedings of the ACM Web Conference 2026, pages 2489–2500, 2026

  57. [57]

    grok-4.1-fast-reasoning, 2025

    xAI. grok-4.1-fast-reasoning, 2025. URL https://docs.x.ai/developers/models/ grok-4-1-fast-reasoning

  58. [58]

    GPT-4 Technical Report

    OpenAI. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023. URL https://api. semanticscholar.org/CorpusID:257532815

  59. [59]

    Critique-out-loud reward models,

    Zachary Ankner, Mansheej Paul, Brandon Cui, Jonathan D Chang, and Prithviraj Ammanabrolu. Critique- out-loud reward models.arXiv preprint arXiv:2408.11791, 2024

  60. [60]

    MM-RLHF: The next step forward in multimodal LLM alignment

    YiFan Zhang, Tao Yu, Haochen Tian, Chaoyou Fu, Peiyan Li, Jianshu Zeng, Wulin Xie, Yang Shi, Huanyu Zhang, Junkang Wu, Xue Wang, Yibo Hu, Bin Wen, Tingting Gao, Zhang Zhang, Fan Yang, Di ZHANG, Liang Wang, and Rong Jin. MM-RLHF: The next step forward in multimodal LLM alignment. In Forty-second International Conference on Machine Learning, 2025. URL https...

  61. [61]

    Personalized reward modeling for text-to-image generation

    Jeongeun Lee, Ryang Heo, and Dongha Lee. Personalized reward modeling for text-to-image generation. arXiv preprint arXiv:2511.19458, 2025

  62. [62]

    Joint reward modeling: Internalizing chain-of-thought for efficient visual reward models.arXiv preprint arXiv:2602.07533, 2026

    Yankai Yang, Yancheng Long, Hongyang Wei, Wei Chen, Tianke Zhang, Kaiyu Jiang, Haonan Fan, Changyi Liu, Jiankang Chen, Kaiyu Tang, et al. Joint reward modeling: Internalizing chain-of-thought for efficient visual reward models.arXiv preprint arXiv:2602.07533, 2026

  63. [63]

    Multimodal llms as customized reward models for text-to-image generation

    Shijie Zhou, Ruiyi Zhang, Huaisheng Zhu, Branislav Kveton, Yufan Zhou, Jiuxiang Gu, Jian Chen, and Changyou Chen. Multimodal llms as customized reward models for text-to-image generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 19638–19648, 2025

  64. [64]

    Why cannot long-term cascade be predicted? exploring temporal dynamics in information diffusion processes.Royal Society Open Science, 8(9), 2021

    Ren-Meng Cao, Xiao Fan Liu, and Xiao-Ke Xu. Why cannot long-term cascade be predicted? exploring temporal dynamics in information diffusion processes.Royal Society Open Science, 8(9), 2021

  65. [65]

    Characterizing viral videos: Methodology and applications.Electronic Commerce Research and Applications, 19:19–32, 2016

    Stephen L France, Mahyar Sharif Vaghefi, and Huimin Zhao. Characterizing viral videos: Methodology and applications.Electronic Commerce Research and Applications, 19:19–32, 2016

  66. [66]

    Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022

  67. [67]

    Qwen3 Technical Report

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

  68. [68]

    Generalization through memorization: Nearest neighbor language models

    Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis. Generalization through memorization: Nearest neighbor language models. InInternational Conference on Learning Representations. 13

  69. [69]

    Adaptation approaches for nearest neighbor language models

    Rishabh Bhardwaj, George Polovets, and Monica Sunkara. Adaptation approaches for nearest neighbor language models. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors,Findings of the Association for Computational Linguistics: ACL 2023, pages 1135–1146, Toronto, Canada, July

  70. [70]

    doi: 10.18653/v1/2023.findings-acl.73

    Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-acl.73. URL https: //aclanthology.org/2023.findings-acl.73/

  71. [71]

    Adanpc: Exploring non-parametric classifier for test-time adaptation

    Yifan Zhang, Xue Wang, Kexin Jin, Kun Yuan, Zhang Zhang, Liang Wang, Rong Jin, and Tieniu Tan. Adanpc: Exploring non-parametric classifier for test-time adaptation. InInternational conference on machine learning, pages 41647–41676. PMLR, 2023

  72. [72]

    Ts-memory: Plug-and-play memory for time series foundation models.arXiv preprint arXiv:2602.11550, 2026

    Sisuo Lyu, Siru Zhong, Tiegang Chen, Weilin Ruan, Qingxiang Liu, Taiqiang Lv, Qingsong Wen, Raymond Chi-Wing Wong, and Yuxuan Liang. Ts-memory: Plug-and-play memory for time series foundation models.arXiv preprint arXiv:2602.11550, 2026

  73. [73]

    Orthogonal subspace learning for language model continual learning

    Xiao Wang, Tianze Chen, Qiming Ge, Han Xia, Rong Bao, Rui Zheng, Qi Zhang, Tao Gui, and Xuan-Jing Huang. Orthogonal subspace learning for language model continual learning. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 10658–10671, 2023

  74. [74]

    A simple but strong baseline for online continual learning: Repeated augmented rehearsal

    Yaqian Zhang, Bernhard Pfahringer, Eibe Frank, Albert Bifet, Nick Jin Sean Lim, and Alvin Jia. A simple but strong baseline for online continual learning: Repeated augmented rehearsal. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Information Processing Systems, 2022. URLhttps://openreview.net/forum?id=bhvUOhnsgZ

  75. [75]

    Self-consistent reasoning-based aspect-sentiment quad prediction with extract-then-assign strategy

    Jieyong Kim, Ryang Heo, Yongsik Seo, SeongKu Kang, Jinyoung Yeo, and Dongha Lee. Self-consistent reasoning-based aspect-sentiment quad prediction with extract-then-assign strategy. InFindings of the Association for Computational Linguistics: ACL 2024, pages 7295–7303, 2024

  76. [76]

    Make compound sentences simple to analyze: Learning to split sentences for aspect-based sentiment analysis

    Yongsik Seo, Sungwon Song, Ryang Heo, Jieyong Kim, and Dongha Lee. Make compound sentences simple to analyze: Learning to split sentences for aspect-based sentiment analysis. InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 11171–11184, 2024

  77. [77]

    Imagine all the relevance: Scenario-profiled indexing with knowledge expansion for dense retrieval

    Sangam Lee, Ryang Heo, SeongKu Kang, and Dongha Lee. Imagine all the relevance: Scenario-profiled indexing with knowledge expansion for dense retrieval. InSecond Conference on Language Modeling

  78. [78]

    Angle-optimized text embeddings,

    Xianming Li and Jing Li. Angle-optimized text embeddings.arXiv preprint arXiv:2309.12871, 2023

  79. [79]

    {Trend Feature} {Topic} {Sub-Topic} #shorts

    Ryang Heo, Yongsik Seo, Junseong Lee, and Dongha Lee. Can large language models be effective online opinion miners? InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 23108–23147, 2025. 14 A Limitations and Broader Impacts LimitationsWhile our results confirm the value of open-web grounding and trend-aware adap...