arxiv: 2604.26252 · v1 · submitted 2026-04-29 · 💻 cs.CV

Recognition: unknown

OmniTrend: Content-Context Modeling for Scalable Social Popularity Prediction

Liliang Ye , Guiyi Zeng , Yunyao Zhang , Yi-Ping Phoebe Chen , Junqing Yu , Zikai Song

Authors on Pith no claims yet

Pith reviewed 2026-05-07 13:54 UTC · model grok-4.3

classification 💻 cs.CV

keywords social popularity predictioncontent-context modelingcross-modal learningcontextual exposurecross-platform transfermultimodal featurespopularity forecastingsocial media analysis

0 comments

The pith

OmniTrend predicts social popularity by learning separate models for content attractiveness and contextual exposure before combining them.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Social media popularity stems from the intrinsic appeal of the content and the context that controls its exposure to users. Earlier methods combine these signals, allowing platform-specific visibility patterns to influence the learned content representations and limiting transfer to other platforms. OmniTrend addresses this by using one module to extract attractiveness from visual, audio, and textual features and another to estimate exposure from time, author activity, trends, and neighborhood data. These two predictions are then integrated to produce the final popularity score. The separation makes it possible to understand each factor's role and to apply the model more reliably when moving between image-focused and video-focused platforms.

Core claim

OmniTrend models popularity as the joint outcome of content attractiveness and contextual exposure. The content module learns cross-modal representations from visual, audio, and textual cues to quantify intrinsic appeal. The context module estimates exposure from exogenous signals such as posting time, author activity, topical trends, and retrieval-based neighborhood statistics. Separate predictors are learned for each component and combined in the final estimate.

What carries the argument

Dual-predictor architecture with a cross-modal content module for intrinsic appeal and an exogenous context module for exposure signals, whose outputs are integrated for the final popularity score.

Load-bearing premise

Content attractiveness and contextual exposure can be cleanly separated using the chosen signals without residual entanglement or platform-specific leakage in the learned representations.

What would settle it

Training the content module on one platform then evaluating its standalone predictions on content from a second platform shows accuracy no higher than an entangled baseline that mixes both factors.

Figures

Figures reproduced from arXiv: 2604.26252 by Guiyi Zeng, Junqing Yu, Liliang Ye, Yi-Ping Phoebe Chen, Yunyao Zhang, Zikai Song.

**Figure 1.** Figure 1: Examples of cross-platform social media posts and view at source ↗

**Figure 2.** Figure 2: Overall architecture of the proposed con view at source ↗

**Figure 3.** Figure 3: Architecture of the cross-platform content model view at source ↗

**Figure 4.** Figure 4: Architecture of the platform-specific context mod view at source ↗

**Figure 5.** Figure 5: Distribution comparison of true labels, content view at source ↗

**Figure 6.** Figure 6: Rank–rank heatmap between the predicted content view at source ↗

read the original abstract

Predicting social media popularity requires understanding both the intrinsic appeal of content and the external context that determines how it is exposed to users. Existing methods focus on content signals but do not separate them from exposure-related patterns, which causes the learned representations to absorb platform-specific visibility effects and weakens both interpretability and cross-platform transfer. This paper introduces OmniTrend, a unified framework that models popularity as the joint outcome of content attractiveness and contextual exposure. The content module learns cross-modal representations from visual, audio, and textual cues to quantify intrinsic appeal, while the context module estimates exposure from exogenous signals such as posting time, author activity, topical trends, and retrieval-based neighborhood statistics. OmniTrend learns separate predictors for content attractiveness and contextual exposure and integrates them in the final popularity estimate, which makes the role of each factor explicit and supports robust transfer across image and video platforms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OmniTrend splits popularity prediction into separate content and context modules for better interpretability and transfer, but the abstract supplies no results to show the split holds up.

read the letter

The new piece is the explicit two-module design: one cross-modal module for intrinsic content appeal from visuals, audio, and text, and another for exposure using time, author activity, trends, and retrieval neighborhoods. They train separate predictors and combine them at the end. This directly targets the mixing problem in prior work and could support moving models between image and video platforms without retraining everything from scratch. The motivation is clear and the architecture choice makes sense on paper for keeping factors distinct.

Referee Report

2 major / 0 minor

Summary. The paper introduces OmniTrend, a unified framework for social media popularity prediction that models popularity as the joint outcome of content attractiveness (learned via cross-modal representations from visual, audio, and textual cues) and contextual exposure (estimated from exogenous signals including posting time, author activity, topical trends, and retrieval-based neighborhood statistics). Separate predictors are learned for each factor and integrated in the final estimate to make their roles explicit and to support improved interpretability and cross-platform transfer between image and video domains.

Significance. If the separation is achieved and empirically validated, the approach could meaningfully advance popularity prediction by yielding more interpretable models that avoid absorbing platform-specific visibility effects, potentially enabling stronger generalization across content types and platforms than existing content-centric methods.

major comments (2)

[Abstract] Abstract: The manuscript outlines the intended architecture, motivation, and claimed benefits but supplies no experimental results, ablation studies, baselines, error bars, or validation details whatsoever, leaving the central claims of clean separation, improved interpretability, and robust cross-platform transfer unsupported by evidence.
[Abstract] Abstract (context module description): Topical trends and retrieval-based neighborhood statistics are assigned to the context module, yet these signals are inherently dependent on content features. No disentanglement losses, orthogonality constraints, mutual-information minimization, or similar mechanisms are described to enforce separation between the content and context representation streams; without such safeguards, residual entanglement is likely and would undermine the claimed interpretability and transfer properties.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and describe the revisions we will make to improve clarity and support for the central claims.

read point-by-point responses

Referee: [Abstract] Abstract: The manuscript outlines the intended architecture, motivation, and claimed benefits but supplies no experimental results, ablation studies, baselines, error bars, or validation details whatsoever, leaving the central claims of clean separation, improved interpretability, and robust cross-platform transfer unsupported by evidence.

Authors: The abstract provides a concise overview of the framework and motivations, as is conventional to keep it brief. The full manuscript contains a dedicated Experiments section with baseline comparisons, ablation studies on the content and context modules, cross-platform transfer results between image and video domains, and error bars from multiple runs that empirically support the separation benefits and improved generalization. To better align the abstract with these results, we will revise it to include a short summary of the key quantitative findings. revision: yes
Referee: [Abstract] Abstract (context module description): Topical trends and retrieval-based neighborhood statistics are assigned to the context module, yet these signals are inherently dependent on content features. No disentanglement losses, orthogonality constraints, mutual-information minimization, or similar mechanisms are described to enforce separation between the content and context representation streams; without such safeguards, residual entanglement is likely and would undermine the claimed interpretability and transfer properties.

Authors: We agree that topical trends and retrieval-based neighborhood statistics can carry indirect content dependencies, which risks some entanglement. The current design relies on distinct input modalities (cross-modal content features versus exogenous context signals) and separate predictor heads to promote separation. However, to more rigorously enforce disentanglement and bolster the interpretability and transfer claims, we will add an explicit regularization term, such as a feature orthogonality constraint or mutual-information minimization loss between the content and context streams, in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity in claimed derivation chain

full rationale

The paper introduces OmniTrend as a modeling framework that separates content attractiveness (via cross-modal cues) from contextual exposure (via exogenous signals) and integrates separate predictors. No equations, derivations, first-principles results, or predictions are described that reduce by construction to fitted inputs or self-definitions. The separation is motivated by domain reasoning about interpretability and transfer, with no self-citation load-bearing steps, uniqueness theorems, or ansatzes invoked in the provided text. This is a standard non-circular case of architectural design rather than a mathematical chain that collapses to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that popularity factors are separable and that the listed exogenous signals are sufficient to isolate exposure without leakage into content representations.

axioms (1)

domain assumption Content attractiveness and contextual exposure are separable and can be modeled independently using the chosen signals
This separability is the foundational premise stated in the abstract for avoiding absorption of visibility effects.

pith-pipeline@v0.9.0 · 5458 in / 1181 out tokens · 52820 ms · 2026-05-07T13:54:25.646079+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

85 extracted references · 25 canonical work pages · 11 internal anchors

[1]

Fatma S Abousaleh, Wen-Huang Cheng, Neng-Hao Yu, and Yu Tsao. 2020. Multi- modal deep learning framework for image popularity prediction on social media. IEEE Transactions on Cognitive and Developmental Systems13, 3 (2020), 679–692

2020
[2]

Anonymous. 2025. EvoPro: An Evolution-aware Prompt-augmented Framework for Micro-video Popularity Prediction. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval

2025
[3]

Jiawei Chen, Hande Dong, Xiang Wang, Fuli Feng, Meng Wang, and Xiangnan He. 2023. Bias and debias in recommender system: A survey and future directions. ACM Transactions on Information Systems41, 3 (2023), 1–39

2023
[4]

Jingyuan Chen, Xuemeng Song, Liqiang Nie, Xiang Wang, Hanwang Zhang, and Tat-Seng Chua. 2016. Micro tells macro: Predicting the popularity of micro-videos via a transductive model. InProceedings of the 24th ACM international conference on Multimedia. 898–907

2016
[5]

Zhiwei Chen, Yupeng Hu, Zhiheng Fu, Zixu Li, Jiale Huang, Qinlei Huang, and Yinwei Wei. 2026. INTENT: Invariance and Discrimination-aware Noise Mitiga- tion for Robust Composed Image Retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 20463–20471

2026
[6]

Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Xuemeng Song, and Liqiang Nie
[7]

InProceedings of the ACM International Conference on Multimedia

OFFSET: Segmentation-based Focus Shift Revision for Composed Image Retrieval. InProceedings of the ACM International Conference on Multimedia. 6113–6122
[8]

Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Haokun Wen, and Weili Guan
[9]

InProceedings of the ACM International Conference on Multimedia

HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Com- posed Video Retrieval. InProceedings of the ACM International Conference on Multimedia. 6143–6152
[10]

Justin Cheng, Lada Adamic, P Alex Dow, Jon Michael Kleinberg, and Jure Leskovec. 2014. Can cascades be predicted?. InProceedings of the 23rd inter- national conference on World wide web. 925–936

2014
[11]

Zhangtao Cheng, Jienan Zhang, Xovee Xu, Goce Trajcevski, Ting Zhong, and Fan Zhou. 2024. Retrieval-augmented hypergraph for multimodal social media popularity prediction. InProceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining. 445–455

2024
[12]

Zhangtao Cheng, Fan Zhou, Xovee Xu, Kunpeng Zhang, Goce Trajcevski, Ting Zhong, and Philip S. Yu. 2024. Information Cascade Popularity Prediction via Probabilistic Diffusion.IEEE Transactions on Knowledge and Data Engineering (2024)

2024
[13]

Tsun-hin Cheung and Kin-man Lam. 2022. Crossmodal bipolar attention for multimodal classification on social media.Neurocomputing514 (2022), 1–12

2022
[14]

Jiaxin Deng, Dong Shen, Shiyao Wang, Xiangyu Wu, Fan Yang, Guorui Zhou, and Gaofeng Meng. 2023. ContentCTR: Frame-level live streaming click-through rate prediction with multimodal transformer.arXiv preprint arXiv:2306.14392 (2023)

work page arXiv 2023
[15]

Zhiheng Fu, Yupeng Hu, Qianyun Yang, Shiqi Zhang, Zhiwei Chen, and Zixu Li
[16]

Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval

Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval. arXiv:2604.19386 [cs.CV] https://arxiv.org/abs/ 2604.19386

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Francesco Gelli, Tiberio Uricchio, Marco Bertini, Alberto Del Bimbo, and Shih- Fu Chang. 2015. Image popularity prediction in social media using sentiment and context features. InProceedings of the 23rd ACM international conference on Multimedia. 907–910

2015
[18]

Shalini Ghosh, Oriol Vinyals, Brian Strope, Scott Roy, Tom Dean, and Larry Heck
[19]

Contextual lstm (clstm) models for large scale nlp tasks.arXiv preprint arXiv:1602.06291(2016)

work page arXiv 2016
[20]

Matan Haimovich, Elad Elisha, et al. 2022. Popularity Prediction for Social Media over Arbitrary Time Horizons.Proceedings of the VLDB Endowment(2022)

2022
[21]

Chih-Chung Hsu, Chia-Ming Lee, Yu-Fan Lin, Yi-Shiuan Chou, Chih-Yu Jian, and Chi-Han Tsai. 2024. Revisiting Vision-Language Features Adaptation and Inconsistency for Social Media Popularity Prediction. InProceedings of the 32nd ACM International Conference on Multimedia. 11464–11469

2024
[22]

Yupeng Hu, Zixu Li, Zhiwei Chen, Qinlei Huang, Zhiheng Fu, Mingzhu Xu, and Liqiang Nie. 2026. REFINE: Composed Video Retrieval via Shared and Differ- ential Semantics Enhancement.ACM Transactions on Multimedia Computing, Communications and Applications(2026)

2026
[23]

Yangliu Hu, Zikai Song, Na Feng, Yawei Luo, Junqing Yu, Yi-Ping Phoebe Chen, and Wei Yang. 2025. SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding.arXiv preprint arXiv:2504.07745(2025)

work page arXiv 2025
[24]

Peter J Huber. 1992. Robust estimation of a location parameter. InBreakthroughs in statistics: Methodology and distribution. Springer, 492–518

1992
[25]

Shuo Ji, Xiaodong Lu, Mingzhe Liu, Leilei Sun, Chuanren Liu, Bowen Du, and Hui Xiong. 2023. Community-based dynamic graph learning for popularity prediction. InProceedings of the 29th ACM SIGKDD conference on knowledge discovery and data mining. 930–940

2023
[26]

Ruidong Jin, Xin Liu, and Tsuyoshi Murata. 2024. Predicting popularity trend in social media networks with multi-layer temporal graph neural networks. Complex & Intelligent Systems10, 4 (2024), 4713–4729

2024
[27]

Peiguang Jing, Yuting Su, Liqiang Nie, Xu Bai, Jing Liu, and Meng Wang. 2017. Low-rank multi-view embedding learning for micro-video popularity prediction. IEEE Transactions on Knowledge and Data Engineering30, 8 (2017), 1519–1532

2017
[28]

Xin Jing, Zeyu Shi, Zhangtao Cheng, Yichen Jing, Yuhuan Lu, Bangchao Deng, and Dingqi Yang. 2026. Modeling Multimodal Information Cascade on Social Media with Interpretable Mixture of Experts. InProceedings of the ACM Web Conference

2026
[29]

Pratik Kayal, Pascal Mettes, Nima Dehmamy, and Minsu Park. 2025. Large Language Models Are Natural Video Popularity Predictors. InFindings of the Association for Computational Linguistics: ACL 2025. 11432–11464

2025
[30]

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree.Advances in neural information processing systems30 (2017)

2017
[31]

Sami Khenissi and Olfa Nasraoui. 2020. Modeling and counteracting exposure bias in recommender systems.arXiv preprint arXiv:2001.04832(2020)

work page arXiv 2020
[32]

Aditya Khosla, Atish Das Sarma, and Raffay Hamid. 2014. What makes an image popular?. InProceedings of the 23rd international conference on World wide web. 867–876

2014
[33]

Thorsten Krause, Alina Deriyeva, Jan H Beinke, Gerrit Y Bartels, and Oliver Thomas. 2024. Mitigating Exposure Bias in Recommender Systems—A Compar- ative Analysis of Discrete Choice Models.ACM Transactions on Recommender Systems3, 2 (2024), 1–37

2024
[34]

Xin Lai, Yihong Zhang, and Wei Zhang. 2020. HyFea: Winning solution to social media popularity prediction for multimedia grand challenge 2020. InProceedings of the 28th ACM International Conference on Multimedia. 4565–4569

2020
[35]

Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. InInternational conference on machine learning. PMLR, 12888–12900

2022
[36]

Wenbing Li, Zikai Song, Jielei Zhang, Tianhao Zhao, Junkai Lin, Yiran Wang, and Wei Yang. 2026. Large Language Model as Token Compressor and Decompressor. arXiv:2603.25340 [cs.CL]

work page internal anchor Pith review arXiv 2026
[37]

Wenbing Li, Zikai Song, Hang Zhou, Yunyao Zhang, Junqing Yu, and Wei Yang
[38]

LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing.arXiv preprint arXiv:2507.00029(2025)

work page internal anchor Pith review arXiv 2025
[39]

Wenbing Li, Hang Zhou, Junqing Yu, Zikai Song, and Wei Yang. 2024. Coupled mamba: Enhanced multimodal fusion with coupled state space model.Advances in Neural Information Processing Systems37 (2024), 59808–59832

2024
[40]

Zixu Li, Zhiwei Chen, Haokun Wen, Zhiheng Fu, Yupeng Hu, and Weili Guan
[41]

InProceedings of the AAAI Conference on Artificial Intelligence, Vol

Encoder: Entity mining and modification relation binding for composed image retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 5101–5109
[42]

Zixu Li, Yupeng Hu, Zhiwei Chen, Qinlei Huang, Guozhi Qiu, Zhiheng Fu, and Meng Liu. 2026. ReTrack: Evidence-Driven Dual-Stream Directional Anchor MM ’26, November 10–14, 2026, Rio de Janeiro, Brazil First Author and Second Author Calibration Network for Composed Video Retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 23...

2026
[43]

Zixu Li, Yupeng Hu, Zhiwei Chen, Shiqi Zhang, Qinlei Huang, Zhiheng Fu, and Yinwei Wei. 2026. HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 6762–6770

2026
[44]

Zixu Li, Yupeng Hu, Zhiheng Fu, Zhiwei Chen, Yongqi Li, and Liqiang Nie. 2026. TEMA: Anchor the Image, Follow the Text for Multi-Modification Composed Image Retrieval. arXiv:2604.21806 [cs.CV] https://arxiv.org/abs/2604.21806

work page internal anchor Pith review Pith/arXiv arXiv 2026
[45]

Yijun Liu, Wu Liu, Xiaoyan Gu, and Yongdong Zhang. 2025. PopSim: Social Net- work Simulation for Social Media Popularity Prediction. arXiv:2512.02533 [cs.SI]

work page arXiv 2025
[46]

Yunbo Long, Yuhan Liu, and Liming Xu. 2026. EmoMAS: Emotion-Aware Multi- Agent System for High-Stakes Edge-Deployable Negotiation with Bayesian Or- chestration. arXiv:2604.07003 [cs.AI] https://arxiv.org/abs/2604.07003

work page internal anchor Pith review Pith/arXiv arXiv 2026
[47]

Yiwei Ma, Guohai Xu, Xiaoshuai Sun, Ming Yan, Ji Zhang, and Rongrong Ji. 2022. X-clip: End-to-end multi-grained contrastive learning for video-text retrieval. In Proceedings of the 30th ACM international conference on multimedia. 638–647

2022
[48]

Mayank Meghawat, Satyendra Yadav, Debanjan Mahata, Yifang Yin, Rajiv Ratn Shah, and Roger Zimmermann. 2018. A multimodal approach to predict social media popularity. In2018 IEEE conference on multimedia information processing and retrieval (MIPR). IEEE, 190–195

2018
[49]

Yongxin Ni, Yu Cheng, Xiangyan Liu, Junchen Fu, Youhua Li, Xiangnan He, Yongfeng Zhang, and Fajie Yuan. 2023. A Content-Driven Micro-Video Recom- mendation Dataset at Scale.arXiv preprint arXiv:2309.15379(2023)

work page arXiv 2023
[50]

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748(2018)

work page internal anchor Pith review arXiv 2018
[51]

Alessandro Ortis, Giovanni Maria Farinella, and Sebastiano Battiato. 2019. Pre- diction of social image popularity dynamics. InInternational Conference on Image Analysis and Processing. Springer, 572–582

2019
[52]

Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Doro- gush, and Andrey Gulin. 2018. CatBoost: unbiased boosting with categorical features.Advances in neural information processing systems31 (2018)

2018
[53]

Guozhi Qiu, Zhiwei Chen, Zixu Li, Qinlei Huang, Zhiheng Fu, Xuemeng Song, and Yupeng Hu. 2026. MELT: Improve Composed Image Retrieval via the Modification Frequentation-Rarity Balance Network.arXiv preprint arXiv:2603.29291(2026)

work page arXiv 2026
[54]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. InInternational conference on machine learning. PmLR, 8748–8763

2021
[55]

Zikai Song, Run Luo, Lintao Ma, Ying Tang, Yi-Ping Phoebe Chen, Junqing Yu, and Wei Yang. 2025. Temporal Coherent Object Flow for Multi-Object Tracking. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 6978–6986

2025
[56]

Zikai Song, Run Luo, Junqing Yu, Yi-Ping Phoebe Chen, and Wei Yang. 2023. Compact transformer tracker with correlative masked modeling. InProceedings of the AAAI conference on artificial intelligence, Vol. 37. 2321–2329

2023
[57]

Zikai Song, Ying Tang, Run Luo, Lintao Ma, Junqing Yu, Yi-Ping Phoebe Chen, and Wei Yang. 2024. Autogenic language embedding for coherent point tracking. In Proceedings of the 32nd ACM International Conference on Multimedia. 2021–2030

2024
[58]

Zikai Song, Junqing Yu, Yi-Ping Phoebe Chen, and Wei Yang. 2022. Transformer tracking with cyclic shifting window attention. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8791–8800

2022
[59]

Zikai Song, Junqing Yu, Yi-Ping Phoebe Chen, Wei Yang, and Xinchao Wang
[60]

Hypergraph-State Collaborative Reasoning for Multi-Object Tracking

Hypergraph-State Collaborative Reasoning for Multi-Object Tracking. arXiv:2604.12665 [cs.CV] https://arxiv.org/abs/2604.12665

work page internal anchor Pith review Pith/arXiv arXiv
[61]

Shisong Tang, Qing Li, Xiaoteng Ma, Ci Gao, Dingmin Wang, Yong Jiang, Qian Ma, Aoyang Zhang, and Hechang Chen. 2022. Knowledge-based temporal fusion network for interpretable online video popularity prediction. InProceedings of the ACM Web Conference 2022. 2879–2887

2022
[62]

Tomasz Trzciński and Przemysław Rokita. 2017. Predicting popularity of online videos using support vector regression.IEEE Transactions on Multimedia19, 11 (2017), 2561–2570

2017
[63]

Massimiliano Viola. 2021. Instagram images and videos popularity prediction: a deep learning-based approach. (2021)

2021
[64]

Jie Wang, Zitong Wang, Yan Peng, and Bowen Hao. 2024. Research on multimodal social media information popularity prediction based on large language model. Tongxin Xuebao/Journal on Communication45, 11 (2024)

2024
[65]

Bo Wu et al. 2024. SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge.arXiv preprint arXiv:2405.10497(2024)

work page arXiv 2024
[66]

Jiayi Xie, Yaochen Zhu, and Zhenzhong Chen. 2021. Micro-video popularity prediction via multimodal variational information bottleneck.IEEE Transactions on Multimedia25 (2021), 24–37

2021
[67]

Jie Xu, Mihaela Van Der Schaar, Jiangchuan Liu, and Haitao Li. 2014. Forecasting popularity of videos using social media.IEEE Journal of Selected Topics in Signal Processing9, 2 (2014), 330–343

2014
[68]

Xovee Xu, Shuojun Lin, Fan Zhou, and Jingkuan Song. 2026. Learning to Curate Context: Jointly Optimizing Retrieval and Prediction for Multimodal Social Media Popularity. InProceedings of the AAAI Conference on Artificial Intelligence

2026
[69]

Xovee Xu, Yifan Zhang, Fan Zhou, and Jingkuan Song. 2025. Improving Mul- timodal Social Media Popularity Prediction via Selective Retrieval Knowledge Augmentation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 932–940

2025
[70]

Xu et al

Y. Xu et al. 2025. SMTPD: A New Benchmark for Temporal Prediction of Social Media Popularity. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2025
[71]

Qianyun Yang, Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, and Liqiang Nie. 2026. STABLE: Efficient Hybrid Nearest Neighbor Search via Magnitude- Uniformity and Cardinality-Robustness.arXiv preprint arXiv:2604.01617(2026)

work page arXiv 2026
[72]

Liliang Ye, Yunyao Zhang, Yafeng Wu, Yi-Ping Phoebe Chen, Junqing Yu, Wei Yang, and Zikai Song. 2025. MVP: Winning Solution to SMP Challenge 2025 Video Track.arXiv preprint arXiv:2507.00950(2025)

work page arXiv 2025
[73]

Haohan Yuan, Sukhwa Hong, and Haopeng Zhang. 2026. Strucsum: Graph- structured reasoning for long document extractive summarization with llms. In Findings of the Association for Computational Linguistics: EACL 2026. 3708–3721

2026
[74]

Haohan Yuan and Haopeng Zhang. 2025. Domainsum: A hierarchical benchmark for fine-grained domain shift in abstractive text summarization. InFindings of the Association for Computational Linguistics: NAACL 2025. 2219–2231

2025
[75]

Haohan Yuan and Haopeng Zhang. 2025. Understanding LLM Reasoning for Abstractive Summarization.arXiv preprint arXiv:2512.03503(2025)

work page arXiv 2025
[76]

Mingyu Zhang, Zixu Li, Zhiwei Chen, Zhiheng Fu, Xiaowei Zhu, Jiajia Nie, Yinwei Wei, and Yupeng Hu. 2026. Hint: Composed image retrieval with dual- path compositional contextualized network.arXiv preprint arXiv:2603.26341 (2026)

work page arXiv 2026
[77]

Xinglang Zhang, Yunyao Zhang, ZeLiang Chen, Junqing Yu, Wei Yang, and Zikai Song. 2026. Logical Phase Transitions: Understanding Collapse in LLM Logical Reasoning. arXiv:2601.02902 [cs.AI] https://arxiv.org/abs/2601.02902

work page internal anchor Pith review Pith/arXiv arXiv 2026
[78]

Yunyao Zhang, Yihao Ai, Zuocheng Ying, Qirui Mi, Junqing Yu, Wei Yang, and Zikai Song. 2026. Coupling Macro Dynamics and Micro States for Long-Horizon Social Simulation. arXiv:2604.05516 [cs.SI] https://arxiv.org/abs/2604.05516

work page internal anchor Pith review Pith/arXiv arXiv 2026
[79]

Yunyao Zhang, Zikai Song, Hang Zhou, Wenfeng Ren, Yi-Ping Phoebe Chen, Junqing Yu, and Wei Yang. 2025. 𝐺𝐴−𝑆 3: Comprehensive Social Network Simulation with Group Agents. InFindings of the Association for Computational Linguistics: ACL 2025. 8950–8970

2025
[80]

Yunyao Zhang, Zuocheng Ying, Xinglang Zhang, Junqing Yu, Peng Fang, Xu Chen, Wei Yang, and Zikai Song. 2026. IntervenSim: Intervention-Aware Social Network Simulation for Opinion Dynamics. arXiv:2604.06600 [cs.SI] https: //arxiv.org/abs/2604.06600

work page internal anchor Pith review Pith/arXiv arXiv 2026

Showing first 80 references.