pith. machine review for the scientific record. sign in

arxiv: 2604.26252 · v1 · submitted 2026-04-29 · 💻 cs.CV

Recognition: unknown

OmniTrend: Content-Context Modeling for Scalable Social Popularity Prediction

Authors on Pith no claims yet

Pith reviewed 2026-05-07 13:54 UTC · model grok-4.3

classification 💻 cs.CV
keywords social popularity predictioncontent-context modelingcross-modal learningcontextual exposurecross-platform transfermultimodal featurespopularity forecastingsocial media analysis
0
0 comments X

The pith

OmniTrend predicts social popularity by learning separate models for content attractiveness and contextual exposure before combining them.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Social media popularity stems from the intrinsic appeal of the content and the context that controls its exposure to users. Earlier methods combine these signals, allowing platform-specific visibility patterns to influence the learned content representations and limiting transfer to other platforms. OmniTrend addresses this by using one module to extract attractiveness from visual, audio, and textual features and another to estimate exposure from time, author activity, trends, and neighborhood data. These two predictions are then integrated to produce the final popularity score. The separation makes it possible to understand each factor's role and to apply the model more reliably when moving between image-focused and video-focused platforms.

Core claim

OmniTrend models popularity as the joint outcome of content attractiveness and contextual exposure. The content module learns cross-modal representations from visual, audio, and textual cues to quantify intrinsic appeal. The context module estimates exposure from exogenous signals such as posting time, author activity, topical trends, and retrieval-based neighborhood statistics. Separate predictors are learned for each component and combined in the final estimate.

What carries the argument

Dual-predictor architecture with a cross-modal content module for intrinsic appeal and an exogenous context module for exposure signals, whose outputs are integrated for the final popularity score.

Load-bearing premise

Content attractiveness and contextual exposure can be cleanly separated using the chosen signals without residual entanglement or platform-specific leakage in the learned representations.

What would settle it

Training the content module on one platform then evaluating its standalone predictions on content from a second platform shows accuracy no higher than an entangled baseline that mixes both factors.

Figures

Figures reproduced from arXiv: 2604.26252 by Guiyi Zeng, Junqing Yu, Liliang Ye, Yi-Ping Phoebe Chen, Yunyao Zhang, Zikai Song.

Figure 1
Figure 1. Figure 1: Examples of cross-platform social media posts and view at source ↗
Figure 2
Figure 2. Figure 2: Overall architecture of the proposed con view at source ↗
Figure 3
Figure 3. Figure 3: Architecture of the cross-platform content model view at source ↗
Figure 4
Figure 4. Figure 4: Architecture of the platform-specific context mod view at source ↗
Figure 5
Figure 5. Figure 5: Distribution comparison of true labels, content view at source ↗
Figure 6
Figure 6. Figure 6: Rank–rank heatmap between the predicted content view at source ↗
read the original abstract

Predicting social media popularity requires understanding both the intrinsic appeal of content and the external context that determines how it is exposed to users. Existing methods focus on content signals but do not separate them from exposure-related patterns, which causes the learned representations to absorb platform-specific visibility effects and weakens both interpretability and cross-platform transfer. This paper introduces OmniTrend, a unified framework that models popularity as the joint outcome of content attractiveness and contextual exposure. The content module learns cross-modal representations from visual, audio, and textual cues to quantify intrinsic appeal, while the context module estimates exposure from exogenous signals such as posting time, author activity, topical trends, and retrieval-based neighborhood statistics. OmniTrend learns separate predictors for content attractiveness and contextual exposure and integrates them in the final popularity estimate, which makes the role of each factor explicit and supports robust transfer across image and video platforms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces OmniTrend, a unified framework for social media popularity prediction that models popularity as the joint outcome of content attractiveness (learned via cross-modal representations from visual, audio, and textual cues) and contextual exposure (estimated from exogenous signals including posting time, author activity, topical trends, and retrieval-based neighborhood statistics). Separate predictors are learned for each factor and integrated in the final estimate to make their roles explicit and to support improved interpretability and cross-platform transfer between image and video domains.

Significance. If the separation is achieved and empirically validated, the approach could meaningfully advance popularity prediction by yielding more interpretable models that avoid absorbing platform-specific visibility effects, potentially enabling stronger generalization across content types and platforms than existing content-centric methods.

major comments (2)
  1. [Abstract] Abstract: The manuscript outlines the intended architecture, motivation, and claimed benefits but supplies no experimental results, ablation studies, baselines, error bars, or validation details whatsoever, leaving the central claims of clean separation, improved interpretability, and robust cross-platform transfer unsupported by evidence.
  2. [Abstract] Abstract (context module description): Topical trends and retrieval-based neighborhood statistics are assigned to the context module, yet these signals are inherently dependent on content features. No disentanglement losses, orthogonality constraints, mutual-information minimization, or similar mechanisms are described to enforce separation between the content and context representation streams; without such safeguards, residual entanglement is likely and would undermine the claimed interpretability and transfer properties.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and describe the revisions we will make to improve clarity and support for the central claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The manuscript outlines the intended architecture, motivation, and claimed benefits but supplies no experimental results, ablation studies, baselines, error bars, or validation details whatsoever, leaving the central claims of clean separation, improved interpretability, and robust cross-platform transfer unsupported by evidence.

    Authors: The abstract provides a concise overview of the framework and motivations, as is conventional to keep it brief. The full manuscript contains a dedicated Experiments section with baseline comparisons, ablation studies on the content and context modules, cross-platform transfer results between image and video domains, and error bars from multiple runs that empirically support the separation benefits and improved generalization. To better align the abstract with these results, we will revise it to include a short summary of the key quantitative findings. revision: yes

  2. Referee: [Abstract] Abstract (context module description): Topical trends and retrieval-based neighborhood statistics are assigned to the context module, yet these signals are inherently dependent on content features. No disentanglement losses, orthogonality constraints, mutual-information minimization, or similar mechanisms are described to enforce separation between the content and context representation streams; without such safeguards, residual entanglement is likely and would undermine the claimed interpretability and transfer properties.

    Authors: We agree that topical trends and retrieval-based neighborhood statistics can carry indirect content dependencies, which risks some entanglement. The current design relies on distinct input modalities (cross-modal content features versus exogenous context signals) and separate predictor heads to promote separation. However, to more rigorously enforce disentanglement and bolster the interpretability and transfer claims, we will add an explicit regularization term, such as a feature orthogonality constraint or mutual-information minimization loss between the content and context streams, in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity in claimed derivation chain

full rationale

The paper introduces OmniTrend as a modeling framework that separates content attractiveness (via cross-modal cues) from contextual exposure (via exogenous signals) and integrates separate predictors. No equations, derivations, first-principles results, or predictions are described that reduce by construction to fitted inputs or self-definitions. The separation is motivated by domain reasoning about interpretability and transfer, with no self-citation load-bearing steps, uniqueness theorems, or ansatzes invoked in the provided text. This is a standard non-circular case of architectural design rather than a mathematical chain that collapses to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that popularity factors are separable and that the listed exogenous signals are sufficient to isolate exposure without leakage into content representations.

axioms (1)
  • domain assumption Content attractiveness and contextual exposure are separable and can be modeled independently using the chosen signals
    This separability is the foundational premise stated in the abstract for avoiding absorption of visibility effects.

pith-pipeline@v0.9.0 · 5458 in / 1181 out tokens · 52820 ms · 2026-05-07T13:54:25.646079+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

85 extracted references · 25 canonical work pages · 11 internal anchors

  1. [1]

    Fatma S Abousaleh, Wen-Huang Cheng, Neng-Hao Yu, and Yu Tsao. 2020. Multi- modal deep learning framework for image popularity prediction on social media. IEEE Transactions on Cognitive and Developmental Systems13, 3 (2020), 679–692

  2. [2]

    Anonymous. 2025. EvoPro: An Evolution-aware Prompt-augmented Framework for Micro-video Popularity Prediction. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval

  3. [3]

    Jiawei Chen, Hande Dong, Xiang Wang, Fuli Feng, Meng Wang, and Xiangnan He. 2023. Bias and debias in recommender system: A survey and future directions. ACM Transactions on Information Systems41, 3 (2023), 1–39

  4. [4]

    Jingyuan Chen, Xuemeng Song, Liqiang Nie, Xiang Wang, Hanwang Zhang, and Tat-Seng Chua. 2016. Micro tells macro: Predicting the popularity of micro-videos via a transductive model. InProceedings of the 24th ACM international conference on Multimedia. 898–907

  5. [5]

    Zhiwei Chen, Yupeng Hu, Zhiheng Fu, Zixu Li, Jiale Huang, Qinlei Huang, and Yinwei Wei. 2026. INTENT: Invariance and Discrimination-aware Noise Mitiga- tion for Robust Composed Image Retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 20463–20471

  6. [6]

    Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Xuemeng Song, and Liqiang Nie

  7. [7]

    InProceedings of the ACM International Conference on Multimedia

    OFFSET: Segmentation-based Focus Shift Revision for Composed Image Retrieval. InProceedings of the ACM International Conference on Multimedia. 6113–6122

  8. [8]

    Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Haokun Wen, and Weili Guan

  9. [9]

    InProceedings of the ACM International Conference on Multimedia

    HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Com- posed Video Retrieval. InProceedings of the ACM International Conference on Multimedia. 6143–6152

  10. [10]

    Justin Cheng, Lada Adamic, P Alex Dow, Jon Michael Kleinberg, and Jure Leskovec. 2014. Can cascades be predicted?. InProceedings of the 23rd inter- national conference on World wide web. 925–936

  11. [11]

    Zhangtao Cheng, Jienan Zhang, Xovee Xu, Goce Trajcevski, Ting Zhong, and Fan Zhou. 2024. Retrieval-augmented hypergraph for multimodal social media popularity prediction. InProceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining. 445–455

  12. [12]

    Zhangtao Cheng, Fan Zhou, Xovee Xu, Kunpeng Zhang, Goce Trajcevski, Ting Zhong, and Philip S. Yu. 2024. Information Cascade Popularity Prediction via Probabilistic Diffusion.IEEE Transactions on Knowledge and Data Engineering (2024)

  13. [13]

    Tsun-hin Cheung and Kin-man Lam. 2022. Crossmodal bipolar attention for multimodal classification on social media.Neurocomputing514 (2022), 1–12

  14. [14]

    Jiaxin Deng, Dong Shen, Shiyao Wang, Xiangyu Wu, Fan Yang, Guorui Zhou, and Gaofeng Meng. 2023. ContentCTR: Frame-level live streaming click-through rate prediction with multimodal transformer.arXiv preprint arXiv:2306.14392 (2023)

  15. [15]

    Zhiheng Fu, Yupeng Hu, Qianyun Yang, Shiqi Zhang, Zhiwei Chen, and Zixu Li

  16. [16]

    Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval

    Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval. arXiv:2604.19386 [cs.CV] https://arxiv.org/abs/ 2604.19386

  17. [17]

    Francesco Gelli, Tiberio Uricchio, Marco Bertini, Alberto Del Bimbo, and Shih- Fu Chang. 2015. Image popularity prediction in social media using sentiment and context features. InProceedings of the 23rd ACM international conference on Multimedia. 907–910

  18. [18]

    Shalini Ghosh, Oriol Vinyals, Brian Strope, Scott Roy, Tom Dean, and Larry Heck

  19. [19]

    Contextual lstm (clstm) models for large scale nlp tasks.arXiv preprint arXiv:1602.06291(2016)

  20. [20]

    Matan Haimovich, Elad Elisha, et al. 2022. Popularity Prediction for Social Media over Arbitrary Time Horizons.Proceedings of the VLDB Endowment(2022)

  21. [21]

    Chih-Chung Hsu, Chia-Ming Lee, Yu-Fan Lin, Yi-Shiuan Chou, Chih-Yu Jian, and Chi-Han Tsai. 2024. Revisiting Vision-Language Features Adaptation and Inconsistency for Social Media Popularity Prediction. InProceedings of the 32nd ACM International Conference on Multimedia. 11464–11469

  22. [22]

    Yupeng Hu, Zixu Li, Zhiwei Chen, Qinlei Huang, Zhiheng Fu, Mingzhu Xu, and Liqiang Nie. 2026. REFINE: Composed Video Retrieval via Shared and Differ- ential Semantics Enhancement.ACM Transactions on Multimedia Computing, Communications and Applications(2026)

  23. [23]

    Yangliu Hu, Zikai Song, Na Feng, Yawei Luo, Junqing Yu, Yi-Ping Phoebe Chen, and Wei Yang. 2025. SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding.arXiv preprint arXiv:2504.07745(2025)

  24. [24]

    Peter J Huber. 1992. Robust estimation of a location parameter. InBreakthroughs in statistics: Methodology and distribution. Springer, 492–518

  25. [25]

    Shuo Ji, Xiaodong Lu, Mingzhe Liu, Leilei Sun, Chuanren Liu, Bowen Du, and Hui Xiong. 2023. Community-based dynamic graph learning for popularity prediction. InProceedings of the 29th ACM SIGKDD conference on knowledge discovery and data mining. 930–940

  26. [26]

    Ruidong Jin, Xin Liu, and Tsuyoshi Murata. 2024. Predicting popularity trend in social media networks with multi-layer temporal graph neural networks. Complex & Intelligent Systems10, 4 (2024), 4713–4729

  27. [27]

    Peiguang Jing, Yuting Su, Liqiang Nie, Xu Bai, Jing Liu, and Meng Wang. 2017. Low-rank multi-view embedding learning for micro-video popularity prediction. IEEE Transactions on Knowledge and Data Engineering30, 8 (2017), 1519–1532

  28. [28]

    Xin Jing, Zeyu Shi, Zhangtao Cheng, Yichen Jing, Yuhuan Lu, Bangchao Deng, and Dingqi Yang. 2026. Modeling Multimodal Information Cascade on Social Media with Interpretable Mixture of Experts. InProceedings of the ACM Web Conference

  29. [29]

    Pratik Kayal, Pascal Mettes, Nima Dehmamy, and Minsu Park. 2025. Large Language Models Are Natural Video Popularity Predictors. InFindings of the Association for Computational Linguistics: ACL 2025. 11432–11464

  30. [30]

    Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree.Advances in neural information processing systems30 (2017)

  31. [31]

    Sami Khenissi and Olfa Nasraoui. 2020. Modeling and counteracting exposure bias in recommender systems.arXiv preprint arXiv:2001.04832(2020)

  32. [32]

    Aditya Khosla, Atish Das Sarma, and Raffay Hamid. 2014. What makes an image popular?. InProceedings of the 23rd international conference on World wide web. 867–876

  33. [33]

    Thorsten Krause, Alina Deriyeva, Jan H Beinke, Gerrit Y Bartels, and Oliver Thomas. 2024. Mitigating Exposure Bias in Recommender Systems—A Compar- ative Analysis of Discrete Choice Models.ACM Transactions on Recommender Systems3, 2 (2024), 1–37

  34. [34]

    Xin Lai, Yihong Zhang, and Wei Zhang. 2020. HyFea: Winning solution to social media popularity prediction for multimedia grand challenge 2020. InProceedings of the 28th ACM International Conference on Multimedia. 4565–4569

  35. [35]

    Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. InInternational conference on machine learning. PMLR, 12888–12900

  36. [36]

    Wenbing Li, Zikai Song, Jielei Zhang, Tianhao Zhao, Junkai Lin, Yiran Wang, and Wei Yang. 2026. Large Language Model as Token Compressor and Decompressor. arXiv:2603.25340 [cs.CL]

  37. [37]

    Wenbing Li, Zikai Song, Hang Zhou, Yunyao Zhang, Junqing Yu, and Wei Yang

  38. [38]

    LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing.arXiv preprint arXiv:2507.00029(2025)

  39. [39]

    Wenbing Li, Hang Zhou, Junqing Yu, Zikai Song, and Wei Yang. 2024. Coupled mamba: Enhanced multimodal fusion with coupled state space model.Advances in Neural Information Processing Systems37 (2024), 59808–59832

  40. [40]

    Zixu Li, Zhiwei Chen, Haokun Wen, Zhiheng Fu, Yupeng Hu, and Weili Guan

  41. [41]

    InProceedings of the AAAI Conference on Artificial Intelligence, Vol

    Encoder: Entity mining and modification relation binding for composed image retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 5101–5109

  42. [42]

    Zixu Li, Yupeng Hu, Zhiwei Chen, Qinlei Huang, Guozhi Qiu, Zhiheng Fu, and Meng Liu. 2026. ReTrack: Evidence-Driven Dual-Stream Directional Anchor MM ’26, November 10–14, 2026, Rio de Janeiro, Brazil First Author and Second Author Calibration Network for Composed Video Retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 23...

  43. [43]

    Zixu Li, Yupeng Hu, Zhiwei Chen, Shiqi Zhang, Qinlei Huang, Zhiheng Fu, and Yinwei Wei. 2026. HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 6762–6770

  44. [44]

    Zixu Li, Yupeng Hu, Zhiheng Fu, Zhiwei Chen, Yongqi Li, and Liqiang Nie. 2026. TEMA: Anchor the Image, Follow the Text for Multi-Modification Composed Image Retrieval. arXiv:2604.21806 [cs.CV] https://arxiv.org/abs/2604.21806

  45. [45]

    Yijun Liu, Wu Liu, Xiaoyan Gu, and Yongdong Zhang. 2025. PopSim: Social Net- work Simulation for Social Media Popularity Prediction. arXiv:2512.02533 [cs.SI]

  46. [46]

    Yunbo Long, Yuhan Liu, and Liming Xu. 2026. EmoMAS: Emotion-Aware Multi- Agent System for High-Stakes Edge-Deployable Negotiation with Bayesian Or- chestration. arXiv:2604.07003 [cs.AI] https://arxiv.org/abs/2604.07003

  47. [47]

    Yiwei Ma, Guohai Xu, Xiaoshuai Sun, Ming Yan, Ji Zhang, and Rongrong Ji. 2022. X-clip: End-to-end multi-grained contrastive learning for video-text retrieval. In Proceedings of the 30th ACM international conference on multimedia. 638–647

  48. [48]

    Mayank Meghawat, Satyendra Yadav, Debanjan Mahata, Yifang Yin, Rajiv Ratn Shah, and Roger Zimmermann. 2018. A multimodal approach to predict social media popularity. In2018 IEEE conference on multimedia information processing and retrieval (MIPR). IEEE, 190–195

  49. [49]

    Yongxin Ni, Yu Cheng, Xiangyan Liu, Junchen Fu, Youhua Li, Xiangnan He, Yongfeng Zhang, and Fajie Yuan. 2023. A Content-Driven Micro-Video Recom- mendation Dataset at Scale.arXiv preprint arXiv:2309.15379(2023)

  50. [50]

    Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748(2018)

  51. [51]

    Alessandro Ortis, Giovanni Maria Farinella, and Sebastiano Battiato. 2019. Pre- diction of social image popularity dynamics. InInternational Conference on Image Analysis and Processing. Springer, 572–582

  52. [52]

    Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Doro- gush, and Andrey Gulin. 2018. CatBoost: unbiased boosting with categorical features.Advances in neural information processing systems31 (2018)

  53. [53]

    Guozhi Qiu, Zhiwei Chen, Zixu Li, Qinlei Huang, Zhiheng Fu, Xuemeng Song, and Yupeng Hu. 2026. MELT: Improve Composed Image Retrieval via the Modification Frequentation-Rarity Balance Network.arXiv preprint arXiv:2603.29291(2026)

  54. [54]

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. InInternational conference on machine learning. PmLR, 8748–8763

  55. [55]

    Zikai Song, Run Luo, Lintao Ma, Ying Tang, Yi-Ping Phoebe Chen, Junqing Yu, and Wei Yang. 2025. Temporal Coherent Object Flow for Multi-Object Tracking. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 6978–6986

  56. [56]

    Zikai Song, Run Luo, Junqing Yu, Yi-Ping Phoebe Chen, and Wei Yang. 2023. Compact transformer tracker with correlative masked modeling. InProceedings of the AAAI conference on artificial intelligence, Vol. 37. 2321–2329

  57. [57]

    Zikai Song, Ying Tang, Run Luo, Lintao Ma, Junqing Yu, Yi-Ping Phoebe Chen, and Wei Yang. 2024. Autogenic language embedding for coherent point tracking. In Proceedings of the 32nd ACM International Conference on Multimedia. 2021–2030

  58. [58]

    Zikai Song, Junqing Yu, Yi-Ping Phoebe Chen, and Wei Yang. 2022. Transformer tracking with cyclic shifting window attention. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8791–8800

  59. [59]

    Zikai Song, Junqing Yu, Yi-Ping Phoebe Chen, Wei Yang, and Xinchao Wang

  60. [60]

    Hypergraph-State Collaborative Reasoning for Multi-Object Tracking

    Hypergraph-State Collaborative Reasoning for Multi-Object Tracking. arXiv:2604.12665 [cs.CV] https://arxiv.org/abs/2604.12665

  61. [61]

    Shisong Tang, Qing Li, Xiaoteng Ma, Ci Gao, Dingmin Wang, Yong Jiang, Qian Ma, Aoyang Zhang, and Hechang Chen. 2022. Knowledge-based temporal fusion network for interpretable online video popularity prediction. InProceedings of the ACM Web Conference 2022. 2879–2887

  62. [62]

    Tomasz Trzciński and Przemysław Rokita. 2017. Predicting popularity of online videos using support vector regression.IEEE Transactions on Multimedia19, 11 (2017), 2561–2570

  63. [63]

    Massimiliano Viola. 2021. Instagram images and videos popularity prediction: a deep learning-based approach. (2021)

  64. [64]

    Jie Wang, Zitong Wang, Yan Peng, and Bowen Hao. 2024. Research on multimodal social media information popularity prediction based on large language model. Tongxin Xuebao/Journal on Communication45, 11 (2024)

  65. [65]

    Bo Wu et al. 2024. SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge.arXiv preprint arXiv:2405.10497(2024)

  66. [66]

    Jiayi Xie, Yaochen Zhu, and Zhenzhong Chen. 2021. Micro-video popularity prediction via multimodal variational information bottleneck.IEEE Transactions on Multimedia25 (2021), 24–37

  67. [67]

    Jie Xu, Mihaela Van Der Schaar, Jiangchuan Liu, and Haitao Li. 2014. Forecasting popularity of videos using social media.IEEE Journal of Selected Topics in Signal Processing9, 2 (2014), 330–343

  68. [68]

    Xovee Xu, Shuojun Lin, Fan Zhou, and Jingkuan Song. 2026. Learning to Curate Context: Jointly Optimizing Retrieval and Prediction for Multimodal Social Media Popularity. InProceedings of the AAAI Conference on Artificial Intelligence

  69. [69]

    Xovee Xu, Yifan Zhang, Fan Zhou, and Jingkuan Song. 2025. Improving Mul- timodal Social Media Popularity Prediction via Selective Retrieval Knowledge Augmentation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 932–940

  70. [70]

    Xu et al

    Y. Xu et al. 2025. SMTPD: A New Benchmark for Temporal Prediction of Social Media Popularity. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

  71. [71]

    Qianyun Yang, Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, and Liqiang Nie. 2026. STABLE: Efficient Hybrid Nearest Neighbor Search via Magnitude- Uniformity and Cardinality-Robustness.arXiv preprint arXiv:2604.01617(2026)

  72. [72]

    Liliang Ye, Yunyao Zhang, Yafeng Wu, Yi-Ping Phoebe Chen, Junqing Yu, Wei Yang, and Zikai Song. 2025. MVP: Winning Solution to SMP Challenge 2025 Video Track.arXiv preprint arXiv:2507.00950(2025)

  73. [73]

    Haohan Yuan, Sukhwa Hong, and Haopeng Zhang. 2026. Strucsum: Graph- structured reasoning for long document extractive summarization with llms. In Findings of the Association for Computational Linguistics: EACL 2026. 3708–3721

  74. [74]

    Haohan Yuan and Haopeng Zhang. 2025. Domainsum: A hierarchical benchmark for fine-grained domain shift in abstractive text summarization. InFindings of the Association for Computational Linguistics: NAACL 2025. 2219–2231

  75. [75]

    Haohan Yuan and Haopeng Zhang. 2025. Understanding LLM Reasoning for Abstractive Summarization.arXiv preprint arXiv:2512.03503(2025)

  76. [76]

    Mingyu Zhang, Zixu Li, Zhiwei Chen, Zhiheng Fu, Xiaowei Zhu, Jiajia Nie, Yinwei Wei, and Yupeng Hu. 2026. Hint: Composed image retrieval with dual- path compositional contextualized network.arXiv preprint arXiv:2603.26341 (2026)

  77. [77]

    Xinglang Zhang, Yunyao Zhang, ZeLiang Chen, Junqing Yu, Wei Yang, and Zikai Song. 2026. Logical Phase Transitions: Understanding Collapse in LLM Logical Reasoning. arXiv:2601.02902 [cs.AI] https://arxiv.org/abs/2601.02902

  78. [78]

    Yunyao Zhang, Yihao Ai, Zuocheng Ying, Qirui Mi, Junqing Yu, Wei Yang, and Zikai Song. 2026. Coupling Macro Dynamics and Micro States for Long-Horizon Social Simulation. arXiv:2604.05516 [cs.SI] https://arxiv.org/abs/2604.05516

  79. [79]

    Yunyao Zhang, Zikai Song, Hang Zhou, Wenfeng Ren, Yi-Ping Phoebe Chen, Junqing Yu, and Wei Yang. 2025. 𝐺𝐴−𝑆 3: Comprehensive Social Network Simulation with Group Agents. InFindings of the Association for Computational Linguistics: ACL 2025. 8950–8970

  80. [80]

    Yunyao Zhang, Zuocheng Ying, Xinglang Zhang, Junqing Yu, Peng Fang, Xu Chen, Wei Yang, and Zikai Song. 2026. IntervenSim: Intervention-Aware Social Network Simulation for Opinion Dynamics. arXiv:2604.06600 [cs.SI] https: //arxiv.org/abs/2604.06600

Showing first 80 references.