pith. sign in

arxiv: 2602.10445 · v3 · pith:HWZRLOEUnew · submitted 2026-02-11 · 💻 cs.IR · cs.LG

End-to-End Semantic ID Generation for Generative Advertisement Recommendation

Pith reviewed 2026-05-22 11:47 UTC · model grok-4.3

classification 💻 cs.IR cs.LG
keywords semantic IDgenerative recommendationend-to-end optimizationcontrastive learningadvertisement recommendationresidual quantizationdiscrete tokenization
0
0 comments X

The pith

UniSID generates semantic IDs for ads by jointly optimizing embeddings and discrete IDs end-to-end from raw data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents UniSID as a framework that creates Semantic IDs for generative recommendation in advertising by learning embeddings and discrete tokens together instead of in separate stages. Existing residual quantization methods first turn items into embeddings and then compress them into IDs, which introduces misalignment between the original semantics and the final tokens plus accumulated errors from the cascade. UniSID removes that separation by training both representations directly on advertising data, using multi-granularity contrastive learning to keep fine distinctions across ID levels and a summary-based reconstruction task to capture high-level ad meaning. A sympathetic reader would care because better SIDs should produce more accurate next-token predictions in generative ad systems, directly raising metrics such as hit rate.

Core claim

By jointly optimizing embeddings and SIDs in an end-to-end manner from raw advertising data, together with multi-granularity contrastive learning and summary-based ad reconstruction, semantic information flows directly into the discrete SID space. This removes the objective misalignment and error accumulation that arise in the conventional two-stage residual quantization pipeline, allowing the generated SIDs to better preserve item semantics for downstream generative recommendation.

What carries the argument

UniSID, the unified end-to-end SID generation framework that optimizes embeddings and discrete SIDs jointly from raw data using multi-granularity contrastive alignment and summary-based reconstruction.

If this is right

  • SIDs preserve finer semantic details without degradation from separate embedding and quantization stages.
  • Multi-granularity contrastive learning produces consistent alignment across different levels of SID granularity.
  • Summary-based reconstruction encourages SIDs to encode high-level semantic information absent from explicit ad features.
  • Downstream generative ad models achieve higher hit rates, with observed gains up to 4.62 percent over prior SID methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The joint-optimization pattern could reduce reliance on separate pre-training pipelines in other large-scale recommendation systems.
  • Similar unified training might be tested for semantic tokenization tasks outside advertising, such as product or content recommendation.
  • If the contrastive and reconstruction objectives generalize, they could serve as drop-in components for improving discrete representations in generative models.

Load-bearing premise

Joint end-to-end optimization of embeddings and SIDs from raw advertising data will resolve misalignment and error accumulation without creating new instabilities or overfitting to the training distribution.

What would settle it

Training and evaluating UniSID on a new advertising dataset with shifted user behavior or item distribution and finding no improvement or a performance drop versus the strongest residual-quantization baseline in hit-rate metrics.

Figures

Figures reproduced from arXiv: 2602.10445 by Enming Zhang, Hao Wang, Huan Yu, Jiawei Jiang, Jie Jiang, Jingwen Wang, Jun Zhang, Xiao Yan, Xinxun Zhang, Yuling Xiong, Yuxiang Wang.

Figure 1
Figure 1. Figure 1: Two-stage cascaded compression of current meth [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The framework of UniSID heads to generate the SID token and item embedding, respectively. The generated SIDs are then optimized through multi-granularity contrastive learning to enforce semantic consistency at different SID granularities. In addition, a summary-based ad reconstruction mechanism further compels the SIDs to capture high-level semantic information. In the following sections, we introduce each… view at source ↗
Figure 3
Figure 3. Figure 3: Comparison between joint training and task [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The impact of hyperparameter 𝜆 on SID quality in the Ad-60W dataset. loss 𝜆 is set to 0.1. All baselines employ the same setting for a fair comparison. B More Discussion B.1 Efficiency Analysis We analyze the training efficiency of UniSID in comparison with tra￾ditional two-stage RQ-based SID generation methods. In two-stage approaches, item embeddings are first learned and then discretized into SIDs throu… view at source ↗
Figure 5
Figure 5. Figure 5: Prompt example of the ad attributes summary. [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Samples for multi-granularity contrastive learning [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
read the original abstract

Generative Recommendation (GR) has excelled by framing recommendation as next-token prediction. This paradigm relies on Semantic IDs (SIDs) to tokenize large-scale items into discrete sequences. Existing GR approaches predominantly generate SIDs via Residual Quantization (RQ), where items are encoded into embeddings and then quantized to discrete SIDs. However, this paradigm suffers from inherent limitations: 1) Objective misalignment and semantic degradation stemming from the two-stage compression; 2) Error accumulation inherent in the structure of RQ. To address these limitations, we propose UniSID, a Unified SID generation framework for generative advertisement recommendation. Specifically, we jointly optimize embeddings and SIDs in an end-to-end manner from raw advertising data, enabling semantic information to flow directly into the SID space and thus addressing the inherent limitations of the two-stage cascading compression paradigm. To capture fine-grained semantics, a multi-granularity contrastive learning strategy is introduced to align distinct items across SID levels. Finally, a summary-based ad reconstruction mechanism is proposed to encourage SIDs to capture high-level semantic information that is not explicitly present in advertising contexts. Experiments demonstrate that UniSID consistently outperforms state-of-the-art SID generation methods, yielding up to a 4.62% improvement in Hit Rate metrics across downstream advertising scenarios compared to the strongest baseline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes UniSID, a unified end-to-end framework for generating Semantic IDs (SIDs) in generative advertisement recommendation. It identifies limitations in existing Residual Quantization (RQ) approaches—objective misalignment, semantic degradation, and error accumulation—and addresses them by jointly optimizing embeddings and discrete SIDs directly from raw advertising data, augmented by multi-granularity contrastive learning for fine-grained semantics and summary-based ad reconstruction for high-level semantics. Experiments claim consistent outperformance over state-of-the-art SID methods, with up to 4.62% Hit Rate gains in downstream advertising tasks.

Significance. If the end-to-end construction demonstrably eliminates two-stage misalignment and RQ error accumulation without new instabilities, the work would represent a substantive advance in generative recommendation by enabling more semantically faithful discrete tokenization. The multi-granularity contrastive and reconstruction objectives are well-motivated extensions that could generalize beyond advertising.

major comments (2)
  1. The central claim that joint optimization resolves RQ error accumulation rests on the differentiability of the discrete SID generation step. The manuscript provides no description of the gradient approximation employed (straight-through estimator, Gumbel-softmax, etc.), its temperature schedule, or any ablation isolating its contribution versus the contrastive and reconstruction losses. Without this, it is impossible to confirm that quantization bias is not simply reintroduced, directly undermining the stated advantage over two-stage RQ.
  2. The abstract reports a 4.62% Hit Rate lift but supplies no experimental details on datasets, baseline implementations, statistical significance tests, ablation results, or hyper-parameter controls. These omissions are load-bearing because the claimed superiority of the end-to-end paradigm cannot be evaluated without evidence that improvements are not attributable to auxiliary losses or tuning differences alone.
minor comments (2)
  1. Notation for SID levels and granularity in the contrastive loss could be clarified with an explicit equation or diagram.
  2. The summary-based reconstruction mechanism would benefit from a concrete example of the summary generation process.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive feedback. We address each major comment below and will revise the manuscript to improve clarity and completeness.

read point-by-point responses
  1. Referee: The central claim that joint optimization resolves RQ error accumulation rests on the differentiability of the discrete SID generation step. The manuscript provides no description of the gradient approximation employed (straight-through estimator, Gumbel-softmax, etc.), its temperature schedule, or any ablation isolating its contribution versus the contrastive and reconstruction losses. Without this, it is impossible to confirm that quantization bias is not simply reintroduced, directly undermining the stated advantage over two-stage RQ.

    Authors: We appreciate this observation on the need for explicit technical details. The manuscript will be revised to include a full description of the gradient approximation used to enable end-to-end differentiability through discrete SID generation, the associated temperature schedule, and a dedicated ablation that isolates the estimator's contribution from the multi-granularity contrastive and summary-based reconstruction losses. These additions will directly substantiate that the joint optimization mitigates RQ-style error accumulation. revision: yes

  2. Referee: The abstract reports a 4.62% Hit Rate lift but supplies no experimental details on datasets, baseline implementations, statistical significance tests, ablation results, or hyper-parameter controls. These omissions are load-bearing because the claimed superiority of the end-to-end paradigm cannot be evaluated without evidence that improvements are not attributable to auxiliary losses or tuning differences alone.

    Authors: We agree that the abstract is necessarily concise. The full manuscript already contains the requested experimental details in the Experiments section (datasets, baseline re-implementations, hyper-parameter tables, and ablation studies). To address the concern directly, we will revise the paper to add statistical significance testing (e.g., paired t-tests with p-values) and a short experimental protocol summary near the results, ensuring readers can clearly attribute gains to the end-to-end framework rather than auxiliary components. revision: partial

Circularity Check

0 steps flagged

UniSID end-to-end optimization and auxiliary losses form an independent empirical proposal

full rationale

The paper defines UniSID via explicit new components (joint embedding-SID optimization from raw data, multi-granularity contrastive alignment, summary-based reconstruction) that are not algebraically equivalent to prior RQ stages or to any fitted quantity the authors later call a prediction. Performance gains are reported from downstream experiments rather than derived by construction from the method equations themselves. No self-citation chain, uniqueness theorem, or ansatz smuggling appears in the load-bearing steps of the abstract or method description.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so no concrete free parameters, axioms, or invented entities can be extracted; the approach appears to rely on standard neural-network training components rather than new postulated entities or unstated mathematical axioms.

pith-pipeline@v0.9.0 · 5787 in / 1146 out tokens · 44719 ms · 2026-05-22T11:47:19.082344+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. MLPs are Efficient Distilled Generative Recommenders

    cs.IR 2026-05 unverdicted novelty 7.0

    SID-MLP distills autoregressive generative recommenders into efficient position-specific MLP heads for Semantic ID tasks, achieving 8.74x faster inference with matching accuracy.

  2. Unified Value Alignment for Generative Recommendation in Industrial Advertising

    cs.IR 2026-05 unverdicted novelty 5.0

    UniVA unifies value alignment in generative recommendation via a Commercial SID tokenizer, eCPM-aware RL decoder, and personalized beam search, reporting 37% offline Hit Rate gains and 1.5% online GMV lift on Tencent ...

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · cited by 2 Pith papers · 5 internal anchors

  1. [1]

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report.arXiv preprint arXiv:2303.08774 (2023)

  2. [2]

    Keqin Bao, Jizhi Zhang, Wenjie Wang, Yang Zhang, Zhengyi Yang, Yanchen Luo, Chong Chen, Fuli Feng, and Qi Tian. 2025. A bi-step grounding paradigm for large language models in recommendation systems.ACM Transactions on Recommender Systems3, 4 (2025), 1–27

  3. [3]

    Zheng Chai, Qin Ren, Xijun Xiao, Huizhi Yang, Bo Han, Sijun Zhang, Di Chen, Hui Lu, Wenlin Zhao, Lele Yu, et al . 2025. Longer: Scaling up long sequence modeling in industrial recommenders. InProceedings of the Nineteenth ACM Conference on Recommender Systems. 247–256

  4. [4]

    Junyi Chen, Lu Chi, Bingyue Peng, and Zehuan Yuan. 2024. Hllm: Enhancing sequential recommendations via hierarchical large language models for item and user modeling.arXiv preprint arXiv:2409.12740(2024)

  5. [5]

    Ruidong Han, Bin Yin, Shangyu Chen, He Jiang, Fei Jiang, Xiang Li, Chi Ma, Mincong Huang, Xiaoguang Li, Chunzhen Jing, et al . 2025. Mtgr: Industrial- scale generative recommendation framework in meituan. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 5731–5738

  6. [6]

    Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. Lightgcn: Simplifying and powering graph convolution network for recommendation. InProceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 639–648

  7. [8]

    Yupeng Hou, Zhankui He, Julian McAuley, and Wayne Xin Zhao. 2023. Learning vector-quantized item representation for transferable sequential recommenders. InProceedings of the ACM Web Conference 2023. 1162–1171

  8. [9]

    Yupeng Hou, Jiacheng Li, Ashley Shin, Jinsung Jeon, Abhishek Santhanam, Wei Shao, Kaveh Hassani, Ning Yao, and Julian McAuley. 2025. Generating long semantic ids in parallel for recommendation. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2. 956–966

  9. [10]

    Wenyue Hua, Shuyuan Xu, Yingqiang Ge, and Yongfeng Zhang. 2023. How to index item ids for recommendation foundation models. InProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region. 195–204

  10. [11]

    Yanhua Huang, Yuqi Chen, Xiong Cao, Rui Yang, Mingliang Qi, Yinghao Zhu, Qingchang Han, Yaowei Liu, Zhaoyu Liu, Xuefeng Yao, et al . 2025. Towards Large-scale Generative Ranking.arXiv preprint arXiv:2505.04180(2025)

  11. [12]

    Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recom- mendation. In2018 IEEE international conference on data mining (ICDM). IEEE, 197–206

  12. [13]

    Xiaopeng Li, Bo Chen, Junda She, Shiteng Cao, You Wang, Qinlin Jia, Haiying He, Zheli Zhou, Zhao Liu, Ji Liu, et al. 2025. A survey of generative recommendation from a tri-decoupled perspective: Tokenization, architecture, and optimization. (2025)

  13. [14]

    Yongqi Li, Xinyu Lin, Wenjie Wang, Fuli Feng, Liang Pang, Wenjie Li, Liqiang Nie, Xiangnan He, and Tat-Seng Chua. 2024. A survey of generative search and recom- mendation in the era of large language models.arXiv preprint arXiv:2404.16924 (2024)

  14. [15]

    Yikun Liu, Yajie Zhang, Jiayin Cai, Xiaolong Jiang, Yao Hu, Jiangchao Yao, Yanfeng Wang, and Weidi Xie. 2025. Lamra: Large multimodal model as your advanced retrieval assistant. InProceedings of the Computer Vision and Pattern Recognition Conference. 4015–4025

  15. [16]

    Xinchen Luo, Jiangxia Cao, Tianyu Sun, Jinkai Yu, Rui Huang, Wei Yuan, Hezheng Lin, Yichen Zheng, Shiyao Wang, Qigen Hu, et al . 2025. Qarm: Quantitative alignment multi-modal recommendation at kuaishou. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 5915– 5922

  16. [17]

    Rui Meng, Ziyan Jiang, Ye Liu, Mingyi Su, Xinyi Yang, Yuepeng Fu, Can Qin, Zeyuan Chen, Ran Xu, Caiming Xiong, et al . 2025. Vlm2vec-v2: Advancing multimodal embedding for videos, images, and visual documents.arXiv preprint arXiv:2507.04590(2025)

  17. [18]

    Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. InProceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). 188–197

  18. [19]

    Aleksandr V Petrov and Craig Macdonald. 2024. RecJPQ: training large-catalogue sequential recommenders. InProceedings of the 17th ACM International Conference on Web Search and Data Mining. 538–547

  19. [20]

    Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Tran, Jonah Samost, et al

  20. [21]

    Recommender systems with generative retrieval.Advances in Neural Information Processing Systems36 (2023), 10299–10315

  21. [22]

    Anima Singh, Trung Vu, Nikhil Mehta, Raghunandan Keshavan, Maheswaran Sathiamoorthy, Yilin Zheng, Lichan Hong, Lukasz Heldt, Li Wei, Devansh Tandon, et al. 2024. Better generalization with semantic ids: A case study in ranking for recommendations. InProceedings of the 18th ACM Conference on Recommender Systems. 1039–1044

  22. [23]

    Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang

  23. [24]

    InProceedings of the 28th ACM international conference on information and knowledge management

    BERT4Rec: Sequential recommendation with bidirectional encoder rep- resentations from transformer. InProceedings of the 28th ACM international conference on information and knowledge management. 1441–1450

  24. [25]

    Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yas- mine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhos- ale, et al. 2023. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288(2023)

  25. [26]

    Hao Wang, Wei Guo, Luankang Zhang, Jin Yao Chin, Yufei Ye, Huifeng Guo, Yong Liu, Defu Lian, Ruiming Tang, and Enhong Chen. 2025. Generative large recom- mendation models: Emerging trends in llms for recommendation. InCompanion Proceedings of the ACM on Web Conference 2025. 49–52

  26. [27]

    Wenjie Wang, Honghui Bao, Xinyu Lin, Jizhi Zhang, Yongqi Li, Fuli Feng, See- Kiong Ng, and Tat-Seng Chua. 2024. Learnable item tokenization for generative recommendation. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. 2400–2409

  27. [28]

    Yejing Wang, Shengyu Zhou, Jinyu Lu, Ziwei Liu, Langming Liu, Maolin Wang, Wenlin Zhang, Feng Li, Wenbo Su, Pengjie Wang, et al. 2025. NEZHA: A Zero- sacrifice and Hyperspeed Decoding Architecture for Generative Recommenda- tions.arXiv preprint arXiv:2511.18793(2025)

  28. [29]

    Bencheng Yan, Shilei Liu, Zhiyuan Zeng, Zihao Wang, Yizhen Zhang, Yujin Yuan, Langming Liu, Jiaqi Liu, Di Wang, Wenbo Su, et al. 2025. Unlocking Scaling Law in Industrial Recommendation Systems with a Three-step Paradigm based Large User Model.arXiv preprint arXiv:2502.08309(2025)

  29. [30]

    Wencai Ye, Mingjie Sun, Shuhang Chen, Wenjin Wu, and Peng Jiang. 2025. Align3GR: Unified Multi-Level Alignment for LLM-based Generative Recom- mendation.arXiv preprint arXiv:2511.11255(2025)

  30. [31]

    Chao Yi, Dian Chen, Gaoyang Guo, Jiakai Tang, Jian Wu, Jing Yu, Mao Zhang, Sunhao Dai, Wen Chen, Wenjun Yang, et al. 2025. Recgpt technical report.arXiv preprint arXiv:2507.22879(2025)

  31. [32]

    Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhao- jie Gong, Fangda Gu, Jiayuan He, et al. 2024. Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations. In International Conference on Machine Learning. PMLR, 58484–58509

  32. [33]

    Jun Zhang, Yi Li, Yue Liu, Changping Wang, Yuan Wang, Yuling Xiong, Xun Liu, Haiyang Wu, Qian Li, Enming Zhang, et al. 2025. GPR: Towards a Generative Pre-trained One-Model Paradigm for Large-Scale Advertising Recommendation. arXiv preprint arXiv:2511.10138(2025)

  33. [34]

    Xin Zhang, Yanzhao Zhang, Wen Xie, Mingxin Li, Ziqi Dai, Dingkun Long, Pengjun Xie, Meishan Zhang, Wenjie Li, and Min Zhang. 2024. GME: Im- proving Universal Multimodal Retrieval by Multimodal LLMs.arXiv preprint arXiv:2412.16855(2024)

  34. [35]

    Zhaoqi Zhang, Haolei Pei, Jun Guo, Tianyu Wang, Yufei Feng, Hui Sun, Shaowei Liu, and Aixin Sun. 2025. OneTrans: Unified Feature Interaction and Sequence Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Jiang et al. Modeling with One Transformer in Industrial Recommender.arXiv preprint arXiv:2510.26104(2025)

  35. [36]

    Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ming Chen, and Ji-Rong Wen. 2024. Adapting large language models by integrating collaborative semantics for recommendation. In2024 IEEE 40th International Conference on Data Engineering (ICDE). IEEE, 1435–1448

  36. [37]

    Guorui Zhou, Jiaxin Deng, Jinghao Zhang, Kuo Cai, Lejian Ren, Qiang Luo, Qian- qian Wang, Qigen Hu, Rui Huang, Shiyao Wang, et al. 2025. OneRec Technical Report.arXiv preprint arXiv:2506.13695(2025)

  37. [38]

    Guorui Zhou, Hengrui Hu, Hongtao Cheng, Huanjie Wang, Jiaxin Deng, Jinghao Zhang, Kuo Cai, Lejian Ren, Lu Ren, Liao Yu, et al. 2025. Onerec-v2 technical report.arXiv preprint arXiv:2508.20900(2025)

  38. [39]

    Hao Zhou, Chengming Hu, Ye Yuan, Yufei Cui, Yili Jin, Can Chen, Haolun Wu, Dun Yuan, Li Jiang, Di Wu, et al. 2024. Large language model (llm) for telecom- munications: A comprehensive survey on principles, key techniques, and oppor- tunities.IEEE Communications Surveys & Tutorials27, 3 (2024), 1955–2005. A Experiments A.1 Datasets We provide a detailed il...

  39. [40]

    If key information such as the target audience, product selling points, or promotion strategy can be inferred from the image or text, briefly include them in a one-sentence summary

  40. [41]

    One-sentence summary (only include core information from the advertisement image and text, describing the main message, selling points, target audience, promotion strategy, etc

    The output format should be: The advertised content is advertised object, the industry is industry, and the first-level category is first-level category. One-sentence summary (only include core information from the advertisement image and text, describing the main message, selling points, target audience, promotion strategy, etc. Do not describe image det...

  41. [42]

    Lowest price ever — buy now!

    The summary should be concise and precise, and avoid speculation beyond the advertisement content. Example: For an advertisement selling a floral skirt with the title "Lowest price ever — buy now!", the summary can be: The advertised content is a floral skirt, the industry is general e-commerce, and the first-level category is women’s clothing. The advert...