{"paper":{"title":"TriAlignGR: Triangular Multitask Alignment with Multimodal Deep Interest Mining for Generative Recommendation","license":"http://creativecommons.org/licenses/by-nc-nd/4.0/","headline":"TriAlignGR embeds visual semantics directly into Semantic IDs to fix content loss and opacity in generative recommendation.","cross_cats":[],"primary_cat":"cs.IR","authors_text":"Hao Peng, Jinze Wang, Rongfeng Guo, Yangchen Zeng, Zhenyu Yu, Zhiyuan Hu","submitted_at":"2026-05-05T11:42:14Z","abstract_excerpt":"We introduce TriAlignGR, a unified multitask-multimodal framework for generative recommendation that establishes two-stage multimodal semantic propagation: (i) encoding visual semantics directly into SIDs via multimodal embeddings, and (ii) enabling the model to decode these semantics through visual description tasks. Existing Semantic ID (SID) pipelines suffer from two fundamental but underexplored problems: \\textbf{SID Content Degradation (SCD)}, where cascaded encoding and residual quantization discard critical multimodal and interest-level semantics; and \\textbf{SID Semantic Opacity (SSO)}"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"TriAlignGR resolves both SID Content Degradation (SCD) and SID Semantic Opacity (SSO) through three tightly integrated components: Cross-Modal Semantic Alignment (CMSA), Multimodal Deep Interest Mining (MDIM), and Triangular Multitask (TMT) that jointly trains on eight complementary generation tasks including two novel visual-semantic tasks.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That VLM-generated descriptions and multimodal embeddings preserve critical semantics without introducing new noise or bias, and that joint training on the eight tasks improves rather than interferes with the core generative recommendation performance.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"TriAlignGR integrates visual content and latent user interests into Semantic IDs via cross-modal alignment, CoT-based interest mining, and triangular multitask training to address content degradation and semantic opacity in generative recommenders.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"TriAlignGR embeds visual semantics directly into Semantic IDs to fix content loss and opacity in generative recommendation.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"a0f0e5342cf2dd936ab96987201513292e77b38740152a09d115493e8009c219"},"source":{"id":"2605.05249","kind":"arxiv","version":2},"verdict":{"id":"39bae9e3-1f22-4544-ad0c-874867b523c1","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-08T18:41:19.812981Z","strongest_claim":"TriAlignGR resolves both SID Content Degradation (SCD) and SID Semantic Opacity (SSO) through three tightly integrated components: Cross-Modal Semantic Alignment (CMSA), Multimodal Deep Interest Mining (MDIM), and Triangular Multitask (TMT) that jointly trains on eight complementary generation tasks including two novel visual-semantic tasks.","one_line_summary":"TriAlignGR integrates visual content and latent user interests into Semantic IDs via cross-modal alignment, CoT-based interest mining, and triangular multitask training to address content degradation and semantic opacity in generative recommenders.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That VLM-generated descriptions and multimodal embeddings preserve critical semantics without introducing new noise or bias, and that joint training on the eight tasks improves rather than interferes with the core generative recommendation performance.","pith_extraction_headline":"TriAlignGR embeds visual semantics directly into Semantic IDs to fix content loss and opacity in generative recommendation."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.05249/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"doi_compliance","ran_at":"2026-05-19T15:07:36.769715Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"597cd0ecbac65541c2627ee93f2b1aaa578dbcbba62ff1aa22052c75b3176472"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":3,"snapshot_sha256":"a5e3e683e988dbbbd906f926081cb2c4040433dff572a180d1c437cd1baf1c3e"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}