RecRM-Bench is a new large-scale benchmark dataset and framework for multi-dimensional reward modeling in agentic recommender systems, spanning instruction following, factual consistency, query-item relevance, and user behavior prediction.
Imperfect Response
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5representative citing papers
A Llama-based model trained on serialized user stories unifies item, carousel, and search ranking and outperforms specialist baselines offline while improving some online metrics and reducing latency.
Uni-OPD unifies on-policy distillation across LLMs and MLLMs with dual-perspective strategies that promote student exploration and enforce order-consistent teacher supervision based on outcome rewards.
TriAlignGR proposes a triangular multitask alignment framework with cross-modal semantic alignment, deep interest mining via chain-of-thought, and joint training on eight tasks to address content degradation and semantic opacity in Semantic ID-based generative recommendation.
citing papers explorer
-
RecRM-Bench: Benchmarking Multidimensional Reward Modeling for Agentic Recommender Systems
RecRM-Bench is a new large-scale benchmark dataset and framework for multi-dimensional reward modeling in agentic recommender systems, spanning instruction following, factual consistency, query-item relevance, and user behavior prediction.
-
TubiFM: Unified Item, Carousel, and Search Ranking for Streaming Discovery
A Llama-based model trained on serialized user stories unifies item, carousel, and search ranking and outperforms specialist baselines offline while improving some online metrics and reducing latency.
-
Uni-OPD: Unifying On-Policy Distillation with a Dual-Perspective Recipe
Uni-OPD unifies on-policy distillation across LLMs and MLLMs with dual-perspective strategies that promote student exploration and enforce order-consistent teacher supervision based on outcome rewards.
-
TriAlignGR: Triangular Multitask Alignment with Multimodal Deep Interest Mining for Generative Recommendation
TriAlignGR proposes a triangular multitask alignment framework with cross-modal semantic alignment, deep interest mining via chain-of-thought, and joint training on eight tasks to address content degradation and semantic opacity in Semantic ID-based generative recommendation.
- Echoes in Filter Bubble: Diagnosing and Curing Popularity Bias in Generative Recommenders