RecRM-Bench is a new large-scale benchmark dataset and framework for multi-dimensional reward modeling in agentic recommender systems, spanning instruction following, factual consistency, query-item relevance, and user behavior prediction.
Title resolution pending
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
ReformIR adaptively prioritizes reformulations and documents with a surrogate model guided by ranker feedback to boost recall while suppressing drift under fixed reranking budgets.
Releases TencentGR-1M and TencentGR-10M datasets with baselines for all-modality generative recommendation in advertising, including weighted evaluation for conversions.
The authors introduce aspect-aware datasets GoldRiM and SilverRiM for math papers and AchGNN, a heterogeneous GNN that outperforms prior methods by jointly modeling textual semantics, citations, and author lineage across aspects.
A jointly learned hierarchical index with cross-attention and residual quantization scales exact retrieval in foundational recommendation models, deployed at Meta with additional performance from test-time training on index nodes.
VAC replaces scalar rewards with natural language feedback in an alternating training loop between a feedback model and a policy model, yielding better personalized QA on the LaMP-QA benchmark.
Introduces progressive visualization for comparing causal discovery algorithms and comparative graph layouts for analyzing multi-outcome causal graphs in healthcare.
Structured negative mining with taxonomy and LLM judges improves offline category accuracy by 2.6% in IKEA search but yields no significant online engagement gains due to prevalent zero-click user behavior.
citing papers explorer
-
RecRM-Bench: Benchmarking Multidimensional Reward Modeling for Agentic Recommender Systems
RecRM-Bench is a new large-scale benchmark dataset and framework for multi-dimensional reward modeling in agentic recommender systems, spanning instruction following, factual consistency, query-item relevance, and user behavior prediction.
-
When More Reformulations Hurt: Avoiding Drift using Ranker Feedback
ReformIR adaptively prioritizes reformulations and documents with a surrogate model guided by ranker feedback to boost recall while suppressing drift under fixed reranking budgets.
-
Tencent Advertising Algorithm Challenge 2025: All-Modality Generative Recommendation
Releases TencentGR-1M and TencentGR-10M datasets with baselines for all-modality generative recommendation in advertising, including weighted evaluation for conversions.
-
Aspect-Aware Content-Based Recommendations for Mathematical Research Papers
The authors introduce aspect-aware datasets GoldRiM and SilverRiM for math papers and AchGNN, a heterogeneous GNN that outperforms prior methods by jointly modeling textual semantics, citations, and author lineage across aspects.
-
Efficient Retrieval Scaling with Hierarchical Indexing for Large Scale Recommendation
A jointly learned hierarchical index with cross-attention and residual quantization scales exact retrieval in foundational recommendation models, deployed at Meta with additional performance from test-time training on index nodes.
-
Learning from Natural Language Feedback for Personalized Question Answering
VAC replaces scalar rewards with natural language feedback in an alternating training loop between a feedback model and a policy model, yielding better personalized QA on the LaMP-QA benchmark.
-
Visual Analysis of Multi-outcome Causal Graphs
Introduces progressive visualization for comparing causal discovery algorithms and comparative graph layouts for analyzing multi-outcome causal graphs in healthcare.
-
Negative Data Mining for Contrastive Learning in Dense Retrieval at IKEA.com
Structured negative mining with taxonomy and LLM judges improves offline category accuracy by 2.6% in IKEA search but yields no significant online engagement gains due to prevalent zero-click user behavior.