hub Mixed citations

Bridging Language and Items for Retrieval and Recommendation: Benchmarking LLMs as Semantic Encoders

· 2024 · cs.IR · arXiv 2403.03952

Mixed citation behavior. Most common role is background (44%).

37 Pith papers citing it

Background 44% of classified citations

open full Pith review browse 37 citing papers arXiv PDF

abstract

Feature engineering has long been central to recommender systems, yet effectively leveraging textual item features remains challenging. Recent advances in large language models (LLMs) have enabled their use as semantic encoders for recommendation, but their roles and behaviors in this setting are still not well understood. Prior studies often rely on general-purpose embedding benchmarks (e.g., MTEB) when selecting LLMs, overlooking the unique characteristics of recommendation tasks. To address this gap, we introduce BLaIR, a comprehensive benchmark for evaluating LLMs as semantic encoders in recommendation scenarios. We contribute (1) a new large-scale Amazon Reviews 2023 dataset with over 570 million reviews and 48 million items, (2) a unified benchmark covering sequential recommendation, collaborative filtering, and product search, and (3) a new complex-query product search task featuring both semi-synthetic and real-world evaluation datasets. Experiments with 11 leading LLMs show that their rankings on BLaIR show little correlation with MTEB, highlighting the unique challenges of semantic encoding in recommendation.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

dataset 4 background 3 method 2

citation-polarity summary

background 4 use dataset 4 use method 1

representative citing papers

Towards Robust Federated Multimodal Graph Learning under Modality Heterogeneity

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

FedMPO recovers missing modalities via topology-aware generation, filters noisy recoveries with missing-aware routing, and uses reliability-aware aggregation to achieve up to 5.65% gains over baselines in high-missing and non-IID federated graph settings.

RecoAtlas: From Semantic Plausibility to Set-Level Utility in LLM Recommendation Agents

cs.IR · 2026-05-11 · unverdicted · novelty 7.0

RecoAtlas is a benchmark that evaluates LLM recommendation agents on behavior-grounded metrics for relevance, complementarity, and diversity in addition to semantic coherence.

fmxcoders: Factorized Masked Crosscoders for Cross-Layer Feature Discovery

cs.LG · 2026-05-10 · conditional · novelty 7.0

fmxcoders improve cross-layer feature recovery in transformers via factorized weights and layer masking, delivering 10-30 point probing F1 gains, 25-50% lower MSE, doubled functional coherence, and 3-13x more coherent latents than standard crosscoders on GPT2-Small, Pythia, and Gemma2 models.

FraudBench: A Multimodal Benchmark for Detecting AI-Generated Fraudulent Refund Evidence

cs.CV · 2026-05-09 · unverdicted · novelty 7.0

FraudBench shows that current multimodal LLMs and specialized AI-image detectors often fail to spot AI-generated fake damage in refund evidence, with true positive rates frequently below 50% on synthetic subsets while producing false positives on real damage.

The Extrapolation Cliff in On-Policy Distillation of Near-Deterministic Structured Outputs

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

On-policy distillation has an extrapolation cliff at closed-form lambda*(p,b,c) set by teacher modal probability, warm-start mass, and clip strength, past which training shifts from format-preserving to format-collapsing.

Expressiveness Limits of Autoregressive Semantic ID Generation in Generative Recommendation

cs.IR · 2026-05-07 · unverdicted · novelty 7.0

Autoregressive semantic ID generation creates tree-induced probability correlations that prevent generative recommenders from capturing simple patterns; Latte adds latent tokens to relax these correlations.

One Pass, Any Order: Position-Invariant Listwise Reranking for LLM-Based Recommendation

cs.IR · 2026-04-30 · conditional · novelty 7.0

InvariRank achieves permutation-invariant listwise reranking for LLM-based recommendations via a structured attention mask that blocks cross-candidate interactions and shared positional framing under RoPE, enabling stable rankings in one forward pass.

Breaking the Autoregressive Chain: Hyper-Parallel Decoding for Efficient LLM-Based Attribute Value Extraction

cs.CL · 2026-04-29 · unverdicted · novelty 7.0

Hyper-Parallel Decoding enables parallel generation of independent sequences in LLMs via position ID manipulation, delivering up to 13.8X speedup for attribute value extraction.

HORIZON: A Benchmark for In-the-wild User Behaviour Modeling

cs.IR · 2026-04-19 · unverdicted · novelty 7.0

HORIZON creates a cross-domain, long-horizon user modeling benchmark from Amazon Reviews that tests generalization across time, domains, and unseen users, exposing gaps in sequential and LLM-based recommendation models.

DynLP: Parallel Dynamic Batch Update for Label Propagation in Semi-Supervised Learning

cs.DC · 2026-04-08 · unverdicted · novelty 7.0

DynLP is a parallel dynamic batch update algorithm for label propagation that achieves significant speedups by updating only relevant parts of the graph on GPUs.

GenRecEdit: Adapting Model Editing for Generative Recommendation with Cold-Start Items

cs.IR · 2026-03-15 · conditional · novelty 7.0

GenRecEdit injects cold-start items into generative recommendation models via context-aware token editing and interference-reducing triggers, boosting cold-start accuracy while using only 9.5% of retraining time.

ItemRAG: Item-Based Retrieval-Augmented Generation for LLM-Based Recommendation

cs.IR · 2025-11-19 · conditional · novelty 7.0

ItemRAG augments LLM recommendation prompts with item-level retrievals that blend semantic and co-purchase signals, outperforming user-history RAG in both standard and cold-start settings.

VoteGCL: Enhancing Graph-based Recommendations with Majority-Voting LLM-Rerank Augmentation

cs.IR · 2025-07-29 · unverdicted · novelty 7.0

VoteGCL augments graph-based recommendation systems with high-confidence synthetic interactions generated via majority-voting LLM reranks and integrates them into graph contrastive learning to improve accuracy and reduce popularity bias.

PipeANN-Filter: An Efficient Filtered Vector Search System on SSD

cs.OS · 2026-05-18 · unverdicted · novelty 6.0

PipeANN-Filter improves filtered vector search latency and throughput on SSD by exploring a superset of valid vectors identified via probabilistic filters and verifying attributes only after selecting top-k candidates.

Conditional Attribute Estimation with Autoregressive Sequence Models

cs.AI · 2026-05-13 · unverdicted · novelty 6.0

Conditional Attribute Transformers jointly estimate next-token probabilities and conditional attribute values for autoregressive sequence models, enabling credit assignment, counterfactuals, and steerable generation in one pass.

Task-Aware Automated User Profile Generation for Recommendation Simulation Using Large Language Models

cs.IR · 2026-05-13 · unverdicted · novelty 6.0

APG4RecSim automatically generates realistic user profiles for LLM-based recommendation simulations, outperforming manual baselines by up to 7% in nDCG@10 and 8% in JSD on three benchmark datasets.

CAMPA: Efficient and Aligned Multimodal Graph Learning via Decoupled Propagation and Aggregation

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

CAMPA resolves modal conflicts in decoupled multimodal GNNs via cross-modal aligned propagation and trajectory aligned aggregation, outperforming coupled and decoupled baselines on benchmarks while retaining efficiency.

LLM Agents Enable User-Governed Personalization Beyond Platform Boundaries

cs.IR · 2026-05-10 · unverdicted · novelty 6.0

LLM agents enable users to integrate cross-platform and offline data for personalization that outperforms single-platform baselines in proof-of-concept tests.

Bridging Textual Profiles and Latent User Embeddings for Personalization

cs.IR · 2026-05-07 · unverdicted · novelty 6.0

BLUE aligns LLM-generated textual user profiles with embedding-based recommendation objectives via reinforcement learning and next-item text supervision, yielding better zero-shot performance and cross-domain transfer than baselines.

PREFER: Personalized Review Summarization with Online Preference Learning

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

PREFER is an online preference learning system that generates personalized review summaries and improves alignment with user interests in simulations on Amazon review data.

One Pool, Two Caches: Adaptive HBM Partitioning for Accelerating Generative Recommender Serving

cs.DC · 2026-05-06 · unverdicted · novelty 6.0

HELM adaptively partitions HBM between EMB and KV caches via a three-layer PPO controller and EMB-KV-aware scheduling, reducing P99 latency by 24-38% while achieving 93.5-99.6% SLO satisfaction on production workloads.

Decision-aware User Simulation Agent for Evaluating Conversational Recommender Systems

cs.IR · 2026-05-05 · unverdicted · novelty 6.0

Hesitator is a theory-grounded simulator that separates utility-based item selection from overload-aware commitment decisions to reduce unrealistic high acceptance rates in conversational recommender evaluations.

From Top-1 to Top-K: A Reproducibility Study and Benchmarking of Counterfactual Explanations for Recommender Systems

cs.IR · 2026-04-21 · unverdicted · novelty 6.0

A unified benchmark of eleven CE methods shows effectiveness-sparsity trade-offs vary by method and format, performance is consistent from item to list level, and graph-based explainers face scalability limits on large graphs.

Self-Distilled Reinforcement Learning for Co-Evolving Agentic Recommender Systems

cs.IR · 2026-04-11 · unverdicted · novelty 6.0

CoARS enables co-evolving recommender and user agents by using interaction-derived rewards and self-distilled credit assignment to internalize multi-turn feedback into model parameters, outperforming prior agentic baselines.

citing papers explorer

Showing 37 of 37 citing papers.

Towards Robust Federated Multimodal Graph Learning under Modality Heterogeneity cs.LG · 2026-05-12 · unverdicted · none · ref 34 · internal anchor
FedMPO recovers missing modalities via topology-aware generation, filters noisy recoveries with missing-aware routing, and uses reliability-aware aggregation to achieve up to 5.65% gains over baselines in high-missing and non-IID federated graph settings.
RecoAtlas: From Semantic Plausibility to Set-Level Utility in LLM Recommendation Agents cs.IR · 2026-05-11 · unverdicted · none · ref 38 · internal anchor
RecoAtlas is a benchmark that evaluates LLM recommendation agents on behavior-grounded metrics for relevance, complementarity, and diversity in addition to semantic coherence.
fmxcoders: Factorized Masked Crosscoders for Cross-Layer Feature Discovery cs.LG · 2026-05-10 · conditional · none · ref 55 · internal anchor
fmxcoders improve cross-layer feature recovery in transformers via factorized weights and layer masking, delivering 10-30 point probing F1 gains, 25-50% lower MSE, doubled functional coherence, and 3-13x more coherent latents than standard crosscoders on GPT2-Small, Pythia, and Gemma2 models.
FraudBench: A Multimodal Benchmark for Detecting AI-Generated Fraudulent Refund Evidence cs.CV · 2026-05-09 · unverdicted · none · ref 15 · internal anchor
FraudBench shows that current multimodal LLMs and specialized AI-image detectors often fail to spot AI-generated fake damage in refund evidence, with true positive rates frequently below 50% on synthetic subsets while producing false positives on real damage.
The Extrapolation Cliff in On-Policy Distillation of Near-Deterministic Structured Outputs cs.LG · 2026-05-09 · unverdicted · none · ref 16 · internal anchor
On-policy distillation has an extrapolation cliff at closed-form lambda*(p,b,c) set by teacher modal probability, warm-start mass, and clip strength, past which training shifts from format-preserving to format-collapsing.
Expressiveness Limits of Autoregressive Semantic ID Generation in Generative Recommendation cs.IR · 2026-05-07 · unverdicted · none · ref 13 · internal anchor
Autoregressive semantic ID generation creates tree-induced probability correlations that prevent generative recommenders from capturing simple patterns; Latte adds latent tokens to relax these correlations.
One Pass, Any Order: Position-Invariant Listwise Reranking for LLM-Based Recommendation cs.IR · 2026-04-30 · conditional · none · ref 12 · internal anchor
InvariRank achieves permutation-invariant listwise reranking for LLM-based recommendations via a structured attention mask that blocks cross-candidate interactions and shared positional framing under RoPE, enabling stable rankings in one forward pass.
Breaking the Autoregressive Chain: Hyper-Parallel Decoding for Efficient LLM-Based Attribute Value Extraction cs.CL · 2026-04-29 · unverdicted · none · ref 3 · internal anchor
Hyper-Parallel Decoding enables parallel generation of independent sequences in LLMs via position ID manipulation, delivering up to 13.8X speedup for attribute value extraction.
HORIZON: A Benchmark for In-the-wild User Behaviour Modeling cs.IR · 2026-04-19 · unverdicted · none · ref 1 · internal anchor
HORIZON creates a cross-domain, long-horizon user modeling benchmark from Amazon Reviews that tests generalization across time, domains, and unseen users, exposing gaps in sequential and LLM-based recommendation models.
DynLP: Parallel Dynamic Batch Update for Label Propagation in Semi-Supervised Learning cs.DC · 2026-04-08 · unverdicted · none · ref 16 · internal anchor
DynLP is a parallel dynamic batch update algorithm for label propagation that achieves significant speedups by updating only relevant parts of the graph on GPUs.
GenRecEdit: Adapting Model Editing for Generative Recommendation with Cold-Start Items cs.IR · 2026-03-15 · conditional · none · ref 10 · internal anchor
GenRecEdit injects cold-start items into generative recommendation models via context-aware token editing and interference-reducing triggers, boosting cold-start accuracy while using only 9.5% of retraining time.
ItemRAG: Item-Based Retrieval-Augmented Generation for LLM-Based Recommendation cs.IR · 2025-11-19 · conditional · none · ref 2 · internal anchor
ItemRAG augments LLM recommendation prompts with item-level retrievals that blend semantic and co-purchase signals, outperforming user-history RAG in both standard and cold-start settings.
VoteGCL: Enhancing Graph-based Recommendations with Majority-Voting LLM-Rerank Augmentation cs.IR · 2025-07-29 · unverdicted · none · ref 12 · internal anchor
VoteGCL augments graph-based recommendation systems with high-confidence synthetic interactions generated via majority-voting LLM reranks and integrates them into graph contrastive learning to improve accuracy and reduce popularity bias.
PipeANN-Filter: An Efficient Filtered Vector Search System on SSD cs.OS · 2026-05-18 · unverdicted · none · ref 20 · internal anchor
PipeANN-Filter improves filtered vector search latency and throughput on SSD by exploring a superset of valid vectors identified via probabilistic filters and verifying attributes only after selecting top-k candidates.
Conditional Attribute Estimation with Autoregressive Sequence Models cs.AI · 2026-05-13 · unverdicted · none · ref 31 · internal anchor
Conditional Attribute Transformers jointly estimate next-token probabilities and conditional attribute values for autoregressive sequence models, enabling credit assignment, counterfactuals, and steerable generation in one pass.
Task-Aware Automated User Profile Generation for Recommendation Simulation Using Large Language Models cs.IR · 2026-05-13 · unverdicted · none · ref 19 · internal anchor
APG4RecSim automatically generates realistic user profiles for LLM-based recommendation simulations, outperforming manual baselines by up to 7% in nDCG@10 and 8% in JSD on three benchmark datasets.
CAMPA: Efficient and Aligned Multimodal Graph Learning via Decoupled Propagation and Aggregation cs.AI · 2026-05-12 · unverdicted · none · ref 14 · internal anchor
CAMPA resolves modal conflicts in decoupled multimodal GNNs via cross-modal aligned propagation and trajectory aligned aggregation, outperforming coupled and decoupled baselines on benchmarks while retaining efficiency.
LLM Agents Enable User-Governed Personalization Beyond Platform Boundaries cs.IR · 2026-05-10 · unverdicted · none · ref 20 · internal anchor
LLM agents enable users to integrate cross-platform and offline data for personalization that outperforms single-platform baselines in proof-of-concept tests.
Bridging Textual Profiles and Latent User Embeddings for Personalization cs.IR · 2026-05-07 · unverdicted · none · ref 7 · internal anchor
BLUE aligns LLM-generated textual user profiles with embedding-based recommendation objectives via reinforcement learning and next-item text supervision, yielding better zero-shot performance and cross-domain transfer than baselines.
PREFER: Personalized Review Summarization with Online Preference Learning cs.AI · 2026-05-07 · unverdicted · none · ref 16 · internal anchor
PREFER is an online preference learning system that generates personalized review summaries and improves alignment with user interests in simulations on Amazon review data.
One Pool, Two Caches: Adaptive HBM Partitioning for Accelerating Generative Recommender Serving cs.DC · 2026-05-06 · unverdicted · none · ref 21 · internal anchor
HELM adaptively partitions HBM between EMB and KV caches via a three-layer PPO controller and EMB-KV-aware scheduling, reducing P99 latency by 24-38% while achieving 93.5-99.6% SLO satisfaction on production workloads.
Decision-aware User Simulation Agent for Evaluating Conversational Recommender Systems cs.IR · 2026-05-05 · unverdicted · none · ref 19 · internal anchor
Hesitator is a theory-grounded simulator that separates utility-based item selection from overload-aware commitment decisions to reduce unrealistic high acceptance rates in conversational recommender evaluations.
From Top-1 to Top-K: A Reproducibility Study and Benchmarking of Counterfactual Explanations for Recommender Systems cs.IR · 2026-04-21 · unverdicted · none · ref 17 · internal anchor
A unified benchmark of eleven CE methods shows effectiveness-sparsity trade-offs vary by method and format, performance is consistent from item to list level, and graph-based explainers face scalability limits on large graphs.
Self-Distilled Reinforcement Learning for Co-Evolving Agentic Recommender Systems cs.IR · 2026-04-11 · unverdicted · none · ref 32 · internal anchor
CoARS enables co-evolving recommender and user agents by using interaction-derived rewards and self-distilled credit assignment to internalize multi-turn feedback into model parameters, outperforming prior agentic baselines.
PeReGrINE: Evaluating Personalized Review Fidelity with User Item Graph Context cs.IR · 2026-04-09 · unverdicted · none · ref 3 · internal anchor
PeReGrINE is a graph-based benchmark that restructures Amazon Reviews 2023 with temporal cutoffs and introduces dissonance analysis to measure how well retrieval-conditioned models match user style and product consensus.
TRU: Targeted Reverse Update for Efficient Multimodal Recommendation Unlearning cs.AI · 2026-04-02 · unverdicted · none · ref 12 · internal anchor
TRU is a plug-and-play unlearning method for multimodal recommenders that applies ranking fusion, modality scaling, and layer isolation to achieve better retain-forget trade-offs than uniform baselines.
Detecting LLM-Generated Spam Reviews by Integrating Language Model Embeddings and Graph Neural Network cs.CL · 2025-10-02 · unverdicted · none · ref 18 · internal anchor
Introduces FraudSquad, a hybrid model using language model embeddings and a gated graph transformer that outperforms baselines on newly created LLM-generated spam review datasets.
SessionIntentBench: A Multi-task Inter-session Intention-shift Modeling Benchmark for E-commerce Customer Behavior Understanding cs.CL · 2025-07-27 · unverdicted · none · ref 9 · internal anchor
SessionIntentBench is a large-scale multimodal benchmark for inter-session intention-shift modeling in e-commerce, with 1.95M intention entries and human-annotated gold labels showing current L(V)LMs struggle but improve when intention is injected.
Don't Let Bandit Feedback Pull Continual LLM-Recommender Updates Off Target cs.LG · 2026-05-17 · unverdicted · none · ref 13 · internal anchor
ABPO combines group-relative policy optimization with anchored exposure correction and asymmetric feedback handling to enable effective continual updates for LLM recommenders under bandit feedback constraints.
RcLLM: Accelerating Generative Recommendation via Beyond-Prefix KV Caching cs.DC · 2026-05-08 · unverdicted · none · ref 18 · internal anchor
RcLLM accelerates generative recommendation inference by 1.31x-9.51x in TTFT through beyond-prefix KV caching, replicated user caches, sharded item caches, affinity scheduling, and selective attention with negligible accuracy loss.
Stable Multimodal Graph Unlearning via Feature-Dimension Aware Quantile Selection cs.LG · 2026-05-05 · unverdicted · none · ref 40 · internal anchor
FDQ improves stability in multimodal graph unlearning by using feature-dimension aware quantile selection to protect sensitive high-dimensional layers while preserving utility and enabling effective forgetting.
Rethinking Semantic Collaborative Integration: Why Alignment Is Not Enough cs.IR · 2026-04-24 · unverdicted · none · ref 14 · internal anchor
Semantic and collaborative representations show low item-level overlap on sparse data, so global alignment suppresses complementary signals and a shared-plus-private fusion design is needed instead.
Multimodal Large Language Models with Adaptive Preference Optimization for Sequential Recommendation cs.IR · 2025-11-24 · unverdicted · none · ref 20 · internal anchor
HaNoRec dynamically weights harder preference samples and applies Gaussian perturbations to output distributions to improve multimodal LLM performance on sequential recommendation tasks.
Learning Decomposed Contextual Token Representations from Pretrained and Collaborative Signals for Generative Recommendation cs.IR · 2025-08-22 · unverdicted · none · ref 10 · internal anchor
DECOR learns decomposed contextual token representations by combining pretrained semantics with collaborative signals to fix objective misalignment in two-stage generative recommendation systems.
To GPU or Not to GPU: Vector Search in Relational Engines cs.DB · 2026-05-15 · conditional · none · ref 13 · internal anchor
Relational engines achieve faster SQL+vector-search queries on GPU than CPU when using compact vector indexes and fast interconnects, reversing the CPU-only design in current systems.
Multistakeholder Impacts of Profile Portability in a Recommender Ecosystem cs.IR · 2026-04-23 · unverdicted · none · ref 34 · internal anchor
Data portability scenarios in algorithmic pluralism produce varying effects on user utility across different recommendation algorithms.
Verbalized Algorithms: Classical Algorithms are All You Need (Mostly) cs.CL · 2025-09-09 · unreviewed · ref 5 · internal anchor

Bridging Language and Items for Retrieval and Recommendation: Benchmarking LLMs as Semantic Encoders

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer