F-GRPO factorizes group-relative policy optimization into generation and ranking phases within one autoregressive sequence, using order-invariant coverage and position-aware utility rewards to improve top-ranked performance on recommendation and multi-hop QA tasks.
hub Mixed citations
Maxwell Harper and Joseph A
Mixed citation behavior. Most common role is background (50%).
hub tools
citation-role summary
citation-polarity summary
representative citing papers
A Bayesian predictive model adaptively selects martingale factors to construct asymptotically log-optimal confidence sequences for bounded means while preserving anytime validity under misspecification.
InvariRank achieves permutation-invariant listwise reranking for LLM-based recommendations via a structured attention mask that blocks cross-candidate interactions and shared positional framing under RoPE, enabling stable rankings in one forward pass.
GREW uses a secret-key-driven green-red item partition and three ranking-integrated modules to embed verifiable watermarks in recommender systems that resist extraction attacks without data injection.
HORIZON creates a cross-domain, long-horizon user modeling benchmark from Amazon Reviews that tests generalization across time, domains, and unseen users, exposing gaps in sequential and LLM-based recommendation models.
For homogeneous agents in multi-agent linear bandits the regret-based TU game is convex with non-empty core containing the Shapley value; for heterogeneous agents a simple regret-based payout lies in the core and satisfies three Shapley axioms.
LLMAR applies LLM reasoning with a self-correction reflection loop to generate semantic user motives for tuning-free recommendations, showing up to 54.6% nDCG@10 gains on a sparse industrial dataset over trained baselines.
Interventions in LLM-simulated user experiments induce distribution shifts in latent attributes that create confounding bias, diagnosable with negative control outcomes and partially mitigated by adding setting-relevant persona details.
APG4RecSim automatically generates realistic user profiles for LLM-based recommendation simulations, outperforming manual baselines by up to 7% in nDCG@10 and 8% in JSD on three benchmark datasets.
Develops COF algorithm for MAB-CS that intelligently checks cheap arm feasibility by pooling samples, with generalized instance-dependent lower bounds and matching upper bounds on cumulative cost and quality regret.
Graphify automates synthesis of type-safe graph backends via a formal GraphQL-to-Gremlin mapping and O(S) recursive transpilation algorithm supporting CRUD and nested queries.
GloRank reformulates list-wise reranking as token generation over a global item identifier space, using supervised pre-training followed by reinforcement learning to maximize list-wise utility and outperforming baselines on benchmarks and industrial data.
WPGRec is a new sequential recommender that performs multi-scale temporal modeling via stationary wavelet packets and injects high-order collaborative information through scale-aligned graph propagation with energy-aware gated fusion.
A2G-DiffRec applies adaptive autoguidance in diffusion recommenders, learning to balance main and weak model outputs via fairness-aware regularization to improve item exposure fairness with only marginal accuracy loss.
Introduces bounded fake data injection attacks that force a class of stochastic bandit algorithms to select a target arm in nearly all rounds at sublinear attack cost.
ILASP approximates neural networks for recipe preference learning as both global and local models, using weak constraints and PCA to maintain fidelity and interpretability.
The thesis identifies theoretical, empirical, and conceptual flaws in offline fairness measures for recommender systems and contributes new evaluation methods and practical guidelines.
A distillation technique embeds LLM-generated textual user profiles into efficient sequential recommenders without runtime LLM inference, architectural changes, or fine-tuning.
Tutorial on a GP-based framework for preference and choice learning that unifies random utility models, limits of discernment, and multi-utility scenarios via customized likelihoods for object and label preferences.
citing papers explorer
-
F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking
F-GRPO factorizes group-relative policy optimization into generation and ranking phases within one autoregressive sequence, using order-invariant coverage and position-aware utility rewards to improve top-ranked performance on recommendation and multi-hop QA tasks.
-
Asymptotically Log-Optimal Bayes-Assisted Confidence Sequences for Bounded Means
A Bayesian predictive model adaptively selects martingale factors to construct asymptotically log-optimal confidence sequences for bounded means while preserving anytime validity under misspecification.
-
One Pass, Any Order: Position-Invariant Listwise Reranking for LLM-Based Recommendation
InvariRank achieves permutation-invariant listwise reranking for LLM-based recommendations via a structured attention mask that blocks cross-candidate interactions and shared positional framing under RoPE, enabling stable rankings in one forward pass.
-
Green-Red Watermarking for Recommender Systems
GREW uses a secret-key-driven green-red item partition and three ranking-integrated modules to embed verifiable watermarks in recommender systems that resist extraction attacks without data injection.
-
HORIZON: A Benchmark for In-the-wild User Behaviour Modeling
HORIZON creates a cross-domain, long-horizon user modeling benchmark from Amazon Reviews that tests generalization across time, domains, and unseen users, exposing gaps in sequential and LLM-based recommendation models.
-
Creator Incentives in Recommender Systems: A Cooperative Game-Theoretic Approach for Stable and Fair Collaboration in Multi-Agent Bandits
For homogeneous agents in multi-agent linear bandits the regret-based TU game is convex with non-empty core containing the Shapley value; for heterogeneous agents a simple regret-based payout lies in the core and satisfies three Shapley axioms.
-
LLMAR: A Tuning-Free Recommendation Framework for Sparse and Text-Rich Industrial Domains
LLMAR applies LLM reasoning with a self-correction reflection loop to generate semantic user motives for tuning-free recommendations, showing up to 54.6% nDCG@10 gains on a sparse industrial dataset over trained baselines.
-
The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study
Interventions in LLM-simulated user experiments induce distribution shifts in latent attributes that create confounding bias, diagnosable with negative control outcomes and partially mitigated by adding setting-relevant persona details.
-
Task-Aware Automated User Profile Generation for Recommendation Simulation Using Large Language Models
APG4RecSim automatically generates realistic user profiles for LLM-based recommendation simulations, outperforming manual baselines by up to 7% in nDCG@10 and 8% in JSD on three benchmark datasets.
-
Cost-Ordered Feasibility for Multi-Armed Bandits with Cost Subsidy
Develops COF algorithm for MAB-CS that intelligently checks cheap arm feasibility by pooling samples, with generalized instance-dependent lower bounds and matching upper bounds on cumulative cost and quality regret.
-
Graphify: Automated Synthesis of Type-Safe Graph Backends via $O(S)$ GraphQL-to-Gremlin Transpilation
Graphify automates synthesis of type-safe graph backends via a formal GraphQL-to-Gremlin mapping and O(S) recursive transpilation algorithm supporting CRUD and nested queries.
-
From Local Indices to Global Identifiers: Generative Reranking for Recommender Systems via Global Action Space
GloRank reformulates list-wise reranking as token generation over a global item identifier space, using supervised pre-training followed by reinforcement learning to maximize list-wise utility and outperforming baselines on benchmarks and industrial data.
-
WPGRec: Wavelet Packet Guided Graph Enhanced Sequential Recommendation
WPGRec is a new sequential recommender that performs multi-scale temporal modeling via stationary wavelet packets and injects high-order collaborative information through scale-aligned graph propagation with energy-aware gated fusion.
-
Adaptive Autoguidance for Item-Side Fairness in Diffusion Recommender Systems
A2G-DiffRec applies adaptive autoguidance in diffusion recommenders, learning to balance main and weak model outputs via fairness-aware regularization to improve item exposure fairness with only marginal accuracy loss.
-
Practical Adversarial Attacks on Stochastic Bandits via Fake Data Injection
Introduces bounded fake data injection attacks that force a class of stochastic bandit algorithms to select a target arm in nearly all rounds at sublinear attack cost.
-
Explaining Neural Networks in Preference Learning: a Post-hoc Inductive Logic Programming Approach
ILASP approximates neural networks for recipe preference learning as both global and local models, using weak constraints and PCA to maintain fidelity and interpretability.
-
Offline Evaluation Measures of Fairness in Recommender Systems
The thesis identifies theoretical, empirical, and conceptual flaws in offline fairness measures for recommender systems and contributes new evaluation methods and practical guidelines.
-
Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge Distillation
A distillation technique embeds LLM-generated textual user profiles into efficient sequential recommenders without runtime LLM inference, architectural changes, or fine-tuning.
-
A tutorial on learning from preferences and choices with Gaussian Processes
Tutorial on a GP-based framework for preference and choice learning that unifies random utility models, limits of discernment, and multi-utility scenarios via customized likelihoods for object and label preferences.
- Auto-Conditioned Frank-Wolfe Algorithms