RAR retrieves candidate items from a 300k-movie corpus then uses LLM generation with RL feedback to produce context-aware recommendations that outperform baselines on benchmarks.
Asynchronous methods for deep reinforcement learning
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
ZeroSiam is an asymmetric architecture using a learnable predictor and stop-gradient that prevents collapse in test-time entropy minimization while also regularizing biased signals for improved performance.
citing papers explorer
-
Retrieval Augmented Conversational Recommendation with Reinforcement Learning
RAR retrieves candidate items from a 300k-movie corpus then uses LLM generation with RL feedback to produce context-aware recommendations that outperform baselines on benchmarks.
-
ZeroSiam: An Efficient Asymmetry for Test-Time Entropy Optimization without Collapse
ZeroSiam is an asymmetric architecture using a learnable predictor and stop-gradient that prevents collapse in test-time entropy minimization while also regularizing biased signals for improved performance.