pith. sign in

Reinforcement learning for optimizing rag for domain chatbots.arXiv preprint arXiv:2401.06800

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

roles

background 1

polarities

background 1

clear filters

representative citing papers

State Contamination in Memory-Augmented LLM Agents

cs.AI · 2026-05-16 · unverdicted · novelty 6.0

Toxic context can be laundered into memory summaries that stay below toxicity thresholds while still driving higher downstream toxicity in LLM agents compared to neutral baselines.

Self-Aligned Reward: Towards Effective and Efficient Reasoners

cs.LG · 2025-09-05 · unverdicted · novelty 5.0

Self-aligned reward uses relative perplexity differences to encourage concise, query-specific reasoning in LLMs, yielding 4% accuracy gains and 30% lower inference cost when added to PPO or GRPO.

citing papers explorer

Showing 4 of 4 citing papers after filters.