RLearner-LLM achieves up to 6x gains in NLI entailment over standard fine-tuning by using an automated hybrid DPO pipeline that balances logic and fluency across multiple model sizes and domains.
In Advances in Neural Information Processing Systems (NeurIPS), volume 38
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
MI-EPO maximizes joint conditional mutual information among responses, feedback, and preference vectors, using probabilistic routing to improve alignment and controllability in multi-objective LLM optimization.
MedSynapse-V proposes a latent memory evolution framework with meta-query prior retrieval, causal counterfactual refinement via RL, and intrinsic memory transition to improve diagnostic accuracy over chain-of-thought baselines in medical VLMs.
citing papers explorer
No citing papers match the current filters.