Anchor-guided variance-aware reward modeling uses two response-level anchors to resolve non-identifiability in Gaussian models of pluralistic preferences, yielding provable identification, a joint training objective, and improved RLHF performance.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4representative citing papers
Proposes a novel semi-supervised estimator for risk prediction under double censoring that combines limited gold-standard labels with large-scale surrogates, proves theoretical validity, and shows efficiency gains over supervised methods in simulations and a T2D EHR application.
A new directed tree structure learning framework for zero-inflated compositional nodes uses KL divergence scoring and column-stochastic transition matrices for conditional expectations, with proven consistency and finite-sample guarantees.
SeqLoRA applies bilevel optimization to sequential LoRA adaptation for continual multi-concept text-to-image generation with theoretical bounds on forgetting and interference.
citing papers explorer
-
Variance-aware Reward Modeling with Anchor Guidance
Anchor-guided variance-aware reward modeling uses two response-level anchors to resolve non-identifiability in Gaussian models of pluralistic preferences, yielding provable identification, a joint training objective, and improved RLHF performance.
-
Semi-supervised Method for Risk Prediction with Doubly Censored EHR Data
Proposes a novel semi-supervised estimator for risk prediction under double censoring that combines limited gold-standard labels with large-scale surrogates, proves theoretical validity, and shows efficiency gains over supervised methods in simulations and a T2D EHR application.
-
Structure Learning for Directed Trees with Zero-Inflated Compositional Nodes
A new directed tree structure learning framework for zero-inflated compositional nodes uses KL divergence scoring and column-stochastic transition matrices for conditional expectations, with proven consistency and finite-sample guarantees.
-
SeqLoRA: Bilevel Orthogonal Adaptation for Continual Multi-Concept Generation
SeqLoRA applies bilevel optimization to sequential LoRA adaptation for continual multi-concept text-to-image generation with theoretical bounds on forgetting and interference.