Dwaracherla, V ., Asghari, S

URL https://arxiv · 2024 · arXiv 2409.10164

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

StoryAlign: Evaluating and Training Reward Models for Story Generation

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

StoryReward, trained on a new 100k story preference dataset, sets state-of-the-art performance on the introduced StoryRMB benchmark for aligning LLM stories with human preferences.

Text-to-Distribution Prediction with Quantile Tokens and Neighbor Context

cs.CL · 2026-04-22 · unverdicted · novelty 7.0

Quantile tokens inserted into LLM inputs combined with neighbor retrieval enable direct prediction of full distributions, yielding lower MAPE and narrower intervals than baselines on Airbnb and StackSample tasks.

A Unifying Lens on Reward Uncertainty in RLHF

cs.LG · 2026-06-08 · unverdicted · novelty 6.0

A distributional reward model p(r|x,y) yields the closed-form effective reward ilde r(x,y) = eta ext{log} ext{E}_p[e^{r/eta}] (pessimistic branch) that unifies prior RLHF aggregation heuristics under Bayesian or KL-DRO views.

Hyperfastrl: Hypernetwork-based reinforcement learning for unified control of parametric chaotic PDEs

cs.CE · 2026-04-07 · unverdicted · novelty 6.0

Hypernetworks map a forcing parameter directly to policy weights in an RL framework, enabling unified stabilization of the Kuramoto-Sivashinsky equation across regimes with KAN architectures showing strongest extrapolation.

DVPO: Distributional Value Modeling-based Policy Optimization for LLM Post-Training

cs.LG · 2025-12-03 · unverdicted · novelty 5.0

DVPO learns token-level value distributions and uses asymmetric risk regularization to contract lower tails while expanding upper tails, outperforming PPO and GRPO under noisy supervision in dialogue, math, and QA tasks.

citing papers explorer

Showing 2 of 2 citing papers after filters.

StoryAlign: Evaluating and Training Reward Models for Story Generation cs.CL · 2026-05-06 · unverdicted · none · ref 9
StoryReward, trained on a new 100k story preference dataset, sets state-of-the-art performance on the introduced StoryRMB benchmark for aligning LLM stories with human preferences.
Text-to-Distribution Prediction with Quantile Tokens and Neighbor Context cs.CL · 2026-04-22 · unverdicted · none · ref 30
Quantile tokens inserted into LLM inputs combined with neighbor retrieval enable direct prediction of full distributions, yielding lower MAPE and narrower intervals than baselines on Airbnb and StackSample tasks.

Dwaracherla, V ., Asghari, S

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer