Peo: Improving bi-factorial preference alignment with post-training policy extrapolation

Yuxuan Liu · 2025 · arXiv 2503.01233

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

VC-Soup: Value-Consistency Guided Multi-Value Alignment for Large Language Models

cs.LG · 2026-03-18 · unverdicted · novelty 6.0

VC-Soup uses a cosine-similarity consistency metric to filter data, trains value-consistent policies, and applies linear merging with Pareto filtering to improve multi-value LLM alignment trade-offs.

Steerable Adversarial Scenario Generation through Test-Time Preference Alignment

cs.AI · 2025-09-24 · unverdicted · novelty 6.0

SAGE reframes adversarial scenario generation as multi-objective preference alignment, using hierarchical group-based optimization and test-time linear interpolation of two expert policies to enable steerable control over adversariality-realism trade-offs.

citing papers explorer

Showing 2 of 2 citing papers.

VC-Soup: Value-Consistency Guided Multi-Value Alignment for Large Language Models cs.LG · 2026-03-18 · unverdicted · none · ref 24
VC-Soup uses a cosine-similarity consistency metric to filter data, trains value-consistent policies, and applies linear merging with Pareto filtering to improve multi-value LLM alignment trade-offs.
Steerable Adversarial Scenario Generation through Test-Time Preference Alignment cs.AI · 2025-09-24 · unverdicted · none · ref 25
SAGE reframes adversarial scenario generation as multi-objective preference alignment, using hierarchical group-based optimization and test-time linear interpolation of two expert policies to enable steerable control over adversariality-realism trade-offs.

Peo: Improving bi-factorial preference alignment with post-training policy extrapolation

fields

years

verdicts

representative citing papers

citing papers explorer