Modular pluralism: Pluralistic alignment via multi-llm collaboration

Modular Pluralism: Pluralistic alignment via multi-LLM collaboration · arXiv 2406.15951

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Spectral Souping: A Unified Framework for Online Preference Alignment

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

Spectral Souping learns offline specialized policies for fine-grained preferences and merges them online using a discovered universal spectral representation for efficient LLM alignment.

Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization

cs.CL · 2026-04-08 · unverdicted · novelty 6.0

Personalized RewardBench reveals that state-of-the-art reward models reach only 75.94% accuracy on personalized preferences and shows stronger correlation with downstream BoN and PPO performance than prior benchmarks.

From Sycophantic Consensus to Pluralistic Repair: Why AI Alignment Must Surface Disagreement

cs.AI · 2026-05-14 · unverdicted · novelty 5.0

Pluralistic AI alignment requires surfacing value conflicts via scoping, signalling, and repair rather than preference aggregation alone, as evidenced by low repair quality on contested prompts in tested frontier models.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization cs.CL · 2026-04-08 · unverdicted · none · ref 5
Personalized RewardBench reveals that state-of-the-art reward models reach only 75.94% accuracy on personalized preferences and shows stronger correlation with downstream BoN and PPO performance than prior benchmarks.

Modular pluralism: Pluralistic alignment via multi-llm collaboration

fields

years

verdicts

representative citing papers

citing papers explorer