Multi-Objective Control trains a single LLM as a preference-conditioned policy using multi-objective optimization in RLHF to produce outputs in user-specified regions of the Pareto front.
Gomez, Lukasz Kaiser, and Illia Polosukhin
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
Multimodal-CoT achieves state-of-the-art on ScienceQA by using a two-stage process that incorporates vision into chain-of-thought rationale generation for models under 1 billion parameters.
citing papers explorer
-
One Model for All: Multi-Objective Controllable Language Models
Multi-Objective Control trains a single LLM as a preference-conditioned policy using multi-objective optimization in RLHF to produce outputs in user-specified regions of the Pareto front.
-
Multimodal Chain-of-Thought Reasoning in Language Models
Multimodal-CoT achieves state-of-the-art on ScienceQA by using a two-stage process that incorporates vision into chain-of-thought rationale generation for models under 1 billion parameters.