STOMP extends direct preference optimization to the multi-objective setting via smooth Tchebysheff scalarization and standardization of observed rewards, achieving highest hypervolume in eight of nine protein engineering evaluations.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
A GNN-based DRL model with two actor-critics produces comparable Pareto fronts for multi-objective fog application placement in milliseconds versus hours for genetic algorithms.
citing papers explorer
-
Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization
STOMP extends direct preference optimization to the multi-objective setting via smooth Tchebysheff scalarization and standardization of observed rewards, achieving highest hypervolume in eight of nine protein engineering evaluations.
-
Multi-objective application placement in fog computing using graph neural network-based reinforcement learning
A GNN-based DRL model with two actor-critics produces comparable Pareto fronts for multi-objective fog application placement in milliseconds versus hours for genetic algorithms.