DReST training makes RL agents and LLMs neutral to trajectory lengths and useful at goals, generalizing to halve shutdown influence probability in out-of-distribution tests.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
Historical US market data from 1889-1978 indicates equity investors were risk-averse in 1977 while risk-free asset investors exhibited insufficient risk-loving behavior under new definitions.
citing papers explorer
-
Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs
DReST training makes RL agents and LLMs neutral to trajectory lengths and useful at goals, generalizing to halve shutdown influence probability in out-of-distribution tests.
-
Empirical Evidence for the New Definitions in Financial Markets and Equity Premium Puzzle
Historical US market data from 1889-1978 indicates equity investors were risk-averse in 1977 while risk-free asset investors exhibited insufficient risk-loving behavior under new definitions.