Hindsight Preference Optimization lets a 4B model outperform a 235B model on S&P 500 advisory accuracy and quality by generating DPO preference pairs from outcome-based LLM judgments on time series predictions.
The model generates structured advisory based solely on this visual input—no ticker symbols, dates, or axis labels that would identify the security or time period are provided
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Hindsight Preference Optimization for Financial Time Series Advisory
Hindsight Preference Optimization lets a 4B model outperform a 235B model on S&P 500 advisory accuracy and quality by generating DPO preference pairs from outcome-based LLM judgments on time series predictions.