LLMs judging sustainable city-trip recommendations show model-specific biases and high variance across four dimensions even when overall rankings agree, with a three-phase human calibration process clarifying some reasoning but exposing disagreements on sustainability.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Multi-Dimensional Evaluation of Sustainable City Trips with LLM-as-a-Judge and Human-in-the-Loop
LLMs judging sustainable city-trip recommendations show model-specific biases and high variance across four dimensions even when overall rankings agree, with a three-phase human calibration process clarifying some reasoning but exposing disagreements on sustainability.