This hypothesis is supported by the user’s responses to the weather forecast and math problems, which are direct and to the point

The user tends to provide literal, straightforward answers, often without embellishment or creative interpretation, may struggle with abstract or open-ended questions

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Personalized Benchmarking: Evaluating LLMs by Individual Preferences

cs.AI · 2026-04-21 · unverdicted · novelty 6.0

Personalized LLM rankings using ELO and Bradley-Terry on 115 users show low correlation with aggregate rankings (BT ρ=0.04), highlighting the need for user-specific benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

Personalized Benchmarking: Evaluating LLMs by Individual Preferences cs.AI · 2026-04-21 · unverdicted · none · ref 46
Personalized LLM rankings using ELO and Bradley-Terry on 115 users show low correlation with aggregate rankings (BT ρ=0.04), highlighting the need for user-specific benchmarks.

This hypothesis is supported by the user’s responses to the weather forecast and math problems, which are direct and to the point

fields

years

verdicts

representative citing papers

citing papers explorer