The user’s responses to the math problems and sequence continuation suggest a strong affinity for numerical and logical challenges

The user has a strong preference for numerical, logical problems, may struggle with more abstract or creative tasks

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Personalized Benchmarking: Evaluating LLMs by Individual Preferences

cs.AI · 2026-04-21 · unverdicted · novelty 6.0

Personalized LLM rankings using ELO and Bradley-Terry on 115 users show low correlation with aggregate rankings (BT ρ=0.04), highlighting the need for user-specific benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

Personalized Benchmarking: Evaluating LLMs by Individual Preferences cs.AI · 2026-04-21 · unverdicted · none · ref 43
Personalized LLM rankings using ELO and Bradley-Terry on 115 users show low correlation with aggregate rankings (BT ρ=0.04), highlighting the need for user-specific benchmarks.

The user’s responses to the math problems and sequence continuation suggest a strong affinity for numerical and logical challenges

fields

years

verdicts

representative citing papers

citing papers explorer