Large language models are not fair evaluators,

· 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

An Empirical Study of Proactive Coding Assistants in Real-World Software Development

cs.SE · 2026-05-07 · unverdicted · novelty 7.0

Real developer IDE traces differ substantially from LLM simulations in behavior and structure; current proactive assistants are unreliable on real traces, and simulated data cannot substitute for real data in training.

citing papers explorer

Showing 1 of 1 citing paper.

An Empirical Study of Proactive Coding Assistants in Real-World Software Development cs.SE · 2026-05-07 · unverdicted · none · ref 36
Real developer IDE traces differ substantially from LLM simulations in behavior and structure; current proactive assistants are unreliable on real traces, and simulated data cannot substitute for real data in training.

Large language models are not fair evaluators,

fields

years

verdicts

representative citing papers

citing papers explorer