Analysis of the LMArena dataset reveals heavy topic skew and varying model rankings, leading to an interactive visualization tool for users to define custom evaluation priorities on LLM leaderboards.
2018.Content analysis: An introduction to its methodology
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
ARGUS extracts fragmented code change rationales from multiple documents using LLMs and generates summaries that developers rate as useful for review and maintenance.
citing papers explorer
-
Who Defines "Best"? Towards Interactive, User-Defined Evaluation of LLM Leaderboards
Analysis of the LMArena dataset reveals heavy topic skew and varying model rankings, leading to an interactive visualization tool for users to define custom evaluation priorities on LLM leaderboards.
-
Fine-grained Multi-Document Extraction and Generation of Code Change Rationale
ARGUS extracts fragmented code change rationales from multiple documents using LLMs and generates summaries that developers rate as useful for review and maintenance.