pith. sign in

hub

Mt-bench-101: A fine-grained benchmark for evaluating large language models in multi-turn dialogues

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

hub tools

citation-role summary

dataset 2 background 1

citation-polarity summary

years

2026 7 2025 4

verdicts

UNVERDICTED 11

representative citing papers

TRINITY: An Evolved LLM Coordinator

cs.LG · 2025-12-04 · unverdicted · novelty 6.0

A compact 0.6B-parameter coordinator with a 10K-parameter head uses evolutionary strategy to dynamically delegate roles to LLMs, achieving SOTA results such as 86.2% on LiveCodeBench.

citing papers explorer

Showing 11 of 11 citing papers.