CTFusion is a live-CTF streaming benchmark that prevents data contamination by forwarding only the first correct flag per challenge under a shared team account.
LiveBench: A Challenging, Contamination-Free
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.
citing papers explorer
-
CTFusion: A CTF-based Benchmark for LLM Agent Evaluation
CTFusion is a live-CTF streaming benchmark that prevents data contamination by forwarding only the first correct flag per challenge under a shared team account.
-
Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility
Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.