CL-Bench is the first expert-validated benchmark for continual learning in frontier LLMs across six real-world domains, showing limited gains and that naive in-context learning outperforms dedicated memory systems.
Interactive-engagement versus traditional methods: A six-thousand-student survey of mechanics test data for introductory physics courses
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Introduces a matched four-condition protocol and ONCU metric to diagnose evidence utilization in long-context and RAG models across synthetic and multi-hop QA tasks.
citing papers explorer
-
Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments
CL-Bench is the first expert-validated benchmark for continual learning in frontier LLMs across six real-world domains, showing limited gains and that naive in-context learning outperforms dedicated memory systems.
-
Diagnosing Evidence Utilization in Long-Context and Retrieval-Augmented Language Models under Matched Evidence Conditions
Introduces a matched four-condition protocol and ONCU metric to diagnose evidence utilization in long-context and RAG models across synthetic and multi-hop QA tasks.