Data contamination quiz: A tool to detect and estimate contamination in large language models

Golchin, S · 2024 · arXiv 2311.06233

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

LiveBench: A Challenging, Contamination-Limited LLM Benchmark

cs.CL · 2024-06-27 · unverdicted · novelty 8.0

LiveBench is a contamination-limited LLM benchmark with auto-scored challenging tasks from recent sources across math, coding, reasoning and more, where top models score below 70%.

WARP: Weight-Space Analysis for Recovering Training Data Portfolios

cs.LG · 2026-07-02 · unverdicted · novelty 7.0

WARP recovers training domain mixtures from fine-tuned model weights using weight-space interpolation via model merging to generate pseudo-checkpoints and geometric features mapped to proportions.

SPENCE: A Syntactic Probe for Detecting Contamination in NL2SQL Benchmarks

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

SPENCE shows older NL2SQL benchmarks like Spider have high performance sensitivity to syntactic changes, indicating likely training contamination, while newer ones like BIRD show little sensitivity and appear largely clean.

Benchmark Data Contamination of Large Language Models: A Survey

cs.CL · 2024-06-06 · unverdicted · novelty 3.0

A survey reviewing benchmark data contamination in LLMs, its impact on evaluation, and alternative assessment approaches.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Benchmark Data Contamination of Large Language Models: A Survey cs.CL · 2024-06-06 · unverdicted · none · ref 50
A survey reviewing benchmark data contamination in LLMs, its impact on evaluation, and alternative assessment approaches.

Data contamination quiz: A tool to detect and estimate contamination in large language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer