SPECTRA generates reproducible synthetic IR corpora up to 60,000 documents with controllable distractors, long-tail vocabulary, and graded relevance labels via a single-process Python prototype.
GenTREC: The first test collection generated by large language models for evaluating information retrieval systems,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.IR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
SPECTRA: Synthetic IR Test Collections with Relevance Oracles and Controlled Distractor Diagnostics
SPECTRA generates reproducible synthetic IR corpora up to 60,000 documents with controllable distractors, long-tail vocabulary, and graded relevance labels via a single-process Python prototype.