SimEval-IR toolkit and benchmarks demonstrate that human-likeness classifiers have negligible pooled predictive power (r=+0.09) for simulator-based system ranking validity, whereas marginal click-depth distance and Fréchet distance on session embeddings show stronger signals (r=0.43 and 0.40).
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.IR 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
IIRSim Studio is a new web workbench offering visual pipeline composition, Git-backed component sharing, explicit provenance for replication, and shared-task support on top of existing IR user simulation libraries.
citing papers explorer
-
SimEval-IR: A Unified Toolkit and Benchmark Suite for Evaluating User Simulators and Search Sessions
SimEval-IR toolkit and benchmarks demonstrate that human-likeness classifiers have negligible pooled predictive power (r=+0.09) for simulator-based system ranking validity, whereas marginal click-depth distance and Fréchet distance on session embeddings show stronger signals (r=0.43 and 0.40).
-
IIRSim Studio: A Dashboard for User Simulation
IIRSim Studio is a new web workbench offering visual pipeline composition, Git-backed component sharing, explicit provenance for replication, and shared-task support on top of existing IR user simulation libraries.