pith. sign in
Pith Number

pith:YVDWE34U

pith:2026:YVDWE34UGGI3H57B36RNMADRTJ
not attested not anchored not stored refs resolved

A Standardized Re-evaluation of Conversational Recommender Systems on the ReDial Dataset

Ivica Kostric, Krisztian Balog

Standardized tests on ReDial show that nearly half of reported CRS accuracy comes from repetition shortcuts rather than architectural advances or novelty.

arxiv:2605.13053 v1 · 2026-05-13 · cs.IR

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{YVDWE34UGGI3H57B36RNMADRTJ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our reproducibility study reveals a granularity gap, where fine-grained ranking (Recall@1) is highly sensitive to implementation details, while our replicability analysis shows that nearly 50% of reported accuracy stems from repetition shortcuts that are absent in novelty-focused evaluation. Furthermore, we find that performance gains are often driven more by the capacity of the LLM backbone than by specific architectural innovations.

C2weakest assumption

The chosen seven methods and three architectural families are representative of the broader CRS literature, and the single standardized preprocessing pipeline adopted here is the correct reference point against which all prior results should be judged.

C3one line summary

Standardized re-evaluation of CRS methods on ReDial finds that nearly half of reported accuracy stems from repetition shortcuts absent in novelty-focused tests, performance tracks LLM capacity more than architecture, and traditional recall overstates conversational utility.

References

22 extracted · 22 resolved · 1 Pith anchors

[1] Nolwenn Bernard and Krisztian Balog. 2025. Limitations of Current Evaluation Practices for Conversational Recommender Systems and the Potential of User Simulation. InProceedings of the 2025 Annual Int 2025
[2] Qibin Chen, Junyang Lin, Yichang Zhang, Ming Ding, Yukuo Cen, Hongxia Yang, and Jie Tang. 2019. Towards Knowledge-Based Recommender Dialog System. In Proceedings of the 2019 Conference on Empirical Me 2019
[3] QLoRA: Efficient Finetuning of Quantized LLMs 2023 · arXiv:2305.14314
[4] Maurizio Ferrari Dacrema, Simone Boglio, Paolo Cremonesi, and Dietmar Jannach
[5] A Troubling Analysis of Reproducibility and Progress in Recommender Systems Research.ACM Trans. Inf. Syst.39, 2 (2021), 20:1–20:49 2021
Receipt and verification
First computed 2026-05-18T03:08:59.256717Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

c547626f943191b3f7e1dfa2d600719a611ffcc0f587672b07830f8dd3dfc064

Aliases

arxiv: 2605.13053 · arxiv_version: 2605.13053v1 · doi: 10.48550/arxiv.2605.13053 · pith_short_12: YVDWE34UGGI3 · pith_short_16: YVDWE34UGGI3H57B · pith_short_8: YVDWE34U
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/YVDWE34UGGI3H57B36RNMADRTJ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c547626f943191b3f7e1dfa2d600719a611ffcc0f587672b07830f8dd3dfc064
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "8934a7c69321f3ed3b134abcfd6e7f53f1156f1e13dde036cf45b534af80efb4",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.IR",
    "submitted_at": "2026-05-13T06:20:43Z",
    "title_canon_sha256": "5972e8ac4d3b93e44b9bc4601eb12cf99e15b6ab773f51968a6f1c48f491a9c4"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.13053",
    "kind": "arxiv",
    "version": 1
  }
}