pith. sign in
Pith Number

pith:JXNDUWUH

pith:2026:JXNDUWUH4STOE2SI6PIJ6IMW23
not attested not anchored not stored refs resolved

What Makes Words Hard? Sakura at BEA 2026 Shared Task on Vocabulary Difficulty Prediction

Adam Nohejl, Hitomi Yanaka, Maria Angelica Riera Machin, Xuanxin Wu, Yi-Ning Chang, Yusuke Ide

Spelling difficulty and test item construction often drive ratings in standard vocabulary difficulty lists beyond genuine word production demands.

arxiv:2605.14257 v1 · 2026-05-14 · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{JXNDUWUH4STOE2SI6PIJ6IMW23}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

The black-box model achieved r > 0.91 and topped the open track, while the explainable model reached r > 0.77 and showed that KVL item difficulty is affected by spelling difficulty or test item construction in addition to genuine production difficulty.

C2weakest assumption

That the shared task dataset and KVL lists provide a clean measure of genuine word production difficulty without significant confounding from test design or spelling factors that the models are capturing post-hoc.

C3one line summary

Fine-tuned LLM with soft-target loss tops shared task on vocabulary difficulty prediction at r>0.91 while explainable model at r>0.77 shows spelling and item construction affect difficulty beyond word production.

References

35 extracted · 35 resolved · 7 Pith anchors

[1] Dennis Aumiller and Michael Gertz. 2022. https://doi.org/10.18653/v1/2022.tsar-1.28 U ni HD at TSAR -2022 shared task: Is compute all we need for lexical simplification? In Proceedings of the Workshop 2022 · doi:10.18653/v1/2022.tsar-1.28
[2] BNC Consortium . 2007. https://llds.ling-phil.ox.ac.uk/llds/xmlui/handle/20.500.14106/2554 British National Corpus , XML edition . https://llds.ling-phil.ox.ac.uk/llds/xmlui/handle/20.500.14106/2554 2007
[3] Annette Capel. 2012. https://doi.org/10.1017/S2041536212000013 Completing the English Vocabulary Profile : C1 and C2 vocabulary . English Profile Journal, 3:e1 2012 · doi:10.1017/s2041536212000013
[4] In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 2016 · doi:10.1145/2939672.2939785
[5] Proceedings of the Association for Computational Linguistics (ACL) , pages = 2020 · doi:10.18653/v1/2020.acl-main.747
Receipt and verification
First computed 2026-05-17T23:39:10.516776Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

4dda3a5a87e4a6e26a48f3d09f2196d6ed1794bb13e3fe9fec276c8429898ac1

Aliases

arxiv: 2605.14257 · arxiv_version: 2605.14257v1 · doi: 10.48550/arxiv.2605.14257 · pith_short_12: JXNDUWUH4STO · pith_short_16: JXNDUWUH4STOE2SI · pith_short_8: JXNDUWUH
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/JXNDUWUH4STOE2SI6PIJ6IMW23 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 4dda3a5a87e4a6e26a48f3d09f2196d6ed1794bb13e3fe9fec276c8429898ac1
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "388081cf826dabe090d85772e1c019e4773ab53e10efc84ed5535f1767cb96c2",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-05-14T01:57:35Z",
    "title_canon_sha256": "7d53ea9ef2e5768cab9f4ed43337e4448ba83cb0818ad7a4e6511ac45e362c0f"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14257",
    "kind": "arxiv",
    "version": 1
  }
}