pith. sign in
Pith Number

pith:UQJ4RKXA

pith:2026:UQJ4RKXAIS2X74HPYHURKAWPVU
not attested not anchored not stored refs resolved

Closing the Gap at CRAC 2026: Two-Stage Adaptation for LLM-Based Multilingual Coreference Resolution

Antoine Bourgois, Olga Seminck, Thierry Poibeau

A two-stage adapter strategy on Gemma-3-27b with XML mention formatting secures first place in the LLM track of a multilingual coreference shared task.

arxiv:2605.16984 v1 · 2026-05-16 · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{UQJ4RKXAIS2X74HPYHURKAWPVU}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our system is based on the Gemma-3-27b model, fine-tuned using a two-stage strategy with a multilingual base adapter followed by dataset-specific adapters. We represent mention spans by their headword using an XML-inspired format with local reindexing and annotate documents iteratively. These design choices proved effective across languages, document lengths, and annotation guidelines.

C2weakest assumption

That the performance gain is driven by the two-stage adapter strategy and XML-inspired representation rather than by the base capabilities of Gemma-3-27b or by unstated choices in data preprocessing and hyperparameter selection.

C3one line summary

Two-stage adapter fine-tuning of Gemma-3-27b with XML-inspired headword mention representation and iterative document annotation achieved 74.32 CoNLL F1 and first place in the LLM track of CRAC 2026.

References

81 extracted · 81 resolved · 1 Pith anchors

[1] Scaling Laws for Neural Language Models 2020 · arXiv:2001.08361
[2] Context-Aware Machine Translation with Source Coreference Explanation 2024 · doi:10.1162/tacl_a_00677
[3] Seeing the Forest and the Trees: Detection and Cross-Document Coreference Resolution of Militarized Interstate Disputes 2020
[4] Tourille, Julien and Ferret, Olivier and N \'e v \'e ol, Aur \'e lie and Tannier, Xavier. Mod \`e le neuronal pour la r \'e solution de la cor \'e f \'e rence dans les dossiers m \'e dicaux \'e lectro 2020
[5] Coreference Resolution for the Biomedical Domain: A Survey 2021 · doi:10.18653/v1/2021.crac-1.2
Receipt and verification
First computed 2026-05-20T00:03:34.409642Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

a413c8aae044b57ff0efc1e91502cfad26d6b9a7dd398929c3f566f73140c73f

Aliases

arxiv: 2605.16984 · arxiv_version: 2605.16984v1 · doi: 10.48550/arxiv.2605.16984 · pith_short_12: UQJ4RKXAIS2X · pith_short_16: UQJ4RKXAIS2X74HP · pith_short_8: UQJ4RKXA
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/UQJ4RKXAIS2X74HPYHURKAWPVU \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: a413c8aae044b57ff0efc1e91502cfad26d6b9a7dd398929c3f566f73140c73f
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "1e2da6cbc40a7227d8ec36d07c9a44bad235086c3de7f6fb6275010db48aa4a8",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by-sa/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-05-16T13:07:07Z",
    "title_canon_sha256": "a9d4cb4055f8f7167cdeaae0a3cab32d49c8e5a77c07c2353c81a07c46d0e273"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.16984",
    "kind": "arxiv",
    "version": 1
  }
}