Pith Number

pith:Z7QLAXPZ

pith:2026:Z7QLAXPZNEOC6HXYDPQ6MFIUMY

not attested not anchored not stored refs resolved

Reinforcement Learning with Semantic Rewards Enables Low-Resource Language Expansion without Alignment Tax

Guixian Xu, Longfei Zheng, Rong Fu, Wentao Zhang, Xiaolu Zhang, Xuexian Song, Zeli Su, Zhankai Xu, Zhou Liu, Ziyin Zhang

Reinforcement learning with embedding-level semantic rewards lets LLMs add low-resource languages without the usual loss of general skills.

arxiv:2605.14366 v1 · 2026-05-14 · cs.CL · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{Z7QLAXPZNEOC6HXYDPQ6MFIUMY}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Experiments show that our method acquires low-resource capabilities while markedly mitigating alignment tax, preserving general competence more effectively than SFT.

C2weakest assumption

That embedding-level semantic rewards reliably capture and preserve intended meaning across languages without introducing new biases or requiring the model to have strong pretrained semantic understanding in the target language.

C3one line summary

Reinforcement learning with semantic rewards lets LLMs gain low-resource language skills without the alignment tax that degrades general capabilities in supervised fine-tuning.

References

38 extracted · 38 resolved · 10 Pith anchors

[1] Transactions of the Association for Computational Linguistics , volume = 2020 · doi:10.1162/tacl_a_00343

[3] Proceedings of the 28th International Conference on Computational Linguistics , year = 2020 · doi:10.18653/v1/2020.coling-main.574

[4] Proceedings of the 28th International Conference on Computational Linguistics , year = 2020 · doi:10.18653/v1/2020.coling-main.381

[5] Proceedings of the National Academy of Sciences , volume = 2017

[6] Proceedings of the European Conference on Computer Vision (ECCV) , year =

Formal links

2 machine-checked theorem links

Receipt and verification

First computed	2026-05-17T23:39:07.886359Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

cfe0b05df9691c2f1ef81be1e615146610fa8b61949ed71f6c188e9c19d90c27

Aliases

arxiv: 2605.14366 · arxiv_version: 2605.14366v1 · doi: 10.48550/arxiv.2605.14366 · pith_short_12: Z7QLAXPZNEOC · pith_short_16: Z7QLAXPZNEOC6HXY · pith_short_8: Z7QLAXPZ

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/Z7QLAXPZNEOC6HXYDPQ6MFIUMY \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: cfe0b05df9691c2f1ef81be1e615146610fa8b61949ed71f6c188e9c19d90c27

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "49333ace483f564e023ef1d6138125147e77396eb457ea09bcd626547a90e4d7",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-05-14T04:47:22Z",
    "title_canon_sha256": "f4863bbe0621727b76fdaf7b4479f8558cd16a2da763d0125b128359c6c033ea"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14366",
    "kind": "arxiv",
    "version": 1
  }
}