pith. sign in
Pith Number

pith:VDK6MBX5

pith:2026:VDK6MBX55VGO2B46SXUS5ESXZY
not attested not anchored not stored refs resolved

Learning Relative Representations for Fine-Grained Multimodal Alignment with Limited Data

Shiwon Kim, Yu Rang Park

Relative representations via learnable anchors align token-level structures across modalities using only limited paired examples.

arxiv:2605.16834 v1 · 2026-05-16 · cs.CV · cs.AI · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{VDK6MBX55VGO2B46SXUS5ESXZY}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Despite learning only the anchors without heavy projection layers, our approach consistently outperforms existing methods in zero-shot classification, cross-modal retrieval, and zero-shot segmentation by a substantial margin.

C2weakest assumption

That training a set of learnable anchors to induce consistent cross-modal similarity patterns for matched pairs is sufficient to capture fine-grained token-level relations without requiring additional projection layers or larger paired datasets.

C3one line summary

A new post-hoc alignment technique uses learnable anchors to capture token-level relative similarities between modalities, outperforming global alignment baselines on zero-shot classification, retrieval, and segmentation with scarce paired examples.

References

31 extracted · 31 resolved · 2 Pith anchors

[1] Learning transferable visual models from natural language supervision 2021
[2] Le, Yun- Hsuan Sung, Zhen Li, and Tom Duerig 2021
[3] The platonic representation hypothesis 2024
[4] Relative representations enable zero-shot latent space communication 2023
[5] Linearly mapping from image to text space.arXiv preprint arXiv:2209.15162 2022

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-20T00:03:25.229675Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

a8d5e606fded4ced079e95e92e9257ce3c77e92056f481849826a2b782aa5116

Aliases

arxiv: 2605.16834 · arxiv_version: 2605.16834v1 · doi: 10.48550/arxiv.2605.16834 · pith_short_12: VDK6MBX55VGO · pith_short_16: VDK6MBX55VGO2B46 · pith_short_8: VDK6MBX5
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/VDK6MBX55VGO2B46SXUS5ESXZY \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: a8d5e606fded4ced079e95e92e9257ce3c77e92056f481849826a2b782aa5116
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "bc1667a62d044b06d87094c993236f37b2668814313f48ae7ffeead1c276eec0",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2026-05-16T06:33:38Z",
    "title_canon_sha256": "e824f1a709c8e7b834f17d7af1268258aae06bbd1521fd7f212830335d200e83"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.16834",
    "kind": "arxiv",
    "version": 1
  }
}