Pith Number

pith:YPHKM4EJ

pith:2026:YPHKM4EJRCZ5UJCQL2UWL3Y2P2

not attested not anchored not stored refs resolved

Think When Needed: Adaptive Reasoning-Driven Multimodal Embeddings with a Dual-LoRA Architecture

Guanghao Zhang, Hao Jiang, Longxiang Zhang, Pipei Huang, Weilong Dai

A dual-LoRA architecture with a routing gate lets multimodal embeddings add chain-of-thought reasoning only when it improves results.

arxiv:2605.14448 v1 · 2026-05-14 · cs.CV · cs.CL · cs.IR

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{YPHKM4EJRCZ5UJCQL2UWL3Y2P2}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

On the 78 tasks of MMEB-V2, TWN achieves state-of-the-art embedding quality while being substantially more efficient than existing generative methods, requiring only 3-5% additional parameters relative to the backbone and up to 50% fewer reasoning tokens compared to the full generative mode.

C2weakest assumption

The self-supervised routing gate accurately identifies inputs where reasoning is unnecessary or harmful, and that detaching gradients at the LoRA interface fully resolves optimization conflicts without introducing new biases in the learned adapters.

C3one line summary

TWN attaches separate reasoning and embedding LoRA adapters to a frozen backbone with gradient detachment and a self-supervised gate that decides per input whether to generate CoT, achieving SOTA on MMEB-V2 with 3-5% added parameters and up to 50% fewer reasoning tokens.

References

43 extracted · 43 resolved · 7 Pith anchors

[1] Qwen3-VL Technical Report 2025 · arXiv:2511.21631

[2] Llm2vec: Large language models are secretly powerful text encoders 2024

[3] Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks 2024

[4] Think then embed: Generative con- text improves multimodal embedding 2025

[5] Flashattention-2: Faster attention with better parallelism and work partitioning 2024

Receipt and verification

First computed	2026-05-17T23:39:06.927990Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

c3cea6708988b3da24505ea965ef1a7ea68849a2dfc53da52becc64b1d2f27aa

Aliases

arxiv: 2605.14448 · arxiv_version: 2605.14448v1 · doi: 10.48550/arxiv.2605.14448 · pith_short_12: YPHKM4EJRCZ5 · pith_short_16: YPHKM4EJRCZ5UJCQ · pith_short_8: YPHKM4EJ

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/YPHKM4EJRCZ5UJCQL2UWL3Y2P2 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c3cea6708988b3da24505ea965ef1a7ea68849a2dfc53da52becc64b1d2f27aa

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "405fe0f1b8407cb4324f8cb7cfe2a184ddbb68ca88a963435733d0569cc84b45",
    "cross_cats_sorted": [
      "cs.CL",
      "cs.IR"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2026-05-14T06:41:53Z",
    "title_canon_sha256": "81f20f7a08c7da0115639609c88d140d5c2e5c032065544a9c24fb5f14f2b914"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14448",
    "kind": "arxiv",
    "version": 1
  }
}