pith:XXMJ2NHH
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Sa2VA unifies segmentation and language models for referring tasks on both images and videos using minimal instruction tuning.
arxiv:2501.04001 v3 · 2025-01-07 · cs.CV
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{XXMJ2NHHAXOHWMNFGFBXUC4CXI}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Sa2VA is the first comprehensive, unified model for dense grounded understanding of both images and videos that supports referring segmentation and conversation with minimal one-shot instruction tuning.
That the LLM-generated instruction tokens can reliably guide SAM-2 to produce precise masks across complex video scenes without task-specific architectural changes or heavy fine-tuning.
Sa2VA unifies SAM-2 segmentation with MLLM reasoning into a single model for referring segmentation and conversation on images and videos, supported by a new 72k-expression Ref-SAV dataset.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:48.000065Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
bdd89d34e705dc7b31a531437a0b82ba1979ecf24ec99ee74181c8f9372b81a1
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/XXMJ2NHHAXOHWMNFGFBXUC4CXI \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: bdd89d34e705dc7b31a531437a0b82ba1979ecf24ec99ee74181c8f9372b81a1
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "af1b72fcb24f808b68a624b1c50a2756d0e0ee2cfca9157d605e2e71789746e5",
"cross_cats_sorted": [],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CV",
"submitted_at": "2025-01-07T18:58:54Z",
"title_canon_sha256": "182fbd81701d2e83bb759bbc154509c069a63c6d97465120834f6e8e7d179ffb"
},
"schema_version": "1.0",
"source": {
"id": "2501.04001",
"kind": "arxiv",
"version": 3
}
}