pith. sign in
Pith Number

pith:VQFPYMTY

pith:2023:VQFPYMTY4WGRVXXHEWI3HRQAC2
not attested not anchored not stored refs resolved

Ferret: Refer and Ground Anything Anywhere at Any Granularity

Bowen Zhang, Haotian Zhang, Haoxuan You, Liangliang Cao, Shih-Fu Chang, Xianzhi Du, Yinfei Yang, Zhe Gan, Zirui Wang

Ferret unifies referring and grounding in multimodal LLMs via a hybrid region representation of coordinates and continuous features.

arxiv:2310.07704 v1 · 2023-10-11 · cs.CV · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{VQFPYMTY4WGRVXXHEWI3HRQAC2}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

The resulting model not only achieves superior performance in classical referring and grounding tasks, but also greatly outperforms existing MLLMs in region-based and localization-demanded multimodal chatting.

C2weakest assumption

That the spatial-aware visual sampler can reliably extract continuous features from regions of arbitrary shape and sparsity without introducing systematic bias or information loss that would affect downstream grounding accuracy.

C3one line summary

Ferret introduces a hybrid region representation and the GRIT dataset to let MLLMs refer to and ground arbitrary image regions, outperforming prior models on referring, grounding, and localization-aware chatting while reducing object hallucination.

References

20 extracted · 20 resolved · 0 Pith anchors

[1] The length of the output list needs to be exactly equal to the input list
[2] Do not explain the reasons
[3] Do not mention the input entities, at least the output name and input name needs to be different
[4] Do not mention something abstract, like ¨alien¨
[5] When dealing with quantities, focus solely on increasing the numbers during revision

Formal links

1 machine-checked theorem link

Cited by

28 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:52.675730Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

ac0afc3278e58d1adee72591b3c6001694212189e5f52ccc1f05310c974c13ff

Aliases

arxiv: 2310.07704 · arxiv_version: 2310.07704v1 · doi: 10.48550/arxiv.2310.07704 · pith_short_12: VQFPYMTY4WGR · pith_short_16: VQFPYMTY4WGRVXXH · pith_short_8: VQFPYMTY
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/VQFPYMTY4WGRVXXHEWI3HRQAC2 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: ac0afc3278e58d1adee72591b3c6001694212189e5f52ccc1f05310c974c13ff
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "419cebae751a0f18a1d1ab104687a18f345ab96416311add427dc504104289dd",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2023-10-11T17:55:15Z",
    "title_canon_sha256": "e0f7ae8a85a0a0fb3081cee3c416f47a53ccb17e4f03f15ee206de24cb7bc640"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2310.07704",
    "kind": "arxiv",
    "version": 1
  }
}