pith. sign in
Pith Number

pith:C6RUBASQ

pith:2023:C6RUBASQYQQFQ4DIJKLTWVQRZM
not attested not anchored not stored refs resolved

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model

Aojun Zhou, Conghui He, Hongsheng Li, Jiaming Han, Pan Lu, Peng Gao, Renrui Zhang, Shijie Geng, Wei Zhang, Xiangyu Yue, Yu Qiao, Ziyi Lin

LLaMA-Adapter V2 turns LLaMA into an open-ended visual instruction follower by adding only 14 million parameters.

arxiv:2304.15010 v1 · 2023-04-28 · cs.CV · cs.AI · cs.CL · cs.LG · cs.MM

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{C6RUBASQYQQFQ4DIJKLTWVQRZM}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Compared to the original LLaMA-Adapter, our LLaMA-Adapter V2 can perform open-ended multi-modal instructions by merely introducing 14M parameters over LLaMA.

C2weakest assumption

That the early-fusion placement and the disjoint-parameter joint training will continue to prevent task interference and maintain generalization when the instruction data distribution shifts or when larger base models are used.

C3one line summary

LLaMA-Adapter V2 achieves open-ended visual instruction following in LLMs by unlocking more parameters, early fusion of visual tokens, and joint training on disjoint parameter groups with only 14M added parameters.

References

79 extracted · 79 resolved · 19 Pith anchors

[1] https://sharegpt.com/
[2] Flamingo: a visual language model for few-shot learning
[3] Bottom-up and top-down attention for image captioning and visual question answering 2018
[4] Lan- guage models are few-shot learners 1901
[5] Conceptual 12m: Pushing web-scale image-text pre- training to recognize long-tail visual concepts 2021

Formal links

2 machine-checked theorem links

Cited by

46 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:52.997953Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

17a3408250c4205870684a973b5611cb1d7ab91dd3d6445d7076680005a5045b

Aliases

arxiv: 2304.15010 · arxiv_version: 2304.15010v1 · doi: 10.48550/arxiv.2304.15010 · pith_short_12: C6RUBASQYQQF · pith_short_16: C6RUBASQYQQFQ4DI · pith_short_8: C6RUBASQ
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/C6RUBASQYQQFQ4DIJKLTWVQRZM \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 17a3408250c4205870684a973b5611cb1d7ab91dd3d6445d7076680005a5045b
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "44f870f9d7255528a47e73343d36a3f6ffeee3ff8d7d32938a2bb5d2da548e9a",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL",
      "cs.LG",
      "cs.MM"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2023-04-28T17:59:25Z",
    "title_canon_sha256": "c3cc9569425a0dcc2b6287f8b1cc9dda45a8c9740ce9805e43e60a104fda39b6"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2304.15010",
    "kind": "arxiv",
    "version": 1
  }
}