pith. sign in
Pith Number

pith:T7PRIAHH

pith:2026:T7PRIAHHEGNOECVUL26FFSFZ4Z
not attested not anchored not stored refs resolved

GRIP-VLM: Group-Relative Importance Pruning for Efficient Vision-Language Models

Hao Wen, Liang Mi, Lichen Pang, Mingzhe Huang, Shansong Yang, Ting Cao, Weijun Wang, Xin Ding, Yuanchun Li, Yunxin Liu

GRIP-VLM uses reinforcement learning to optimize discrete visual token pruning in VLMs, avoiding suboptimal local minima from gradient relaxations.

arxiv:2605.13375 v1 · 2026-05-13 · cs.CV · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{T7PRIAHHEGNOECVUL26FFSFZ4Z}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

GRIP-VLM consistently outperforms heuristic and supervised-learning baselines, achieving a superior Pareto frontier and delivering up to a 15% inference speedup at equal accuracy.

C2weakest assumption

That the Group Relative Policy Optimization agent can reliably discover high-quality discrete pruning masks across varying compression budgets without retraining and without the instability typical of RL on combinatorial spaces.

C3one line summary

GRIP-VLM applies group-relative policy optimization via reinforcement learning to prune visual tokens in VLMs, yielding up to 15% inference speedup at matched accuracy over prior methods.

References

42 extracted · 42 resolved · 12 Pith anchors

[1] Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models 2023
[2] MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models 2023 · arXiv:2304.10592
[3] Improved baselines with visual instruction tuning 2024
[4] Visual instruction tuning.Advances in neural information processing systems, 2024 2024
[5] Llama 2: Open Foundation and Fine-Tuned Chat Models 2023 · arXiv:2307.09288
Receipt and verification
First computed 2026-05-18T02:44:47.917854Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

9fdf1400e7219ae20ab45ebc52c8b9e67fe8a9e37215a3affeadd29e38c0b516

Aliases

arxiv: 2605.13375 · arxiv_version: 2605.13375v1 · doi: 10.48550/arxiv.2605.13375 · pith_short_12: T7PRIAHHEGNO · pith_short_16: T7PRIAHHEGNOECVU · pith_short_8: T7PRIAHH
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/T7PRIAHHEGNOECVUL26FFSFZ4Z \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 9fdf1400e7219ae20ab45ebc52c8b9e67fe8a9e37215a3affeadd29e38c0b516
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "668b20bdedb2dd38cdba55b0c79ff2b87480be5eb068d63703fd1a8a3cecbe48",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2026-05-13T11:32:03Z",
    "title_canon_sha256": "2d1a3380f6da5b8e97e08fba4ecd517b81b89b2a512b1ccbac00732ea21f7347"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.13375",
    "kind": "arxiv",
    "version": 1
  }
}