Pith Number

pith:7UUOAUYU

pith:2024:7UUOAUYUO6BUPKPARCDZXJZJQH

not attested not anchored not stored refs resolved

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

Conghui He, Dahua Lin, Feng Wu, Jiajie Lu, Jiaqi Wang, Long Xing, Pan Zhang, Qidong Huang, Xiaoyi Dong, Yuhang Cao, Yuhang Zang

PyramidDrop reduces image tokens progressively through the layers of large vision-language models to cut training time by 40% and inference FLOPs by 55% with comparable performance.

arxiv:2410.17247 v2 · 2024-10-22 · cs.CV · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{7UUOAUYUO6BUPKPARCDZXJZJQH}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

PyramidDrop can achieve a 40% training time and 55% inference FLOPs acceleration of LLaVA-NeXT with comparable performance. Besides, the PyramidDrop could also serve as a plug-and-play strategy for inference acceleration without training, with better performance and lower inference cost than counterparts.

C2weakest assumption

The assumption that a lightweight similarity-based dropping rule at stage boundaries preserves all task-critical information across diverse images and downstream tasks, which is supported only by the reported experiments on LLaVA-NeXT.

C3one line summary

PyramidDrop accelerates LVLMs by staged, similarity-based dropping of visual tokens that become redundant in deeper layers, delivering 40% faster training and 55% lower inference cost with comparable accuracy.

References

56 extracted · 56 resolved · 24 Pith anchors

[1] and Vandierendonck, Hans and John, Deepu and Ji, Bo , month = aug, year =

[2] Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond 2023 · arXiv:2308.12966

[3] Token Merging: Your ViT But Faster 2022 · arXiv:2210.09461

[4] Pumer: Pruning and merging tokens for efficient vision language models, 2023 2023

[5] Llavolta: Efficient multi-modal models via stage-wise visual context compression 2024

Formal links

2 machine-checked theorem links

Cited by

27 papers in Pith

Growing a Multi-head Twig via Distillation and Reinforcement Learning to Accelerate Large Vision-Language Models

Focus-then-Context: Subject-Centric Progressive Visual Token Reduction for Vision-Language Models

LRCP: Low-Rank Compressibility Guided Visual Token Pruning for Efficient LVLMs

FastOCR: Dynamic Visual Fixation via KV Cache Pruning for Efficient Document Parsing

Rotation-Aligned Key Channel Pruning for Efficient Vision-Language Model Inference

Receipt and verification

First computed	2026-05-17T23:38:52.581127Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

fd28e05314778347a9e088879ba72981d88bee57e5735546a2b6bed4a380c89a

Aliases

arxiv: 2410.17247 · arxiv_version: 2410.17247v2 · doi: 10.48550/arxiv.2410.17247 · pith_short_12: 7UUOAUYUO6BU · pith_short_16: 7UUOAUYUO6BUPKPA · pith_short_8: 7UUOAUYU

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/7UUOAUYUO6BUPKPARCDZXJZJQH \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: fd28e05314778347a9e088879ba72981d88bee57e5735546a2b6bed4a380c89a

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "4b89e9e203d463d2e8d7523502838057237d47c51d79b18c5a0f200ce59b85dd",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2024-10-22T17:59:53Z",
    "title_canon_sha256": "1d500d068a96591af0d35cd55147e28c00ef34b0380d4446340ae066d9a215e2"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2410.17247",
    "kind": "arxiv",
    "version": 2
  }
}