pith. sign in
Pith Number

pith:F5SAP2TV

pith:2026:F5SAP2TVLX4NM6K2KGMP5NZQA6
not attested not anchored not stored refs resolved

OmniDrop: Layer-wise Token Pruning for Omni-modal LLMs via Query-Guidance

Hyemi Jang, Jongsun Lee, Jooyoung Choi, Minseo Choi, Yeo Jeong Park, Yongkweon Jeon

Layer-wise token pruning inside the LLM decoder, guided by text queries, allows omni-modal models to process audiovisual inputs faster while maintaining or improving accuracy.

arxiv:2605.14458 v1 · 2026-05-14 · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{F5SAP2TVLX4NM6K2KGMP5NZQA6}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Experimental results across various audiovisual benchmarks demonstrate that OmniDrop outperforms all baselines by up to 3.58 points while reducing prefill latency by up to 40% and memory usage by up to 14.7%.

C2weakest assumption

That performing initial fusion in early layers followed by aggressive pruning in deeper layers, guided by text queries, reliably preserves semantic information without task-specific degradation.

C3one line summary

OmniDrop is a training-free layer-wise token pruning framework for omni-modal LLMs that uses query guidance and temporal diversity to reduce prefill latency by up to 40% and memory by 14.7% while improving benchmark scores by up to 3.58 points.

References

33 extracted · 33 resolved · 5 Pith anchors

[1] Divprune: Diversity-based visual token pruning for large multimodal models, 2025 2025
[2] Token merging: Your vit but faster, 2023 2023
[3] An image is worth 1/2 tokens after layer 2: Plug-and-play inference acceleration for large vision-language models 2024
[4] Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities 2025 · arXiv:2507.06261
[5] FlashAttention-2: Faster attention with better parallelism and work partitioning 2024
Receipt and verification
First computed 2026-05-17T23:39:06.813478Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

2f6407ea755df8d6795a5198feb730078f9458cac5c43164d03ff50bdcdbf789

Aliases

arxiv: 2605.14458 · arxiv_version: 2605.14458v1 · doi: 10.48550/arxiv.2605.14458 · pith_short_12: F5SAP2TVLX4N · pith_short_16: F5SAP2TVLX4NM6K2 · pith_short_8: F5SAP2TV
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/F5SAP2TVLX4NM6K2KGMP5NZQA6 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 2f6407ea755df8d6795a5198feb730078f9458cac5c43164d03ff50bdcdbf789
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "a2aec49115402a41aa1f34d23c264ee1108ba0bcde0b78447ce43214e684e684",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2026-05-14T06:54:37Z",
    "title_canon_sha256": "2f0d6d89da70f9c913e7892280f53c6cf2693e18e4a8da795994f65cbc2d4b85"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14458",
    "kind": "arxiv",
    "version": 1
  }
}