Llama-vid: An image is worth 2 tokens in large language models

Yanwei Li, Chengyao Wang, Jiaya Jia · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Seeing Through Fog: Towards Fog-Invariant Action Recognition

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

Introduces FogAct paired clean-foggy video dataset and FogNet two-stream CLIP model that learns fog-invariant semantic representations via clean-video guidance.

citing papers explorer

Showing 1 of 1 citing paper.

Seeing Through Fog: Towards Fog-Invariant Action Recognition cs.CV · 2026-05-20 · unverdicted · none · ref 21
Introduces FogAct paired clean-foggy video dataset and FogNet two-stream CLIP model that learns fog-invariant semantic representations via clean-video guidance.

Llama-vid: An image is worth 2 tokens in large language models

fields

years

verdicts

representative citing papers

citing papers explorer