Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Long-term recurrent convolutional networks for visual recognition, description , author=

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

PaLI: A Jointly-Scaled Multilingual Language-Image Model

cs.CV · 2022-09-14 · conditional · novelty 7.0

PaLI jointly scales a 4B-parameter vision transformer with language models on a new 10B multilingual image-text dataset to reach state-of-the-art results on vision-language tasks while keeping a simple modular design.

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

cs.CV · 2023-10-03 · unverdicted · novelty 6.0

LanguageBind aligns video, infrared, depth, and audio to a frozen language encoder via contrastive learning on the new VIDAL-10M dataset, extending video-language pretraining to N modalities.

Online Hand Gesture Recognition Using 3D Convolutional Neural Networks

cs.CV · 2026-05-22 · unverdicted · novelty 2.0

Proposes an online hand gesture recognition system using 3D CNNs achieving 98%+ detector accuracy and 90%+ classifier accuracy on Jester, with 37.5% Levenshtein accuracy on a homemade dataset.

citing papers explorer

Showing 3 of 3 citing papers.

PaLI: A Jointly-Scaled Multilingual Language-Image Model cs.CV · 2022-09-14 · conditional · none · ref 163
PaLI jointly scales a 4B-parameter vision transformer with language models on a new 10B multilingual image-text dataset to reach state-of-the-art results on vision-language tasks while keeping a simple modular design.
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment cs.CV · 2023-10-03 · unverdicted · none · ref 127
LanguageBind aligns video, infrared, depth, and audio to a frozen language encoder via contrastive learning on the new VIDAL-10M dataset, extending video-language pretraining to N modalities.
Online Hand Gesture Recognition Using 3D Convolutional Neural Networks cs.CV · 2026-05-22 · unverdicted · none · ref 16
Proposes an online hand gesture recognition system using 3D CNNs achieving 98%+ detector accuracy and 90%+ classifier accuracy on Jester, with 37.5% Levenshtein accuracy on a homemade dataset.

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

fields

years

verdicts

representative citing papers

citing papers explorer