Mm-vid: Advancing video understanding with gpt-4v (ision).arXiv preprint arXiv:2310.19773, 2023b

Pan, Chaoyi, Yan, Yuzi, Zhang, Zexu, Shen, Yuan , year = · 2022 · arXiv 5093.2022

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

SnareNet: Flexible Repair Layers for Neural Networks with Hard Constraints

cs.LG · 2026-02-10 · unverdicted · novelty 7.0

SnareNet introduces a repair layer that navigates the range space of constraints plus adaptive relaxation training to enforce hard non-convex constraints on neural network outputs more reliably than prior methods.

Global Convergence of Sampling-Based Nonconvex Optimization through Diffusion-Style Smoothing

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

cs.CV · 2024-06-11 · unverdicted · novelty 4.0

VideoLLaMA 2 improves video LLMs via a new STC connector for spatial-temporal dynamics and joint audio training, reaching competitive results on video QA and captioning benchmarks.

citing papers explorer

Showing 3 of 3 citing papers.

SnareNet: Flexible Repair Layers for Neural Networks with Hard Constraints cs.LG · 2026-02-10 · unverdicted · none · ref 20
SnareNet introduces a repair layer that navigates the range space of constraints plus adaptive relaxation training to enforce hard non-convex constraints on neural network outputs more reliably than prior methods.
Global Convergence of Sampling-Based Nonconvex Optimization through Diffusion-Style Smoothing cs.LG · 2026-05-15 · unverdicted · none · ref 167
Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs cs.CV · 2024-06-11 · unverdicted · none · ref 29
VideoLLaMA 2 improves video LLMs via a new STC connector for spatial-temporal dynamics and joint audio training, reaching competitive results on video QA and captioning benchmarks.

Mm-vid: Advancing video understanding with gpt-4v (ision).arXiv preprint arXiv:2310.19773, 2023b

fields

years

verdicts

representative citing papers

citing papers explorer